English Deutsch Français Italiano Español Português 繁體中文 Bahasa Indonesia Tiếng Việt ภาษาไทย
All categories

I have a large text file with thousands of entries on it (each on a new line) which were copied and pasted from different sources, but the problem is that hundreds of the entries are likely to be there more than once. So I would like a batch program to look at this file (let's call it list.txt) and remove anything that appears twice (or more times) leaving only one of every entry. For example:

a
b
c
a
a
b

would become:

a
b
c

(Any duplicates have been removed, leaving only one of each unique entry in the list.)

Could answerers please leave the code in their answer, I will create a batch file from this. For a better chance of best answer, explanations of what different commands are doing should be included. Thanks.

2006-08-03 05:10:44 · 10 answers · asked by Rich 5 in Computers & Internet Programming & Design

Unfortunately, arnold's code does not work as a MS-DOS batch file (as he pointed out, it is unix), however the method of making a new list without duplicates is OK if that helps anyone out there who has decided to help me!

2006-08-03 05:25:07 · update #1

I am using Windows XP and unfortunately my skills as a programmer are... ahem.. not too brilliant as of yet!

2006-08-03 05:33:14 · update #2

10 answers

Use:

C:\dedup "C:\My Documents\list.txt"

You need to first create dedup.bat on the root of your C: drive, it should contain the code below:


@echo off
setlocal
if {%1} EQU {} goto syntax
if not exist %1 goto syntax
set file=%1
set file="%file:"=%"
set work=%~pd1\%~nx1.tmp
set work="%work:"=%"
set work=%work:\\=\%
sort %file% /O %work%
del /f /q %file%
for /f "Tokens=1* Delims=:" %%s in ('findstr /n /v /c:"dO nOt FiNd" %work%') do set record=###%%t###&call :output
REM if exist %work% del /q %work%
endlocal
goto :EOF
:syntax
@echo ***************************
@echo Syntax: dedup "Input_File"
@echo ***************************
goto :EOF
:output
if not defined prev_rec goto :write
if "%record%" EQU "%prev_rec%" goto :EOF
:write
set prev_rec=%record%
set record=%record:###=%
if "%record%" EQU "" goto :blknul
if "%record%" GTR " " @echo>>%file% %record%&goto :EOF
:blknul
if defined bn_rec goto :EOF
set bn_rec=Y
@echo.>>%file%


Note the following:
Line 12 (excerpted below) in the above program is shown as two lines but when you create the batch file, the code must be all on one line.

for /f "Tokens=1* Delims=:" %%s in ('findstr /n /v /c:"dO nOt FiNd" %work%') do set record=###%%t###&call :output

2006-08-05 19:59:12 · answer #1 · answered by Mowgli 6 · 0 0

1

2017-01-20 06:21:12 · answer #2 · answered by ? 3 · 0 0

Here is a way that you can do it using Excel or some other spreadsheet.

Import the file into Excel. If you have one entry per text line, you will get one entry in the first cell of each row.

Sort the file so that duplicate rows sort together.

In column B, row 2, define the cell to be +IF(A2=A1), 1, 0)

Duplicate that formula down all the rows of the spreadsheet.

It will be 0 for the first item of each duplicate set.

Save the page as a text file. The formulas go away and you save only the text and the 0's and 1's.

Reopen the saved file in Excel and sort on column B. All the 0 items will be together. They are your unique items. Delete the other rows, clear column B, and save.

2006-08-03 06:16:12 · answer #3 · answered by rt11guru 6 · 0 0

I am not a master at batch and i would not be able to give you your entire answer but i kow enough to say you would need to have the program check a diccionary and you might need to make the dictionary into the batch file, you also need to be able to make it relate misspellings with the words you were trying to mention without having it consider a real word as a misspelling of another one. You would have to tell it to let you choose the word that would correspond to a misspelled word with more than one possible corrections.

2016-03-26 21:40:13 · answer #4 · answered by Anonymous · 0 0

Using the simple tools that come in the current distribution of windows dos (Ive only ever heard "of batch programs" to mean dos scripts). You can not do this task. Try making an exe to do it for you, or get someone to do it for you. It would only take a few lines of c to do this. If you have microsoft office i can give you a VBA program which will do the job.

2006-08-03 05:28:24 · answer #5 · answered by Anonymous · 0 0

Which OS you are using?

If you know C language, you can easily acheive this using file and string operations.
In Oracle SQL*PLUS, you can load the file into a temporary table using bulk loading and then write a simple distinct query to retrieve the unique values.

If you need a C or Oracle PL/SQL program for this, let me know.

2006-08-03 05:30:20 · answer #6 · answered by Pands 2 · 0 0

Instead of a program, it can be easily done using 'sort' utility in the JCL.
In the mainframe world, use the 'sort' / 'synsort' / 'dfsort', whichever in installed on your machine--
then code the JCL--and it should read as follows -

//PS010 EXEC SORT
//SORTIN DD DSN="your input file name here",
// DISP=SHR
//SORTOUT DD DSN="your output file here",
// DISP=(,CATLG,DELETE),
// UNIT=SYSDA,
// SPACE=(CYL,(10,10),RLSE)
//SYSIN DD *
SORT FIELDS=(01,01,A),FORMAT=CH
SUM FIELDS=NONE
/*
Note the SYSIN parameters- those are the important ones.
The 'sum fileds =none' is the one that removes the duplicates. Just remeber to used the proper field position and the length of the character in the 'sort fields' syntax.

Since you are thinking or writing a 'batch program' and running, I am assuming that you are doing this on mainframe world and will be executing the program using JCL.

Hope it helps.

2006-08-03 05:21:48 · answer #7 · answered by Anonymous · 0 0

f you have Access then import the file into a table, select out the distinct entries and export out again as a text file

2006-08-03 16:01:20 · answer #8 · answered by CeeVee 3 · 0 0

sort -u filename.txt > newfilename.txt

(but it's unix)

2006-08-03 05:16:07 · answer #9 · answered by arnold 3 · 0 0

sorry

2006-08-06 06:08:49 · answer #10 · answered by Anonymous · 0 0

fedest.com, questions and answers