Sort and remove duplicates using awk scripting

Question

tuffenuff 0 Newbie Poster

15 Years Ago

I have file which is having large data in it. and there are some repeated rows. Basic idea here is : Sort data, remove duplicates based on first field and then print the whole....

I have tried teh following but no help..

#!/bin/sh
if [ $# -ne 1 ]
then
    echo "Usage - $0  file-name"
    exit 1
fi
if [ -f $1 ]
then
    echo "$1 file exist"
    sort -u $1 > results.cvs
    awk '!x[$0]++' results.cvs > results-new.cvs
 else
    echo "Sorry, $1 file does not exist"
fi

Input data and expected out put data Attached :

trying in HP-UNIX

shell-scripting

2 Contributors
9 Replies
501 Views
7 Hours Discussion Span
Latest Post 15 Years Ago Latest Post by tuffenuff

gerard4143 371 Nearly a Posting Maven

15 Years Ago

I would use the sort utility for this one..

gerard4143 371 Nearly a Posting Maven

15 Years Ago

I have file which is having large data in it. and there are some repeated rows. Basic idea here is : Sort data, remove duplicates based on first field and then print the whole....
I have tried teh following but no help..
#!/bin/sh
if [ $# -ne 1 ]
then
    echo "Usage - $0  file-name"
    exit 1
fi
if [ -f $1 ]
then
    echo "$1 file exist"
    sort -u $1 > results.cvs
    awk '!x[$0]++' results.cvs > results-new.cvs
 else
    echo "Sorry, $1 file does not exist"
fi
Input data and expected out put data Attached :
trying in HP-UNIX

Actually looking at your script, you have sort...
Why are you using a temp file results.cvs. Won't it make more sense to pipe the result of sort directly into awk.

gerard4143 371 Nearly a Posting Maven

15 Years Ago

Like this..

#!/bin/sh
if [ $# -ne 1 ]
then
	echo "Usage - $0  file-name"
	exit 1
fi

touch results.cvs

if [ -f $1 ]
then
	echo "$1 file exist"
	sort -u $1 | awk '!x[$0]++' > results.cvs
else
	echo "Sorry, $1 file does not exist"
fi

Note - You really should have the proper exits in your program.

Edited 15 Years Ago by gerard4143 because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

tuffenuff 0 Newbie Poster · Answer 1 · 2010-04-01T03:08:36+00:00

Actually, I am new to shell scriting.
you mean to say:

sort -u | awk '!x[$0]++'  $1 > results-new.cvs

if i had to write this in Vb script i would like this:

Option Explicit
Dim objFSO, strInputFile, strOutFile, objTextFile
Dim strData, strLine, arrLines, i, j, firstLine, secondLine, arrIds
CONST ForReading = 1  
Const ForAppending = 8 
strInputFile = "inputfile.cvs"
strOutFile = "outputfile.cvs"
Set objFSO = CreateObject("Scripting.FileSystemObject")
strData = objFSO.OpenTextFile(strInputFile,ForReading).ReadAll
arrLines = Split(strData,vbCrLf)
Set objFSO = Nothing
for i = 0 to UBound(arrlines)
    if arrlines(i) <> "" then
       firstLine = split(arrlines(i), ",")
       for j = i + 1 to UBound(arrlines)
           if arrlines(j) <> "" then
              secondLine = split(arrlines(j), ",")
              if(firstLine(0) = secondLine(0)) then
                 arrlines(j) = ""
              end if
           end if
        next
    end if
next

Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFile = objFSO.OpenTextFile(strOutFile, ForAppending, True)

for each strline in arrlines
    if strline <> "" then
        objTextFile.WriteLine(strline)
    end if
next
objTextFile.Close
Set objFSO = Nothing

gerard4143 371 Nearly a Posting Maven · Answer 2 · 2010-04-01T03:12:06+00:00

Actually, I am new to shell scriting.
you mean to say:

sort -u | awk '!x[$0]++' $1 > results-new.cvs

Almost $1 belongs with the sort command

sort -u $1 | awk '!x[$0]++'   > results-new.cvs

I never used VB script. It looks extravagant.

The worst thing you can do while bash shell scripting is think like a 'traditional' programmer. Things are done differently in bash scripting.

tuffenuff 0 Newbie Poster · Answer 3 · 2010-04-01T03:23:10+00:00

I am nt getting the desired out put here! I am still getting the duplicates...

my requirement is copare each and every first filed..

when i executed this i am getting three records:

awk 'x[$1]++ == 1 { print $1 " is duplicated"}' inputfile.cvs

is there anyway to compare only first field and print full data...

gerard4143 371 Nearly a Posting Maven · Answer 4 · 2010-04-01T03:29:15+00:00

I am nt getting the desired out put here! I am still getting the duplicates...
my requirement is copare each and every first filed..
when i executed this i am getting three records:
awk 'x[$1]++ == 1 { print $1 " is duplicated"}' inputfile.cvs
is there anyway to compare only first field and print full data...

That's because awk is reading the file. You need to have sort read the file and pipe the results to awk...Like I showed you in the above postings.

gerard4143 371 Nearly a Posting Maven · Answer 5 · 2010-04-01T03:32:28+00:00

cvs files are comma delimited right? You may have to set the sort delimiter with -t

tuffenuff 0 Newbie Poster · Answer 6 · 2010-04-01T03:34:14+00:00

Thank you! :) now i understood! this is working fine!