Hi Daniweb members,
I have two files that are not of equal dimensions and I need to match them on a unique key much like an SQL natural join. The files are space delimited and contain no headings.
File 1
rs121 10001 A G
rs125 10002 C T
rs126 10003 T A
File 2
rs121 11001
rs122 11002
rs126 11004
rs125 11003
Required result:
file 3:
rs121 11001 A G
rs125 11003 C T
rs126 11004 T A
#all the fields from file two replace the first two columns of file 1 and the rest of file 1 is maintained as is.
The two files would have about 600 000 rows and 6042 columns so it is rather a large task to accomplish. I've started using awk:
awk 'BEGIN {OFS=" "} NR==FNR { a[$2]=$0 ;next} { for ( i in a) { if ($2==i) { print a[i],$4,$5} }}' file2 file1 > file3
This produces the following result:
file3
rs121 11001
A G
rs125 11003
C T
rs126 11004
T A
How do I remove the carriage returns? I think this is the problem?
I need to include another loop that will tell awk to print to the end of the file, I suppose something like this:
awk '{for(i=3;i<=NF;++i)print $i}'
but I'm not sure where to insert it into the above loop?
Many thanks,
Newbi