I am stuck with large text files which I have to merge and further work with my model.
I tried to follow the previous thread in text merging
http://www.daniweb.com/forums/thread38625.html
Here is the script which I used:
one = open("one.txt",'r')
two = open("two.txt",'r')
ofh = open("out.txt",'w')
# read in the first file and create a dict:
d = {}
for line in one:
# we remove the newline and split the line
k = line.strip().split('\t')
d = k[0]
for line in two:
result = ""
h= line.strip().split('\t')[0]
# for every character in the line ...
for i in h:
# check if there is an entry for it in the dict
if i in d:
# if yes, add the value to the result-string
result += d[i]
else: pass
# print the result
print>>ofh, result
ofh.close()
one.close()
two.close()
but keep getting error message and could not go further. I think I have not got the main point in the whole process so I am lost.
file one looks:
1 UK IR
1 UK SC
2 US WS
2 US CL
2 US ND
2 US TX
2 US NB
2 US SC
3 Germany WW
3 Germany BR
3 Germany DR
4 France PR
4 France ST
> it has over 2 million rows of diffrent countries
file two looks:
1 UK 2.3 3.1 5.3
2 US 3.3 3.4 2.3
3 Germnay 1.3 5.1 4.1
4 France 2.3 3.1 3.3
> file two has about 2 thousands entries
The two files are to be combined based on the first coloumns!
The output file should look like:
1 UK IR 2.3 3.1 5.3
1 UK SC 2.3 3.1 5.3
2 US WS 3.3 3.4 2.3
2 US CL 3.3 3.4 2.3
2 US ND 3.3 3.4 2.3
2 US TX 3.3 3.4 2.3
2 US NB 3.3 3.4 2.3
2 US SC 3.3 3.4 2.3
3 Germany WW 1.3 5.1 4.1
3 Germany BR 1.3 5.1 4.1
3 Germany DR 1.3 5.1 4.1
4 France PR 2.3 3.1 3.3
4 France ST 2.3 3.1 3.3
The files are larg grid files converted into text and have to be combined. It does not matter if all content of file two are joined with file one.
Any suggestions is greatly appreciated. I am stucked with this and could not go further with my work.
chebude