Hi All
Hope everyones doing good.
I have two \t files with 3 columns, file1 contains 600050 rows and file2 contains 11221133 rows.
I am comparing file2 with file1 to match common entries in first two columns, if file1[0:2] in file2[0:2 ,] write file2[0:2]+column 3 else fil1[0:2] + 5.
I did this by using two dictionaries with elements of column[0:2] is key and column[3] as value and but loading of file2 in dictionary gave a memory error.
dict2[item1,item2] = item3
MemoryError
I also tried list but memory error, sets works but huge duplicates and unordered. I want to preserve the order as same as file1.
find attached test files.
f1 = open(file1)
f2 = open(file2) # very very large file
f3 = open(file3,'w')
dict1 = {}
dict2 = {}
for line in f1:
lstrip = line.strip('\n')
item1,item2,item3 = lstrip.split()
dict1[item1,item2] = item3
for line in f2:
lstrip = line.strip('\n')
item1,item2,item3 = lstrip.split()
dict2[item1,item2] = item3 # here its giving memory error
for item in dict1.keys():
if item in dict2:
match = item[0] +'\t'+ item[1] + '\t' + dict2[item] + '\n'
f3.write(data)
else:
data= item[0] +'\t'+ item[1] + '\t' + str(0) + '\n'
f3.write(data)
f1.close()
f2.close()
f3.close()