Hi,
I am comparing 2000 files with one other file. I want the program to go through each line in both files and compare. If the line is present, then it has to write to another file. What I tried was to open both the files and use readlines() to read into an list. Then I used for loop like this:
chain_sep=[]
complex_file=open ("1complex.txt", "r")
complex_lines = complex_file.readlines()
complex_lines = map(string.strip, complex_lines)
splitter = [s.split('\t') for s in complex_lines]
complex_file.close()
for file in os.listdir("."):
basename=os.path.basename(file)
if basename.endswith(".pd"):
chain_sep.append(basename)
for (i,s) in izip(chain_sep,splitter):
fhandle_6 =open (i, "r")
from_pd = fhandle_6.readlines()
from_pd = map(string.strip,from_pd)
fhandle_6.close()
fhandle_13 = open(s[0]+".cr", 'r')
fhandle_13_l = fhandle_13.readlines()
fhandle_13_l = map(string.strip, fhandle_13_l)
fhandle_13.close()
fopen_7=open (i+"r.pdb", "w")
fopen_8=open (i+"l.pdb", "w")
for (a,y) in izip(from_pd,fhandle_13_l): #from_pd and fhandle_13_l is not of the same length :(
if a[0:4]=="ATOM":
if a[21] == "R":
print >>fopen_7, a
else:
if a[7:13]==y[7:13]:
print >>fopen_8, a
fopen_7.close()
fopen_8.close()
The above code is only a chunk btw. My problem is that both the files are not of the same size so I feel using zip or izip is not ideal in this situation. A part or the file I have to deal with is below:
file-1
ATOM 2197 [b]CB CYS I 51[/b] 38.091 -13.002 6.320 1.00 20.12
ATOM 2198 [b]SG CYS I 51[/b] 39.781 -12.827 5.691 1.00 26.67
ATOM 2199 [b]N MET I 52[/b] 37.845 -15.766 5.722 1.00 33.08
ATOM 2200 [b]CA MET I 52[/b] 38.312 -17.144 5.674 1.00 33.08
file-2
ATOM 2197 [b]O ASP L 50[/b] 18.653 89.329 84.802 1.00 0.00
ATOM 2198 [b]CB ASP L 50[/b] 16.004 87.278 84.523 1.00 0.00
ATOM 2199 [b]CG ASP L 50[/b] 15.349 86.109 85.277 1.00 0.00
ATOM 2200 [b]OD1 ASP L 50[/b] 15.347 85.935 86.514 1.00 0.00
The only part that is common to both files is the one in bold (the above is just a chunk of a code). So ideally I am supposed to compare the bold data from file 1 and if it exists in file 2, I have to retain it and remove the remaining data.
For e.g.:
[b]CB CYS I 51[/b]
[b]CB CYS I 51[/b]
If the above entry is there in both files then I gotto retain it in file-2 and remove all other entries. I tried to add the required list position to the sample code you gave me but I failed to get the results. Please let me know if I can differentiate the above data and if so how can I do it? I tried the same in perl and I am able to do it very easily but the same in python is becoming tougher for me as I am very new to python (learning for the past week or so)
Cheers,
Chav