Hi all,
I have text file as follows...
>s1
MPPRRSIVEVKVLDVQKRRVPNKHYVYIIRVTWSSGATEAIYRRYSKFFDLQMQMLDKFP MEGGQKDPKQRIIPFLPGKILFRRSHIRDVAVKRLIPIDEYCKALIQLPPYISQCDEVLQ FFETRPEDLNPPKEEHIGKKKSGNDPTSVDPMVLEQYVVVADYQKQESSEISLSVGQVVD
>s2
MAEVRKFTKRLSKPGTAAELRQSVSEAVRGSVVLEKAKLVEPLDYENVITQRKTQIYSDP LRDLLMFPMEDISISVIGRQRRTVQSTVPEDAEKRAQSLFVKECIKTYSTDWHVVNYKYE DFSGDFRMLPCKSLRPEKIPNHVFEIDEDCEKDEDSSSLCSQKGGVIKQGWLHKANVNST
. . .
I wanted to count letter 'P' in each sequences output should be
> s1:10
> s2:20
To acheive this python script as follows
infile=open("file1.txt",'r')
out=open("file2.csv",'w')
for line in infile:
line = line.strip("\n")
if line.startswith('>'):
name=line
else:
pattern = line.count('P')
print '%s:%s' %(name,pattern)
out.write('%s:%s\n' %(name,pattern))
it reads line by line and gives result as follows
> s1:2
> s1:3
> s1:5
> s2:10
> s2:10
But i would like to have out put as follows
> s1:10
> s2:20 . . .
Can any body help how to do this...
Thanks in Advance Ni