hi, am trying to parse a multiple pairwise format into table for example:
Query= m100529_140129_SMRT1_c0000010190006406181231110_s0_p0/32965/0_332_clipped_50:0
(282 letters)
Query: 8 TTTTTGAACAGCCCCAACAACTCTTCCGCTGCCGGTTGCTGCA-TTCCAGTTGTTCCACA 66
||||||||||||||||||||||||||||||||||||||||||| |||||||||||| |||
Sbjct: 4045830 TTTTTGAACAGCCCCAACAACTCTTCCGCTGCCGGTTGCTGCACTTCCAGTTGTTC-ACA 4045772
Query: 67 GTCCAGCTCCAGTTCAACGTCGGTTTAAATCGTCG--AGCT-GTATGAGAGATAAGCATA 123
| ||||||||||||||||||||||| |||||||| |||| |||||||||||||||| |
Sbjct: 4045771 GGTCAGCTCCAGTTCAACGTCGGTTTTAATCGTCGCCAGCTGGTATGAGAGATAAGCA-A 4045713
Query= m100529_140129_SMRT1_c0000010190006406181231110_s0_p0/56521/6_684_clipped_527:0
(151 letters)
Query: 1 CTTCAAAGAGGGAGAATTACGTCGATATTACCGAAGGCTGGGAGAAGGGTGAAAATACAA 60
||||||||||||||||||||||||||||||| ||||||||||||| |||||||| |||||
Sbjct: 1500035 CTTCAAAGAGGGAGAATTACGTCGATATTAC-GAAGGCTGGGAGA-GGGTGAAA-TACAA 1500091
Query: 61 G--AGACGCTCGGCGAGCTGGCGCCG-ACCGACGCCCACGTTAATCG-ATTAAACTGCGT 116
||||| ||||||||||| | ||| ||||||||| ||| |||||| ||||||||||||
Sbjct: 1500092 TGAAGACG-TCGGCGAGCTG-CACCGCACCGACGCCAACGGTAATCGTATTAAACTGCGT 1500149
into table like below:
m100529_140129_SMRT1_c0000010190006406181231110_s0_p0/32965/0_332_clipped_50:0 '\t' TTTTTGAACAGCCCCAACAACTCTTCCGCTGCCGGTTGCTGCA-TTCCAGTTGTTCCACAGTCCAGCTCCAGTTCAACGTCGGTTTAAATCGTCG--AGCT-GTATGAGAGATAAGCATA
||||||||||||||||||||||||||||||||||||||||||| |||||||||||| |||| ||||||||||||||||||||||| |||||||| |||| |||||||||||||||| |
TTTTTGAACAGCCCCAACAACTCTTCCGCTGCCGGTTGCTGCACTTCCAGTTGTTC-ACAGGTCAGCTCCAGTTCAACGTCGGTTTTAATCGTCGCCAGCTGGTATGAGAGATAAGCA-A
m100529_140129_SMRT1_c0000010190006406181231110_s0_p0/56521/6_684_clipped_527:0 '\t'
CTTCAAAGAGGGAGAATTACGTCGATATTACCGAAGGCTGGGAGAAGGGTGAAAATACAAG--AGACGCTCGGCGAGCTGGCGCCG-ACCGACGCCCACGTTAATCG-ATTAAACTGCGT
||||||||||||||||||||||||||||||| ||||||||||||| |||||||| ||||| ||||| ||||||||||| | ||| ||||||||| ||| |||||| ||||||||||||
CTTCAAAGAGGGAGAATTACGTCGATATTAC-GAAGGCTGGGAGA-GGGTGAAA-TACAATGAAGACG-TCGGCGAGCTG-CACCGCACCGACGCCAACGGTAATCGTATTAAACTGCGT
i tried create d python program to do dis as program shown below:
#!/usr/bin/env python
import sys
class Fasta:
def __init__(self, name, pwiseseq):
self.name = name
self.pwiseseq = pwiseseq
def read_pw(file):
items = []
index = 0
for line in file:
if not line.strip():
continue
if line.startswith("Query="):
if index >= 1:
items.append(aninstance)
index+=1
name = line[7:-1]
if line.find('Query:') >= 0:
QseqPW = ''
QseqPW = (line[7:-1]).strip('0123456789 ')
aninstance = Fasta(name, QseqPW)
items.append(aninstance)
return items
filePW = open(sys.argv[1], 'r').readlines()
mydatasets = read_pw(filePW)
for i in mydatasets:
print i.name + '\t' + i.pwiseseq
but unfortunately d output i got only shown the last sequence alignment line for each sequence header as like below:
m100529_140129_SMRT1_c0000010190006406181231110_s0_p0/32965/0_332_clipped_50:0 '\t' GTCCAGCTCCAGTTCAACGTCGGTTTAAATCGTCG--AGCT-GTATGAGAGATAAGCATA
100529_140129_SMRT1_c0000010190006406181231110_s0_p0/56521/6_684_clipped_527:0 '\t'
G--AGACGCTCGGCGAGCTGGCGCCG-ACCGACGCCCACGTTAATCG-ATTAAACTGCGT
can anybody help me to solve dis? thanks