The actual file is as follows
>sp|P81928|140U_DROME
67 198 Tim17 8.9e-19 No_clan
>sp|P20905|5HT1R_DROME
179 507 7tm_1 1.1e-97 CL0192
>sp|P28285|5HT2A_DROME
243 805 7tm_1 3.2e-73 CL0192
>sp|P28286|5HT2B_DROME
107 588 7tm_1 7.2e-82 CL0192
>sp|P13368|7LESS_DROME
439 520 fn3 1.4e-10 CL0159
1313 1380 fn3 3.4e-05 CL0159
1800 1890 fn3 3.6e-12 CL0159
2209 2481 Pkinase_Tyr 3.7e-91 CL0016
>sp|P14599|A4_DROME
26 198 A4_EXTRA 4.9e-75 No_clan
826 884 APP_amyloid 1.5e-24 No_clan
>sp|P91927|A60DA_DROME
178 446 LETM1 3.7e-114 No_clan
>sp|Q24093|ABHD2_DROME
158 362 Abhydrolase_1 5.5e-07 CL0028
>sp|Q9VIP7|ACASE_DROME
19 266 aPHC 8e-64 No_clan
>sp|P07140|ACES_DROME
26 601 COesterase 3.6e-172 CL0028
>sp|P09478|ACH1_DROME
26 240 Neur_chan_LBD 1.3e-78 No_clan
247 530 Neur_chan_memb 5.8e-78 No_clan
>sp|P17644|ACH2_DROME
46 261 Neur_chan_LBD 2.3e-76 No_clan
268 543 Neur_chan_memb 1.1e-73 No_clan
>sp|P04755|ACH3_DROME
28 236 Neur_chan_LBD 9.5e-71 No_clan
243 498 Neur_chan_memb 1.9e-69 No_clan
>sp|P25162|ACH4_DROME
31 245 Neur_chan_LBD 3e-73 No_clan
252 479 Neur_chan_memb 2.2e-71 No_clan
>sp|P16395|ACM1_DROME
121 318 7tm_1 2.2e-57 CL0192
695 772 7tm_1 3.3e-20 CL0192
>sp|Q9VAC5|ADA17_DROME
32 160 Pep_M12B_propep 3.7e-13 No_clan
394 464 Reprolysin 2.2e-05 CL0126
477 555 Disintegrin 2.9e-13 No_clan
>sp|Q9VW60|ADCY2_DROME
310 490 Guanylate_cyc 7.2e-54 CL0276
1102 1300 Guanylate_cyc 1.6e-55 CL0276
>sp|Q9VCY8|ADRL_DROME
198 419 HlyIII 3e-71 No_clan
>sp|Q26365|ADT_DROME
21 114 Mito_carr 8.1e-27 No_clan
127 217 Mito_carr 5.3e-23 No_clan
224 312 Mito_carr 8.4e-14 No_clan
program I wrote is given below
infile = open('memb_protein2.hmmout','r')
rec = infile.read()
records = rec.split('>')[1:]
line = []
protein = ''
for item in records:
domains = item.count('\n') - 1
if domains != 1:
protein = item.split('\n',1)[0]
dom_line = item.split('\n',1)[1]
dom_present = dom_line.split('\n')[:-1]
#print dom_present
for item in dom_present:
#print item
dom = item.split('\t')
#print dom[2]
#dom_list = dom.split('\t')
#dom_name = dom_list[2]
#print dom_name
entry = dom[2]
line.append(entry)
entry = ''
print protein + '\t',
for item in line :
print item + '\t',
line = []
print '\n'
#seq = ''.join(line)
#file('multi_dom','a').write(seq)
infile.close()
the results comes is as follows
tr|B7YZE8|B7YZE8_DROME Ion_trans_N Ion_trans cNMP_binding
tr|B7Z145|B7Z145_DROME EGF_2 EGF_2 EGF_2 EGF_2 EGF_2 EGF_2 EGF_2
tr|Q8MQM7|Q8MQM7_DROME ANF_receptor Lig_chan-Glu_bd Lig_chan
tr|Q9VVS0|Q9VVS0_DROME Mito_carr Mito_carr Mito_carr
tr|O18367|O18367_DROME Na_Ca_ex Calx-beta Calx-beta Na_Ca_ex
tr|Q8IQW3|Q8IQW3_DROME Mito_carr Mito_carr
tr|Q8IGX4|Q8IGX4_DROME Cadherin Cadherin Cadherin Cadherin Cadherin Cadherin Cadherin Cadherin Cadherin Cadherin Cadherin
tr|Q9VFH5|Q9VFH5_DROME Cadherin Cadherin Cadherin Cadherin Cadherin Cadherin Cadherin Cadherin Cadherin
tr|Q7KT97|Q7KT97_DROME Neur_chan_LBD Neur_chan_memb
tr|Q8SZN6|Q8SZN6_DROME Mito_carr Mito_carr
tr|Q8T0K9|Q8T0K9_DROME SBP_bac_3 Lig_chan
tr|Q9BML7|Q9BML7_DROME ANF_receptor 7tm_3
tr|Q0KIB1|Q0KIB1_DROME V-set I-set C2-set_2 I-set
tr|A1Z855|A1Z855_DROME Ion_trans KCNQ_channel
tr|A1Z9L9|A1Z9L9_DROME Mito_carr Mito_carr Mito_carr
tr|A8QI34|A8QI34_DROME Cation_ATPase_N E1-E2_ATPase Hydrolase Cation_ATPase_C
tr|Q9VSV5|Q9VSV5_DROME ANF_receptor Lig_chan-Glu_bd Lig_chan
tr|Q95SN5|Q95SN5_DROME Ldl_recept_a Ldl_recept_a Ldl_recept_a Ldl_recept_a Ldl_recept_a Ldl_recept_a Ldl_recept_a Ldl_recept_a Ldl_recept_a Ldl_recept_a EGF_CA Ldl_recept_b Ldl_recept_b Ldl_recept_a Ldl_recept_a Ldl_recept_a Ldl_recept_a Ldl_recept_a Ldl_recept_aLdl_recept_a Ldl_recept_a Ldl_recept_a Ldl_recept_a Ldl_recept_a EGF_CA Ldl_recept_b
tr|Q24273|Q24273_DROME V-set C2-set_2 C2-set_2 I-set
tr|Q8IQQ6|Q8IQQ6_DROME SNF SNF
tr|Q8IPK1|Q8IPK1_DROME Mito_carr Mito_carr
tr|Q7PLW4|Q7PLW4_DROME Na_Ca_ex Na_Ca_ex
tr|Q5BHU9|Q5BHU9_DROME Mito_carr Mito_carr
tr|Q2PDZ0|Q2PDZ0_DROME Neur_chan_LBD Neur_chan_memb
tr|Q0E8N6|Q0E8N6_DROME ANF_receptor SBP_bac_3 Lig_chan
tr|Q0KIF2|Q0KIF2_DROME ANF_receptor Lig_chan-Glu_bd Lig_chan
tr|A1Z7J1|A1Z7J1_DROME I-set C2-set_2 C2-set_2 C2-set_2 C2-set_2 C2-set_2 I-set I-set I-set fn3
tr|A8DYR5|A8DYR5_DROME K_tetra Ion_trans
tr|B7YZR4|B7YZR4_DROME Ion_trans KCNQ_channel
tr|B7Z015|B7Z015_DROME C2-set_2 I-set I-set fn3
tr|Q9VYY5|Q9VYY5_DROME Ion_trans_2 Ion_trans_2
tr|Q8IN24|Q8IN24_DROME ANF_receptor 7tm_3
tr|Q7KUV2|Q7KUV2_DROME Neur_chan_LBD Neur_chan_memb
tr|Q3ZZY0|Q3ZZY0_DROME Ion_trans_2 Ion_trans_2
tr|Q0E9F2|Q0E9F2_DROME I-set C2-set_2 C2-set_2 C2-set_2 C2-set_2 C2-set_2 I-set I-set I-set fn3
tr|Q0E8B8|Q0E8B8_DROME V-set C2-set_2 C2-set_2 I-set
tr|A1Z9M0|A1Z9M0_DROME Mito_carr Mito_carr Mito_carr
tr|A1Z9P0|A1Z9P0_DROME Ion_trans_N Ion_trans cNMP_binding
tr|A8DYJ6|A8DYJ6_DROME V-set C2-set_2 I-set
tr|B7Z0Z2|B7Z0Z2_DROME Ion_trans DUF3451 Ion_trans Na_trans_assoc Ion_trans Ion_trans
tr|Q9VVM6|Q9VVM6_DROME Sulfate_transp STAS
tr|Q9VB20|Q9VB20_DROME PSI PSI
tr|Q967X6|Q967X6_DROME V-set C2-set_2 C2-set_2 I-set fn3
tr|Q8IQN2|Q8IQN2_DROME Voltage_CLC CBS CBS
Ignore the green lines in code , but what the results given below is an example , from this program I got similiar type of similiar result obtaioned containing more than 30, 000 lines , I want to compare the results each other and want to know how much of entries in a results are similiar to each other