hi, I am again involve in solving one trivial problem , that is I have a text file in which large number of entry are there like
proteinid
sp|P13368|7LESS_DROME fn3 fn3 fn3 Pkinase_Tyr
sp|P14599|A4_DROME A4_EXTRA APP_amyloid
sp|P09478|ACH1_DROME Neur_chan_LBD Neur_chan_memb
sp|P17644|ACH2_DROME Neur_chan_LBD Neur_chan_memb
sp|P04755|ACH3_DROME Neur_chan_LBD Neur_chan_memb
sp|P25162|ACH4_DROME Neur_chan_LBD Neur_chan_memb
sp|P16395|ACM1_DROME 7tm_1 7tm_1
sp|Q9VAC5|ADA17_DROME Pep_M12B_propep Reprolysin Disintegrin
sp|Q9VW60|ADCY2_DROME Guanylate_cyc Guanylate_cyc
sp|Q26365|ADT_DROME Mito_carr Mito_carr Mito_carr
sp|Q8INB9|AKT1_DROME PH Pkinase Pkinase_C
sp|P15364|AMAL_DROME I-set I-set I-set
sp|P91926|AP2A_DROME Adaptin_N Alpha_adaptinC2 Alpha_adaptin_C
sp|P54362|AP3D_DROME Adaptin_N BLVR
sp|P18824|ARM_DROME Arm Arm Arm Arm
sp|P22700|ATC1_DROME Cation_ATPase_N E1-E2_ATPase Hydrolase Cation_ATPase_C
sp|P13607|ATNA_DROME Cation_ATPase_N E1-E2_ATPase Hydrolase Cation_ATPase_C
sp|P35381|ATPA_DROME ATP-synt_ab_N ATP-synt_ab ATP-synt_ab_C
sp|Q05825|ATPB_DROME ATP-synt_ab_N ATP-synt_ab ATP-synt_ab_C
sp|P12428|BROWN_DROME ABC_tran ABC2_membrane
sp|Q7KT91|C3390_DROME GBA2_N DUF608
sp|Q9VYY4|C4G15_DROME p450 p450
sp|P91645|CAC1A_DROME Ion_trans Ion_trans Ion_trans Ion_trans
now in this file a proteinid is given in aline and infront of them domains are present which are present in that protein are given................. now i want to know which two protein are similiar in domain content , and which protein are diffrent in domain content ..............as for an example i have this of file more than 30, 000 entries so impossible to check through eyes..