Hi, I have a file like this, separeted by tabs:
Code Info source start end
GB01672 rpsblast_to_CDD protein_match
GB01672 rpsblast_to_CDD match_part 296 988
GB01673 rpsblast_to_CDD protein_match
GB01673 rpsblast_to_CDD match_part 3803 4147
GB01673 rpsblast_to_CDD match_part 1314 1907
GB01673 rpsblast_to_CDD match_part 3516 3932
GB01673 rpsblast_to_CDD match_part 3335 3463
GB01674 rpsblast_to_CDD protein_match
GB01674 rpsblast_to_CDD match_part 3724 406
GB01674 rpsblast_to_CDD match_part 1314 1907
GB01674 rpsblast_to_CDD match_part 3335 385
All the start and end of the row protein_match are empty. So I need to fill the start and end of the protein_match for each code. The start is the lower number of the match_part (which is just below the protein match for each code)and the end is the high number of the match_part. Taking in mind that each code coul have several match_parts.
For example, the output for the code GB01673 have to be:
GB01673 rpsblast_to_CDD protein_match **1314 4147**
GB01673 rpsblast_to_CDD match_part 3803 4147
GB01673 rpsblast_to_CDD match_part 1314 1907
GB01673 rpsblast_to_CDD match_part 3516 3932
GB01673 rpsblast_to_CDD match_part 3335 3463
I really appreciate if someone can help me!!!
Thanks