ENSTRUG00000000009 ENSTRUT00000000011 1026 509 5896
ENSTRUG00000000011 ENSTRUT00000000014 420 63 482
ENSTRUG00000000012 ENSTRUT00000000015 10902 15313 93157
ENSTRUG00000000012 ENSTRUT00000000016 2844 23243 60985
as this is my input file it has five coloumns and there is for each line , we have to identify unique entry from the first coloumns which high value for the third coloumns like
ENSTRUG00000000009 ENSTRUT00000000012 1026 1503 6379
ENSTRUG00000000011 ENSTRUT00000000014 420 63 482
ENSTRUG00000000012 ENSTRUT00000000015 10902 15313 93157
my code is
from sys import *
import operator
file = open(argv[1],'r')
outfile = open(argv[2],'w')
buffer = []
gene = ''
cds = {}
rec = file.readlines()
for line in rec :
field = line.split()
if (gene != field[0]):
header = field[0]#header is the variable caries the values
print header,
#outfile.writelines(header+"\t")
gene = field[0]
transcript = field[1]
#print transcript
cds[field[1]]=field[2]
#print cds
protein = max(cds.iteritems(), key=operator.itemgetter(1))[0]
print protein