Hi everybody,
I've got a code which returns to a given text an inverse index. From a list of tokens, the function produces a list sorted by the frequency.
Example:
inverted_index(['the', 'house', ',', 'the', 'beer'])
[('the', [0, 3]), ('beer', [4]), ('house', [1]), (',', [2])]
Code:
def outp(txt):
ind = {}
for word in txt:
if word not in ind.keys():
i = txt.index(word)
ind[word] = [i]
else:
i = txt.index(word, ind[word][-1]+1)
ind[word].append(i)
sorted_ind = sorted(ind.items(), key=lsort, reverse=True)
return sorted_ind
def lsort(kv):
return len(kv[1])
The code works, but it's very slow.
So, my question is: How could it be written s.t. the code is faster?
Thanks for any propositions, Darek