Hey guys,
I'm working on a basic search engine and am really close to completion.
I currently have a function that takes a string and compares each word and its synonyms to a webpage.
My output at the moment is [("closeness" percentage of terms to webpage, webpage contents,(x,y),(x,y)...(x,y)]
I am almost there, but I now need to remove the items that have no match to a site (ie, where x = 0.
I have found out that the itemgetter() function isolated just the first variables, then I filtered out the zeros from there with this code
def Google_search(string):
internet_length = len(Internet)
percentage_list = []
for x in range(0,internet_length):
position = x
closeness_percentage = closeness(string, Internet[x])
percentage_list.append([closeness_percentage, Internet[position]])
sorted_list = sorted(percentage_list, key=operator.itemgetter(1), reverse = True)
## print sorted_list
## now to delete the ones with zero percentage
get_percentages = operator.itemgetter(0)
percentages = map(get_percentages, sorted_list)
print percentages
no_zeros = [x for x in percentages if x is not 0]
print no_zeros
print sorted_list
So any example of the output would be
[13, 0, 3, 2, 0, 0, 4, 0, 0, 6, 2, 3, 0, 0]
[13, 3, 2, 4, 6, 2, 3]
This is good, however, deleting the zeros from percentage only list does not correlate to them being deleted from the list with the webpages - obviously as its a new list!
I have been straining my brain for hours about how to get around this! I think I need to make a loop that compares the 2nd value in each SUBLIST to the values of the original list, then if its a match return true, then filter the results! But i dont know how to do something like
for x in range(0, length):
for y in range(0, no_zeros_length):
if sorted_list[x].itemgetter(1) == no_zeros:
return true
Do you guys get what I mean? Or is there a much easier way to omit the zeros from the original list?
Thanks heaps in advance!
ps. Ive attached the file (rename to .py if you want to use it)..so its easier to understand whats going on as this is part 4 and each part is dependant on the others before it (thought it would be too much code for a post)!
or get them here