Hello, I'm having a slight problem with my code. The task is to create an indexing program, similar to the ones the google uses.
The problem i'm having is that we have to remove the common ending from the words left after the removal of stop_words(which is a list variable not a string variable). I proceed to convert every item in the list, one at a time, to a string with code as follows
leaf_words = "s","es","ed","er","ly","ing"
for words in line_stop_words:
#line_stop_words is the list of words without any "stop words" present, eg only the essential info
stemming_word = ""
for chars in words:
print chars
stemming_word = chars
if stemming_word[-1] == leaf_words:
stemming_word[-1] = ""
#to remove that letter from the string
print stemming_word
Two issues i have are that, each time its finished with the first line of text, it throws an error, saying the index is out of bounds. The problem i believe lies in the if statement because i dont think the for loop is moving to the next item in the line_stop_words list.
Second of all it doesnt actually remove the leaf word from the main string ( Say you have blows, it doesnt remove the s)
Any help or advice you can would be very helpful.
The rest of the code, so you know what im talking about is:
import string
i = 0
text_input = ""
total_text_input = ""
line = []
n = 0
char = ""
while i != 1:
text_input = raw_input ("")
if text_input == ".":
i = 1
else:
new_char_string = ""
for char in text_input:
if char in string.punctuation:
char = " "
new_char_string = new_char_string + char
line = line + [new_char_string.lower()]
total_text_input = (total_text_input + new_char_string).lower()
stop_words = "a","i","it","am","at","on","in","of","to","is","so","too","my","the","and","but","are","very","here","even","from","them","then","than","this","that","though"
line_stop_words = []
word_list = ""
sent = ""
word = ""
for sent in line:
word_list = string.split(sent)
new_string = ""
for word in word_list:
if word not in stop_words:
new_string = new_string + word + ";"
new_string = string.split(new_string,";")
line_stop_words = line_stop_words +[new_string]