I have a strange problem, I have written a for loop to check a list of words for a match and it is not checking all of the words:
I am using two files to check for matched words (Enwords.txt - a list of English words and Encontract.txt - a list of English contractions) and I am able to find a match against that list and a corresponding list that doesn't match. When I run the non-matched list against the Encontract.txt file, it is not checking all the words. Here is my code:
fullwords = open("Enwords.txt").read()
contrwords = open("Encontract.txt").read()
wordlist = []
nonwordlist = []
failedwords = []
string4 = "I have already explained this I thought. Okay here it is again. I beleive that the laws of physics apply to the universe and always have. Those laws do not allow the universe to be only 6000 years old. So the only way in which God could be responsible for the creation of the universe is if the bible is incorrect in its description of creation. That doesn't mean there is no God, but only that if he does exist, he is not the author of the bible if he meant to have Genesis taken literally. This is what I believe. There is no way around it. If what you say is actually true, then all the laws of physics are malarky. I cannot accept that. The big bang happend 13.7 billion years ago. Period, So if God caused it, Genesis is simply wrong. If he didn't cause it, then there most likely is no God. I can't think of another way to put it."
testdoc = string4
stripped_text = ""
for c in testdoc:
if c in "!@#$%^&*().[],{}<>?": # need to remove punctuation from list of words
c = ""
stripped_text += c
testdoc = stripped_text
words = testdoc.split(' ')
for word in words:
word = word.lower()
if word in fullwords: # test to see if word in full word list
if word not in wordlist:
wordlist.append(word)
else:
if word not in nonwordlist:
nonwordlist.append(word)
print nonwordlist
word = ""
print "Checking Alternate Word List . . ." # check dictionary for alternate words and update
for word in nonwordlist:
if word in contrwords:
wordlist.append(word)
nonwordlist.remove(word)
print "Added ", word, " to the word list."
else:
print "Failed to find ", word
nonwordlist.remove(word)
failedwords.append(word)
print ""
wordlist.sort()
print "Word list: ", wordlist
print ""
nonwordlist.sort()
print "Non word list: ", nonwordlist
print ""
failedwords.sort()
print "Failed Words: ", failedwords
print ""
And here are the results:
python indextext.py
['beleive', '6000', "doesn't", 'happend', '137', "didn't", "can't"]
Checking Alternate Word List . . .
Failed to find beleive
Added doesn't to the word list.
Failed to find 137
Added can't to the word list.
Word list: ['accept', 'actually', 'again', 'ago', 'all', 'allow', 'already', 'always', 'and', 'another', 'apply', 'are', 'around', 'author', 'bang', 'be', 'believe', 'bible', 'big', 'billion', 'but', "can't", 'cannot', 'cause', 'caused', 'could', 'creation', 'description', 'do', 'does', "doesn't", 'exist', 'explained', 'for', 'genesis', 'god', 'have', 'he', 'here', 'i', 'if', 'in', 'incorrect', 'is', 'it', 'its', 'laws', 'likely', 'literally', 'malarky', 'mean', 'meant', 'most', 'no', 'not', 'of', 'okay', 'old', 'only', 'period', 'physics', 'put', 'responsible', 'say', 'simply', 'so', 'taken', 'that', 'the', 'then', 'there', 'think', 'this', 'those', 'thought', 'to', 'true', 'universe', 'way', 'what', 'which', 'wrong', 'years', 'you']
Non word list: ['6000', "didn't", 'happend']
Failed Words: ['137', 'beleive']
The problem is that it is not finding "didn't" when it is on the list and it doesn't appear to be checking "didn't", '6000' or 'happend' when it checks the failedwords list against the Encontract.txt file.
The Enword.txt and Encontract.txt files are plain text files with one word per line used for checking for valid words. I have verified the existence of the expected contractions (can't, didn't, doesn't) but can't tell what is happening.
Any help would be appreciated...