Hello everyone, I'm working on a random text generator -without using Markov chains- and currently it works without too many problems. Firstly, here is my code flow:
1-Enter a sentence as input -this is called trigger string, is assigned to a variable-
2-Get longest word in trigger string
3-Search all Project Gutenberg database for sentences that contain this word -regardless of uppercase lowercase-
4-Return the longest sentence that has the word I spoke about in step 3
5-Append the sentence in Step 1 and Step4 together
6-Assign the sentence in Step 4 as the new 'trigger' sentence and repeat the process. Note that I have to get the longest word in second sentence and continue like that and so on-
And here is my code:
import nltk
from nltk.corpus import gutenberg
from random import choice
import smtplib
triggerSentence = raw_input("Please enter the trigger sentence: ")#get input str
longestLength = 0
longestString = ""
listOfSents = gutenberg.sents() #all sentences of gutenberg are assigned -list of list format-
listOfWords = gutenberg.words()# all words in gutenberg books -list format-
while triggerSentence:
#so this is run every time through the loop
split_str = triggerSentence.split()#split the sentence into words
#code to find the longest word in the trigger sentence input
for piece in split_str:
if len(piece) > longestLength:
longestString = piece
longestLength = len(piece)
#lowerStr = longestString.lower()
#code to get the sentences containing the longest word, then selecting
#random one of these sentences that are longer than 40 characters
sets = []
for sentence in listOfSents:
if sentence.count(longestString):
sents= " ".join(sentence)
if len(sents) > 40:
sets.append(" ".join(sentence))
triggerSentence = choice(sets)
print triggerSentence
My concern is, the loop mostly reaches a point where the same sentence is printed over and over again. To counter this problem I decided to do the following:
*If the longest word in the current sentence is the same as it was in the last sentence, simply delete this longest word from the current sentence and look for the next longest word.
I tried some implementations but failed to apply the solution above. Any suggestions about how to find the second longest word ? Thanks in advance.