Hello everyone, I have created my own random text generator with a custom method, no Markov chains included, and now I would like to try it on a different text corpus that is larger from that of NLTK's and I wanted to know which Data structure should I use in order to make the code work faster since additional text files will surely make the code a painstaking procedure to execute. My algorithm is as follows:
1- Enter the trigger sentence -only once, at the beginning of the program-
2- Get the longest word in the trigger sentence
3- Find all the sentences of the corpus that contain the word at step2
4- Randomly select one of those sentences
5- Get the sentence (named sentA to resolve the ambiguity in description) that follows the sentence picked at step4 -so long as sentA is longer than 40 characters-
6- Go to step 2, now the trigger sentence is the sentA of step5
Which data structure would be the most optimal for this one ? -I originally used Lists for the code I created- Thanks in advance.