Best data structure for this problem ?

Question

koveras vehcna 0 Newbie Poster

14 Years Ago

Hello everyone, I have created my own random text generator with a custom method, no Markov chains included, and now I would like to try it on a different text corpus that is larger from that of NLTK's and I wanted to know which Data structure should I use in order to make the code work faster since additional text files will surely make the code a painstaking procedure to execute. My algorithm is as follows:

1- Enter the trigger sentence -only once, at the beginning of the program-
2- Get the longest word in the trigger sentence
3- Find all the sentences of the corpus that contain the word at step2
4- Randomly select one of those sentences
5- Get the sentence (named sentA to resolve the ambiguity in description) that follows the sentence picked at step4 -so long as sentA is longer than 40 characters-
6- Go to step 2, now the trigger sentence is the sentA of step5

Which data structure would be the most optimal for this one ? -I originally used Lists for the code I created- Thanks in advance.

data-structure python

2 Contributors
2 Replies
240 Views
17 Hours Discussion Span
Latest Post 14 Years Ago Latest Post by koveras vehcna

TrustyTony 888 ex-Moderator

14 Years Ago

Profile your code with cProfile to see what operations take most time. i would think that dictionary of list of sentences (or their index) containing given word would be helpfull.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

koveras vehcna 0 Newbie Poster · Answer 1 · 2011-04-18T11:48:48+00:00

Profile your code with cProfile to see what operations take most time. i would think that dictionary of list of sentences (or their index) containing given word would be helpfull.

Thanks for the information.