Displaying the original text in a string

Question

kailashbuki 0 Newbie Poster

14 Years Ago

I have a string as below

"i am genie. who are you? we worked together in a hotel called xsis."

Normally string like the above is cleaned. Cleaning involves removing whitespaces. Thus the cleaned version of the above string would be

"iamgenie.whoareyou?weworkedtogetherinahotelcalledxsis."

It is divided into kgrams of fixed size. Assuming size of kgram as five, the kgrams for the above string would be

'iamge', 'amgen', 'mgeni', 'genie', ... , 'isiam', 'siamg'

I want to display the actual text in the original string depending on the kgram chosen. For instance, if kgram 'iamge' is chosen, the output would be 'i am ge'. Please suggest me, if possible with a python implementation. Thanking you in advance.

python

6 Contributors
8 Replies
191 Views
1 Week Discussion Span
Latest Post 14 Years Ago Latest Post by TrustyTony

All 8 Replies

woooee 814 Nearly a Posting Maven

14 Years Ago

This is a simple dictionary with 'iamge' pointing to 'i am ge'. Most of the online tutorials cover dictionaries.

richieking 44 Master Poster

14 Years Ago

And how do you intend to work with this data format?

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

loveerl 0 Newbie Poster · Answer 1 · 2011-01-06T10:21:26+00:00

I'm not sure I understand how you are getting your string to look like that. Could you tell me how that occurs?

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 2 · 2011-01-08T07:14:24+00:00

Yes, richieking, I see that this n-gram are starting from every letter of the document, so my suggestion would be instead of wooees dict to record something like file position counter for n-gram starting positions. Would add quite a lot to space requirements though. To speed things up you would keep the original text in memory in addition to the n-gram data.

Or you could have tuple n-gram,(list of space indexes in n-gram) This list of indexes would be easy to get from n-gramming process as side product splitting data to n-gram stream and index stream of spaces (hole two to five of them)

Democles 0 Light Poster · Answer 3 · 2011-01-08T07:57:34+00:00

Not sure exactly what you are trying to do. You could have a dictionary or list (whatever you want to use) count the letters to each white space then later re-insert the white spaces based off the dictionary. Of course it wouldn't be a permanent fix, each sentence or paragraph would have it's own dictionary or list.Just an idea.

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 4 · 2011-01-08T16:40:56+00:00

t = "i am genie. who are you? we worked together in a hotel called xsis."
n  = 6
indexes = list(i for i,letter in enumerate(t) if not letter.isspace())
print 'ngram %i with spaces: %s' % (n,t[indexes[n]:indexes[n+5]])

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 5 · 2011-01-08T23:32:11+00:00

t = "i am genie. who are you? we worked together in a hotel called xsis."
wrap = 7
indexes = list(index for index,letter in enumerate(t+t[:wrap]) if letter.isalpha())
for k in (2,3,5):
    print '\n', t
    print '\n%igram with non-letters:' % k
    print ', '.join(repr((t+t[:wrap])[indexes[n]:indexes[n+k]]) for n in range(len(indexes)-wrap+2))
    print '\n%igram only letters:' % k
    print ','.join("%r" % ''.join((t+t[:wrap])[index]  for index in indexes[n:n+k]) for n in range(len(indexes)-wrap+2))

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 6 · 2011-01-13T00:54:22+00:00

TrustyTony 888 ex-Moderator

14 Years Ago

Is this marked solved?

Displaying the original text in a string

Recommended Answers Collapse Answers

All 8 Replies

Recommended Answers