finding text between two specified words, when one of the two words changes

Question

romes87 0 Newbie Poster

12 Years Ago

Basically, I am trying to extract text between two strings within a loop as one of the two words changes after the information is extracted.

so for example, the string is:

string = alpha 111 bravo 222 alpha somethingA end, 333 bravo somethingB end 444 alpha 555 bravo

So I want to extract the text between alpha and end and then bravo and end. I have quite a few of these unique words in my file so I have a list and a counter to go through them. See the code below:

string = 'alpha 111 bravo 222 alpha somethingA end, 333 bravo somethingB end 444 alpha 555 bravo'

words = ['alpha', 'bravo'] #there will be more words here

counter = 0

stringOut = ''

#going through the list of words

while counter < len(words):

    firstWord = words[counter]

    lastWord = 'end'     

    data = string[string.find(firstWord)+len(firstWord):string.find(lastWord)].strip() 

    #this will give the text between the first ocurrance of "alpha" and "end"
    #since I want just the smallest string between "alpha" and "end", I use another while loop
    #to see if firstWord occurs again

    while firstWord in data:
            ignore,ignore2,data = data.partition(str(firstWord))

    counter = counter + 1

    stringOut += str(data) + str('\n')

print('output string is \n' + str(stringOut))

#this code gives the correct output for the text between the first word ("alpha") and "end".
#but when the list moves to the next string "bravo", it takes the text between the first "bravo" 
#and the "end" that was associated with the information required for "alpha" ("somethingA")

Can anyone help me with this please? Any suggestions are welcome.

Many thanks.

python

Edited 12 Years Ago by romes87

3 Contributors
6 Replies
250 Views
4 Days Discussion Span
Latest Post 12 Years Ago Latest Post by TrustyTony

TrustyTony 888 ex-Moderator

12 Years Ago

You might find this my code snippet useful as example: http://www.daniweb.com/software-development/python/code/289548/picking-piece-of-string-between-separators

snippsat 661 Master Poster

12 Years Ago

For fun one with regex,but i guess this is a school task?

import re

string = 'alpha 111 bravo 222 alpha somethingA end, 333 bravo somethingB end 444 alpha 555 bravo'
pattern = re.compile(r'(alpha|bravo)(\s\w+\s)(end)')
for match in pattern.finditer(string):
    print match.group(2).strip()

"""Output-->
somethingA
somethingB
"""

Edited 12 Years Ago by snippsat

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 1 · 2013-02-05T06:31:34+00:00

This code would fail with words like 'bend' or 'send' inside the data, but could give you idea.

t = "alpha 111 bravo 222 alpha somethingA end, 333 bravo somethingB end 444 alpha 555 bravo"
keywords = 'alpha', 'bravo'
if 'end' in t:
    for part in t.split('end')[:-1]:
        last, key = max((part.rfind(key)+len(key), key) for key in keywords)
        print key,':',part[last:]

romes87 0 Newbie Poster · Answer 2 · 2013-02-08T09:27:41+00:00

Thank you both for your reply. snippsat. You answer works perfectly.

I managed to get around the problem by marking the index of the string where word appears

word = ['alpha', 'bravo'] #...
counter = 0

marker0 = fileString.index(word)
marker0 = marker0 + len(word)
marker0 = fileString.index(word,marker0)

But your solution looks more robust as it doesnt matter how many times the word appears in the string before the required information!

Thanks! :)

romes87 0 Newbie Poster · Answer 3 · 2013-02-08T09:39:33+00:00

But would using this line of code :

pattern = re.compile(r'(alpha|bravo)(\s\w+\s)(end)')

mean that the whole list of words need to written within this manually?
The reasom I am asking is that I have many of these unique words that I need to extract info for. So :

word = ['alpha', 'bravo', '....', '.....' , 'etc'] #could you quite a few here

Is there a way to use variable within the re.compile statement?

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 4 · 2013-02-08T16:19:22+00:00

TrustyTony 888 ex-Moderator

12 Years Ago

'|'.join(words)