Basically, I am trying to extract text between two strings within a loop as one of the two words changes after the information is extracted.

so for example, the string is:

string = alpha 111 bravo 222 alpha somethingA end, 333 bravo somethingB end 444 alpha 555 bravo

So I want to extract the text between alpha and end and then bravo and end. I have quite a few of these unique words in my file so I have a list and a counter to go through them. See the code below:

string = 'alpha 111 bravo 222 alpha somethingA end, 333 bravo somethingB end 444 alpha 555 bravo'

words = ['alpha', 'bravo'] #there will be more words here

counter = 0

stringOut = ''

#going through the list of words

while counter < len(words):

    firstWord = words[counter]

    lastWord = 'end'     

    data = string[string.find(firstWord)+len(firstWord):string.find(lastWord)].strip() 

    #this will give the text between the first ocurrance of "alpha" and "end"
    #since I want just the smallest string between "alpha" and "end", I use another while loop
    #to see if firstWord occurs again

    while firstWord in data:
            ignore,ignore2,data = data.partition(str(firstWord))

    counter = counter + 1

    stringOut += str(data) + str('\n')

print('output string is \n' + str(stringOut))

#this code gives the correct output for the text between the first word ("alpha") and "end".
#but when the list moves to the next string "bravo", it takes the text between the first "bravo" 
#and the "end" that was associated with the information required for "alpha" ("somethingA")

Can anyone help me with this please? Any suggestions are welcome.

Many thanks.

For fun one with regex,but i guess this is a school task?

import re

string = 'alpha 111 bravo 222 alpha somethingA end, 333 bravo somethingB end 444 alpha 555 bravo'
pattern = re.compile(r'(alpha|bravo)(\s\w+\s)(end)')
for match in pattern.finditer(string):
    print match.group(2).strip()

"""Output-->
somethingA
somethingB
"""

This code would fail with words like 'bend' or 'send' inside the data, but could give you idea.

t = "alpha 111 bravo 222 alpha somethingA end, 333 bravo somethingB end 444 alpha 555 bravo"
keywords = 'alpha', 'bravo'
if 'end' in t:
    for part in t.split('end')[:-1]:
        last, key = max((part.rfind(key)+len(key), key) for key in keywords)
        print key,':',part[last:]

Thank you both for your reply. snippsat. You answer works perfectly.

I managed to get around the problem by marking the index of the string where word appears

word = ['alpha', 'bravo'] #...
counter = 0

marker0 = fileString.index(word)
marker0 = marker0 + len(word)
marker0 = fileString.index(word,marker0)

But your solution looks more robust as it doesnt matter how many times the word appears in the string before the required information!

Thanks! :)

But would using this line of code :

pattern = re.compile(r'(alpha|bravo)(\s\w+\s)(end)')

mean that the whole list of words need to written within this manually?
The reasom I am asking is that I have many of these unique words that I need to extract info for. So :

word = ['alpha', 'bravo', '....', '.....' , 'etc'] #could you quite a few here

Is there a way to use variable within the re.compile statement?

'|'.join(words)
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.