hi,

I am working on a text proccessing project, actually related to protein sequences. I want to list occurrences of a search term with the hit positions. I tried the following, but it only gives it for the first hit.

text = 'MSKSASPKEPEQLRKLFIGGLSFETTDESLRSAHFESSSYGSAGRRF'
index = text.find('SA')
print index

is it possible to get all hits and their positions?

thanks a lot

Of course, you simply move the find() start position up as you search ...

# multiple searches of a string for a substring
# using s.find(sub[ ,start[, end]])

text = 'MSKSASPKEPEQLRKLFIGGLSFETTDESLRSAHFESSSYGSAGRRF'

search = 'SA'
start = 0
while True:
    index = text.find(search, start)
    # if search string not found, find() returns -1
    # search is complete, break out of the while loop
    if index == -1:
        break
    print( "%s found at index %d" % (search, index) )
    # move to next possible start position
    start = index + 1

"""my result (index is zero based)-->
SA found at index 3
SA found at index 31
SA found at index 41
"""

Or maybe more Pythonicaly and not so difficultly (look it up, and see which one you prefer)

# multiple searches of a string for a substring
# using s.find(sub[ ,start[, end]])

def multis(search,text,start=0):
    while start>-1:
        f=text.find(search,start)
        start=f
        if start>-1:
            yield f
            start+=1

print("tonyjv: pythonic generator")

print(" from vegaseat's code")
text = 'MSKSASPKEPEQLRKLFIGGLSFETTDESLRSAHFESSSYGSAGRRF'
search = 'SA'
print(text)
print(search)

print("Searching %s:" % search)
for i in multis(search,text): print( "%s found at index %d" % (search, i) )

"""tonyjv: pythonic generator
 from vegaseat's code
MSKSASPKEPEQLRKLFIGGLSFETTDESLRSAHFESSSYGSAGRRF
SA
Searching SA:
SA found at index 3
SA found at index 31
SA found at index 41
>>> """

If you were to want to find something in that sequence from a dictionary of options what would the code be?
Thanks

Ok, here is one piece of code maybe helps as you posted to the closed thread.

def multis(search,text,start=0):
    while start>-1:
        f=text.find(search,start)
        start=f
        if start>-1:
            yield f
            start+=1

#An example library of restriction enzymes and their recognition sites
site_enz= {'GAATTC':'EcoRI','GGATTC':'BamHI','AAGCTT':'HindIII',
           'CCCGGG':'Smal','GATATC':'EcoRV'}

dna=["5'", 'CGATCGCTAGCTAGCTTGAATTCGACGATTTGCTAGGGCCAT ', "3'\n"]

dna= dna[1].rstrip()

for seq in site_enz:
    print seq
    for i in multis(seq,dna):
        print site_enz[seq],'in position',i
    
"""Output:
>>> 
GGATTC
GATATC
GAATTC
EcoRI in position 17
CCCGGG
AAGCTT
>>> """
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.