Speeding up finding all spaces (' ') in text

Question

wheatontrue 0 Newbie Poster

15 Years Ago

Dear all,

I am trying to parse a lot of text. for small amounts of text, the WHILE loop I use to find all spaces in the text works well:

markerlist=[]
counter=0
while len(markerlist)<text.count(marker):
markerlist.append(text.find(marker,counter)
counter=text.find(marker,counter)+1

This iterative process is very, very slow when working with a few megabites of text. Can someone advise a faster method?

Best,

Wheaton

python

4 Contributors
6 Replies
137 Views
1 Day Discussion Span
Latest Post 15 Years Ago Latest Post by wheatontrue

Ene Uran 638 Posting Virtuoso

15 Years Ago

Something like this should be faster, since it avoids the use of function calls:

text = """\
A list of Pythonyms ...

pythonanism: spending way too much time programming Python
pythoncology: the science of debugging Python programs
pythondemic: something bad that lots of Pyton programmers do
pythonerous: something hard to do in Python
pythong: a short piece of Python code that works
pythonorean: someone who knows the esoteric technical aspects of Python
pythonus: something you don't want to do in Python
pythonym: one of these words
ptyhoon: someone who is really bad at Python
sython: to use someone else's Python code
pythug: someone who uses python to crash computers
Pythingy: a function or tool that works, but that you don't understand
Pythonian: somebody that insists on using a very early version of python
PythUH? -  block of code that yields a very unexpected output
Pythealot:  a Python fanatic
"""

spaces = 0
for c in text:
    # add to the count of spaces
    if c == " ":
        spaces += 1

print( "This text has %d spaces" % spaces )

vegaseat 1,735 DaniWeb's Hypocrite

15 Years Ago

You can use a list comprehension to speed things up ...

# create a list of indexes for spaces in a text

text = "According to Sigmund Freud fuenf comes between fear and sex."
space_ix = [ix for ix, c in enumerate(text) if c == " "]
print(space_ix)  # [9, 12, 20, 26, 32, 38, 46, 51, 55]

wheatontrue commented: Very helpful +1

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

wheatontrue 0 Newbie Poster · Answer 1 · 2009-02-06T01:25:53+00:00

Thanks Ene.

But this is actually not the same thing as what I'm trying to do. I am trying to make a list of positions at which each space occurs. if I wanted to count the spaces I would just do

text.count(' ')

But I need a list of positions in the string.

scru 909 Posting Virtuoso Featured Poster · Answer 2 · 2009-02-06T01:37:27+00:00

It's not that hard to do that with the example given, granted the resulting code is a little bit nasty:

spaces = []
position = 0
for c in text:
    if c == ' ':
        spaces.append(position)
    position += 1

Not so hard is it?

wheatontrue 0 Newbie Poster · Answer 3 · 2009-02-06T01:41:37+00:00

Okay, so this is faster because it doesn't call text.find and text.count?

I used my method originally because I thought iterating through every element would be slower. I don't know much about processing speed though.

Please let me know!

wheatontrue 0 Newbie Poster · Answer 4 · 2009-02-07T01:13:14+00:00

vegaseat, that is really quick! thanks very much. works like a charm. I've not explored list comprehension before, but I'm going to go look it up now.