Parsing lines with strings containing spaces

Question

gridder 0 Newbie Poster

14 Years Ago

Hi, Wonder if anyone here can help with this:
I'm trying to split each line of data from a text file into a list or tuple.
The lines contain a mixture of ints, floats, and strings (with spaces).
I can do it easily enough using split() twice - once by " ' " and then by space. I was hoping to use the csv module - but that only seems to work if the text is separated by one space - and in these files, there can be many spaces between the numbers.
Is there a concise way of splitting up lines like these using a regex, or some other clever method?

10471 'AGHADA71' 20.00 2 0.00 0.00 1 1 1.0106 0.312
10475 'AGHADA75' 21.00 2 0.00 0.00 1 1 0.9940 -1.217
10810 'AHANE 10' 110.00 1 0.00 0.00 1 1 1.0517 -4.091

Thanks in advance!

python

5 Contributors
7 Replies
121 Views
3 Days Discussion Span
Latest Post 14 Years Ago Latest Post by gridder

bvdet 75 Junior Poster

14 Years Ago

You could use a combination of re and split. Find and remove all items in quotes first then split on space. Example:

import re

s = '''10471 'AGHADA71' 20.00 2 0.00 0.00 1 1 1.0106 0.312 'AGGTREW 45'
10475 'AGHADA75' 21.00 2 0.00 0.00 1 1 0.9940 -1.217
10810 'AHANE 10' 110.00 1 0.00 'AHANE 10' 'AHANE 10' 0.00 1 1 1.0517 -4.091 '''

lines = s.split('\n')

patt = re.compile(r'\'(.+?)\'')
output = []
for line in lines:
    lineList = []
    while True:
        m = patt.search(line)
        if m:
            lineList.append(m.group(1))
            line = line[:m.start()] + line[m.end()+1:]
        else:
            break
    lineList.extend([item.strip() for item in line.split() if item.strip()])
    output.append(lineList)

for item in output:
    print item

I added additional quotes to test for a general solution. The output is:

>>> ['AGHADA71', 'AGGTREW 45', '10471', '20.00', '2', '0.00', '0.00', '1', '1', '1.0106', '0.312']
['AGHADA75', '10475', '21.00', '2', '0.00', '0.00', '1', '1', '0.9940', '-1.217']
['AHANE 10', 'AHANE 10', 'AHANE 10', '10810', '110.00', '1', '0.00', '0.00', '1', '1', '1.0517', '-4.091']
>>>

The drawback is the items are not in order.

Gribouillis 1,391 Programming Explorer

14 Years Ago

I once wrote a code snippet that parses command lines the way a C program parses argv. You could use it for your problem: save http://www.daniweb.com/code/snippet234768.html as argv.py

>>> from argv import buildargv
>>> print buildargv("10210 'ARDNA 10' 110.00 1 0.00 0.00 1 1 1.0432 -3.954"))
['10210', 'ARDNA 10', '110.00', '1', '0.00', '0.00', '1', '1', '1.0432', '-3.954']

On a command line, the rule is the same: space is preserved in single quoted or double quoted arguments.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Kruptein 15 Posting Whiz in Training · Answer 1 · 2010-02-19T17:57:08+00:00

Kruptein 15 Posting Whiz in Training

14 Years Ago

can you give an example + expected output

gridder 0 Newbie Poster · Answer 2 · 2010-02-19T18:02:30+00:00

I use readline() to obtain the text from the file, so input looks something like:
"10210 'ARDNA 10' 110.00 1 0.00 0.00 1 1 1.0432 -3.954"
Then I would like to split according to space but to ignore the possible spaces within the string, i.e. to get an output like
(10210, "ARDNA 10", 110.00, 1, 0.00, 0.00, 1, 1, 1.0432, -3.954)

The reason for this is that I need to address/ modify some of these entries later on, and it makes things easier if they are in a list or tuple.

Kruptein 15 Posting Whiz in Training · Answer 3 · 2010-02-19T18:10:32+00:00

Kruptein 15 Posting Whiz in Training

14 Years Ago

Nevermind It did not work...

Edited 14 Years Ago by Kruptein because: n/a

vegaseat 1,735 DaniWeb's Hypocrite Team Colleague · Answer 4 · 2010-02-19T20:53:32+00:00

You could potentially do something like this ...

line = "10810 'AHANE 10' 110.00 1 0.00 0.00 1 1 1.0517 -4.091"

rawlist = line.split("'")
print rawlist  # test
number = rawlist.pop(0).strip()
name = rawlist.pop(0).strip()
numbers = rawlist.pop(0).strip()
mylist = [number, name]
for n in numbers.split():
    mylist.append(n)

print mylist

gridder 0 Newbie Poster · Answer 5 · 2010-02-22T20:31:37+00:00

Thanks for all your replies - I get the impression that there is no
terse one-line solution to this problem! The way I have it coded at the moment is as follows, which also deals with the case where there is no string within the data entry.

def split_strings_with_quotes(datastring):
    newstring = []
    left = ((datastring.strip()).split("'"))[0]
    right = ((datastring.strip()).split("'"))[2]
    middle = datastring.split("'")[1]
    leftelements = left.split()
    rightelements = right.split()
    for element in leftelements:
        newstring.append(element)
    
    newstring.append(middle)

    for element in rightelements:
        newstring.append(element)
    return newstring

def split_strings_no_quotes(datastring):
    newstring = []
    stringelements = (datastring.strip()).split()

    for element in stringelements:
        newstring.append(element)
    return newstring

def split_strings(datastring):
  
    if datastring.count("'") > 0:
        return(split_strings_with_quotes(datastring))
    else:
        return(split_strings_no_quotes(datastring))