Question about lists and regular expressions.

Question

Tenzing 0 Newbie Poster

15 Years Ago

Hello, I am new to this.

What I am trying to do is take a .txt file and write parts of it to different lists. The file contains line sof FCODE,DESCRIPTION
such as this:
DAAS44,AIRSTRIP region ruin/inactive/abandoned

I would like to split each line into FCODE and DESCRIPTION strings and append them to FCODE and DESCRIPTION lists so that I can retrieve entries in those lists later in the program.

I am reading the text file as follows:

infile = open(name, "r")
lines = infile.readlines()

This gives me a list where each line seems to be a string

## create a regular expression to find all strings beginning with RR
    ##these represent road feature codes. 
    reobj = re.compile("^RR[A-Z]")
    ## set up lists for the fcode and description
    fcode=[]
    desc=[]
    for line in lines:
        if re.search(reobj,line):
            fcline=line.split(",")
            fcode=fcline[0]
            desc=fcline[1]
            print desc
            fcode=re.split("[\b]",fcode)

What I am trying to do is split the fcode and the descriptions and get strings that each have a unique index value. While splitting the fcode and descriptions seems to work, the resulting lists only have 0,1 index values so I cant retrieve individual strings by index value.

I thought the problem might be caused by reading in the txt file all at once rather then one line at a time, but the indexing seems to work and I can take a slice of more then just [0:1] so the problem has to be with he way Im using the search regular expression.

Another thing I had considered doing beyond this is splitting the description into uppercase and lowercase sections and adding each section to its own list, however I do not know the regular expression to separate uppercase words from lowercase words and numbers.

python

2 Contributors
5 Replies
128 Views
13 Hours Discussion Span
Latest Post 15 Years Ago Latest Post by Tenzing

Gribouillis 1,391 Programming Explorer

15 Years Ago

Your program seems too complicated for what you are trying to do. You could split the lines on the first "," without regular expressions:

def rr_lines(lines):
    for line in lines:
        if line.startswith("RR"):
            yield line

def splitted_lines(lines):
    for line in rr_lines(lines):
        fcode, desc = line.split(",", 1)
        yield (fcode, desc)

if __name__ == "__main__":
    name = "filename.txt"
    infile = open(name, "r")
    for item in splitted_lines(infile):
        print item

Edited 15 Years Ago by Gribouillis because: n/a

Gribouillis 1,391 Programming Explorer

15 Years Ago

Oh, I see. Try this

fcodes, descs = zip(*splitted_lines(infile))

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Tenzing 0 Newbie Poster · Answer 1 · 2010-02-19T01:00:20+00:00

Ok, that works well for splitting and ive never used the yield statement before. Thank you.

What Im wondering now is how I can assign the items it returns to lists since it returns all fcodes with index 0 and all descriptions with an index 1. How can I append these to lists so that each line has a unique index?

Tenzing 0 Newbie Poster · Answer 2 · 2010-02-19T02:50:47+00:00

Ok that works, I used:

fcode, desc = zip(*splitted_lines(lines))

Thanks a lot for the tips

Tenzing 0 Newbie Poster · Answer 3 · 2010-02-19T05:39:53+00:00

I've been working on the second part of my question. I want to take the description string and divide it into a string with only the uppercase words and the whitespace between uppercase words from the original string. I've tried the following:

for items in desc:        
      desc1=re.findall("([A-Z,\s])", items)

However, this returns each uppercase letter and whitespace as a string. Is there a way to extract the words and spaces as a string and leave the rest?
Thanks in advance;