How do I extract values from a single column in a CSV file?

Question

fingerpainting 0 Newbie Poster

14 Years Ago

Background:

I'm extracting values from a file which is sometimes an xls and sometimes an xlsx file. An xls is easily read with xlrd, but xlrd nor any other Python library (as far as I could find) supports xlsx, so instead I'm using xlsx2csv to convert to csv and then reading values from that.

I'm new to Python and only have beginner's knowledge of other scripting languages.

Problem:

I'd like to select values from a specific column of the csv file (and fill a list with them). This is where I'm getting confused. When I convert the CSV with

infile = path + "\\" + filename
xlsx2csv(infile, open("data.csv", "w+"))

and open it manually in Excel, I can see the values I want in column 5 at row 23 and onwards (with columns up to 5 and rows up to 23 containing values I do not want).

But when I open the csv file within Python with

values = csv.reader(open('data.csv', 'rb'), delimiter=' ')

I'm getting a list of lists. Printing the values with

for row in values:
    print row

gives the following (edited) result:

I feel like I've already gone wrong at this point. Or should I go with this and then split up the lists and get the last value from each separate list except the empty ones? I hope someone can help me to an elegant solution.

python

3 Contributors
6 Replies
9K Views
6 Days Discussion Span
Latest Post 14 Years Ago Latest Post by fingerpainting

TrustyTony 888 ex-Moderator

14 Years Ago

Could you post beginning of the original csv file or attach it from advanced view, manage attachments? I would guess that delimeter is ','

TrustyTony 888 ex-Moderator

14 Years Ago

You can put one list comprehension or normal for before your for reading out the unneeded rows, or from itertools import islice and list(islice(values,22) Alternatively you can try this line or variation of it for line 4:

MyValues.append(row[4] if len(row)>4 else None)

Edited 14 Years Ago by TrustyTony because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

fingerpainting 0 Newbie Poster · Answer 1 · 2010-07-21T21:27:06+00:00

Could you post beginning of the original csv file or attach it from advanced view, manage attachments? I would guess that delimeter is ','

Thank you! I've changed the delimiter to ',' and the values now stay intact. I've also flagellated myself for such a silly oversight (I now realise that the words in sentences were getting broken up because they were separated by spaces and I used a space as a delimiter), but this doesn't seem to have made me any smarter so far.

Let me try and select the values again and report back with how I failed or succeeded.

Tommymac501 4 Junior Poster in Training · Answer 2 · 2010-07-22T03:39:58+00:00

Thank you! I've changed the delimiter to ',' and the values now stay intact. I've also flagellated myself for such a silly oversight (I now realise that the words in sentences were getting broken up because they were separated by spaces and I used a space as a delimiter), but this doesn't seem to have made me any smarter so far.
Let me try and select the values again and report back with how I failed or succeeded.

When you open a file with the CSV module, you're given a 'list', which is indexed. If you want the data in a spcific column to be put into another list, just scroll through it. If the data you need is in the 5th column (zero based), use that index. i.e.

MyValues = [] #create an empty list
values = csv.reader(open('data.csv', 'rb'), delimiter=' ')
for row in values:
  MyValues.append(row[5])

Thats pretty much it.

fingerpainting 0 Newbie Poster · Answer 3 · 2010-07-26T22:29:09+00:00

Thank you, that does it. (With row[4] instead of row[5], since as you pointed out it is zero based.)

There is one problem that pops up. I found that I was only getting values up to a certain row, then it failed with "IndexError: list index out of range". Looking at the original list I noticed that the breaking point was at a list with 4 items instead of 5 (so it can't reach row[4]). I don't even need that list, so I would have to start the for loop at a particular list (23).

fingerpainting 0 Newbie Poster · Answer 4 · 2010-07-27T15:16:59+00:00

You can put one list comprehension or normal for before your for reading out the unneeded rows, or from itertools import islice and list(islice(values,22) Alternatively you can try this line or variation of it for line 4:
MyValues.append(row[4] if len(row)>4 else None)

Brilliant. I went with the code snippet. All I had to do after that was put in another for loop to select values of a particular format and I had exactly what I wanted!