Python CSV Blank Cells Read as strings

Question

pbnoj 0 Newbie Poster

12 Years Ago

Hi there,

My problem involves reading in and working with an excel file that contains blank cells randomly throughout the spreadsheet. The reason there are blank cells is because I have expression data for various cell types but for some of the cell types the expression data was not available. My program works great (Finding standard deviations, averages etc.) when I fill the empty cells with 0, but when I leave them blank I get the following error:

   Resistance.append(map(float, row[4:]))
ValueError: could not convert string to float:

Basically when I try to append the data to a new array and convert it to floats I get the error because of the blank cells. If I don't do the float function, I get errors everywhere in the programd because I can't do the Stdev, for example I will have the following error:

withinA1 = numpy.std(a1)
  File "C:\Python27\lib\site-packages\numpy\core\fromnumeric.py", line 2433, in std
    return _wrapit(a, 'std', axis, dtype, out, ddof)
  File "C:\Python27\lib\site-packages\numpy\core\fromnumeric.py", line 37, in _wrapit
    result = getattr(asarray(obj),method)(*args, **kwds)
TypeError: cannot perform reduce with flexible type

How can I read through and tell python that the empty cells should just be skipped over? I've tried:

for label in Reader:
    while i < len(label):
        # if either the item in col1 or col2 is empty remove both of them
        if label[i] == '':
            del label[i]
        # otherwise, increment the index
        else:
            i += 1
    if len(label) == 0 or label[j] == '':
        continue
    else:
        Resistance.append(map(float, label[4:]))
        i += 1

and:

for row in Reader:
    if len(row) == 0 or row[i] == '':
        i+=1
        continue
    else:
        Resistance.append(map(float, row[4:]))
        i+=1

To no avail. Any help would be greatly appreciated, thank you.

python

2 Contributors
8 Replies
4K Views
1 Day Discussion Span
Latest Post 12 Years Ago Latest Post by Gribouillis

Gribouillis 1,391 Programming Explorer

12 Years Ago

Well, here is something which should always work

def to_float(sequence):
    """Generate a sequence of floats by converting every item in
    a given sequence to float, and ignoring failing conversions"""
    for x in sequence:
        try:
            yield float(x)
        except ValueError:
            pass

Now if you want to append all the values in columns 4, 5, ... for all rows,
you can write a generator

def sheet_values(reader):
    """Generate the cell values that we want to convert and append to Resistance"""
    for row in reader:
        for x in row[4:]:
            yield x

Finally, here is how to append all the values

Resistance.append(list(to_float(sheet_values(Reader)))

Perhaps you could describe what is this variable Resistance. Is it a list, a list of lists ? What
is your expected content for Resistance ?

Edited 12 Years Ago by Gribouillis

Gribouillis 1,391 Programming Explorer

12 Years Ago

The only thing you need to do is determine which lists you want to append to Resistance. For example if you want to append the values from a column numbered k, you can write

def column_values(rows, k):
    for row in rows:
        yield row[k]

rows = list(Reader) # convert to list in case Reader is a generator.
Resistance.append(list(to_float(column_values(rows, 0))))

Edited 12 Years Ago by Gribouillis

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 1 · 2012-12-22T10:26:54+00:00

You could insert 0.0 when there is no data

Resistance.append(tuple((float(x) if x.strip() else 0.0) for x in row[4:]))

pbnoj 0 Newbie Poster · Answer 2 · 2012-12-23T20:05:59+00:00

That doesn't work because adding 0 changes the standard deviation

pbnoj 0 Newbie Poster · Answer 3 · 2012-12-23T21:00:43+00:00

Thank you for your help.

The code does work indeed, the only problem is that now it is one large array, and I need to be able to distinguish the original rows from one another so that I may calculate standard deviations as I see fit. Sorry if that was not clear

pbnoj 0 Newbie Poster · Answer 4 · 2012-12-23T21:10:48+00:00

Yes Resistance is a list of lists. My raw data is in the form of an excel spreadsheet. I have A1, A2, B1..B6, C1-C15, etc... My goal is to calculate a running standard deviation of each column (index) from A1 to A2, so STDEV OF A1[0]A2[0], A1[1]A2[1] etc. for each of the subclasses. I will then find the average standard deviation within each subclass

pbnoj 0 Newbie Poster · Answer 5 · 2012-12-23T21:47:33+00:00

Great, I guess I'm more of a beginner than I thought.

I am now getting this error:

ValueError: setting an array element with a sequence.

and I believe it is because since we took out some zeros, the arrays are of different sizes and the stdev indexing gets thrown off. Is there a way to tell python to completely ignore an index position rather than no longer assuming it's not there?

So, having something like:

[[[-1.0, -1.38, na, 0.24, 0.01, 0.47, -0.38, -0.64, -0.47, -0.71, 0.79, 1.61, 0.88, -0.99, -0.46, -0.23, -0.23, -1.0, -0.01, -0.52, -0.58, -1.17, 1.3, -1.08, -0.53, -0.75, 0.8, -0.23, -0.19, -0.83, -0.43, -0.07, 1.84, -0.4, -0.54, -0.4, 1.82, -0.61, 0.17]], [[-0.89, -0.59, -0.27, -0.63, 0.14, 1.35, -0.28, -0.51, -0.36, -0.52, 0.1, 1.58, -0.78, -0.86, -0.92, -0.41, -0.22, -1.0, -0.24, -0.2, -0.15, -0.73, 0.05, -0.68, -0.52, -0.27, 1.36, -0.44, -0.06, -0.78, -0.02, -0.22, 2.04, -0.16, -0.45, -0.34, 1.33, -0.71, 1.14]]]

I put an na up there as an example just so that the boxes are the same dimension

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 6 · 2012-12-23T21:56:51+00:00

You should have a look in numpy masked arrays. I can not help you much here, because I never tried it.