I'm a newbie to python and I have a problem. My problem is that I have a CSV file with hundreds and hundreds of rows of data. Each row contains temperature data that has been recorded daily. What I want to do is try to extract certain rows of data from the csv file. For example I want to be able to give a function two different days.(mm/dd/yyyy) I'll call one 'initialDay' and the other 'finalDay'. Now I would like to extract the data in between these two days. My data is constructed so that's DATE is a column. Does Python offer any way of extracting just the data between my two dates?? Another quick example: let say I entered:
initialDay = 1/1/2000 and finalDay = 1/4/2000
I would want the daily temp recorded for january 1st 2000 through january 4th 2000.
So if initialDay = finalDay it should only return the temp for that single day.

Does Python offer any way of extracting just the data between my two dates??

Yes more than one way depends how data are organized.

post an short example off that cvs file.
Mark where 'initialDay' and 'finalDay' are.

STATION,DATE,MAX,MIN
PRIMET,9/14/1990,30.6,6.3
PRIMET,9/15/1990,33.9,7.7
PRIMET,9/16/1990,32.8,8.9
PRIMET,9/17/1990,31.3,9.3
PRIMET,9/18/1990,31.7,9.7
RS20,9/14/1990,12.2,5.7
RS20,9/15/1990,18.5,3.7
RS20,9/16/1990,24.9,8
RS20,9/17/1990,23.7,9.3
RS20,9/18/1990,26.6,9.4
VANMET,9/14/1990,18.5,6
VANMET,9/15/1990,9.3,6.5
VANMET,9/16/1990,11.8,3.7
VANMET,9/17/1990,11.5,2.3
VANMET,9/18/1990,12.2,1.9


****Above is a short example of my CSV file. Now the first column is different temp Stations where data was recorded. Now lets say I want to extract the rows of data for each station between 9/14/1990 and 9/17/1990. Meaning:

initialDay = 9/14/1990
finalDay = 9/17/1990

The end result should be 4 rows of data for each station being that each station recorded data on those days. So a total of 12 rows. The 3rd and 4th columns represent max and min temps for the day.

thanks

If i understood correctly:

initial_date = '9/14/1990'
initial_month, initial_day, initial_year = initial_date.split('/')
final_date = '9/17/1990'
final_month, final_day, final_year = final_date.split('/')

f_in = open('stations_temp.csv').readlines()
f_out = open('filtered_stations_temp.csv', 'w')

f_out.write(f_in[0])

for i in range(1, len(f_in)):
    station, date, max_temp, min_temp = f_in[i].split(',')
    month, day, year = date.split('/')
    if initial_month <= month <= final_month and \
    initial_day <= day <= final_day and \
    initial_year <= year <= final_year:
        f_out.write(f_in[i])

f_out.close()

Cheers and Happy coding

Hey thanks this is very helpful. My only problem is what if I didn't want to create a new file but rather store the filtered data in a list can this be done easily? I was able to write some code that will do this for one specific day of the year. Below is the code:

import csv

def filtered_stations_temps(YearlyData, initial_date):
        stationTemps = []
        for row in YearlyData:
                if row[1] in initial_date:
                        stationTemps.append(row[0:4])
        return list(stationTemps)

if __name__ == '__main__':

        YearlyData = list(csv.reader(open('stations_temps.csv'), delimiter=','))
        print filtered_stations_temps(YearlyData, ['9/14/1990'])

I will need to somehow give 'filtered_stations_temps' another argument for 'final_date' and include it in some statement.

If you change Bent Slayer code,to this.
It will give you a list.

initial_date = '9/14/1990'
initial_month, initial_day, initial_year = initial_date.split('/')
final_date = '9/17/1990'
final_month, final_day, final_year = final_date.split('/')

f_in = open('test.csv').readlines()
#f_out = open('filtered_stations_temp.csv', 'w')

my_list = []
for i in range(1, len(f_in)):
    station, date, max_temp, min_temp = f_in[i].split(',')
    month, day, year = date.split('/')
    if initial_month <= month <= final_month and \
    initial_day <= day <= final_day and \
    initial_year <= year <= final_year:
        my_list.append(f_in[i].strip())

#f_out.close()
print my_list

Another way.

start_date = '9/14/1990'
end_date = '9/18/1990'

flag = 1
my_list = []
linelist = open('test.csv')
for line in linelist:
    if start_date in line: # or line.startswith('STATION') ## to get first line
        flag = 0
    if end_date in line:
        flag = 1
    if not flag and not end_date in line:
       #print line,
       my_list.append(line.strip())

print my_list

The first offered solution is not going to work if statistics goes over month boundary.

The second solution of snippsat is quite usable if each stations data are always sorted in date order and if there is always some statistics from both given days or second day is after last day of statistics.

I know the thread is solved and closed, but I got a question.

How does my solution don't work?

I don't get it.

It will copy all lines in wich the date is between that range, for sure.

Cheers and Happy coding

I know the thread is solved and closed, but I got a question.

How does my solution don't work?

I don't get it.

It will copy all lines in wich the date is between that range, for sure.

Cheers and Happy coding

Use:

start_date = '06/12/2007'
end_date = '01/27/2008'

and attached modified csv to see what I meant

Thanks mate by the clearing up.

I wasn't really seeing it.

import time

#initial_date = '9/14/1990'
initial_date = '6/12/2007'
initial = time.mktime(time.strptime(initial_date, "%m/%d/%Y"))

#final_date = '9/17/1990'
final_date = '1/27/2008'
final = time.mktime(time.strptime(final_date, "%m/%d/%Y"))

f_in = open('stations_temp_tonyjv.csv').readlines()
f_out = open('filtered_stations_temp.csv', 'w')

f_out.write(f_in[0])

for i in range(1, len(f_in)):
    station, date, max_temp, min_temp = f_in[i].split(',')
    date = time.mktime(time.strptime(date, "%m/%d/%Y"))
    if initial <= date <= final:
        f_out.write(f_in[i])

f_out.close()

Cheers and Happy coding.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.