Get daily averages of monthly database

Question

glez_b 0 Newbie Poster

11 Years Ago

I have a long list of data structured in the following way

Date , Time, Temperature, Moisture, Accumulated precipitation

1/01/2011, 00:00, 23, 50, 2,
1/01/2011, 00:15, 22, 45, 1,
1/01/2011, 00:30, 20, 39, 0,
1/01/2011, 01:00, 25, 34, 0,
1/01/2011, 01:15, 23, 50, 0,
.
.
.
.
1/01/2011, 23:45, 22, 40, 0,
.
.
.
.
31/01/2011, 00:00, 23, 45, 0,

How I can get the daily averages of the variables Temperature and Moisture for the 31 day of the month, someone has an idea how I can do?

python

5 Contributors
12 Replies
3K Views
2 Weeks Discussion Span
Latest Post 11 Years Ago Latest Post by glez_b

woooee 814 Nearly a Posting Maven

11 Years Ago

and Moisture for the 31 day of the month,

and Moisture for the 31 day of the month

Split the rec to isolate the date and split the date into month, day, and year.

rec="1/01/2011, 00:00, 23, 50, 2,"
split_on_space = rec.split()
print split_on_space[0]
mm, dd, ccyy = split_on_space[0].split("/")
if 31 == int(dd):
    print "Found the 31st"
else:
    print "%s is Not the 31st" % (dd)

slate 241 Posting Whiz in Training

11 Years Ago

If you can not come up with an at least partially working solution, than you should go back to tutorials or easier tasks.

If cannot accomplish how to open a file and read it line by line, then I recommend googling it.
http://rosettacode.org/wiki/Read_a_file_line_by_line

Edited 11 Years Ago by slate because: rosetta

BearofNH 104 Posting Whiz

11 Years Ago

You didn't tell us what was in the DirViento column.
I sincerely hope you are not trying to convert a temperature value in degrees to a temperature value in radians...

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

rrashkin 41 Junior Poster in Training · Answer 1 · 2013-10-05T14:26:04+00:00

Let's say d is a date string in the format you posted. Then
m=datetime.date(*map(int,d.split("/"))).month
returns the month as an integer. You could construct a dictionary that collects the data for each month:
dctTemp[m].append(newTemp)

Then you can average the values when you collect all the days.

slate 241 Posting Whiz in Training · Answer 2 · 2013-10-05T14:26:19+00:00

You define four dictionaries:
sum_temperature day-> sum of temperature on the given day
count_temperature day-> count of temerature data on given day
likewise with moisture.

You read in the file line by line.
You parse one line into day, temerature, moisture. Time and precipitation is not needed for the task. (?)
sum_temperature[day]+=temperature
count_temperature[day]+=1
likewise you do with moisture.

After the file is parsed, you loop on the sum_temperature keys and values. On each day you get the average temperature:
sum_temperature[day]/count_temperature[day]

Problems left for you:
Increasing sum_temperature the first time. Keycheck or defaultdict.
Empty day average.

Now you can ask yourself, the docs, google, or us how do you program it.

glez_b 0 Newbie Poster · Answer 3 · 2013-10-08T06:32:44+00:00

glez_b 0 Newbie Poster

11 Years Ago

how i program it?

glez_b 0 Newbie Poster · Answer 4 · 2013-10-09T20:28:26+00:00

I have a file 'path-tracks.csv' containing date, lat, lon, id, humidity, rainfall data recorded every 15 minutes. I need to obtain daily averages of monthly database. How do I get these daily averages?. One more thing, my monthly database is not continuous, there are days where there were no data at a certain time,How I resolve this problem and I can get daily averages?. I am doing some tests in a file with few data and it has the following form:

data=np.loadtxt('path-tracks.csv',delimiter=',',skiprows=1,dtype=str)
data
array([['02-05-2004 06:15', '19.7', '-95.2', '1', '45', '-38', 'CCM', '1'],
       ['02-05-2004 06:30', '19.7', '-94.7', '1', '34', '-48', 'CCM', '2'],
       ['02-05-2004 06:45', '19.3', '-93.9', '1', '57', '-60', 'CCM', '3'],
       ['02-05-2004 07:15', '19', '-93.5', '1', '89', '-58', 'CCM', '4'],
       ['02-05-2004 07:30', '19', '-92.8', '1', '34', '-50', 'CCM', '2'],
       ['02-05-2004 07:45', '19.2', '-92.6', '1', '23', '-40', 'CCM', '1'],
       ['02-05-2004 08:15', '19.9', '-93', '1', '10', '-43', 'CCM', '1'],
       ['02-05-2004 08:45', '20', '-92.8', '1', '50', '-32', 'CCM', '2'],
       ['30-05-2004 09:15', '23.1', '-100.2', '2', '45', '-45', 'SCME', '3'],
       ['30-05-2004 09:45', '23.2', '-100', '2', '68', '-56', 'SCME', '1'],
       ['30-05-2004 10:15', '23.3', '-100', '2', '90', '-48', 'SCME', '2'],
       ['30-05-2004 10:45', '23.3', '-100.2', '2', '100', '-32', 'SCME',
        '1'],
       ['31-05-2004 03:15', '23.4', '-99', '3', '12', '-36', 'SCM', '1'],
       ['31-05-2004 03:45', '23.5', '-98.9', '3', '34', '-46', 'SCM', '2'],
       ['31-05-2004 04:15', '23.6', '-98.7', '3', '56', '-68', 'SCM', '1'],
       ['31-05-2004 04:45', '23.7', '-98.8', '3', '78', '-30', 'SCM', '1']], 
      dtype='|S16')

Until now, i have making a script for get the average of the all days, my question is: how i can get the average for each day?

import numpy as np
import datetime as dt

f = file('path-tracks.csv','r')
line = f.readline()
line = f.readline()
temp = []
hume = []
preci = []
date = []
while line:
    data = line.split(',')
    temp.append(int(data[4]))
    hume.append(int(data[5]))
    preci.append(int(data[7]))
    date.append(dt.datetime.strptime(data[0],'%d-%m-%Y %H:%M'))
    line = f.readline()


temp = array(temp)
hume = array(hume)
preci = array(preci)
date= array(date)
all = array([date,temp,hume,preci])
print 'max:', all[1].max(), all[2].max(),
print 'suma:', all[3].sum()
print 'mean:', all[1].mean(), all[2].mean()

slate 241 Posting Whiz in Training · Answer 5 · 2013-10-10T12:03:30+00:00

If you do not have source data for the daily average, than you must decide what to do. This is not a programming question. If you intend to draw some graph of the data, then using the prevoius day's data comes to my mind.

from collections import defaultdict
import datetime 

dataset=[['02-05-2004 06:15', '19.7', '-95.2', '1', '45', '-38', 'CCM', '1'],
       ['02-05-2004 06:30', '19.7', '-94.7', '1', '34', '-48', 'CCM', '2'],
       ['02-05-2004 06:45', '19.3', '-93.9', '1', '57', '-60', 'CCM', '3'],
       ['02-05-2004 07:15', '19', '-93.5', '1', '89', '-58', 'CCM', '4'],
       ['02-05-2004 07:30', '19', '-92.8', '1', '34', '-50', 'CCM', '2'],
       ['02-05-2004 07:45', '19.2', '-92.6', '1', '23', '-40', 'CCM', '1'],
       ['02-05-2004 08:15', '19.9', '-93', '1', '10', '-43', 'CCM', '1'],
       ['02-05-2004 08:45', '20', '-92.8', '1', '50', '-32', 'CCM', '2'],
       ['30-05-2004 09:15', '23.1', '-100.2', '2', '45', '-45', 'SCME', '3'],
       ['30-05-2004 09:45', '23.2', '-100', '2', '68', '-56', 'SCME', '1'],
       ['30-05-2004 10:15', '23.3', '-100', '2', '90', '-48', 'SCME', '2'],
       ['30-05-2004 10:45', '23.3', '-100.2', '2', '100', '-32', 'SCME',
        '1'],
       ['31-05-2004 03:15', '23.4', '-99', '3', '12', '-36', 'SCM', '1'],
       ['31-05-2004 03:45', '23.5', '-98.9', '3', '34', '-46', 'SCM', '2'],
       ['31-05-2004 04:15', '23.6', '-98.7', '3', '56', '-68', 'SCM', '1'],
       ['31-05-2004 04:45', '23.7', '-98.8', '3', '78', '-30', 'SCM', '1']]


daily_data=defaultdict(list)
for data in dataset:
    temp=int(data[4])
    hume=int(data[5])
    preci=int(data[7])
    time=datetime.datetime.strptime(data[0],'%d-%m-%Y %H:%M')
    day=datetime.date(time.year,time.month,time.day)
    daily_data[day].append((temp,hume,preci))

minday,maxday=min(daily_data.keys()), max(daily_data.keys())

day=minday
previous_temp_average=None
while day <= maxday:
    if daily_data[day]!=[]:
        temp_average=sum(data[0] for data in daily_data[day])/sum(1 for data in daily_data[day])
    else:
        temp_average=previous_temp_average
    print("Average temperature for day: %s is %s" % (day,temp_average))
    previous_temp_average=temp_average
    day=day+datetime.timedelta(days=1)

glez_b 0 Newbie Poster · Answer 6 · 2013-10-11T20:32:12+00:00

Thanks for your comment, it was a lot of contribution for me. I found another way to create daily averages of many variables, for example of a database that has the following structure:

Fecha,Time, DirViento, MagViento, Temperatura, Humedad, PreciAcu.

Cada columna tiene datos diarios cada 15 minutos.I share the code that calculates the sum of the column PreciAcu and average of temperature and humidity columns:

import numpy as np
import pandas as pd
from numpy import *
from pandas import *
from scipy import *



data = pd.read_csv('tancoyol.csv')
index5=data.set_index(['Fecha','Hora'],inplace=True)

grouped = index5.groupby(level=0)

`stat_cea = grouped.agg({'Temperatura':np.mean,'Humedad':np.mean,'PreciAcu':np.sum})`


print 'Done............'

Now I have one more question, I need to convert to radians the DirViento column, how I can do this and add this new column to my data file? . In summary,i need to obtain 3 columns from the DirViento column.

Specifically:
First I need to convert each of the column values in DirViento to Radians (New column called Rad)
Second I need to get the sine of Rad column (new column called Sin)
Third I need to get the cosine of Rad column (new column called Cos).

How I can achieve this?

slate 241 Posting Whiz in Training · Answer 7 · 2013-10-14T15:01:12+00:00

slate 241 Posting Whiz in Training

11 Years Ago

You use math.sin, math.cos and math.pi.

rrashkin 41 Junior Poster in Training · Answer 8 · 2013-10-19T12:42:49+00:00

rrashkin 41 Junior Poster in Training

11 Years Ago

I'm guessing it's "wind direction"

glez_b 0 Newbie Poster · Answer 9 · 2013-10-19T16:37:51+00:00

BearofNH:
Thanks for your comments, but I'm not doing that you think.

rrashkin:
Effectively, I need to calculate average wind, Temperature, Moisture and Sum of Precipitation from a monthly database has daily data recorded every 15 minutes. A sample of the dataframe is as follows:

                DirV MagV Temp HumR  PreciAcu

Fecha_Hora
0 2011/07/01 00:00 318 6.6 21.22 100 1.7
1 2011/07/01 00:15 342 5.5 21.20 100 1.7
2 2011/07/01 00:30 329 6.6 21.15 100 4.8
3 2011/07/01 00:45 279 7.5 21.11 100 4.2
4 2011/07/01 01:00 318 6.0 21.16 100 2.5
5 2011/07/01 01:15 329 7.1 21.13 100 4.0
6 2011/07/01 01:30 300 4.7 21.15 100 1.3
7 2011/07/01 01:45 291 3.1 21.23 100 2.2
8 2011/07/01 02:00 284 7.6 21.29 100 1.3
10 2011/07/02 00:00 281 3.6 21.47 100 3.2
11 2011/07/02 00:15 360 2.7 21.52 100 2.5
12 2011/07/02 00:30 57 1.2 21.53 100 0.0
13 2011/07/02 00:45 300 3.4 21.69 100 0.0
14 2011/07/02 01:00 359 5.9 21.67 100 0.0
15 2011/07/01 01:15 309 1.8 21.65 100 0.0

I'm using pandas to read the data and I do calculations with numpy and scipy. After taking some considerations such as: if some value is 0, replace to 360, Obtain the components u and v and if the component v is 0, then resuling wind direction is 0 if no is atan(u/v) and magnitude whit (u2 + v2)**1/2.

for get daily mean and sum i do the next:

index5=data.set_index(['Fecha','Hora'],inplace=True)
grouped = index5.groupby(level=0)
stat_cea=grouped.agg({'MagRes':np.mean,'DirRes':np.mean,'Temperatura':np.mean,'Humedad':np.mean,'PreciAcu':np.sum})

Fecha DirRes Humedad MagRes PreciAcu Temperatura

2011/07/01 141.0588 100 4.6475 30.4 21.35
2011/07/02 153.8235 99 3.3455 18.0 21.841765
2011/07/03 133.8824 99 4.6470 4.0 21.347059

Now I want to save the output to a excel file, but I would like the columns were rounded to 1, eg: 153.8235 = 153.8. After that i get this, now i want to convert the columns: DirRes to an integer and MagRes, PreciAcu and Temperatura remain as float.

How can I achieve this in pandas or if there is another way to get this, appreciate your help