Help with Empirical CDF implementation

Question

noniterum03 0 Newbie Poster

14 Years Ago

I am a new python user and I am trying to code an implementation to calculate and empirical cdf. So far, I have some code (attached below) that returns a list of tuples [(datapoint, P(X>=x)),...]. The problem I am trying to resolve is how to take care of replicated data e.g [1,1,4,6,7..]. In my implementation, I can't handle repeated numbers.Any ideas to improve my implementation would be welcome, thanks.

class EmpiricalCDF:
    
    def __init__(self,datalist):
        
        '''
        class that holds a list of data and returns cdf
        
        defined as p(X>=x)
        
        ''' 
        self.datalist = datalist
        self.n = len(datalist)
    
    def cdf_data(self):   
        data = self.datalist
        plotdata =[]
        
        for i in range(len(data)):
            
            n = float(self.n)
            length = len(data)
            plotdata.append((data[0],length/n))
            data.pop(0)
       
        return plotdata

python

3 Contributors
2 Replies
107 Views
20 Hours Discussion Span
Latest Post 14 Years Ago Latest Post by SgtMe

slate 241 Posting Whiz in Training

14 Years Ago

I am not sure, I understand correctly.

Empirical distribution function is a function with two arguments. The dataset, and a real number.

What is your cdf_data is meant to return?
BTW you are losing all your data (in self.datalist) by calling cdf_data function.

My implementaion would be:

def cdf_data(self,t):
    return sum(d in self.datalist if d<t)/float(self.n)

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

SgtMe 46 Veteran Poster Featured Poster · Answer 1 · 2010-11-09T02:12:09+00:00

I was just taking a look at compacting your code slightly, and I noticed something.

self.n = len(datalist)
data = datalist
length = len(datalist)
...
n = float(self.n)

Therefore

n = length

And then: ...(length/n)
This would be 1.

You can also change the function 'cdf_data' (including the above thing for the moment), to:

def cdf_data(self):   
	data = self.datalist
	plotdata =[]
        
	for i in range(len(data)):
		#got rid of: "n = float(self.n)"
		length = len(data)
		plotdata.append((data[0],length/float(self.n)))            #this line
		data.pop(0)
       
        return plotdata