I am a new python user and I am trying to code an implementation to calculate and empirical cdf. So far, I have some code (attached below) that returns a list of tuples [(datapoint, P(X>=x)),...]. The problem I am trying to resolve is how to take care of replicated data e.g [1,1,4,6,7..]. In my implementation, I can't handle repeated numbers.Any ideas to improve my implementation would be welcome, thanks.
class EmpiricalCDF:
def __init__(self,datalist):
'''
class that holds a list of data and returns cdf
defined as p(X>=x)
'''
self.datalist = datalist
self.n = len(datalist)
def cdf_data(self):
data = self.datalist
plotdata =[]
for i in range(len(data)):
n = float(self.n)
length = len(data)
plotdata.append((data[0],length/n))
data.pop(0)
return plotdata