A Smart Package for data comparison

Question

hughesadam_87 54 Junior Poster

15 Years Ago

Hey guys,

This questions is more about organization of python modules than the actual construction therein.

In my project, we have several scientists performing similar analysis on several different data sets. For each project, the science changes, but the analysis often requires almost identical data handling. For example, one project may require us to take in a large, tab delimited data file, take all of the information from one column, and store it, then compare this information with information from another column in a different file. The protocol is usually to build dictionaries with lists as values, which seems to be the best way to ensure no information is lost during comparisons. Sometimes I just compare one list with another. Sometimes I compare a list with keys in a dictionary. Sometimes I compare one dictionary's keys to another's.

My question is: Do you know of any packages that could help streamline this process? We now have several, nearly identical, modules floating around which do almost the same thing. It starts to get tedious to try to organize all of these modules. Is there any packages available that are used for type of comparison? Namely, column comparison between data files etc... If I could start using the same package for all of this analysis, it would really clear up some of the clutter int he reasearch group.

Advice is greatly appreciated.

data-science python

2 Contributors
2 Replies
129 Views
12 Hours Discussion Span
Latest Post 15 Years Ago Latest Post by hughesadam_87

All 2 Replies

zachabesh 5 Junior Poster

15 Years Ago

Hey,

I had a similar issue. I believe the easiest answer is to create a shared pylib and put your module in that folder. It's not wise to have multiple copies of modules, because when you change one, you end up having to change them all. So, have all your scientists point their PYTHONPATH enviroment variable to a shared location, and then they can access the module. For example:

If your scientists are:

Bob -who compares number of apples with number of oranges
Paul -who compares population of two countries

your module "mymodule.py" has one function:

#mymodule.py

def compare_stuff(num1,num2):
    if num1 > num2:
        return '%s is greater!' %num1
    else:
        return '%s is greater!' %num2

Bob would have a script on his computer that says:

import mymodule

apples = 20
oranges = 10

print mymodule.compare_stuff(apples,oranges)

and Paul would have a script that says:

import mymodule

France = 10000000
America = 250000000

print mymodule.compare_stuff(France,America)

I realize this is simplistic, but I hope it makes sense. Each scientist has access to the same functionality, but they can use it however they'd like.

Also, if your scientists are bad with computers and can't do Enviroment variables (it happens) just have them add these two lines at the start of the script (before trying to import mymodle)

import sys
sys.path.append('your shared pylib here')

Also, make sure they have read/write access to the shared directory. Note that network paths look like this:

//computername/folder

where folder is the name they have designated for their network path. Somewhat unfortunate. Make sure it matches up on their computers.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

hughesadam_87 54 Junior Poster · Answer 1 · 2009-07-21T03:44:46+00:00

Yeah that is probably the best solution to the problem. I had been thinking about that as well. Thanks for the input.

A Smart Package for data comparison

Recommended Answers Collapse Answers

All 2 Replies

Recommended Answers