For my new job I need to learn a bit of Python to parse and extract data from .txt files.
Essentially, I have a table that looks like this:
Pair NO. Sense Antisense Coding/Noncoding Cis/Trans Overlap
ATH00001 At1g02170 At1g02180 coding-coding cis 3
As you can see, the top are simply categories for each respective column, and the bottom is the actual data. Now, I have 30000 of the data lines, and I need to extract the following:
If first two letters of first column data are MM or HS I need to extract the data of the first and third column of that line.
Technically I could do this in Excel, but soon I'll be moving onto databases where Python will be the only solution.
I was just wondering what the best approach to this was. I was thinking something along the lines of splitting the string with aString.split(), then only taking the first and third element and writing them into a new file.
However, I'm utterly clueless as to how to start.
I guess I would open the file:
inp = file("SAdatabsse.txt","r")
Create a file to write to:
outp = open("SAdatabse2.txt","w")
Read lines with readlines command, then search for first two letters with find('MM',0,2) or find('HS',0,2). Then I guess I'd use some sort of boolean expression with a loop so that if it comes up true the first and third column for that line are stored in the new file.
Technically, I think I could do it, but I just have no idea how to structure the code, seeing as this is my first time working with Python.
So please, if you can, I would appreciate all and any help.
Thank for your time, - Siberian.