parsing newlines together

Question

laver68xo 0 Newbie Poster

12 Years Ago

I'm trying to take a file that looks like this:

taxon1
ACCGTGGATC
CCTATTGATT
GGATATTATC
taxon2
TTCATATGTA
GGATTTCATA
GATGGCCCCC

And get it to look like this
taxon1 ACCGTGGATCCCTATTGATTGGATATTATC

I'm using a python script, so far this is what I have:

#!/usr/bin/python

import sys

if len(sys.argv) < 2:
    print "usage: finalmyscript.py infile.txt"
    sys.exit(1)

fname = sys.argv[1]

handle = open(fname, "r")
list = handle.readlines()

for line in list:
    parts = line.rstrip('\n')
    linearr = parts.split()
    combine = ''.join(linearr[0])
    print combine

handle.close()

The script removes the '\n' at the end of each line, but it still won't join the lines all on a single line. Can anyone help with where I'm going wrong?
Thanks!

python

5 Contributors
8 Replies
204 Views
2 Days Discussion Span
Latest Post 12 Years Ago Latest Post by HiHe

All 8 Replies

vegaseat 1,735 DaniWeb's Hypocrite

12 Years Ago

Hint ...

s = '''\
taxon1
ACCGTGGATC
CCTATTGATT
GGATATTATC
'''

s2 = ""
for ix, line in enumerate(s.split('\n')):
    line = line.rstrip()
    if ix == 0:
        # add a space
        line += ' '
    s2 += line

print(s2)

''' result ...
taxon1 ACCGTGGATCCCTATTGATTGGATATTATC
'''

Edited 12 Years Ago by vegaseat

vegaseat 1,735 DaniWeb's Hypocrite

12 Years Ago

You can do something like that ...

''' infile_test.py
data processing from a file

file infile.txt has content ...
taxon1
ACCGTGGATC
CCTATTGATT
GGATATTATC
taxon2
TTCATATGTA
GGATTTCATA
GATGGCCCCC 
'''

fname = "infile.txt"

with open(fname) as fin:
    s2 = ""
    for line in fin:
        line = line.rstrip()
        if "taxon" in line:
            # add a space
            line += ' '
            # might need to adjust this value
            if len(s2) > 10:
                s2 += '\n'
        s2 += line
    print(s2)

''' result ...
taxon1 ACCGTGGATCCCTATTGATTGGATATTATC
taxon2 TTCATATGTAGGATTTCATAGATGGCCCCC
'''

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

chris.stout 14 Junior Poster in Training · Answer 1 · 2013-04-18T16:55:01+00:00

If you get rid of lines 14 - 19 you can print the input file as one line with:

print ''.join(list).replace("\n", " ")

laver68xo 0 Newbie Poster · Answer 2 · 2013-04-18T17:21:10+00:00

Vegaseat:
that would work if I wasn't pulling the input from a file. I can only use the input file as a command for running my file, I can't use any of the verbatim info from the file, like the actual DNA code.

Chris, I used a line like that before, it does bring them together, but the problem is that it then puts all the taxons on one line and I need to split them by taxon.

Thank you guys!

woooee 814 Nearly a Posting Maven · Answer 3 · 2013-04-18T17:52:56+00:00

Does the file have more than one taxon? In the example you posted there is only one so all of the solutions are for one only. Try this as a hint, although there are other ways to do it.

handle = open(fname, "r")
all_data = handle.read()
print all_data.split("taxon")

laver68xo 0 Newbie Poster · Answer 4 · 2013-04-18T17:56:40+00:00

There are 3 taxon total, but I just printed 2 of them.

taxon1
ACCGTGGATC
CCTATTGATT
GGATATTATC
taxon2
TTCATATGTA
GGATTTCATA
GATGGCCCCC

laver68xo 0 Newbie Poster · Answer 5 · 2013-04-18T18:11:48+00:00

That last comment helped alot! Thank you! The all_data.split got all 3 taxon on one line for me.

HiHe 174 Junior Poster · Answer 6 · 2013-04-20T22:35:11+00:00

At this point it would be nice to know what your input data is. And what you expect your output data to look like.

parsing newlines together

Recommended Answers Collapse Answers

All 8 Replies

Recommended Answers