I need to take a text file of a number of gene sequences in fasta format eg

>geneA
agctactactacgatcgaacgtagctactactacgatcgaacgtagctactactacgatc
gaacgtagctactactacgatcgaacgtagctactactacgatcgaacgtagctactact
acgatcgaacgtagctactactacgatcgaacgtagctactactacgatcgaacgtagct
actactacgatcgaacgtagctactactacgatcgaacgtagctactactacgatcgaac
gtagctactactacgatcgaacgtactacgatcgaacgta

and put it into:

geneA agctactactacgatcgaacgtagctactactacgatcgaacgtagctac

where all of the sequence is on one line. I can concatenate it in excel for one sequence but i have 200+ to fit into the two column format such that i can use python to open the text file and stuff the txt file into an SQL file. I only need some ideas on how to put the sequences into the two column format.

Cheers

To take a series of lines in a text file that are separated by newline characters and mash them into a single line of text:

# Read from your input file using readlines(), then:
out_txt = '%s %s' % (my_txt[0], ''.join(my_txt[1:]))

Then you can write the out_txt to your new file. Note that the above code assumes you read your file using readlines into a variable named my_txt

commented: nice solution +12

The above code doesn't strip the newline so you want to use .strip() as well. If the file contains multiple "geneX" codes, you want to append the records to a list until the next "gene" is found. When found, send the list to a function that will concatenate, similiar to the above code example (if you .strip() before appending to the list then you should be able to use the above code as is), and write to the SQL file. Then initialize the list to an empty list and start appending with the current "gene" record. After the loop that is reading the records finishes, you will have to add one more "send the final list to the function" line of code.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.