Hi all,
My first, but likely not my last post. I’m teaching myself C++ with basically no programming background. I’ve been using Borland, reading books (such as teach yourself in 21 days, etc.) and reading this website, but now I’ve run into a problem and after three days of trying to figure it out, I decided to seek your help. My code so far is below.
Here’s the deal, I’m trying to write a program that imports a file of DNA sequences (basically A T C and Gs) from one format called FASTA and export those sequences into a new format called PHASE. There is a lot of nonsense in the FASTA file that I need to skip over and the sequence length may change file to file; hence, I’m using a switch to get to the good parts of the FASTA file. That seemed to work out well. But now putting the data into the array is a problem. It does not seem to work like the more simple examples that I’ve seen and practiced.
To start: I want to bring in the first and second DNA sequences, after a bunch of junk characters, into a 2D array. My program nearly works in that it goes to the correct part of the FASTA file before entering characters into the array, but then for some reason it gets stuck in a loop. I’m only seeking help on getting these sequences into a 2D array. But if you have other tips for a beginner working a project like this, let me know.
Here the code and an abridged example of a FASTA file.
#include <iostream>
#include <string>
#include <fstream> //Provides input and output classes
using namespace std;
#pragma hdrstop
#pragma argsused
int main(int argc, char* argv[])
{
ifstream FASTAin("FASTA2.txt");
if (!FASTAin)
{
cout << "File not found.\n";
cout << "Press enter to exit.\n";
}
ofstream PHASEout("PHASE2.txt");
if (!PHASEout)
{
cout << "Unable to export the new PHASE.txt file.";
cout << "Press enter to exit.\n";
}
cout << "File found.\n";
cout << "Press enter continue.\n";
getchar();
const int M=2;
const int N=5000;
char seq1[M][N];
char IDname[80]="";
char ch='a';
unsigned short int row=0;
unsigned short int col=0;
int flipper=-9;
unsigned short int t=0;
while (FASTAin.get(ch))
{
if (ch==10)
{
flipper=0;
}
if (ch=='>') //10 = ascii code for <enter> character
{
flipper=1;
}
if (flipper==-9)
{
int ignore=0;
cout << "."; ignore++;
}
if (flipper==0)
{
for (row=0; row<=M; row++)
{
for (col=0; col<=N; col++)
{
seq1[row][col] = ch;
//Here is the problem!
}
}
}
if (flipper==1)
{
IDname[t]=ch; t++;
}
}
if (FASTAin.eof ()) //ignore this
{
for (int k=1;k<(M+1);k++) //for all int up to i
{
cout << seq1[k]; //print idname
PHASEout << seq1[k];
}
}
cout << seq1[row][col];
cout << "\n***End of FASTA file contents.***\n";
getchar();
FASTAin.close();
PHASEout.close();
getchar();
return 0;
}
------------THE FILE: FASTA.txt--Important stuff starts at BAGATT...
[oi]
>'1_{GreeNlANd}' [Jun 10, 2005]
BAGATTTGGGTACCACCCAAGTATTGACTCACCCATCAACGACCGCTATGTATTTCGTAC
ATTACTGCCAGTCACCATGAATATTGTACGGTACCATAAATACTTGACCACCTGTAGTAC
ATAAAAACCCAATCCACATCAAAACCCCCTCCCCATGCTTACAAGCAAGTACAGCAATCA
ACCTTCAACTATCACACATCAACTGCAACTCCAAAGCCACCCCTCGCCCACTAGGATACC
AACAAACCTATCCACCCTTAACAGTACATAGTACATAAAACCATTTACCGTACATAGCAC
ATTACAGTCAAATCCCTTCTCE
>'2_{GreeNlANd}'
bAGATTTGGGTACCACCCAAGTATTGACTCACCCATCAACAACCGCTATGTATCTCGTAC
ATTACTGCCAGTCACCATGAATATTGTACGGTACCATAAATACTTGACCACCTGTAGTAC
ATAAAAACCCAATCCACATCAAAACCCCCTCCCCATGCTTACAAGCAAGTACAGCAATCA
ACCTTCAACTATCACACATCAACTGCAACTCCAAAGCCACCCCTCGCCCACTAGGATACC
AACAAACCTATCCACCCTTAACAGTACATAGTACATAAAACCATTTACCGTACATAGCAC
ATTACAGTCAAATCCCTTCTCe