Dear Internets,

I am currently trying to write a program that can extract the "header" from a FASTA file. and then output a portion of the header to a new file. The part of the header i need is the "CONTIG_X" part. When I run this program it prints duplicates of the same "CONTIG_X" and then proceeds to the next.

Can you please tell me what I doint wrong?

#include <stdio.h>
int main(int argc, char *argv[]){
    char line[200];
    char part1[25], part2[25], part3[100], part4[25], part5[25];
    int c;
    FILE *file_in = fopen(argv[1], "r");
    FILE *file_out = fopen("header_chart1.txt", "w");

    while(fgets(line, 200, file_in) != NULL){

        if(line[0] == '>')  
            sscanf(line, "%s | %s | %s | %s | %s", part1, part2, part3, part4, part5);
            fprintf(file_out, "%s\n", part2);
    }
    fclose(file_out);
    fclose(file_in);
}

The output looks like this:
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_2
CONTIG_2
CONTIG_2
CONTIG_2
CONTIG_2
CONTIG_2
...

The file (argv[1]) the program opens looks like this: (EXCEPT with thousands of contigs (instead of just 2))

ACDU01000003 | CONTIG_3 | part of supercont3.1 of Allomyces macrogynus ATCC 38327 | [14502-15283] | 782 nt
CGTCGTGGCCATAATAGTCGTCCTTGTCATCATGGCCATAGCCATACGAGTCGTACGAAT
CATGGCCGTAGTAATCGTCCTTCTTGTCGTCATGACCATAGCCATACGAGTCGTACGAGT
CGTAGCTGTCGTAGTCATCCTTGTTGTCATAGCCATAGCCATACGAATCATACGAATCAT
ACDU01000001 | CONTIG_1 | part of supercont3.1 of Allomyces macrogynus ATCC 38327 | [1-833] | 833 nt
CTCCGACTCGCCAGAGTCAAATGGGCTTGCCGAGCGGACGCAGGGTGTGCTCAAGTCGAT
GGTGCGTGCGGCCATGACGGCCGCCAAGGCGCCGGATTCCCTCTGGCCAGAGTGTGTGCG
CGCGGCGTGCTATGTGCGCAACCGTGTGCCAAGTGACTCGCTCGATGGTCGCTCGCCATA

Try enclosing the entire if body in brackets.

while(fgets(line, 200, file_in) != NULL){
   if(line[0] == '>') {
      sscanf(line, "%s | %s | %s | %s | %s", part1, part2, part3, part4, part5);
      fprintf(file_out, "%s\n", part2);
   }
}

What you currently have functions, instead, like

while(fgets(line, 200, file_in) != NULL){
   if(line[0] == '>') {
      sscanf(line, "%s | %s | %s | %s | %s", part1, part2, part3, part4, part5);
   }
   fprintf(file_out, "%s\n", part2);
}

Notice how, in the second example, no matter what the if statement results in you always print out the contents of part2.

OH WOW.

That was so trivial yet so helpful!

Thanks from the the bottom of my random access memmory.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.