Dear Internets,
I am currently trying to write a program that can extract the "header" from a FASTA file. and then output a portion of the header to a new file. The part of the header i need is the "CONTIG_X" part. When I run this program it prints duplicates of the same "CONTIG_X" and then proceeds to the next.
Can you please tell me what I doint wrong?
#include <stdio.h>
int main(int argc, char *argv[]){
char line[200];
char part1[25], part2[25], part3[100], part4[25], part5[25];
int c;
FILE *file_in = fopen(argv[1], "r");
FILE *file_out = fopen("header_chart1.txt", "w");
while(fgets(line, 200, file_in) != NULL){
if(line[0] == '>')
sscanf(line, "%s | %s | %s | %s | %s", part1, part2, part3, part4, part5);
fprintf(file_out, "%s\n", part2);
}
fclose(file_out);
fclose(file_in);
}
The output looks like this:
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_2
CONTIG_2
CONTIG_2
CONTIG_2
CONTIG_2
CONTIG_2
...
The file (argv[1]) the program opens looks like this: (EXCEPT with thousands of contigs (instead of just 2))
ACDU01000003 | CONTIG_3 | part of supercont3.1 of Allomyces macrogynus ATCC 38327 | [14502-15283] | 782 nt
CGTCGTGGCCATAATAGTCGTCCTTGTCATCATGGCCATAGCCATACGAGTCGTACGAAT
CATGGCCGTAGTAATCGTCCTTCTTGTCGTCATGACCATAGCCATACGAGTCGTACGAGT
CGTAGCTGTCGTAGTCATCCTTGTTGTCATAGCCATAGCCATACGAATCATACGAATCAT
ACDU01000001 | CONTIG_1 | part of supercont3.1 of Allomyces macrogynus ATCC 38327 | [1-833] | 833 nt
CTCCGACTCGCCAGAGTCAAATGGGCTTGCCGAGCGGACGCAGGGTGTGCTCAAGTCGAT
GGTGCGTGCGGCCATGACGGCCGCCAAGGCGCCGGATTCCCTCTGGCCAGAGTGTGTGCG
CGCGGCGTGCTATGTGCGCAACCGTGTGCCAAGTGACTCGCTCGATGGTCGCTCGCCATA