Hi :-|

I am workning on a projekt based on DNA sequences in Perl. I wanna make a program that reverse complements the DNA sequence and writes it into a file fx. revdna.fsa (Which I have done ) but the problem is that I want to keep the first unaltered because it's just a description line and therefore has to stay that way even though I change the lines after it.
Also If I wanna keep the first line AND add some more text to the first line-how do I do that???

This is a DNA seqeunce taken from the GenBank database. (the first line is not part of the sequence and is the line I wanna keep as it is)

>AB000410 1559 bp mRNA Homo sapiens
GAGAAGATAAGTCGCAAGGAGGGGGCGGGACCCACACCTCAGGAAAGCCGGAGAATTGGG
GCACGCAAGCGGGGGGGCTTTGATGACCCCCCAAAGGGCGAGGCATGCAGGAGGTGGAGG
AATTAAGTGAAACAGGGAAGGTTGTTAAACAGCACCGTGTGGGCGAGGCCTTAAGGGTCG
TGGTCCTCGTCTGGGCGGGGTCTTTGGGCGTCGACGAGGCCTGGTTCTGGGTAGGCGGGG
CTACTACGGGGCGGTGCCTGCTGTGGAAATGCCTGCCCGCGCGCTTCTGCCCAGGCGCAT
GGGGCATCGTACTCTAGCCTCCACTCCTGCCCTGTGGGCCTCCATCCCGTGCCCTCGCTC
TGAGCTGCGCCTGGACCTGGTTCTGCCTTCTGGACAATCTTTCCGGTGGAGGGAGCAAAG
TCCTGCACACTGGAGTGGTGTACTAGCGGATCAAGTATGGACACTGACTCAGACTGAGGA
GCAGCTCCACTGCACTGTGTACCGAGGAGACAAGAGCCAGGCTAGCAGGCCCACACCAGA
CGAGCTGGAGGCCGTGCGCAAGTACTTCCAGCTAGATGTTACCCTGGCTCAACTGTATCA
CCACTGGGGTTCCGTGGACTCCCACTTCCAAGAGGTGGCTCAGAAATTCCAAGGTGTGCG


I have done this so far:

open(IN, '<',"dna.dat") or die "Can't read file\n $!";
$dna = "";
while(defined($line=<IN>)){
chomp $line;
$dna .= $line;
}
close IN;

$cdna = "";
for($i=0; $i < length($dna); $i++){
$base =substr($dna, $i, 1);

if($base eq "A"){
$base= "T";
}
elsif($base eq "T"){
$base="A";
}
elsif($base eq "C"){
$base= "G";
}
elsif($base eq "G"){
$base= "C";
}
else {
die "Unknown base; $base\n";
}
$cdna .= $base;
}


$rdna = "";
for($i=-1; $i >= -length($cdna); $i--){

$base =substr($cdna, $i, 1);

$rdna .= $base;
}

)
print "The DNA string is now reversed complemented: $rdna\n";


__END__

here the file "dna.dat" containe the DNA sequence without the first line.

hope u can help me to skipt the first line;)

Say your data is in a variable called $page. You can use the split function to split on the first line break only and then only apply your data processing code to the text that appears after that. Take a look at this:

@stuff = split(/\\n/, $page, 2);

You can find out how the split function works here:

http://perldoc.perl.org/functions/split.html

In the example I have given above, the data in the variable $page after the first line break is put into $stuff[1]. Remember, some characters (including \ ) have special meanings when used inside a regular expression (//) and therefore need to be escaped with a \ . It would be useful to give this page about regular expressions a good read if you haven't yet:

http://perldoc.perl.org/perlretut.html

I hope this helps.

Steven.

#!/usr/bin/perl
use strict;
use warnings;
open(IN, '<',"dna.dat") or die "Can't read file\n $!";
my $first_line = <IN>;
my $dna = '';
while(my $line=<IN>){
   chomp $line;
   $dna .= $line;
}
close IN;
my $cdna = '';
for my $i (0 .. length($dna)-1){
   $_ = substr($dna, $i, 1);
   if (!/[TAGC]/) {die "Unkown base: '$_'\n";}
   $cdna .= ($_ eq 'A') ? 'T' : 
            ($_ eq 'T') ? 'A' :
            ($_ eq 'C') ? 'G' :
            ($_ eq 'G') ? 'C' : '';
}
my $rdna = reverse $cdna;
print "The DNA string is now reversed complemented:\n\n $first_line\n$rdna\n";

this line:

$rdna = reverse $cdna;

should be:

my $rdna = reverse $cdna;

Hi guys ....

Thanxs alot for ur big help!
Kevin I have tried the following but when I want to write the data into a file I get problems with displaying the first line without repeating itself for every new data line (because of loop):
How do I display the first line only once and followed by the dataset

I have tried this:

open(IN, '<',"dna.fsa") or die "Can't read file\n $!";
my $first_line = <IN>;
my $dna = '';
while(my $line=<IN>){
chomp $line;
$dna .= $line;
}
close IN;


$cdna = "";
for($i=0; $i < length($dna); $i++){
$base =substr($dna, $i, 1);

if($base eq "A"){
$base= "T";
}
elsif($base eq "T"){
$base="A";
}
elsif($base eq "C"){
$base= "G";
}
elsif($base eq "G"){
$base= "C";
}
else {
die "Unknown base; $base\n";
}
$cdna .= $base;
}

my $rdna = reverse $cdna;

substr($first_line, -1, 0)= "ComplementStrand";

open(OUT,'>' , "revdna.fsa") or die "Can't write file\n $!";

for($i=0; $i < length($rdna);$i+=60){
$base =substr($rdna, $i, 60);
print OUT "$base\n";
}
close OUT;

print "The DNA in FASTA format is now reversed complemented: \n", "$first_line $rdna\n";

as you can see here I only got the DNA sequence writing in the revdna.fsa, what should I do to display the first line followe by the given string ...!

Thanx

print it outside the loop.

open(OUT,'>' , "revdna.fsa") or die "Can't write file\n $!";
print OUT "$first_line\n";
for($i=0; $i < length($rdna);$i+=60){
$base =substr($rdna, $i, 60);
print OUT "$base\n";
}
close OUT;

Hmmm I had tried that, and its still wouldnt work when I nedit revdna.dat, only the sequence is showing!!!!!
the first line apperently cant be seen in the texteditor .....!

hard to say why it's not working because in the code you posted it never prints $first_line to a file and you print to a file called "revdna.fsa" but then in your last post you mention "revdna.dat".

This line in your last code post:

substr($first_line, -1, 0)= "ComplementStrand";

is simpler written as:

$first_line .= "ComplementStrand";

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.