Hello there,
I have got a csv file which has
Chromosom_id fstart fstop Count
1 105 1 14.5
1 105 1 14.5
1 105 1 14.5
1 813 797 4
1 813 797 22
1 813 797 4
In this the fstart represents the start of a matching with the genome and the fstop represents the stop of the match(Means the match starts at 105 and ends at 1.) and the counts represents the number of similar matches available with in this region(1-105) which are all of equal lengths. If the counts are greater some arbitrary value (say 7) then those regions are to be taken into account. I have attached the code below.
open (FILE ,"$file") or die "Cannot open the file\n";
my @hit_clusters = <FILE>;
close FILE;
my ($id, $fstart, $fstop, $count);
my ($cluster_start, $cluster_stop, $cluster_dist);
my $row_number =0;
foreach my $file_line(@hit_clusters){
next if $file_line =~m/^\s*$/;#removes spaces
next if $file_line =~m/^(Chromosom_id.+)$/;
if ($file_line =~m/^(.+?)\t(\d+?)\t(\d+?)\t(\d+?)\b/){
($id, $fstart, $fstop, $count)= ($1,$2,$3,$4);
if ($count >= $mini_num_hits){ #to check the counts greater than the arbitrary value
if (!$row_number){
if ($fstart > $fstop){ # if fstart is grater than fstop assign fstop to cluster_start.
$cluster_start = $fstop;
}else {$cluster_start = $fstart;} #if not assign fstart to cluster_start
if ($fstop <$fstart){#if similar to the above case.
$cluster_stop = $fstart;
}else {$cluster_stop = $fstop;}
++$row_number;
}
but the problem is the row_number is not incrementing and it prints the same value all the time.
1 105 1
1 105 1
1 105 1
1 105 1
1 105 1
1 105 1
1 105 1
1 105 1
1 105 1
1 105 1
What I have to do is: set the first fstart in the file as the $cluster_start and while reading through the file if I get another fstop that is less then 250 from the first fstart then I have to add their counts together and extend the region from the first fstart to the current fstop and then reset the cluster_start to the new fstart continue further.
Thanks in advance,