Dear All,
I have two files where I need to map file_1 to file_2 based on the similarity of the 1st column from file_1 to 8th column to file_2 and print the similar rows with annotaions for that. I am able to achieve that with the following perl script.
But the problem is my perl script is not able to differntiate the dot(.) in the word which I used to map the columns.
my file_1 and file_2 looks like following,
file_1:
CUFF.2 chr1:14362-29806 24.2763 22.1124 26.4401 OK
CUFF.23 chr1:89294-173862 4.95251 3.44948 6.45555 OK
and
file_2:
chr1 Cufflinks transcript 11869 14409 1 + . CUFF.2 transcript_id ENST00000456328.2 FPKM 0.0000000000 frac 0.000000 conf_lo 0.000000 conf_hi
0.059559 cov 0.000000 full_read_support no
chr1 Cufflinks exon 11869 12227 1 + . CUFF.23 transcript_id ENST00000456328.2 exon_number 1 FPKM 0.0000000000 frac 0.000000 conf_lo 0.000000
conf_hi 0.059559 cov 0.000000
the perl code I used,
`
#open the second file, then step through it and find
#match values to display
my %gen_id;
# open the first file, split and save in an hash
open my $fh, '<', $ARGV[0] or die "can't open file: $!";
while (<$fh>) {
my ( $id, $value ) = split;
$gen_id{$id} = $value;
}
close $fh or die "can't close file: $!";
#open the second file, then step through it and find
#match values to display
my %get_data;
open $fh, '<', $ARGV[1] or die "can't open file: $!";
while (<$fh>) {
#get only values in indexes 11,13 & 14,
#starting from 0
foreach my $file ( [split] ) {
if ( exists $gen_id{ $file->[8] } ) {
$get_data{ $file->[0]}=join "\t"=> @$file[1,2,3,4,5,6,7,8,9];
}
}
}
print $_,"\t",$get_data{$_},"\t", $/ for sort keys %get_data;
close $fh or die "can't close file: $!";
#print $_,"\t",$get_data{$_}, $/ for sort keys %get_data;Inline Code Example Here
`
This codes gives me output which looks like this,
chr1 Cufflinks exon 11869 12227 1 + . CUFF.23 transcript_id ENST00000456328.2 exon_number 1 FPKM 0.0000000000 frac 0.000000
wherein I am suppose to get the this line as,
chr1 Cufflinks exon 11869 12227 1 + . CUFF.23 transcript_id ENST00000456328.2 exon_number 1 FPKM 0.0000000000 frac 0.000000
chr1 Cufflinks transcript 11869 14409 1 + . CUFF.2 transcript_id ENST00000456328.2 FPKM 0.0000000000 frac 0.000000 conf_lo 0.000000
That is the perl script is considering both CUFF.2 and CUFF.23 as same asnd its removing the duplicated line.The reality is its not duplicate its two different names, it would be really great if someone would help me to alter the code a little bit so that I will get the output I want here.
Thank you