I have two data sets Hdata.txt and Edata.txt that are tab delimited. Both data sets contain information about two groups of people. The first column in both data sets contains the last name of the individual. I wrote a perl program to make a comparison and print out the individuals that are in BOTH data sets. Both of my files are huge and to run this program it takes more than an hour so I was wondering if there is a better way to do this.
I appreciate your help,
Thanks
Here is the program
#!/usr/bin/perl _w
open(Hdata, "Hdata.txt") || die "Can not open the file1\n";
open(Edata, "Edata.txt") || die "Can not open the file2\n";
open(outdata, ">Match.txt") || die "Can not open the file3\n";
# Reads the data #
@H=<Hdata>;
@E=<Edata>;
for($j=1; $j<=$#H; $j++){
for($k=1; $k<=$#E; $k++){
$l1=$H[$j];
chomp $l1;
@line1 = split(/\t/,$l1);
$l2=$E[$k];
chomp $l2;
@line2 = split(/\t/,$l2);
$flag=0;
for($i=1; $i<=$#line1; $i++){
if ($line1[$i] ne $line2[$i]){
$flag=1;
last;
} # end if
}# end for i
if ($flag==0){
print outdata "$line1[0]\t$line2[0]\n";
}#end if
} # end for k
} # end for j
close(Hdata);
close(Edata);
close(outdata);