Search contents of a file

Question

perly 0 Newbie Poster

13 Years Ago

I have two files:

File1:

M1U152S44906X14127_xu
M1U7S112336U117688_xu

File2 (tab delimited):

T1X19S17508N179711_xu AAU_779
M1U152S44906X14127_xu xcup
M1U7S112336U117688_xu mmna

I want to search the content of File 2 using the content of File 1 and then display the output as follows:

Date of search:
The following matches were found in File 1:

T1X19S17508N179711_xu Nothing found
M1U152S44906X14127_xu xcup
M1U7S112336U117688_xu mmna

my code:

#!C:\bin\perl.exe 
use warnings;  
use strict; 

my $REPORT_FILE  = 'outFile.txt';   

my $F1 = 'File1.txt';   
open(RF,"<$F1") || die "can't open $F1 $!";   


my $F2 = 'File2.txt';   
open(RXNs,"<$F2") || die "can't open $F2 $!";   


my %line;    
my %var1; 
my %var2; 

while (my $line = <RF>){ 
            $line = split('\t'); 
            $var1{$1} = {2}; 
} 
close(RF); 

while (my $line = <RXNs>){ 
            $line = split('\n'); 
            $var1{$2}={1}; 
} 
close(RXNs); 



open(DATA,"+>OutFile.txt") or die "Can't open data";   



if (exists $var1{$var2}){   

                print DATA $var1{1}."\t".$var2{2}."\n" ;   

        }   
        else { 

            print "$var2 not found in the file\n";  #Here I want the program to output - Nothing found 
}   

close DATA;


This code returned errors that I have not been able to solve.


Please help!!!

Thanks

file-system perl seo

3 Contributors
11 Replies
206 Views
2 Days Discussion Span
Latest Post 13 Years Ago Latest Post by perly

All 11 Replies

d5e5 109 Master Poster

13 Years Ago

If one of 2teez's scripts works for you with the large files, good. Otherwise you could try the following modified version. When working with large files you need to consider memory limitations vs. speed. I think 2teez's scripts make good use of memory but take a bit longer than using a hash to store the contents of the first file. On the other hand, using a hash could exceed the limits of your computer memory if the first file is too large, in which case the program will stop and you'll get an error message.

#!/usr/bin/perl
use warnings;
use strict;

my $time = scalar localtime();    # get date of search
my $heading = <<"HOF";
 Date of search: $time
 The Following Matches were Found in File 1:
HOF
my %hash;
my $file1 = 'File1.txt';          # file1
open my $fh, '<', $file1 or die "can't open $file1:$!";
while (<$fh>) {
    chomp;
    undef $hash{$_};
}
close $fh;

my $file2 = 'File2.txt';          # file 2
open my $fh_new, '>', 'output.txt' or die "can't open file:$!";    # output file
open $fh, '<', $file2 or die "can't open this file:$!";
print $fh_new $heading;
while ( defined( my $line = <$fh> ) ) {
    chomp $line;
    my $checked_word;
    if ( $line =~ m/(.+?)\s+?.+?$/ ) {
        $checked_word = $1;
        exists $hash{$checked_word}
          ? print $fh_new $line, $/
          : print $fh_new $checked_word, " No Match Found", $/;
    }
}
close $fh     or die "can't close file:$!";
close $fh_new or die "can't close file:$!";

d5e5 109 Master Poster

13 Years Ago

perly, Please make sure that the HOF at the end of that heading string starts at the beginning of a line in your script and is the only word on that line.

 my $heading = <<"HOF";
     Date of search: $time
     The Following Matches were Found in File 1:
HOF

Make sure there are no spaces or tabs before the final HOF and there should be no other characters on that line.

Or, if you prefer, remove the above lines and replace with the following:
my $heading = qq(Date of search: $time\nThe Following Matches were Found in File 1:\n);

Edited 13 Years Ago by d5e5

2teez 43 Posting Whiz

13 Years Ago

Hi perly,

Yes, it works now!!!

Good for you!! Am sure you are happy now!

a.Was this line, using undef, to help free memory?

No! The line:

  undef $hash{$_};

assigns undef as a value to the each key of the hash variable. That can as well be written like so:

  $hash{$_} = undef;

Why was defined used here:

defined: Returns a Boolean value telling whether EXPR has a value other than the undefined value undef.
check

  perldoc -f defined

from your CLI to read more. It's just 59 lines of read!

c. I did not understand this regexp completely, especially why "?" was used when ".+" has already been used.

a? = match 'a' 1 or 0 times,
a+ = match 'a' 1 or more times, i.e., at least once,
however, "?" was used as optional enitity here. More like limiting the extends of "+" matches.
For detailed info, please check:

perldoc perlre

OR

perldoc perlretut

d. Could you please explain this code block

Here, perl inbuilt tertiary operator was used instead of cascaded if-elsif-else!
Infact, the expression above could also be written as:

if(exists $hash{$checked_word}){
   print $fh_new $line, $/;
}
else{
  print $fh_new $checked_word, " No Match Found", $/;
}

The if-else code takes twices the number of codes of the formal one, and it's as compact as the tertiary operator codes shows.

All that codes says is, if the hash key exists, then print to the filehandler $fh_new the line read in, if not still print to the file handler the $checked_word and print again a string "No Match Found";

Hope this helps.
If you are satified, please mark this trend as solved. Thanks

d5e5 commented: Good explanations. +8

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

2teez 43 Posting Whiz · Answer 1 · 2012-05-28T05:34:01+00:00

Hi perly,
There are several gray areas in the code you posted, time will not permit me to point out here.
However, the code below does what you wanted. Please take it as a guide.
It was written on window OS, so don't mind #!/usr/bin/perl.
Hope it helps

#!/usr/bin/perl
use warnings;
use strict;
use Fcntl qw(O_RDONLY);
use Tie::File;
use List::Util qw(first);
use Readonly;

my $time = scalar localtime(); # get date of search

Readonly my $heading => <<"HOF";
 Date of search: $time
 The Following Matches were Found in File 1:

HOF

my $file1 = 'File1.txt'; # file1

tie my @array_file1, 'Tie::File', $file1, mode => O_RDONLY
  or die "can't tie this file: $!";

my $file2 = 'File2.txt'; # file 2

open my $fh_new, '>', 'output.txt' or die "can't open file:$!";    # output file
open my $fh, '<', $file2 or die "can't open this file:$!";
print $fh_new $heading;
while ( defined( my $line = <$fh> ) ) {
    chomp $line;
    my $checked_word;
    if ( $line =~ m/(.+?)\s+?.+?$/ ) {
        $checked_word = $1;
    }
    my $match_word = first { $checked_word eq $_ } @array_file1;
    defined $match_word ? print $fh_new $line, $/ 
                        : print $fh_new $checked_word," No Match Found", $/;
}
close $fh     or die "can't close file:$!";
close $fh_new or die "can't close file:$!";

perly 0 Newbie Poster · Answer 2 · 2012-05-28T07:07:08+00:00

Hi 2teez,

Thanks for writing the code. The code worked for the small files as above, but was hanging when I tried to run it using large files (> 9000 enteries each) and produced no output.

2teez 43 Posting Whiz · Answer 3 · 2012-05-28T11:46:56+00:00

Hi Perly,
change line 19 and 20 from:

 tie my @array_file1, 'Tie::File', $file1, mode => O_RDONLY
    or die "can't tie this file: $!";

to:

my @array_file1;
tie @array_file1, 'Tie::File', $file1, mode => O_RDONLY
    or die "can't tie this file: $!";

and see if it works for file (> 9000).

If Not, then check the below, it's just a little modification the the previous one:

#!/usr/bin/perl
use warnings;
use strict;
use IO::File;

my $time = scalar localtime();    # get date of search

my $heading = <<"HOF";
 Date of search: $time
 The Following Matches were Found in File 1:

HOF

my $file1 = 'File1.txt';          # file1

my $matched_word = {};
my $fh_o         = IO::File->new($file1);
while (<$fh_o>) {
    chomp;
    $matched_word->{$_} = 1;
}
$fh_o->close;

my $file2 = 'File2.txt';          # file 2

open my $fh_new, '>', 'output.txt' or die "can't open file:$!";    # output file
open my $fh, '<', $file2 or die "can't open this file:$!";
print $fh_new $heading;
while ( defined( my $line = <$fh> ) ) {
    chomp $line;
    my $checked_word;
    if ( $line =~ m/(.+?)\s+?.+?$/ ) {
        $checked_word = $1;
        exists $matched_word->{$checked_word}
          ? print $fh_new $line, $/
          : print $fh_new $checked_word, " No Match Found", $/;
    }
}
close $fh     or die "can't close file:$!";
close $fh_new or die "can't close file:$!";

Hope it helps.

perly 0 Newbie Poster · Answer 4 · 2012-05-28T16:37:59+00:00

Hi 2teez and d5e5,

Thank you both for the scripts.

The modification (below) on 2teez' script still did not work:

my @array_file1; tie @array_file1, 'Tie::File', $file1, mode => O_RDONLY or die "can't tie this file: $!";

However, 2teez' second script and d5e5' script generated the similar errors follows:

"Can't find the string terminator "HOF" anywhere before EOF at prog.pl line 5."

Thanks

perly 0 Newbie Poster · Answer 5 · 2012-05-29T02:49:44+00:00

Hi 2teez and d5e5,

Thank you both for the scripts.

The modification (below) on 2teez' script still did not work:

my @array_file1; tie @array_file1, 'Tie::File', $file1, mode => O_RDONLY or die "can't tie this file: $!";

However, 2teez' second script and d5e5' script generated the similar errors follows:

"Can't find the string terminator "HOF" anywhere before EOF at prog.pl line 5."

Thanks

perly 0 Newbie Poster · Answer 6 · 2012-05-29T03:03:53+00:00

Yes, it works now!!! I replaced the previous code with the following:

my $heading = qq(Date of search: $time\nThe Following Matches were Found in File 1:\n);

and also deleted spaces at the end of my files.

Thanks so much 2teez and d5e5, I just have a some questions regarding the last code from d5e5:

a.Was this line, using undef, to help free memory?

open my $fh, '<', $file1 or die "can't open $file1:$!";
    while (<$fh>) {
    chomp;
    undef $hash{$_};
    }

b. Why was defined used here:
.

while ( defined( my $line = <$fh> ) ) {

c. I did not understand this regexp completely, especially why "?" was used when ".+" has already been used.

if ( $line =~ m/(.+?)\s+?.+?$/ ) {
    $checked_word = $1;

d. Could you please explain this code block

exists $hash{$checked_word}
? print $fh_new $line, $/
: print $fh_new $checked_word, " No Match Found", $/;
  }

d5e5 109 Master Poster · Answer 7 · 2012-05-29T14:39:08+00:00

undef $hash{$_};
assigns undef as a value to the each key of the hash variable. That can as well be written like so:

$hash{$_} = undef;

Yes. We need the hash to store keys but we don't need to associate any values with these keys. The undef means something like null or nothing. If we put $hash{$_} = 1; or $hash{$_} = qq(n'importe quoi); it would have worked just as well, except I think we might use more memory to save keys with values than to save keys without values. But the main reason I used undef was it indicates to anyone reading it that we don't need to associate any particular value with the key.

perly 0 Newbie Poster · Answer 8 · 2012-05-30T02:31:47+00:00

Very clear!!! Gentlemen, thanks for your explanations and for writing the wonderful scripts. Much appreciated.

Search contents of a file

Recommended Answers Collapse Answers

All 11 Replies

Recommended Answers