d5e5 109 Master Poster

my $save_header; #define g̶l̶o̶b̶a̶l̶ lexical variable whose scope includes entire script

my @flds = split;#r̶e̶a̶d̶ ̶f̶i̶l̶e̶ ̶c̶o̶n̶t̶e̶n̶t̶s̶ ̶i̶n̶t̶o̶ ̶a̶n̶ ̶a̶r̶r̶a̶y̶ Equivalent to my(@flds) = split(' ', $_, 0);

print "\n" if $save_header and $_ ne $save_header; #print newline i̶n̶ ̶b̶o̶t̶h̶ ̶c̶a̶s̶e̶s̶ before printing a new header (blank line between groups)

$save_header = $_;#assign the value of the current header ($_) to a variable so we can compare in the future to know if we are about to start another header group (i.e. whether we want a blank line separating previous group from current group.

sub findit {#sub routine -opens file and returns m̶a̶t̶c̶h̶e̶d̶ ̶e̶l̶e̶m̶e̶n̶t̶s̶ true (non-zero value) if parameter found in file, false (0) if not found in file.

return(1) if index($_,$word) > -1;# c̶h̶e̶c̶k̶ ̶i̶f̶ ̶s̶e̶a̶r̶c̶h̶ ̶w̶o̶r̶d̶ ̶f̶r̶o̶m̶ ̶i̶n̶p̶u̶t̶ ̶f̶i̶l̶e̶2̶.̶t̶x̶t̶ ̶m̶a̶t̶c̶h̶e̶s̶ ̶a̶ ̶p̶a̶r̶t̶i̶c̶u̶l̶a̶r̶ ̶i̶n̶d̶e̶x̶ ̶e̶l̶e̶m̶e̶n̶t̶ ̶i̶n̶ ̶i̶n̶p̶u̶t̶ ̶f̶i̶l̶e̶1̶ If the word is NOT found in the current record the index will give a value of -1. If the word is found, the index will give a number indicating where in the current record the word was found. I can't explain it better than http://perldoc.perl.org/functions/index.html

d5e5 109 Master Poster

You could also try the following:

#!/usr/bin/perl
use strict; 
use warnings; 

open my $fh, '<', 'input1.csv' or die "Failed to open input1.csv $!";
my $save_header;
while (<$fh>){
    my @flds = split;
    if (@flds == 1){
        s/\s//g;
        print "\n" if $save_header and $_ ne $save_header;
        print "$_:\n";
        $save_header = $_;
    }
    else{
        print "$flds[0]\n" if findit($flds[0]);
    }
}

sub findit {
    my $word = shift;
    open my $fh, '<', 'input2.csv' or die "Failed to open input2.csv $!";   
    while (<$fh>) {
        return(1) if index($_,$word) > -1;
    }
}
d5e5 109 Master Poster

Look closely at lines 16 - 17.

                my $Header;
                $hash{$Header} = $_;

You declare a lexical variable called $Header and assign no value to it, so its value is undefined. Then you use $Header (whose value is undefined) as a key to $hash. In effect you are assigning all the header values to one key (key = undef) in your hash. Each value replaces the previous value because you always use the same undefined key.

d5e5 109 Master Poster

The following script creates a table (if it doesn't already exist) and saves all the image names (extension included) and the directory names.

#!/usr/bin/perl
use strict; 
use warnings; 
use DBI;
use File::Glob ':glob';
use File::Basename;

my $dbh=DBI->connect('dbi:mysql:daniweb','david','dogfood') ||
   die "Error opening database: $DBI::errstr\n";

$dbh->do("CREATE TABLE IF NOT EXISTS images (id INT NOT NULL AUTO_INCREMENT,
                                i_name VARCHAR(64),
                                i_dir VARCHAR(100),
                                PRIMARY KEY (id))");

my $sth = $dbh->prepare("insert into `images` (i_name, i_dir) values (?, ?)")
             or die "prepare failed: " . $dbh->errstr();  


my @files = </home/david/Pictures/*>;
foreach (@files) {
    my($filename, $dir) = fileparse($_);
    $sth->execute($filename, $dir) or die "execute failed: " . $sth->errstr(); 
}

# Close connection
$sth->finish();
undef($dbh);
2teez commented: Nice one +3
d5e5 109 Master Poster

I should add that experienced programmers do not require their users to memorize the real names of all their subroutines to use as command-line arguments at run-time. Make the command-line arguments short, easy to remember and spell. Use a hash or an if-elsif-else statement to associate the command-line arguments with the name of the appropriate function or subroutine. If the argument value does not correspond to any of the expected values, your script should display a warning or error message.

d5e5 109 Master Poster

You could write your subroutines as values in a hash in your script and then run it from the commandline with the name of your desired function as the first argument on the command line. For example, a script named many_functions.pl

#!/usr/bin/perl
use warnings;
use strict;

my %funcs = (thisismyfunction1 => sub {print 'hello world number 1',"\n";},
             thisismyfunction2 => sub {print 'hello world number 2',"\n";},
            thisismyfunction3 => sub {print 'hello world number 3',"\n";},);

my $what_to_do = $ARGV[0];

$funcs{$what_to_do}();

Test it from the command line as follows:

david@david-laptop:~/Programming/Perl$ perl many_functions.pl thisismyfunction1
hello world number 1
david@david-laptop:~/Programming/Perl$ perl many_functions.pl thisismyfunction2
hello world number 2
david@david-laptop:~/Programming/Perl$ perl many_functions.pl thisismyfunction3
hello world number 3

You may prefer to name your subroutines, especially if they may contain lots of code, and store references to the subroutines as values in a hash. The following would give the same results as the above script:

#!/usr/bin/perl
use warnings;
use strict;

my %funcs = (thisismyfunction1 => \&world_1,
             thisismyfunction2 => \&world_2,
            thisismyfunction3 => \&world_3,);

my $what_to_do = $ARGV[0];

$funcs{$what_to_do}();

sub world_1 {print 'hello world number 1',"\n";}
sub world_2 {print 'hello world number 2',"\n";}
sub world_3 {print 'hello world number 3',"\n";}
d5e5 109 Master Poster

Do you have a solution now to the Perl aspect of your question? If so please mark this topic solved.

d5e5 109 Master Poster

Hi 2teez,

doesn't have to associate to any persons name

I don't know the reason for Serpterion's requirement to append application ids onto records for persons that probably have no relation to those ids. I would consider the column of ids as logically separate from the name and city records. Perhaps putting them there saves the trouble of putting them in a separate spreadsheet. I wouldn't recommend arranging the data that way if there's any chance of its becoming a long-term application that you would eventually have to explain to someone else. A data base analyst, for example, wouldn't like it.

d5e5 109 Master Poster

The following modified version of my script outputs the ids so they start at the top of your id column.

#!/usr/bin/perl
use warnings;
use strict;
use Text::CSV;

my $csv = Text::CSV->new ( { binary => 1 } )  # should set binary attribute.
                 or die "Cannot use CSV: ".Text::CSV->error_diag ();

#slurp file into one string because it contains multi-line records
my $string = do {
    local $/;
    my $filename = 'people.txt';
    open my $fh, '<', $filename or die "Failed to open $filename: $!";
    <$fh>
};
$string =~ s/\n/!sep!/g;#Replace newlines with !sep! marker
#Break the string into records beginning with "\n"
$string =~ s/(Name|ApplicationID)/\n$1/g;

#Process each 'record'
my @lines;
my @ids;
foreach (split "\n", $string){
    next if m/^$/;#Remove empty 'record'
    my ($id) = m/^ApplicationID:.*?(\d+)(?:!sep!)?$/g;
    my $status;
    my @columns;
    if ($id){
        push @ids,$id;
        next;
    }
    my ($name,$loc) = m/^Name[:|\s](.+?)(?:!sep!).*?Location:(.+?)(?:!sep!)?$/g;
    if ($loc){
        push @columns,$name,$loc;
        push @lines, [@columns];
    }
}
while (@lines || @ids){
    my ($name, $loc,$line, $id);
    if (@lines){
        $line = shift @lines;
        ($name,$loc) = @$line;
    }
    if (@ids){
        $id = shift @ids;
    }
    my @columns;
    push @columns, $name,$loc,$id;
    my $status = $csv->combine(@columns);    # combine columns into a string
    print $csv->string(), "\n";          # print the combined string
}

Outputs:
"Bob Gray","Burmingham,AL",10012
"Ron Alberta","Omaha,NE",10013
,,10013

d5e5 109 Master Poster

Multi-line records with no record separator make it tricky. As for writing CSV output it's worth installing the Text::CSV module if you don't already have it.

#!/usr/bin/perl
use warnings;
use strict;
use Text::CSV;

my $csv = Text::CSV->new ( { binary => 1 } )  # should set binary attribute.
                 or die "Cannot use CSV: ".Text::CSV->error_diag ();

#slurp file into one string because it contains multi-line records
my $string = do {
    local $/;
    my $filename = 'people.txt';
    open my $fh, '<', $filename or die "Failed to open $filename: $!";
    <$fh>
};
$string =~ s/\n/!sep!/g;#Replace newlines with !sep! marker
#Break the string into records beginning with "\n"
$string =~ s/(Name|ApplicationID)/\n$1/g;

#Process each 'record'
foreach (split "\n", $string){
    next if m/^$/;#Remove empty 'record'
    my ($id) = m/^ApplicationID:.*?(\d+)(?:!sep!)?$/g;
    my $status;
    my @columns;
    if ($id){
        push @columns,undef,undef,$id;#Make two empty columns followed by id
        $status = $csv->combine(@columns);    # combine columns into a string
        print $csv->string(), "\n";          # print the combined string
        next;
    }
    my ($name,$loc) = m/^Name[:|\s](.+?)(?:!sep!).*?Location:(.+?)(?:!sep!)?$/g;
    if ($loc){
    push @columns,$name,$loc;
    $status = $csv->combine(@columns);    # combine columns into a string
    print $csv->string(), "\n";          # print the combined string
    }

}

For me, running the above script and reading your sample input prints the following:

"Bob Gray","Burmingham,AL"
"Ron Alberta","Omaha,NE"
,,10012
,,10013
,,10013

d5e5 109 Master Poster

Was "$/" in lines the scripts (32 and 33 - d5e5) and lines (28 and 30 - 2teez) used as a dereference and why?

No. I copied the lines where "$/" occurs from 2teez' scripts. By default the Perl special variable has a value of "\n" and, since our scripts don't modify it, we could have put "\n" instead and achieved the same result.

$/
The input record separator, newline by default. This influences Perl's idea of what a "line" is.
from perldoc perlvar

2teez commented: Nice one!!! +2
d5e5 109 Master Poster

open my $fh, '&lt;', $file1 or die "can't open $file1:$!";

Why do you have statements like the above where I see '&lt;' instead of '<'? Besides fixing the above, mostly all you need to do to get what you want is to switch the filenames, as follows:

#!/usr/bin/perl
use warnings;
use strict;
my $time = scalar localtime();    # get date of search

#Heading should say 'File 2', not 'File 1'
my $heading =
  qq(Date of search: $time\nThe Following Matches were Found in File 2:\n);
my %hash;

#Save the file you want to search ($file2, not $file1, into %hash)
my $file2 = 'File2.txt';    # file 2: file with two columns to be searched with contents of file1.
open my $fh, '<', $file2 or die "can't open $file2:$!";
while (<$fh>) {
    chomp;
    my ($probe, $value) = split /\s+/, $_;
    $hash{$probe} = $value;
}
close $fh;

my $file1 =
  'File1.txt'; # file1: one column of probe names for searching the second file.
open my $fh_new, '>', 'outFile.txt' or die "can't open file:$!";   # output file
open $fh, '<', $file1 or die "can't open this file:$!"; # ($file1, not $file2)
print $fh_new $heading;
while ( defined( my $line = <$fh> ) ) {
    chomp $line;
    my $checked_word;
    if ( $line ) {
        $checked_word = $line;
        exists $hash{$checked_word}
          ? print $fh_new $checked_word, ' ', $hash{$checked_word}, $/
          : print $fh_new $checked_word, " No Match Found", $/;
    }
}
close $fh     or die "can't close file:$!";
close $fh_new or die "can't close file:$!";
d5e5 109 Master Poster

undef $hash{$_};
assigns undef as a value to the each key of the hash variable. That can as well be written like so:

$hash{$_} = undef;

Yes. We need the hash to store keys but we don't need to associate any values with these keys. The undef means something like null or nothing. If we put $hash{$_} = 1; or $hash{$_} = qq(n'importe quoi); it would have worked just as well, except I think we might use more memory to save keys with values than to save keys without values. But the main reason I used undef was it indicates to anyone reading it that we don't need to associate any particular value with the key.

d5e5 109 Master Poster

perly, Please make sure that the HOF at the end of that heading string starts at the beginning of a line in your script and is the only word on that line.

 my $heading = <<"HOF";
     Date of search: $time
     The Following Matches were Found in File 1:
HOF

Make sure there are no spaces or tabs before the final HOF and there should be no other characters on that line.

Or, if you prefer, remove the above lines and replace with the following:
my $heading = qq(Date of search: $time\nThe Following Matches were Found in File 1:\n);

d5e5 109 Master Poster

If one of 2teez's scripts works for you with the large files, good. Otherwise you could try the following modified version. When working with large files you need to consider memory limitations vs. speed. I think 2teez's scripts make good use of memory but take a bit longer than using a hash to store the contents of the first file. On the other hand, using a hash could exceed the limits of your computer memory if the first file is too large, in which case the program will stop and you'll get an error message.

#!/usr/bin/perl
use warnings;
use strict;

my $time = scalar localtime();    # get date of search
my $heading = <<"HOF";
 Date of search: $time
 The Following Matches were Found in File 1:
HOF
my %hash;
my $file1 = 'File1.txt';          # file1
open my $fh, '<', $file1 or die "can't open $file1:$!";
while (<$fh>) {
    chomp;
    undef $hash{$_};
}
close $fh;

my $file2 = 'File2.txt';          # file 2
open my $fh_new, '>', 'output.txt' or die "can't open file:$!";    # output file
open $fh, '<', $file2 or die "can't open this file:$!";
print $fh_new $heading;
while ( defined( my $line = <$fh> ) ) {
    chomp $line;
    my $checked_word;
    if ( $line =~ m/(.+?)\s+?.+?$/ ) {
        $checked_word = $1;
        exists $hash{$checked_word}
          ? print $fh_new $line, $/
          : print $fh_new $checked_word, " No Match Found", $/;
    }
}
close $fh     or die "can't close file:$!";
close $fh_new or die "can't close file:$!";
d5e5 109 Master Poster

I face some real problems here. I need a Perl script that can open a text file, read a series of URLs, get the content of the pages and save it to another file.

Divide your task into steps, such as:

  1. open a text file
  2. read a series of URLs
  3. get the content of the pages for each URL
  4. save contents of pages to another file

Which of these steps gives you problems?

Could you attach an example of the text file you need to read? (Click on 'Files' in the Post menu here to upload a text file.)

d5e5 109 Master Poster

First, I agree with 2teez that what when we write scripts we present them as examples of ways to do something you ask us how to do. I can't give you perfect solutions that will work without your having to make your own changes to adapt them to your requirements.

Second, my subroutine to sort the dates and times can be improved. If you replace the by_date_time subroutine with the following it should sort the report lines in descending order which I believe is what you want. (I included the statement for sorting in ascending order in a comment.)

sub by_date_time{
    my ($dta, $dtb);
    my @flds_a = split /\s+/, $a;
    if (@flds_a > 4){
        $dta = join ' ', @flds_a[4,5];
    }
    else{
        $dta = '';
    }

    my @flds_b = split /\s+/, $b;
    if (@flds_b > 4){
        $dtb = join ' ', @flds_b[4,5];
    }
    else{
        $dtb = '';
    }

    #$dta cmp $dtb;#Sort in Ascending Order
    $dtb cmp $dta;#Sort in Descending Order
}

Total alarms=156

And third, I see more than 1000 lines in the OUTPUT.txt (see attached file) so I don't understand why the total should be 156.

d5e5 109 Master Poster

The following works if you have the Tie::File module. If you don't have it you can find it on CPAN.

#!/usr/bin/perl
use warnings;
use strict;
my %has_site;
my ( $site_file, $input_file ) = qw(site.txt INPUT.txt);

open my $fh, '<', $site_file or die "can't open this file:$!";
while (<$fh>) {
    s{^\s+}{};
    s{\s+$}{};#I have linux so chomp doesn't remove CR/LF for me
    my ( $bsc, $bcf, $site ) = split /\s+/, $_;
    $has_site{$bsc . '  ' . $bcf} = $site;#I prefer looking up with bsc and bcf as key
}
close $fh or die "can't close file:$!";

open my $fh2, '>', 'OUTPUT.txt'
  or die "can't open this file:$!";    # output file

open $fh, '<', $input_file or die "can't open this file:$!";
my $hdrs = <$fh>;
$hdrs =~ s{^\s+}{};
$hdrs =~ s{\s+$}{};#I have linux so chomp doesn't remove CR/LF for me
$hdrs = 'SITE         ' . $hdrs;
$hdrs =~ s{\s+$}{};#Remove CR/LF
print $fh2 $hdrs, "\n";

while (<$fh>) {
    s{^\s+}{};
    s{\s+$}{};#I have linux so chomp doesn't remove CR/LF for me
    s{\(\d\d\d\d\d\) \d\d\d\d}{            };#Remove unwanted (25441) 7622 etc.
    my ( $bsc, $bcf ) = split /\s+/, $_;
    my $key = $bsc . '  ' . $bcf;
    my $site;
    if (exists $has_site{$key}){
        $site = $has_site{$key};
    }
    else{
        $site = '______'
    }
    print $fh2 $site, " " x length($site), $_, "\n";

}
close $fh  or die "can't close file:$!";
close $fh2 or die "can't close file:$!";

#Now you want to sort OUTPUT.txt
#This should really be a separate script
undef %has_site; #Reclaim …
d5e5 109 Master Poster

Another question for abulut. I see lines in INPUT.txt that include BADYK01 BCF-1621 and BADYK01 BCF-3000 but I don't find any BCF-1621 or BCF-3000 in the site.txt file. Do you want these lines output with no site?

d5e5 109 Master Poster

Thanks for helping .. could you pls order them time by time. Also they should be from the new time to last time

abulut, Do you mean you want them ordered by a combination of date and time? Should 2012-03-16 14:29:54.61 print before (above) 2012-04-23 07:16:21.83 for example? Normally we order by date and time, not time alone.

d5e5 109 Master Poster

First, always use strict; for anything more than a one-liner.

Second, indexing of arrays starts at 0, so you will refer to the eighth element in the result of your split as $whatever[7]. (Note that $whatever[9] is undefined.)

Also note that the package global variables $a and $b have a special meaning in Perl so avoid declaring lexical variables $a and $b having the same name as those special variables because that can have side effects.

Since the decision whether or not to print a line depends only on the contents of that semicolon-delimited string, you don't need to assign all the columns to individual variables. Just print the whole line after testing the one element that determines whether you want to print it. The following works for me. See the attachements file1.txt and result1.txt

#!/usr/bin/perl
use strict;
use warnings;

#inputfile
my $input_file = 'file1.txt';
my $output_file = 'result1.txt';

open my $fh_in, '<', $input_file or die "Failed to open $input_file: $!";
open my $fh_out, '>', $output_file or die "Failed to open $output_file: $!";

while (<$fh_in>){
    chomp;
    if (ok($_) == 1){
        print $fh_out $_, "\n";
    }
}

sub ok{
    my $rec = shift;
    return 1 if $rec =~ m/^#/; #Comments ok to print
    my @cols = split /\s+/, $rec;
    my $test_this = $cols[7];
    return 0 if $test_this =~ m/sno=;/; #Don't print
    return 0 if $test_this =~ m/toss=head->head/ or $test_this =~ m/toss=tail->tail/;
    return 1; #Must have passed tests
}
d5e5 109 Master Poster

Thanks for helping test this. I couldn't understand why my script worked for me and not for perllearner007.

Just as you did, I selected and copied the data provided by perllearner007 as posted on top of this post and pasted it into Komodo edit to make the above attached file, without changing anything. Komodo edit doesn't automatically strip leading spaces and my browser (Google Chrome) shouldn't make a difference, but who knows?

I agree that perllearner007's task is essentially solved. If we added some regex to strip out leading spaces before parsing each data line we could easily make a script that would work for data having or not having leading spaces. See perlfaq4 for one way to strip leading and trailing spaces.

d5e5 109 Master Poster

To answer the first part of your question about removing spaces from the number and reading from four input files:

#!/usr/bin/perl
use strict;
use warnings;

@ARGV = qw(PT1.txt PT2.txt YK1.txt YK2.txt);

while (my $line = <>){
    my @fields = split /\s+/, $line;
    print @fields[6..8], "\t", $fields[9], ' | ', $ARGV, "\n";
}
d5e5 109 Master Poster

I wonder if the data gets modified in the process of posting to and copying from Daniweb?

When I select the data provided at the top of this article, copy and paste it into my text editor and save I get the attached (please see the File Attachment.) The families.txt file that I've been working with has no spaces to the left of the data. I didn't remove them so if perllearner007's original records start with spaces, then transferring to and from the Daniweb post somehow stripped them. If you will post the families.txt file you are testing with as a File Attachment then I can test my script with it.

I think if we can verify that we're all testing with the same data we can settle on a script that works for all, whether or not the script makes use of regex or the split function.

d5e5 109 Master Poster

2teez, when I made your change the script didn't work for me anymore. I just prints blank lines and prompts for Name again. I don't understand why it works for you. Maybe the default behavior of split has changed, or maybe the Tie::File module that comes with my version of Perl works differently. My version of Perl is 5.14 subversion 2 for i686 linux.

When I run the following simplified script:

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

while (<DATA>){
    my @fields = split;
    print Dumper(\@fields);
}

__DATA__
002  Kate      Rhodes,Mich           Positive

My output is:

$VAR1 = [
          '002',
          'Kate',
          'Rhodes,Mich',
          'Positive'
        ];
d5e5 109 Master Poster

When I run the above script in my terminal, searching for names that exist in the file (Kate, etc.) and one that doesn't (Ogden) I get the following output:

david@david-laptop:~/Programming/data$ perl ../Perl/temp01.pl

Name: Kate

Kate Rhodes
Shirley Rhodes

Name: Shirley

Kate Rhodes
Shirley Rhodes

Name: Ogden

Name: Dexter

Aaron Sunders
Dexter Sunders

Name:
david@david-laptop:~/Programming/data$

I don't know why it doesn't work for you. Are you sure your input file exists in the specified path and contains the expected records? One way to debug is to add print statements to the script at strategic points to print the contents of some of the variables. How do I debug my Perl programs Also, the fact that the script doesn't tell you when a name is not found is a bug, not a feature. I forgot to check for that case and give a message when the name doesn't match any of the records.

Try running the following, modified version of the script:

#!/usr/bin/perl
use warnings;
use strict;

use Tie::File;
use Fcntl 'O_RDONLY';

#Change the following path to input directory and file if necessary
my $filename = '/home/david/Programming/data/families.txt';

# open an existing file in read-only mode
die "Unable to find $filename" unless -e $filename;
tie my @families, 'Tie::File', $filename, mode => O_RDONLY or die "$!";

#added for debugging
my $count = @families;
print "$filename contains the following $count lines:\n", join "\n", @families, "\n";
#-----

#Infinite loop, exit by entering blank input
while (1){
    print "\nName: "; …
d5e5 109 Master Poster

The following works for me. Make sure you change the path to your own directory before running it.

#!/usr/bin/perl
use warnings;
use strict;

use Tie::File;
use Fcntl 'O_RDONLY';

#Change the following path to input directory and file if necessary
my $filename = '/home/david/Programming/data/families.txt';

# open an existing file in read-only mode
tie my @families, 'Tie::File', $filename, mode => O_RDONLY or die "$!";

#Infinite loop, exit by entering blank input
while (1){
    print "\nName: ";
    chomp( my $name = <> );
    last unless $name =~ /\S/;
    print "\n";
    my %fam_names;
    foreach (@families){
        my @fields = split;
        my ($firstname, $family_tag) = @fields[1, 2];
        next unless $firstname eq $name;
        my ($lastname) = split /,/, $family_tag;
        undef $fam_names{$lastname};
    }

    foreach my $k (keys %fam_names){
        foreach my $rec (@families){
            my @fields = split /\s+/, $rec;
            my ($firstname, $family_tag) = @fields[1, 2];
            my ($lastname) = split /,/, $family_tag;
            print "$firstname $lastname\n" if $lastname eq $k;
        }
    }
}
d5e5 109 Master Poster

Line 40 probably doesn't do whatever you intend it to do. if ( my $Tags = $names{$name} ) { declares a new variable and assigns the value of $names{$name} to it, and tests this assignment with an if. That won't give an error message but it doesn't make much sense.

d5e5 109 Master Poster

2teez when I run your script and enter Kate when prompted I get no output. No error but no output either. I made a few changes in the loop that builds the hash and now it works OK for me.

perllearner007 I couldn't recreate your error but you might as well try the following modified version which works OK for me.

#!/usr/bin/perl
use warnings;
use strict;
use Carp qw(croak);
croak
"Usage: script.pl db_name_file.txt, you must specify the file that contained names to search from"
  unless defined $ARGV[0];    ## error msg if input file is not specified
print "Enter name to search: ";
my %has_val;
chomp( my $name = <STDIN> );    ## get name to search from user
my $input_file = $ARGV[0];
open my $fh, '<', $input_file
  or croak "can't open this file:$!";    ## open the file to read from

while (<$fh>) {
    next if !/^\d+?/;
    #Had to make changes here to get it to work for me d5e5
    chomp;
    my ($name_id, $fam_tag) = (split /\s+/)[1,2];
    my ($fam_name) = split /,/, $fam_tag;
    $has_val{$name_id} = $fam_name;
    #End of d5e5's changes
}
close $fh or croak "can't close this file:$!";    ## close file
$name = ucfirst( lc($name) );    ## this will yield first letter capital always

foreach my $fnd ( keys %has_val ) {
    if ( $fnd eq $name ) {
        my $surname = $has_val{$fnd};
        while ( my ( $key, $value ) = each %has_val ) {
            print $key, '  ', $value, $/ if $surname eq $value;
        }
    }
}
d5e5 109 Master Poster

P.s: An error occurs at line 19th solved by an arobase: push @{$h_fruit{$fruitname}}, @fields;

For some reason perl didn't give me an error or warning about line 19 and it seemed to work OK, but you are correct: a good way to dereference the array reference $h_fruit{$fruitname}, as I needed to do in order to push values into it, is to enclose it in curly brackets and prefix it with an @, as you did.

d5e5 109 Master Poster

Your idea of a hash with keys of fruit names and values as array references should be sufficient. In your last script you convert the hash to a hash reference. You don't need a hash reference in this case and it makes the code look more complex when you want to dereference your array. The following should work:

#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw(sum);#You probably have this module.

my $file=$ARGV[0];
my %h_fruit;#Keys will be fruits. Values will be references to arrays of values.

#Always test that file open worked, and die if it failed
#Best practice: open lexical filehandle such as $fh instead of bareword such as fruit_file
open (my $fh, '<', $file) or die "Failed to open $file: $!";

while (my $line = <$fh>){
    chomp $line;
    #You can use your regex but split on spaces looks simpler to me.
    my @fields = split(/\s+/, $line);#split on spaces
    my $fruitname = shift(@fields); #Remove the first field and assign it to $fruitname
    $h_fruit{$fruitname} = [] unless exists($h_fruit{$fruitname});
    push $h_fruit{$fruitname}, @fields;
}

foreach my $fruit (keys %h_fruit){
    my @values = @{$h_fruit{$fruit}};#Dereference the array reference
    my $nbval = @values;#Number of fields in the array
    my $sum = sum(@values);
    my $moy = $sum / $nbval;
    print $fruit, ' Average: ', $moy, "\n";
}
d5e5 109 Master Poster
d5e5 109 Master Poster

The set function gives a set of the unique items in a list. Iterate through that and print the count method of the list for each item.

#!/usr/bin/env python
def checkForWidthCount(mylist):
    for x in set(mylist):#Iterate through set of unique list items
        print x, 'count = ', mylist.count(x)

ExampleOfList = [1,2,3,3,3,5,5,5,6]
checkForWidthCount(ExampleOfList)
d5e5 109 Master Poster

If you replace those lines with the following it works without problems.
my $record_separator = "\n\n";

When I make that change the script prints headings only, because I have Linux. I hope abulut is still reading this, because I think you (replic) have solved the problem of running it under Windows.

d5e5 109 Master Poster

I suspect the statements that need to change to make it run under Windows are one of the following two:
use constant {WINDOWS_LINE_END_CHAR => chr(13) . chr(10)}; my $record_separator = WINDOWS_LINE_END_CHAR x 2;

The input file records contain newline characters and Perl assumes newline as the default record separator. When I examine the input file (LOG.txt) it appears to me that each true record ends in a double set of the Windows newline characters. Unlike Linux, Windows ends its lines with a combination of Carriage Return and Line Feed characters. I think that newline \n means just Line Feed in Linux and Carriage Return + Line Feed in Windows. That's why I specify the line separator for input (not output) explicitly. But for some reason running the script in Windows hides or modifies the newline characters. I don't know why and I would have to experiment with various line separators and test under Windows, which I don't have.

It looks like a long complex script to debug, but I suspect that getting a simple script reading multi-line records to work under Windows would reveal the secret to getting this one to work.

d5e5 109 Master Poster

@abulut, I received a private message from you saying you get no output when you run my script, but I can't understand why not. When I run the above script on my computer I see 50 lines in the output file named KESIK_SEKTOR.txt which I have attached to this message. I don't know why this would not work on your computer. I believe your operating system is Windows and mine is Linux, but I don't know why this would make a difference. I don't have Windows on my computer so I don't think there is anything else I can do to make it work under Windows.

d5e5 109 Master Poster

If you also want to capture (or extract) the portion of the string that matches your pattern, then have a look at Extracting Matches.

d5e5 109 Master Poster

Sorry, the above example of using qr to store a pattern is not quite correct. You could store your pattern as follows:
my $word = qr(DOT\d+);

For example:

#!/usr/bin/perl
use strict; 
use warnings;

my $string = 'Number = DOT123';
my $word = qr(DOT\d+);

if ($string =~ m/^Number\s=\s$word/){
    print "$string matches the $word pattern\n"
}
else{
    print "$string does not match the $word pattern\n"
}
d5e5 109 Master Poster

my $word=~/DOT\d+/; attempts to match whatever is in $word and there is nothing in $word at this point in the script.

You can store a regex pattern in a variable using the Regexp-like quote function as follows: my $word = qr(/DOT\d+/);

d5e5 109 Master Poster

I had to guess what characters to look for to know when the end of an input record occurs because each input record has several lines. The following script prints 50 BCCH MISSING lines on my computer.

#!/usr/bin/perl
use strict; 
use warnings;

use constant {
    ASC     => 1,
    DESC    => -1
};

my $date_pattern = '\d\d\d\d-\d\d-\d\d';
my $time_pattern = '\d\d:\d\d:\d\d\.\d\d';
my $filename = 'LOG.txt';

my $filenane="KESIK_SEKTOR.txt";
open my $fh, '<', $filename or die "Failed to open $filename: $!";

use constant {WINDOWS_LINE_END_CHAR => chr(13) . chr(10)};
my $record_separator = WINDOWS_LINE_END_CHAR x 2;
my $ctr = 0;
my %lines;#Save report lines to sort later
while (my $aref = get_next_rec($record_separator)){
    my ($bsc,$bcf,$date,$time,$saha,$alarm) = @$aref;
    $lines{$date . $time. $ctr++} = $aref if $alarm eq 'BCCH MISSING';
}

open my $fp2, '>', $filenane or die "Cannot open $filenane for output\n :$!";
print $fp2 " BSC          BCF          DATE           TIME         SAHA        ALARM \n";
foreach (sort {($a cmp $b) * ASC} keys %lines){
    my @flds = @{$lines{$_}};
    my $line = sprintf '%-10s%-12s%-15s%-16s%-14s%-12s', @flds;
    print $fp2 $line, "\n";
}

sub get_next_rec{
    local $/ = shift;
    while (my $r = <$fh>){
        my $dtrec = $r =~ m/^\s+([\w]+)\s+(\w+-\d+).+($date_pattern)\s+($time_pattern)/;
        if ($dtrec){
            my ($bsc,$bcf,$date,$time) = ($1,$2,$3,$4);
            
            my $saha;
            if ($r =~ m/([A-Z]{2}\d{5})/m){
                $saha = $1;
            }
            else{
                $saha = 'SahaNotFound'
            }
            
            my $alarm;
            if ($r =~ m/ALARM.+\d\d\d\d\s((\w+\s?)+\w+)/ms){
                $alarm = $1;
            }
            else{
                $alarm = 'ALARMNotFound'
            }
            return [($bsc,$bcf,$date,$time,$saha,$alarm)];
        }
        else{
            next;
        }
    }
}
d5e5 109 Master Poster

Sorry, I was able to answer your original question about how to sort lines by date and time but I'm not able to filter your input data in the right way to give exactly the output you want. You may need to hire an expert to do that.

d5e5 109 Master Poster

You can read the second file into a hash if the values in the first column are unique. By 'unique' I mean that '153814_at' occurs no more than once in the second file, so you can use it as a key in your hash whose value is 0.09276438. After building your hash by reading through your second file you can open your first file. For each record in your first file you can split the record into its two columns and see if an entry corresponding to the value in the first column occurs in the hash. If it does you can take the value associated with that key from the hash and you have the three data items that you want to print.

For example:
file1.txt

beans   green
lime    green
banana  yellow
carrot  orange

file2.txt

apple   fruit
carrot  vegetable
beans   vegetable
banana  fruit
spinach vegetable

script.pl

#!/usr/bin/perl
use strict; 
use warnings; 

my $filename1 = 'file1.txt';
my $filename2 = 'file2.txt';
my %categories;#Hash to store key=>value pairs from file2.txt

open my $fh, '<', $filename2 or die "Failed to open $filename2: $!";
while (<$fh>){
    chomp;
    my ($name, $cat) = split;
    $categories{$name} = $cat;
}
close $fh;

open $fh, '<', $filename1 or die "Failed to open $filename1: $!";
while (<$fh>){
    chomp;
    my ($name, $colour) = split;
    my $cat;
    
    if (exists $categories{$name}){
        $cat = $categories{$name}
    }
    else{
        $cat = "***$name Not Found in $filename2***";
    }
    print "$name\t$colour\t$cat\n";
}

close $fh;

Output:

beans	green	vegetable
lime	green …
d5e5 109 Master Poster
#!/usr/bin/perl
use strict; 
use warnings; 
use constant {
    ASC     => 1,
    DESC    => -1
};

my $date_pattern = '\d\d\d\d-\d\d-\d\d';
my $time_pattern = '\d\d:\d\d:\d\d\.\d\d';
my $filename = 'LOG.txt';
my %lines;#Save report lines to sort later

open my $fh, '<', $filename or die "Failed to open $filename: $!";

while (my $line = <$fh>){
    $line =~ s/\R//g; #Remove Windows end-of-line characters because I have Linux
    if ($line =~ m/^\s+(\w+\d\d)\s+(\w+-\d\d\d\d)\s+\w+\s+($date_pattern)\s+($time_pattern)/){
        my ($bsc,$bcf,$date,$time) = ($1,$2,$3,$4);
        my $saha = get_saha();
        next if $saha eq '***SAHA NOT FOUND***';
        my $alarm = get_alarm();
        next if $alarm eq '***Alarm NOT FOUND***';
        #print "$bsc\t$bcf\t$date\t$time\t$saha\t$alarm\n";
        my $ctr = sprintf '%04d', $.;
        $lines{$date . $time . $ctr}{'bsc'} = $bsc;
        $lines{$date . $time . $ctr}{'bcf'} = $bcf;
        $lines{$date . $time . $ctr}{'date'} = $date;
        $lines{$date . $time . $ctr}{'time'} = $time;
        $lines{$date . $time . $ctr}{'saha'} = $saha;
        $lines{$date . $time . $ctr}{'alarm'} = $alarm;
    }
}

print " BSC          BCF          DATE           TIME         SAHA        ALARM \n";

#To change sort from ascending to descending sequence replace ASC with DESC
#in the following line.
foreach (sort {($a cmp $b) * ASC} keys %lines){
    my @flds = ($lines{$_}{'bsc'}, $lines{$_}{'bcf'},
                $lines{$_}{'date'}, $lines{$_}{'time'},
                $lines{$_}{'saha'}, $lines{$_}{'alarm'});
    my $line = sprintf '%-10s%-12s%-15s%-16s%-14s%-12s', @flds;
    
    #print $line, "\n";#Change this statement to the following
    print $line, "\n" if $lines{$_}{'alarm'} =~ 'BCCH MISSING';
}

sub get_saha{
    while (my $line = <$fh>){
        $line =~ s/\R//g; #Remove Windows end-of-line characters because I have Linux
        #Changed the following to include alarms preceded by one or more asterixes *
        if ($line =~ m/^\*+\s+ALARM\s+(\w+)/){
            return $1;
        }
        if …
d5e5 109 Master Poster

I wouldn't call either of them wrong. For multiple inserts in a loop, I prefer the prepare + execute with place holders rather than the do, for a couple of reasons. In theory it makes more sense to prepare the insert statement once outside your loop and then execute it inside the loop for each set of values. You say it makes no difference in speed for MySQL, so there is no need to change scripts that already run satisfactorily ("If it ain't broke, don't fix it"). However if one day you migrate to a different database system and want to run the same scripts maybe it will make a difference, since some databases may optimize execution of previously prepared sql better than MySQL does.

Another reason. Look what happens in the following example of using $dbh->do in a loop. It works for some names but not the last one because it contains an apostrophe, whereas preparing the sql beforehand and passing the values to place holders avoids this problem.

#!/usr/bin/perl
use strict; 
use warnings; 
use DBI;

my $dbh=DBI->connect('dbi:mysql:daniweb','david','dogfood') ||
   die "Error opening database: $DBI::errstr\n";

$dbh->do("CREATE TABLE IF NOT EXISTS people (fname VARCHAR(20),
                                lname VARCHAR(20))");

while (<DATA>){
    chomp;
    my ($fname, $lname) = split;
    my $sql = "insert into `people` (fname, lname) values ('$fname', '$lname')";
    print $sql, "\n";
    $dbh->do($sql) or die "do failed: " . $dbh->errstr(); 
}

# Close connection
undef($dbh);

__DATA__
John Doe
Jane Smith
Martin O'Hara

Outputs

DBD::mysql::db do failed: You have an error in your …
replic commented: Always answers with a well written response and adds some code to point out certain problems etc. +1
d5e5 109 Master Poster
#!/usr/bin/perl
use strict; 
use warnings; 

my $date_pattern = '\d\d\d\d-\d\d-\d\d';
my $time_pattern = '\d\d:\d\d:\d\d\.\d\d';
my $filename = 'LOG.txt';
my %lines;#Save report lines to sort later

open my $fh, '<', $filename or die "Failed to open $filename: $!";

while (my $line = <$fh>){
    $line =~ s/\R//g; #Remove Windows end-of-line characters because I have Linux
    if ($line =~ m/^\s+(\w+\d\d)\s+(\w+-\d\d\d\d)\s+\w+\s+($date_pattern)\s+($time_pattern)/){
        my ($bsc,$bcf,$date,$time) = ($1,$2,$3,$4);
        my $saha = get_saha();
        next if $saha eq '***SAHA NOT FOUND***';
        my $alarm = get_alarm();
        next if $alarm eq '***Alarm NOT FOUND***';
        #print "$bsc\t$bcf\t$date\t$time\t$saha\t$alarm\n";
        my $ctr = sprintf '%04d', $.;
        $lines{$date . $time . $ctr}{'bsc'} = $bsc;
        $lines{$date . $time . $ctr}{'bcf'} = $bcf;
        $lines{$date . $time . $ctr}{'date'} = $date;
        $lines{$date . $time . $ctr}{'time'} = $time;
        $lines{$date . $time . $ctr}{'saha'} = $saha;
        $lines{$date . $time . $ctr}{'alarm'} = $alarm;
    }
}

print " BSC          BCF          DATE           TIME         SAHA        ALARM \n";
foreach (sort {$b cmp $a} keys %lines){
    my @flds = ($lines{$_}{'bsc'}, $lines{$_}{'bcf'},
                $lines{$_}{'date'}, $lines{$_}{'time'},
                $lines{$_}{'saha'}, $lines{$_}{'alarm'});
    my $line = sprintf '%-10s%-12s%-15s%-16s%-14s%-12s', @flds;
    
    print $line, "\n";
}

sub get_saha{
    while (my $line = <$fh>){
        $line =~ s/\R//g; #Remove Windows end-of-line characters because I have Linux
        if ($line =~ m/^\*\*\*\s+ALARM\s+(\w+)/){
            return $1;
        }
        if ($line =~ m/^\s+(\w+\d\d)\s+(\w+-\d\d\d\d)\s+\w+\s+($date_pattern)\s+($time_pattern)/){
            return '***SAHA NOT FOUND***';
        }
    }
}

sub get_alarm{
    while (my $line = <$fh>){
        $line =~ s/\R//g; #Remove Windows end-of-line characters because I have Linux
        if ($line =~ m/^\s+\(\d+\)\s+\d+\s+([\w\s]+)$/){
            return $1;
        }
        if ($line =~ m/^\s+(\w+\d\d)\s+(\w+-\d\d\d\d)\s+\w+\s+($date_pattern)\s+($time_pattern)/){
            return '***Alarm NOT FOUND***';
        }
    }
}

Outputs:

BSC          BCF          DATE           TIME         SAHA        ALARM …
d5e5 109 Master Poster

I've tried that you did not run script. Can you check it again. as the following must be cut according to SEKTORLER.txt log.txt file.

My output txt
 BSC          BCF          DATE           TIME         SAHA        ALARM 
BCORK01      BCF-0021    2012-02-29     19:11:41.65   CO11801  BCH MISSING    
BCORK01      BCF-0021    2012-02-29     19:11:42.65   CO11802  BCH MISSSING  
BCORK01      BCF-0021    2012-02-29     19:11:43.65   CO11802  BCH MISSSING

Sorry, I don't understand what you mean by 'I've tried that you did not run script.' Also, you said previously 'the top of the written must be the newest date' but the sample output you show above has the most recent date + time at the bottom.

d5e5 109 Master Poster

Because I hope to use two value $gia and $name1 in my script. And I tried to use

my $position="$gia$name1";
    read_positions ($position);

but it has error to read file.

The erro: Failed open 6114067_contig34:No such file or directory at c:\users\PCUSER\desktop\thu.pl line 129

I do not know how to solve that error. Could you show me how to solve this error?

That message says that a file named '6114067_contig34' does not exist in your current directory. Either the file named '6114067_contig34' is in a different directory or it has a different name. Should it have an extension (such as dot something). For example if the file is really named '6114067_contig34.txt' then trying to open '6114067_contig34' will result in the error message you are getting.

Do you know what the current directory is when perl runs your scripts? What is the output when you run the following?

#!/usr/bin/perl
use strict; 
use warnings; 

use Cwd qw();
my $path = Cwd::cwd();
print "$path\n";

my @files = <6114067_contig34*>;

foreach my $file (@files) {
    print $file . "\n";
}

print "No files exist starting with '6114067_contig34' in $path\n" if @files == 0;
d5e5 109 Master Poster

Rather than change your long complex script, you can run a second script that takes the resulting file as input, then modifies the spacing and sorts the lines.

#!/usr/bin/perl
use strict; 
use warnings; 

my $filename = 'KESIK_SEKTORLER.txt';
open my $fh, '<', $filename or die "Failed to open $filename: $!";

my @file = <$fh>; #Read file into array so you can sort
map s/\R//g, @file; #Remove Windows end-of-line characters because I have Linux

@file = sort sortsub @file;

foreach(@file){
    s/\s+/ /g;#Replace many spaces with one space
    s/^\s//;#Remove space at beginning of line
    print $_, "\n";
}

sub sortsub{
    my $date_pattern = '\d\d\d\d-\d\d-\d\d';
    my $time_pattern = '\d\d:\d\d:\d\d\.\d\d';
    my ($a_dt) = $a =~ m/($date_pattern\s+$time_pattern)/;
    my ($b_dt) = $b =~ m/($date_pattern\s+$time_pattern)/;
    #print "adt is $a_dt\tbdt is $b_dt\n";
    $a_dt = '9' x 23 unless defined $a_dt;
    $b_dt = '9' x 23 unless defined $b_dt;
    $b_dt cmp $a_dt;
}
d5e5 109 Master Poster

You want to put the content of $gia and $name together to have the name of an existing file that you can pass to a subroutine which will open and read that file, right? I don't know why you would ask how to separate that file name into columns, but the following is an example of passing a variable containing one scalar value that should be a valid file name.

#!/usr/bin/perl
use strict; 
use warnings; 

my $gia = 12;
my $name = 'tai';

my $oneval = $gia.$name; #Name of file should be '12tai', right?

print_first_argument($oneval);#Call subroutine and pass one scalar value as argument

sub print_first_argument{
    my ($filename) = @_;#Assign argument to lexical variable
    print "The first argument is '$filename'\n";
    #If $filename contains the name of a file that exists 
    #then you should be able to open and read it.
}

If my ($filename) = @_;#Assign argument to lexical variable looks confusing you can substitute with my $filename = shift(@_);#Assign argument to lexical variable which does exactly the same thing.

d5e5 109 Master Poster
#!/usr/bin/perl
use strict; 
use warnings; 

my $gia = '12tai';

#Capture sequence of digits into one variable,
#capture sequence of non-digits into another variable
my ($number, $name) = $gia =~ m/(\d+)(\D+)/;

print "$number $name\n";#Two values separated by one space