d5e5 109 Master Poster

The following may not exactly correspond to what you mean by 'numeric' but it will serve as an example of what the Scalar::Util module can do.

#!/usr/bin/perl
use warnings;
use strict;

use Scalar::Util qw(looks_like_number);
while (my $teststring = <DATA>){
    chomp $teststring;#Remove newline character from end of string
    if (looks_like_number($teststring)){
        print $teststring, ' looks like a number to me', "\n";
    }
    else{
        print $teststring, ' does NOT look like a number to me', "\n";
    }
}

__DATA__
hello
5
2.3
-2.3
7t7
d5e5 109 Master Poster

Sorry, I'm stumped. After replacing the " with ' I don't see any difference in the paths. I don't use Windows so can't test unlinking in Windows.

d5e5 109 Master Poster
#!/usr/bin/perl
use warnings;
use strict;

my $path = <DATA>;
chomp($path);
print 'Original path: ', $path, "\n";

$path =~ s/"/'/g;#Replace all " with '
print 'Changed path: ', $path, "\n";

__DATA__
"C:\Users\ABC\XYZ\1.txt"
Aditya_MUHAHA commented: Works +0
d5e5 109 Master Poster

my $save_header; #define g̶l̶o̶b̶a̶l̶ lexical variable whose scope includes entire script

my @flds = split;#r̶e̶a̶d̶ ̶f̶i̶l̶e̶ ̶c̶o̶n̶t̶e̶n̶t̶s̶ ̶i̶n̶t̶o̶ ̶a̶n̶ ̶a̶r̶r̶a̶y̶ Equivalent to my(@flds) = split(' ', $_, 0);

print "\n" if $save_header and $_ ne $save_header; #print newline i̶n̶ ̶b̶o̶t̶h̶ ̶c̶a̶s̶e̶s̶ before printing a new header (blank line between groups)

$save_header = $_;#assign the value of the current header ($_) to a variable so we can compare in the future to know if we are about to start another header group (i.e. whether we want a blank line separating previous group from current group.

sub findit {#sub routine -opens file and returns m̶a̶t̶c̶h̶e̶d̶ ̶e̶l̶e̶m̶e̶n̶t̶s̶ true (non-zero value) if parameter found in file, false (0) if not found in file.

return(1) if index($_,$word) > -1;# c̶h̶e̶c̶k̶ ̶i̶f̶ ̶s̶e̶a̶r̶c̶h̶ ̶w̶o̶r̶d̶ ̶f̶r̶o̶m̶ ̶i̶n̶p̶u̶t̶ ̶f̶i̶l̶e̶2̶.̶t̶x̶t̶ ̶m̶a̶t̶c̶h̶e̶s̶ ̶a̶ ̶p̶a̶r̶t̶i̶c̶u̶l̶a̶r̶ ̶i̶n̶d̶e̶x̶ ̶e̶l̶e̶m̶e̶n̶t̶ ̶i̶n̶ ̶i̶n̶p̶u̶t̶ ̶f̶i̶l̶e̶1̶ If the word is NOT found in the current record the index will give a value of -1. If the word is found, the index will give a number indicating where in the current record the word was found. I can't explain it better than http://perldoc.perl.org/functions/index.html

d5e5 109 Master Poster

You could also try the following:

#!/usr/bin/perl
use strict; 
use warnings; 

open my $fh, '<', 'input1.csv' or die "Failed to open input1.csv $!";
my $save_header;
while (<$fh>){
    my @flds = split;
    if (@flds == 1){
        s/\s//g;
        print "\n" if $save_header and $_ ne $save_header;
        print "$_:\n";
        $save_header = $_;
    }
    else{
        print "$flds[0]\n" if findit($flds[0]);
    }
}

sub findit {
    my $word = shift;
    open my $fh, '<', 'input2.csv' or die "Failed to open input2.csv $!";   
    while (<$fh>) {
        return(1) if index($_,$word) > -1;
    }
}
d5e5 109 Master Poster

Look closely at lines 16 - 17.

                my $Header;
                $hash{$Header} = $_;

You declare a lexical variable called $Header and assign no value to it, so its value is undefined. Then you use $Header (whose value is undefined) as a key to $hash. In effect you are assigning all the header values to one key (key = undef) in your hash. Each value replaces the previous value because you always use the same undefined key.

d5e5 109 Master Poster

You may also want to look at the Bio::DB::Fasta module. I haven't tried it but according to the docs it has the following method to extract specific substrings from fasta files:

$raw_seq = $db->seq($id [,$start, $stop])
Return the raw sequence (a string) given an ID and optionally a start and stop position in the sequence.

d5e5 109 Master Poster

I would create a text file in comma separated values (or tab-separated if you prefer) format extension because I find it easier to write text files. Then you can start up Excel and import the movies_or_whatever.txt file into a sheet using Excel and save it as an Excel file.

d5e5 109 Master Poster

What data are you trying to read? Please post an example of your input data and some example code showing how you try to read it. I don't know much about XML and nothing about soap but maybe somebody here does and can help you if you provide relevant information.

d5e5 109 Master Poster

OpenOffice::OODoc looks like the module you need to install and learn how to use. I have no expertise with it so if you don't want to learn how to do it yourself I think you may need to hire a programmer with the relevant experience to write a script that does exactly what you want. Just to illustrate what the OpenOffice::OODoc module can do, here is a script that creates a document file with three pages of your sample data:

#!/usr/bin/perl
use strict; 
use warnings; 

use OpenOffice::OODoc;

my $oofile = odfDocument
                        (
                        file            => 'oowriter_test.odt',
                        create          => 'text'
                        );

my $styles = odfDocument
                (
                container => $oofile,
                part => 'styles'
                );
# process any content and style element

$styles->createStyle
                (
                "BlueStyle",
                parent => 'Text body',
                family => 'paragraph',
                properties =>
                        {
                        area            => 'text',
                        'fo:color'      => rgb2oo('blue')
                        }
                );

while (my $rec = <DATA>){

    my $pg = $oofile->appendParagraph
                (
                text => $rec,
                style => "BlueStyle"
                );
    $oofile->setPageBreak($pg,style => 'autopb', position=>'after');
}

$oofile->save;

__DATA__
HariKrishna  1200
Srikanth     1201
Madhav       2345
d5e5 109 Master Poster

Scripting makes it easier to do something that we repeatedly do, so what scripting language would serve you best depends on your answer to the question what do you do repeatedly that has several steps, that you would like to do in fewer steps, automatically? As Aristotle said, "We are what we repeatedly do."

I would add, "we want to write scrips that do what we repeatedly do."

d5e5 109 Master Poster

Any simple ways to get hyperlinks using perl without the HTML table?

Yes. What's wrong with the example shown in the docs?

use LWP::Simple;
$content = get("http://www.sn.no/");

I assume that you want to retrieve the html content from the site refered to in a given hyperlink, right? If the components required to build the link are in a text file you can read the text file and put the components together to make a valid URL, which you can pass to the get method and assign the returned text to a variable.

d5e5 109 Master Poster

The following script creates a table (if it doesn't already exist) and saves all the image names (extension included) and the directory names.

#!/usr/bin/perl
use strict; 
use warnings; 
use DBI;
use File::Glob ':glob';
use File::Basename;

my $dbh=DBI->connect('dbi:mysql:daniweb','david','dogfood') ||
   die "Error opening database: $DBI::errstr\n";

$dbh->do("CREATE TABLE IF NOT EXISTS images (id INT NOT NULL AUTO_INCREMENT,
                                i_name VARCHAR(64),
                                i_dir VARCHAR(100),
                                PRIMARY KEY (id))");

my $sth = $dbh->prepare("insert into `images` (i_name, i_dir) values (?, ?)")
             or die "prepare failed: " . $dbh->errstr();  


my @files = </home/david/Pictures/*>;
foreach (@files) {
    my($filename, $dir) = fileparse($_);
    $sth->execute($filename, $dir) or die "execute failed: " . $sth->errstr(); 
}

# Close connection
$sth->finish();
undef($dbh);
2teez commented: Nice one +3
d5e5 109 Master Poster

I should add that experienced programmers do not require their users to memorize the real names of all their subroutines to use as command-line arguments at run-time. Make the command-line arguments short, easy to remember and spell. Use a hash or an if-elsif-else statement to associate the command-line arguments with the name of the appropriate function or subroutine. If the argument value does not correspond to any of the expected values, your script should display a warning or error message.

d5e5 109 Master Poster

You could write your subroutines as values in a hash in your script and then run it from the commandline with the name of your desired function as the first argument on the command line. For example, a script named many_functions.pl

#!/usr/bin/perl
use warnings;
use strict;

my %funcs = (thisismyfunction1 => sub {print 'hello world number 1',"\n";},
             thisismyfunction2 => sub {print 'hello world number 2',"\n";},
            thisismyfunction3 => sub {print 'hello world number 3',"\n";},);

my $what_to_do = $ARGV[0];

$funcs{$what_to_do}();

Test it from the command line as follows:

david@david-laptop:~/Programming/Perl$ perl many_functions.pl thisismyfunction1
hello world number 1
david@david-laptop:~/Programming/Perl$ perl many_functions.pl thisismyfunction2
hello world number 2
david@david-laptop:~/Programming/Perl$ perl many_functions.pl thisismyfunction3
hello world number 3

You may prefer to name your subroutines, especially if they may contain lots of code, and store references to the subroutines as values in a hash. The following would give the same results as the above script:

#!/usr/bin/perl
use warnings;
use strict;

my %funcs = (thisismyfunction1 => \&world_1,
             thisismyfunction2 => \&world_2,
            thisismyfunction3 => \&world_3,);

my $what_to_do = $ARGV[0];

$funcs{$what_to_do}();

sub world_1 {print 'hello world number 1',"\n";}
sub world_2 {print 'hello world number 2',"\n";}
sub world_3 {print 'hello world number 3',"\n";}
d5e5 109 Master Poster

Immediately after the statement that attempts to find a value for $interface, you can put a statement that skips to the next record if no interface value was found in the current record... next unless defined $interface;

The following works for me:

#!/usr/bin/perl
use warnings;
use strict;

my $filename = 'cisco.txt';
open my $fh, '<', $filename or die "Failed to open $filename: $!";

while (my $rec = get_rec($fh,'!')){
    next if $rec eq '!'; #Skip top line
    my $interface;
    my $description;
    my $acl1;
    my $acl2;
    my $shutdown;
    ($interface) = $rec =~ m/(?:interface Serial|interface GigabitEthernet)(.+)/;
    next unless defined $interface;
    ($description) = $rec =~ m/(?:description )(.+)/;
    ($acl1,$acl2) = $rec =~ m/(?:ip access-group )(.+)?\n(?:ip access-group )(ACL.+)/;
    ($acl1) = $rec =~ m/(?:ip access-group )(.+)/ unless defined $acl1;
    $acl1 ||= 'NO ACL';
    $acl2 ||= 'NO ACL';
    if ($rec =~ m/shutdown/){
        $shutdown = 'YES';
    }
    else{
        $shutdown = 'NO';
    }
    print $interface,", ",$acl1,", ",$acl2,", ",$shutdown,", ",$description,", " . "\n";
}

sub get_rec{
    my ($fh,$record_separator) = @_;
    local $/ = $record_separator;
    <$fh>;
}
d5e5 109 Master Poster

If you run perl and your script from the command-line interface in a DOS window or a Terminal in linux and print to STDIN then maybe the DOS or Terminal can't print those characters. I think that depends on the platform. However if you print something encoded as utf-8 to a file and read it with an editor or viewer that can read utf-8 encoded files you should see the characters you expect.

Can you post a simple script that demonstrates the problem so we can reproduce the problematic output?

d5e5 109 Master Poster

In Perl I would change the value of the special variable representing the input record separator $/ to '!' to read entire records instead of one line at a time. Then you need only one loop within which you can use regex to capture values into variables.

#!/usr/bin/perl
use warnings;
use strict;

my $filename = 'cisco.txt';
open my $fh, '<', $filename or die "Failed to open $filename: $!";

while (my $rec = get_rec($fh,'!')){
    next if $rec eq '!'; #Skip top line
    my $interface;
    my $description;
    my $acl1;
    my $acl2;
    my $shutdown;
    ($interface) = $rec =~ m/(?:interface Serial|interface GigabitEthernet)(.+)/;
    ($description) = $rec =~ m/(?:description )(.+)/;
    ($acl1,$acl2) = $rec =~ m/(?:ip access-group )(.+)?\n(?:ip access-group )(ACL.+)/;
    ($acl1) = $rec =~ m/(?:ip access-group )(.+)/ unless defined $acl1;
    $acl1 ||= 'NO ACL';
    $acl2 ||= 'NO ACL';
    if ($rec =~ m/shutdown/){
        $shutdown = 'YES';
    }
    else{
        $shutdown = 'NO';
    }
    print $interface,", ",$acl1,", ",$acl2,", ",$shutdown,", ",$description,", " . "\n";
}

sub get_rec{
    my ($fh,$record_separator) = @_;
    local $/ = $record_separator;
    <$fh>;
}
d5e5 109 Master Poster

Do you have a solution now to the Perl aspect of your question? If so please mark this topic solved.

d5e5 109 Master Poster

Hi 2teez,

doesn't have to associate to any persons name

I don't know the reason for Serpterion's requirement to append application ids onto records for persons that probably have no relation to those ids. I would consider the column of ids as logically separate from the name and city records. Perhaps putting them there saves the trouble of putting them in a separate spreadsheet. I wouldn't recommend arranging the data that way if there's any chance of its becoming a long-term application that you would eventually have to explain to someone else. A data base analyst, for example, wouldn't like it.

d5e5 109 Master Poster

The following modified version of my script outputs the ids so they start at the top of your id column.

#!/usr/bin/perl
use warnings;
use strict;
use Text::CSV;

my $csv = Text::CSV->new ( { binary => 1 } )  # should set binary attribute.
                 or die "Cannot use CSV: ".Text::CSV->error_diag ();

#slurp file into one string because it contains multi-line records
my $string = do {
    local $/;
    my $filename = 'people.txt';
    open my $fh, '<', $filename or die "Failed to open $filename: $!";
    <$fh>
};
$string =~ s/\n/!sep!/g;#Replace newlines with !sep! marker
#Break the string into records beginning with "\n"
$string =~ s/(Name|ApplicationID)/\n$1/g;

#Process each 'record'
my @lines;
my @ids;
foreach (split "\n", $string){
    next if m/^$/;#Remove empty 'record'
    my ($id) = m/^ApplicationID:.*?(\d+)(?:!sep!)?$/g;
    my $status;
    my @columns;
    if ($id){
        push @ids,$id;
        next;
    }
    my ($name,$loc) = m/^Name[:|\s](.+?)(?:!sep!).*?Location:(.+?)(?:!sep!)?$/g;
    if ($loc){
        push @columns,$name,$loc;
        push @lines, [@columns];
    }
}
while (@lines || @ids){
    my ($name, $loc,$line, $id);
    if (@lines){
        $line = shift @lines;
        ($name,$loc) = @$line;
    }
    if (@ids){
        $id = shift @ids;
    }
    my @columns;
    push @columns, $name,$loc,$id;
    my $status = $csv->combine(@columns);    # combine columns into a string
    print $csv->string(), "\n";          # print the combined string
}

Outputs:
"Bob Gray","Burmingham,AL",10012
"Ron Alberta","Omaha,NE",10013
,,10013

d5e5 109 Master Poster

Multi-line records with no record separator make it tricky. As for writing CSV output it's worth installing the Text::CSV module if you don't already have it.

#!/usr/bin/perl
use warnings;
use strict;
use Text::CSV;

my $csv = Text::CSV->new ( { binary => 1 } )  # should set binary attribute.
                 or die "Cannot use CSV: ".Text::CSV->error_diag ();

#slurp file into one string because it contains multi-line records
my $string = do {
    local $/;
    my $filename = 'people.txt';
    open my $fh, '<', $filename or die "Failed to open $filename: $!";
    <$fh>
};
$string =~ s/\n/!sep!/g;#Replace newlines with !sep! marker
#Break the string into records beginning with "\n"
$string =~ s/(Name|ApplicationID)/\n$1/g;

#Process each 'record'
foreach (split "\n", $string){
    next if m/^$/;#Remove empty 'record'
    my ($id) = m/^ApplicationID:.*?(\d+)(?:!sep!)?$/g;
    my $status;
    my @columns;
    if ($id){
        push @columns,undef,undef,$id;#Make two empty columns followed by id
        $status = $csv->combine(@columns);    # combine columns into a string
        print $csv->string(), "\n";          # print the combined string
        next;
    }
    my ($name,$loc) = m/^Name[:|\s](.+?)(?:!sep!).*?Location:(.+?)(?:!sep!)?$/g;
    if ($loc){
    push @columns,$name,$loc;
    $status = $csv->combine(@columns);    # combine columns into a string
    print $csv->string(), "\n";          # print the combined string
    }

}

For me, running the above script and reading your sample input prints the following:

"Bob Gray","Burmingham,AL"
"Ron Alberta","Omaha,NE"
,,10012
,,10013
,,10013

d5e5 109 Master Poster

Was "$/" in lines the scripts (32 and 33 - d5e5) and lines (28 and 30 - 2teez) used as a dereference and why?

No. I copied the lines where "$/" occurs from 2teez' scripts. By default the Perl special variable has a value of "\n" and, since our scripts don't modify it, we could have put "\n" instead and achieved the same result.

$/
The input record separator, newline by default. This influences Perl's idea of what a "line" is.
from perldoc perlvar

2teez commented: Nice one!!! +2
d5e5 109 Master Poster

open my $fh, '&lt;', $file1 or die "can't open $file1:$!";

Why do you have statements like the above where I see '&lt;' instead of '<'? Besides fixing the above, mostly all you need to do to get what you want is to switch the filenames, as follows:

#!/usr/bin/perl
use warnings;
use strict;
my $time = scalar localtime();    # get date of search

#Heading should say 'File 2', not 'File 1'
my $heading =
  qq(Date of search: $time\nThe Following Matches were Found in File 2:\n);
my %hash;

#Save the file you want to search ($file2, not $file1, into %hash)
my $file2 = 'File2.txt';    # file 2: file with two columns to be searched with contents of file1.
open my $fh, '<', $file2 or die "can't open $file2:$!";
while (<$fh>) {
    chomp;
    my ($probe, $value) = split /\s+/, $_;
    $hash{$probe} = $value;
}
close $fh;

my $file1 =
  'File1.txt'; # file1: one column of probe names for searching the second file.
open my $fh_new, '>', 'outFile.txt' or die "can't open file:$!";   # output file
open $fh, '<', $file1 or die "can't open this file:$!"; # ($file1, not $file2)
print $fh_new $heading;
while ( defined( my $line = <$fh> ) ) {
    chomp $line;
    my $checked_word;
    if ( $line ) {
        $checked_word = $line;
        exists $hash{$checked_word}
          ? print $fh_new $checked_word, ' ', $hash{$checked_word}, $/
          : print $fh_new $checked_word, " No Match Found", $/;
    }
}
close $fh     or die "can't close file:$!";
close $fh_new or die "can't close file:$!";
d5e5 109 Master Poster

undef $hash{$_};
assigns undef as a value to the each key of the hash variable. That can as well be written like so:

$hash{$_} = undef;

Yes. We need the hash to store keys but we don't need to associate any values with these keys. The undef means something like null or nothing. If we put $hash{$_} = 1; or $hash{$_} = qq(n'importe quoi); it would have worked just as well, except I think we might use more memory to save keys with values than to save keys without values. But the main reason I used undef was it indicates to anyone reading it that we don't need to associate any particular value with the key.

d5e5 109 Master Poster

perly, Please make sure that the HOF at the end of that heading string starts at the beginning of a line in your script and is the only word on that line.

 my $heading = <<"HOF";
     Date of search: $time
     The Following Matches were Found in File 1:
HOF

Make sure there are no spaces or tabs before the final HOF and there should be no other characters on that line.

Or, if you prefer, remove the above lines and replace with the following:
my $heading = qq(Date of search: $time\nThe Following Matches were Found in File 1:\n);

d5e5 109 Master Poster

If one of 2teez's scripts works for you with the large files, good. Otherwise you could try the following modified version. When working with large files you need to consider memory limitations vs. speed. I think 2teez's scripts make good use of memory but take a bit longer than using a hash to store the contents of the first file. On the other hand, using a hash could exceed the limits of your computer memory if the first file is too large, in which case the program will stop and you'll get an error message.

#!/usr/bin/perl
use warnings;
use strict;

my $time = scalar localtime();    # get date of search
my $heading = <<"HOF";
 Date of search: $time
 The Following Matches were Found in File 1:
HOF
my %hash;
my $file1 = 'File1.txt';          # file1
open my $fh, '<', $file1 or die "can't open $file1:$!";
while (<$fh>) {
    chomp;
    undef $hash{$_};
}
close $fh;

my $file2 = 'File2.txt';          # file 2
open my $fh_new, '>', 'output.txt' or die "can't open file:$!";    # output file
open $fh, '<', $file2 or die "can't open this file:$!";
print $fh_new $heading;
while ( defined( my $line = <$fh> ) ) {
    chomp $line;
    my $checked_word;
    if ( $line =~ m/(.+?)\s+?.+?$/ ) {
        $checked_word = $1;
        exists $hash{$checked_word}
          ? print $fh_new $line, $/
          : print $fh_new $checked_word, " No Match Found", $/;
    }
}
close $fh     or die "can't close file:$!";
close $fh_new or die "can't close file:$!";
d5e5 109 Master Poster

I used to have @david_ellis on Twitter and I thought if I made it shorter people would retweet me more often. (This was before Twitter came up with their new retweet action that doesn't include the posters handle in the character count.) I shortened it to @d5e5 because my first name starts with 'd' and has 5 letters.

d5e5 109 Master Poster

I face some real problems here. I need a Perl script that can open a text file, read a series of URLs, get the content of the pages and save it to another file.

Divide your task into steps, such as:

  1. open a text file
  2. read a series of URLs
  3. get the content of the pages for each URL
  4. save contents of pages to another file

Which of these steps gives you problems?

Could you attach an example of the text file you need to read? (Click on 'Files' in the Post menu here to upload a text file.)

d5e5 109 Master Poster

suppose my input string is "a*b+c/\/" and I want a Perl regex expression to match it.

#!/usr/bin/perl
use warnings;
use strict;

my $x = 'String of 8 words including numbers and letters';
my $y = 'String including characters like a*b+c/\/ plus text';
my $z = 'Some numbers 7, 8 9 etc.';

my $substring = 'a*b+c/\/';

foreach my $string ($x, $y, $z){
    if ($string =~ m/\Q$substring\E/){
        print qq('$&' ), "found in '$string'\n";
    }
    else{
        print q($substring ), "NOT found in '$string'\n";
    }
}

See quotemeta

d5e5 109 Master Poster

First, I agree with 2teez that what when we write scripts we present them as examples of ways to do something you ask us how to do. I can't give you perfect solutions that will work without your having to make your own changes to adapt them to your requirements.

Second, my subroutine to sort the dates and times can be improved. If you replace the by_date_time subroutine with the following it should sort the report lines in descending order which I believe is what you want. (I included the statement for sorting in ascending order in a comment.)

sub by_date_time{
    my ($dta, $dtb);
    my @flds_a = split /\s+/, $a;
    if (@flds_a > 4){
        $dta = join ' ', @flds_a[4,5];
    }
    else{
        $dta = '';
    }

    my @flds_b = split /\s+/, $b;
    if (@flds_b > 4){
        $dtb = join ' ', @flds_b[4,5];
    }
    else{
        $dtb = '';
    }

    #$dta cmp $dtb;#Sort in Ascending Order
    $dtb cmp $dta;#Sort in Descending Order
}

Total alarms=156

And third, I see more than 1000 lines in the OUTPUT.txt (see attached file) so I don't understand why the total should be 156.

d5e5 109 Master Poster

Since each feature, or F2, can have more than one lines you can still build a hash with features as keys and for the value make a reference to an array. The following example doesn't print exactly the layout you want because that will take more work but you can try running it.

#!/usr/bin/perl
use warnings;
use strict;

my %HoH;

@ARGV = qw(FILE1.txt FILE2.txt);

#Hash of hash references
while ( my $line = <> ) {
    $line =~ s/^\s+//;

    #Remove space characters and tabs (if any) from start of line
    chomp $line;
    next
      unless $line =~ m/^LINE|^FILE /;  #Skip if doesn't start with FILE or LINE
    my $oldnew;
    if ( $ARGV eq 'FILE1.txt' ) {
        $oldnew = 'old';
    }
    else {
        $oldnew = 'new';
    }
    my @fields = split /\s+/, $line;
    my $key = $fields[1];
    $HoH{$key}{$oldnew} = [] if not exists $HoH{$key}{$oldnew};
    push $HoH{$key}{$oldnew}, [@fields];
}

my %unique_lines;
foreach my $k ( sort keys %HoH ) {
    my $ctr = 0;
    foreach my $aref (@{$HoH{$k}{'old'}}){
        my ($o_ver, $o_dt, $o_num) = @$aref[3..5];
        my ($n_ver, $n_dt, $n_num) = @{$HoH{$k}{'new'}[$ctr]}[3..5];
        my @plist = ($k,$o_ver,$n_ver,$o_num,$n_num,$o_dt,$n_dt);
        foreach (@plist){
            $_ = 'None' unless $_;
        }
        undef $unique_lines{join "\t", @plist};
        $ctr++;
    }
    $ctr = 0;
    foreach my $aref (@{$HoH{$k}{'new'}}){
        my ($o_ver, $o_dt, $o_num) = @{$HoH{$k}{'old'}[$ctr]}[3..5];
        my ($n_ver, $n_dt, $n_num) = @$aref[3..5];
        my @plist = ($k,$o_ver,$n_ver,$o_num,$n_num,$o_dt,$n_dt);
        foreach (@plist){
            $_ = 'None' unless $_;
        }
        undef $unique_lines{join "\t", @plist};
        $ctr++;
    }
}

foreach (sort keys %unique_lines){
    printf "%s\t%s\t%s\t%s\t%s\t%s\t%s\n", split;
}
d5e5 109 Master Poster

Sorry, I don't know the answer. I wrote the script assuming F2 contained a unique key for each record. If F2 can have the same value for more than one record in the same file then it won't serve as a unique key for a hash. To write a script to do what you want, you would need to choose some other data structure that corresponds to assumptions you make about your data.

d5e5 109 Master Poster

The following works if you have the Tie::File module. If you don't have it you can find it on CPAN.

#!/usr/bin/perl
use warnings;
use strict;
my %has_site;
my ( $site_file, $input_file ) = qw(site.txt INPUT.txt);

open my $fh, '<', $site_file or die "can't open this file:$!";
while (<$fh>) {
    s{^\s+}{};
    s{\s+$}{};#I have linux so chomp doesn't remove CR/LF for me
    my ( $bsc, $bcf, $site ) = split /\s+/, $_;
    $has_site{$bsc . '  ' . $bcf} = $site;#I prefer looking up with bsc and bcf as key
}
close $fh or die "can't close file:$!";

open my $fh2, '>', 'OUTPUT.txt'
  or die "can't open this file:$!";    # output file

open $fh, '<', $input_file or die "can't open this file:$!";
my $hdrs = <$fh>;
$hdrs =~ s{^\s+}{};
$hdrs =~ s{\s+$}{};#I have linux so chomp doesn't remove CR/LF for me
$hdrs = 'SITE         ' . $hdrs;
$hdrs =~ s{\s+$}{};#Remove CR/LF
print $fh2 $hdrs, "\n";

while (<$fh>) {
    s{^\s+}{};
    s{\s+$}{};#I have linux so chomp doesn't remove CR/LF for me
    s{\(\d\d\d\d\d\) \d\d\d\d}{            };#Remove unwanted (25441) 7622 etc.
    my ( $bsc, $bcf ) = split /\s+/, $_;
    my $key = $bsc . '  ' . $bcf;
    my $site;
    if (exists $has_site{$key}){
        $site = $has_site{$key};
    }
    else{
        $site = '______'
    }
    print $fh2 $site, " " x length($site), $_, "\n";

}
close $fh  or die "can't close file:$!";
close $fh2 or die "can't close file:$!";

#Now you want to sort OUTPUT.txt
#This should really be a separate script
undef %has_site; #Reclaim …
d5e5 109 Master Poster

Another question for abulut. I see lines in INPUT.txt that include BADYK01 BCF-1621 and BADYK01 BCF-3000 but I don't find any BCF-1621 or BCF-3000 in the site.txt file. Do you want these lines output with no site?

d5e5 109 Master Poster

Thanks for helping .. could you pls order them time by time. Also they should be from the new time to last time

abulut, Do you mean you want them ordered by a combination of date and time? Should 2012-03-16 14:29:54.61 print before (above) 2012-04-23 07:16:21.83 for example? Normally we order by date and time, not time alone.

d5e5 109 Master Poster

Skip the lines that don't start with FILE or LINE by putting next unless $line =~ m/^FILE|LINE/;#Skip if doesn't start with FILE or LINE near the start of the loop that reads the files. The data you posted above appears to have a lot of spaces or tabs at the start of each line. It's better to post data as file attachments so we can see what's really in the files. The following script seems to work OK for me.

#!/usr/bin/perl
use strict;
use warnings;
@ARGV = qw(FILE1.txt FILE2.txt);
my %HoH; #Hash of hash references
while (my $line = <>){
    $line =~ s/^\s+//;#Remove space characters and tabs (if any) from start of line
    chomp $line;
    next unless $line =~ m/^FILE|LINE/;#Skip if doesn't start with FILE or LINE

    my $oldnew;
    if ($ARGV eq 'FILE1.txt'){
        $oldnew = 'old';
    }
    else{
        $oldnew = 'new';
    }
    my @fields = split /\s+/, $line;
    my $key = $fields[1];
    $HoH{$key} = {} if not exists $HoH{$fields[1]};
    $HoH{$key}{$oldnew}{'F3'} = $fields[2];
    $HoH{$key}{$oldnew}{'F4'} = $fields[3];
    $HoH{$key}{$oldnew}{'F5'} = $fields[4];
    $HoH{$key}{$oldnew}{'F6'} = $fields[5];
}

foreach my $k (sort keys %HoH){
    my $o_f3 = exists $HoH{$k}{'old'}{'F3'} ? $HoH{$k}{'old'}{'F3'}: 'NONE';
    my $n_f3 = exists $HoH{$k}{'new'}{'F3'} ? $HoH{$k}{'new'}{'F3'}: 'NONE';
    my $o_f4 = exists $HoH{$k}{'old'}{'F4'} ? $HoH{$k}{'old'}{'F4'}: 'NONE';
    my $n_f4 = exists $HoH{$k}{'new'}{'F4'} ? $HoH{$k}{'new'}{'F4'}: 'NONE';
    my $o_f5 = exists $HoH{$k}{'old'}{'F5'} ? $HoH{$k}{'old'}{'F5'}: 'NONE';
    my $n_f5 = exists $HoH{$k}{'new'}{'F5'} ? $HoH{$k}{'new'}{'F5'}: 'NONE';
    my $o_f6 = exists $HoH{$k}{'old'}{'F6'} ? $HoH{$k}{'old'}{'F6'}: 'NONE';
    my $n_f6 = exists $HoH{$k}{'new'}{'F6'} ? $HoH{$k}{'new'}{'F6'}: 'NONE';
    printf "%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\n", $k,$o_f3,$n_f3,$o_f4,$n_f4,$o_f5,$n_f5,$o_f6,$n_f6;
}
d5e5 109 Master Poster

If both files will always be in the same order you can always compare line 2 in one file with line 2 in the other, line 3 with line 3, etc.

Considering that the line begin with "FILE" or "LINE" string always

I don't see any lines begining with a "LINE" string. You know your data better than I do. You need to make some assumptions about the layout of the data that determine what columns to compare with each other.

d5e5 109 Master Poster

First, always use strict; for anything more than a one-liner.

Second, indexing of arrays starts at 0, so you will refer to the eighth element in the result of your split as $whatever[7]. (Note that $whatever[9] is undefined.)

Also note that the package global variables $a and $b have a special meaning in Perl so avoid declaring lexical variables $a and $b having the same name as those special variables because that can have side effects.

Since the decision whether or not to print a line depends only on the contents of that semicolon-delimited string, you don't need to assign all the columns to individual variables. Just print the whole line after testing the one element that determines whether you want to print it. The following works for me. See the attachements file1.txt and result1.txt

#!/usr/bin/perl
use strict;
use warnings;

#inputfile
my $input_file = 'file1.txt';
my $output_file = 'result1.txt';

open my $fh_in, '<', $input_file or die "Failed to open $input_file: $!";
open my $fh_out, '>', $output_file or die "Failed to open $output_file: $!";

while (<$fh_in>){
    chomp;
    if (ok($_) == 1){
        print $fh_out $_, "\n";
    }
}

sub ok{
    my $rec = shift;
    return 1 if $rec =~ m/^#/; #Comments ok to print
    my @cols = split /\s+/, $rec;
    my $test_this = $cols[7];
    return 0 if $test_this =~ m/sno=;/; #Don't print
    return 0 if $test_this =~ m/toss=head->head/ or $test_this =~ m/toss=tail->tail/;
    return 1; #Must have passed tests
}
d5e5 109 Master Poster

Thanks for helping test this. I couldn't understand why my script worked for me and not for perllearner007.

Just as you did, I selected and copied the data provided by perllearner007 as posted on top of this post and pasted it into Komodo edit to make the above attached file, without changing anything. Komodo edit doesn't automatically strip leading spaces and my browser (Google Chrome) shouldn't make a difference, but who knows?

I agree that perllearner007's task is essentially solved. If we added some regex to strip out leading spaces before parsing each data line we could easily make a script that would work for data having or not having leading spaces. See perlfaq4 for one way to strip leading and trailing spaces.

d5e5 109 Master Poster

Start by reading both files into a suitable data structure (see the data structure cookbook). Then you can read and test values from your data structure to print the details of your report.

#!/usr/bin/perl
use strict;
use warnings;

@ARGV = qw(FILE1.txt FILE2.txt);

my %HoH; #Hash of hash references
while (my $line = <>){
    chomp $line;
    my $oldnew;
    if ($ARGV eq 'FILE1.txt'){
        $oldnew = 'old';
    }
    else{
        $oldnew = 'new';
    }
    my @fields = split /\s+/, $line;
    my $key = $fields[1];
    $HoH{$key} = {} if not exists $HoH{$fields[1]};
    $HoH{$key}{'version'} = $fields[3];
    $HoH{$key}{$oldnew}{'date'} = $fields[4];
    $HoH{$key}{$oldnew}{'num'} = $fields[5];
}
#use Data::Dumper;
#print Dumper(\%HoH);

foreach my $k (sort keys %HoH){
    my $ver = $HoH{$k}{'version'};
    my $o_num = exists $HoH{$k}{'old'}{'num'} ? $HoH{$k}{'old'}{'num'}: 'NONE';
    my $n_num = exists $HoH{$k}{'new'}{'num'} ? $HoH{$k}{'new'}{'num'}: 'NONE';
    my $o_dt = exists $HoH{$k}{'old'}{'date'} ? $HoH{$k}{'old'}{'date'}: 'NONE';
    my $n_dt = exists $HoH{$k}{'new'}{'date'} ? $HoH{$k}{'new'}{'date'}: 'NONE';
    printf "%s\t%s\t%s\t%s\t%s\t%s\n", $k,$ver,$o_num,$n_num,$o_dt,$n_dt;
}
d5e5 109 Master Poster

To answer the first part of your question about removing spaces from the number and reading from four input files:

#!/usr/bin/perl
use strict;
use warnings;

@ARGV = qw(PT1.txt PT2.txt YK1.txt YK2.txt);

while (my $line = <>){
    my @fields = split /\s+/, $line;
    print @fields[6..8], "\t", $fields[9], ' | ', $ARGV, "\n";
}
d5e5 109 Master Poster

I wonder if the data gets modified in the process of posting to and copying from Daniweb?

When I select the data provided at the top of this article, copy and paste it into my text editor and save I get the attached (please see the File Attachment.) The families.txt file that I've been working with has no spaces to the left of the data. I didn't remove them so if perllearner007's original records start with spaces, then transferring to and from the Daniweb post somehow stripped them. If you will post the families.txt file you are testing with as a File Attachment then I can test my script with it.

I think if we can verify that we're all testing with the same data we can settle on a script that works for all, whether or not the script makes use of regex or the split function.

d5e5 109 Master Poster

2teez, when I made your change the script didn't work for me anymore. I just prints blank lines and prompts for Name again. I don't understand why it works for you. Maybe the default behavior of split has changed, or maybe the Tie::File module that comes with my version of Perl works differently. My version of Perl is 5.14 subversion 2 for i686 linux.

When I run the following simplified script:

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

while (<DATA>){
    my @fields = split;
    print Dumper(\@fields);
}

__DATA__
002  Kate      Rhodes,Mich           Positive

My output is:

$VAR1 = [
          '002',
          'Kate',
          'Rhodes,Mich',
          'Positive'
        ];
d5e5 109 Master Poster
#!/usr/bin/perl
use strict;
use warnings;

my @array = <DATA>;

my ($s_min, $e_max) = (99999, 0);
foreach (reverse @array){
    chomp;
    next if m/^\s/;
    my ($code, $source, $start, $end) = (split)[0,2,3,4];
    if ($source eq 'match_part'){
        $s_min = $s_min < $start ? $s_min : $start;
        $e_max = $e_max > $end ? $e_max : $end;
    }
    else{
        $start = $s_min;
        $end = $e_max;
        ($s_min, $e_max) = (99999, 0);
    }
    $_ = join "\t", $code, $source, $start, $end;
}

foreach (@array){
    print "$_\n";
}
__DATA__
  Code  Info              source          start     end    
GB01672 rpsblast_to_CDD     protein_match       
GB01672 rpsblast_to_CDD     match_part  296     988
GB01673 rpsblast_to_CDD     protein_match       
GB01673 rpsblast_to_CDD     match_part  3803    4147
GB01673 rpsblast_to_CDD     match_part  1314    1907
GB01673 rpsblast_to_CDD     match_part  3516    3932
GB01673 rpsblast_to_CDD     match_part  3335    3463
GB01674 rpsblast_to_CDD     protein_match       
GB01674 rpsblast_to_CDD     match_part  3724    406
GB01674 rpsblast_to_CDD     match_part  1314    1907
GB01674 rpsblast_to_CDD     match_part  3335    385
d5e5 109 Master Poster

When I run the above script in my terminal, searching for names that exist in the file (Kate, etc.) and one that doesn't (Ogden) I get the following output:

david@david-laptop:~/Programming/data$ perl ../Perl/temp01.pl

Name: Kate

Kate Rhodes
Shirley Rhodes

Name: Shirley

Kate Rhodes
Shirley Rhodes

Name: Ogden

Name: Dexter

Aaron Sunders
Dexter Sunders

Name:
david@david-laptop:~/Programming/data$

I don't know why it doesn't work for you. Are you sure your input file exists in the specified path and contains the expected records? One way to debug is to add print statements to the script at strategic points to print the contents of some of the variables. How do I debug my Perl programs Also, the fact that the script doesn't tell you when a name is not found is a bug, not a feature. I forgot to check for that case and give a message when the name doesn't match any of the records.

Try running the following, modified version of the script:

#!/usr/bin/perl
use warnings;
use strict;

use Tie::File;
use Fcntl 'O_RDONLY';

#Change the following path to input directory and file if necessary
my $filename = '/home/david/Programming/data/families.txt';

# open an existing file in read-only mode
die "Unable to find $filename" unless -e $filename;
tie my @families, 'Tie::File', $filename, mode => O_RDONLY or die "$!";

#added for debugging
my $count = @families;
print "$filename contains the following $count lines:\n", join "\n", @families, "\n";
#-----

#Infinite loop, exit by entering blank input
while (1){
    print "\nName: "; …
d5e5 109 Master Poster

The following works for me. Make sure you change the path to your own directory before running it.

#!/usr/bin/perl
use warnings;
use strict;

use Tie::File;
use Fcntl 'O_RDONLY';

#Change the following path to input directory and file if necessary
my $filename = '/home/david/Programming/data/families.txt';

# open an existing file in read-only mode
tie my @families, 'Tie::File', $filename, mode => O_RDONLY or die "$!";

#Infinite loop, exit by entering blank input
while (1){
    print "\nName: ";
    chomp( my $name = <> );
    last unless $name =~ /\S/;
    print "\n";
    my %fam_names;
    foreach (@families){
        my @fields = split;
        my ($firstname, $family_tag) = @fields[1, 2];
        next unless $firstname eq $name;
        my ($lastname) = split /,/, $family_tag;
        undef $fam_names{$lastname};
    }

    foreach my $k (keys %fam_names){
        foreach my $rec (@families){
            my @fields = split /\s+/, $rec;
            my ($firstname, $family_tag) = @fields[1, 2];
            my ($lastname) = split /,/, $family_tag;
            print "$firstname $lastname\n" if $lastname eq $k;
        }
    }
}
d5e5 109 Master Poster

Line 40 probably doesn't do whatever you intend it to do. if ( my $Tags = $names{$name} ) { declares a new variable and assigns the value of $names{$name} to it, and tests this assignment with an if. That won't give an error message but it doesn't make much sense.

d5e5 109 Master Poster

2teez when I run your script and enter Kate when prompted I get no output. No error but no output either. I made a few changes in the loop that builds the hash and now it works OK for me.

perllearner007 I couldn't recreate your error but you might as well try the following modified version which works OK for me.

#!/usr/bin/perl
use warnings;
use strict;
use Carp qw(croak);
croak
"Usage: script.pl db_name_file.txt, you must specify the file that contained names to search from"
  unless defined $ARGV[0];    ## error msg if input file is not specified
print "Enter name to search: ";
my %has_val;
chomp( my $name = <STDIN> );    ## get name to search from user
my $input_file = $ARGV[0];
open my $fh, '<', $input_file
  or croak "can't open this file:$!";    ## open the file to read from

while (<$fh>) {
    next if !/^\d+?/;
    #Had to make changes here to get it to work for me d5e5
    chomp;
    my ($name_id, $fam_tag) = (split /\s+/)[1,2];
    my ($fam_name) = split /,/, $fam_tag;
    $has_val{$name_id} = $fam_name;
    #End of d5e5's changes
}
close $fh or croak "can't close this file:$!";    ## close file
$name = ucfirst( lc($name) );    ## this will yield first letter capital always

foreach my $fnd ( keys %has_val ) {
    if ( $fnd eq $name ) {
        my $surname = $has_val{$fnd};
        while ( my ( $key, $value ) = each %has_val ) {
            print $key, '  ', $value, $/ if $surname eq $value;
        }
    }
}
d5e5 109 Master Poster

You can also do this by tieing your file to an array using Tie::File so any change you make to the array will get written to the file.

#!/usr/bin/perl
use warnings;
use strict;

use Tie::File;

my $filename = 'lookup.txt';
my @lookup;

die "$filename not found" unless -e $filename;
tie @lookup, 'Tie::File', $filename or die "Failed to tie $filename: $!";

edit_file('InstanceName1','DomainName2','Server5'); #Edit existing record

edit_file('InstanceName8','DomainName2','Server999'); #Add new record

untie @lookup;            # all finished

sub edit_file{
    my ($iname, $dname, $sname) = @_;
    my $found = 0;

    foreach my $rec (@lookup){
        chomp $rec;
        my @keyarray = split(/::/, $rec);
        if ($keyarray[0] . $keyarray[1] eq $iname . $dname){
            $found++;
            $rec = join('::', $iname, $dname, $sname);
        }
    }
    if ($found == 0){
        push @lookup, join('::', $iname, $dname, $sname);
    }
}
asadarun commented: Thanks +0