Compare two files & output in third file in Perl scripting

Question

sandeepau 0 Newbie Poster

13 Years Ago

Hello, Can you please help me in following scenario in Perl scripting? I want to compare two text files & save output of this comparision in third file with flags I-Insert, D-Delete, U-Update at the end of line.

File1.txt -
1|abc
2|efg
3|xyz

File2.txt
1|abc
2|efh
4|pqr

Expected output is - File3.txt
2|efh|C
3|xyz|D
4|pqr|I

perl

4 Contributors
11 Replies
4K Views
4 Months Discussion Span
Latest Post 13 Years Ago Latest Post by sandeepau

d5e5 109 Master Poster

13 Years Ago

#!/usr/bin/perl
use strict;
use warnings;

my ($file1, $file2, $file3) = qw(1.txt 2.txt 3.txt);
open my $fh1, '<', $file1 or die "Can't open $file1: $!";
open my $fh2, '<', $file2 or die "Can't open $file2: $!";
open my $fh3, '>', $file3 or die "Can't open $file3: $!";

while (<$fh1>){
    last if eof($fh2);
    my $comp_line = <$fh2>;
    chomp($_, $comp_line);
    my @rec1 = split /\|/;
    my @rec2 = split /\|/, $comp_line;
    
    #I-Insert, D-Delete, U-Update
    print $fh3 "$rec2[0]|$rec2[1]|U\n" if $rec2[0] eq $rec1[0] and $rec2[1] ne $rec1[1];
    print $fh3 "$rec1[0]|$rec1[1]|D\n" if $rec2[0] ne $rec1[0] and $rec2[1] ne $rec1[1];
    print $fh3 "$rec2[0]|$rec2[1]|I\n" if $rec2[0] ne $rec1[0];
}

d5e5 109 Master Poster

13 Years Ago

For this data I would suggest reading and saving the first file into a hash of hashes to save each record with a flag with value of 'D'. Then read through the second file to compare its records with the saved records and change the flag's value as needed.

#!/usr/bin/perl
use strict;
use warnings;

my ($file1, $file2, $file3) = qw(1.txt 2.txt 3.txt);
open my $fh1, '<', $file1 or die "Can't open $file1: $!";
open my $fh2, '<', $file2 or die "Can't open $file2: $!";
open my $fh3, '>', $file3 or die "Can't open $file3: $!";

my %save; #Hash of hashes to store records from file1 for comparison with file2
while (<$fh1>){
    chomp;
    my @rec = split /\|/;
    my $key = $rec[2];
    $save{$key}->{'data'} = $_; #Save current record in hash
    $save{$key}->{'flag'} = 'D';
}

while (<$fh2>){
    chomp;
    my @rec = split /\|/;
    my $key = $rec[2];
    
    if (not exists $save{$key}){
        $save{$key}->{'data'} = $_;
        $save{$key}->{'flag'} = 'I';
    }elsif ($_ ne $save{$key}->{'data'}){
        $save{$key}->{'data'} = $_;
        $save{$key}->{'flag'} = 'U';
    }else{
        delete $save{$key};
    }
}

foreach (sort keys %save){
    print $fh3 "$save{$_}->{'data'}|$save{$_}->{'flag'}\n";
}

d5e5 109 Master Poster

13 Years Ago

Instead of the above solution, you could simply write a function to return the portion of the record that you want to compare.

#!/usr/bin/perl
use strict;
use warnings;

my ($file1, $file2, $file3) = qw(1.txt 2.txt 3.txt);
open my $fh1, '<', $file1 or die "Can't open $file1: $!";
open my $fh2, '<', $file2 or die "Can't open $file2: $!";
open my $fh3, '>', $file3 or die "Can't open $file3: $!";

my %save; #Hash of hashes to store records from file1 for comparison with file2
while (<$fh1>){
    chomp;
    my @rec = split /\|/;
    my $key = $rec[2];
    $save{$key}->{'data'} = $_; #Save current record in hash
    $save{$key}->{'flag'} = 'D';
}

while (<$fh2>){
    chomp;
    my @rec = split /\|/;
    my $key = $rec[2];
    
    if (not exists $save{$key}){
        $save{$key}->{'data'} = $_;
        $save{$key}->{'flag'} = 'I';
    }elsif (string_to_compare($_) ne string_to_compare($save{$key}->{'data'})){
        $save{$key}->{'data'} = $_;
        $save{$key}->{'flag'} = 'U';
    }else{
        delete $save{$key};
    }
}

foreach (sort keys %save){
    print $fh3 "$save{$_}->{'data'}|$save{$_}->{'flag'}\n";
}

sub string_to_compare{
    my $line = shift;
    my ($skip1, $skip2, $key, $remainder) = split /\|/, $line, 4;
    return $remainder;
}

Salem commented: For effort, even though the OP is getting a free ride +17

k_manimuthu commented: Nice coding sequence & derivation +5

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

sandeepau 0 Newbie Poster · Answer 1 · 2011-07-11T19:56:46+00:00

Thank you very much David for solution. This is working perfectly fine above scenario. However, sometimes number of columns/fields are not fixed in the input file1 & file2. It could be more 10 or less. So, array indexing will not work in that case. Can you please suggest on following scenario?

In this example - I have third field of each input file is unique key column & I want to differences based on that key coulmn (red highlighted).
e.g.
File1 ->
1780437|20110705|00000077040000000000000048881|7704|48881|PE|08/12/2008 11:38:54|0|1000.00
1780437|20110705|00000077040000000000000048882|7704|48882|PE|08/12/2008 11:38:54|0|1000.00
1780437|20110705|00000077040000000000000048887|7704|48887|PE|08/11/2008 11:38:54|0|1000.00
1780437|20110705|00000077040000000000000048888|7704|48888|PE|08/12/2008 11:38:54|0|1000.00
File2.txt ->
1780437|20110705|00000077040000000000000048881|7704|48881|PE|08/12/2008 11:38:54|0|1000.00
1780437|20110705|00000077040000000000000048882|7704|48882|PE|08/12/2008 11:38:54|0|1000.00
1780437|20110705|00000077040000000000000048883|7704|48883|PE|10/01/2009 14:33:18|1|1000.00
1780437|20110705|00000077040000000000000048887|7704|48887|PE|08/11/2008 11:38:54|0|1001.00

Expected output ->

1780437|20110705|00000077040000000000000048883|7704|48883|PE|10/01/2009 14:33:18|1|1000.00|I
1780437|20110705|00000077040000000000000048887|7704|48887|PE|08/11/2008 11:38:54|0|1001.00|U
1780437|20110705|00000077040000000000000048888|7704|48888|PE|08/12/2008 11:38:54|0|1000.00|D

sandeepau 0 Newbie Poster · Answer 2 · 2011-07-12T19:35:33+00:00

Thanks David! This is working for me.

Also, I have realised one thing for flag 'U' that for find update flag - I don't need to consider first & second column. Since first & second columns are like timestamp & ID. So, for 'U' - comparision should start from third column (unique column).

So, I thinking to remove first two column at very initial step & before starting this comparision from both input files. Could please suggest any other approach here?

d5e5 109 Master Poster · Answer 3 · 2011-07-12T21:34:40+00:00

Thanks David! This is working for me.
Also, I have realised one thing for flag 'U' that for find update flag - I don't need to consider first & second column. Since first & second columns are like timestamp & ID. So, for 'U' - comparision should start from third column (unique column).
So, I thinking to remove first two column at very initial step & before starting this comparision from both input files. Could please suggest any other approach here?

In that case I would save the portion following the unique column in a separate element of the %save data called 'compare_this' and use it for comparing with the corresponding remainder of each record from file2.

#!/usr/bin/perl
use strict;
use warnings;

my ($file1, $file2, $file3) = qw(1.txt 2.txt 3.txt);
open my $fh1, '<', $file1 or die "Can't open $file1: $!";
open my $fh2, '<', $file2 or die "Can't open $file2: $!";
open my $fh3, '>', $file3 or die "Can't open $file3: $!";

my %save; #Hash of hashes to store records from file1 for comparison with file2
while (<$fh1>){
    chomp;
    my ($skip1, $skip2, $key, $remainder) = split(/\|/, $_, 4);
    $save{$key}->{'data'} = $_; #Save current record in hash
    $save{$key}->{'compare_this'} = $remainder; #Save last part of current record in hash
    $save{$key}->{'flag'} = 'D';
}

while (<$fh2>){
    chomp;
    my @rec = split /\|/;
    my ($skip1, $skip2, $key, $remainder) = split(/\|/, $_, 4);
    
    if (not exists $save{$key}){
        $save{$key}->{'data'} = $_;
        $save{$key}->{'flag'} = 'I';
    }elsif ($remainder ne $save{$key}->{'compare_this'}){
        $save{$key}->{'data'} = $_;
        $save{$key}->{'flag'} = 'U';
    }else{
        delete $save{$key};
    }
}

foreach (sort keys %save){
    print $fh3 "$save{$_}->{'data'}|$save{$_}->{'flag'}\n";
}

sandeepau 0 Newbie Poster · Answer 4 · 2011-07-13T18:10:18+00:00

Thank you David! This is working perfectly fine for me.Thanks again!

d5e5 109 Master Poster · Answer 5 · 2011-07-13T21:03:14+00:00

Thank you David! This is working perfectly fine for me.Thanks again!

You are welcome. Please don't forget to mark this thread 'solved'.

sathya21 0 Newbie Poster · Answer 6 · 2011-08-31T11:36:43+00:00

sathya21 0 Newbie Poster

13 Years Ago

comedy comedy

mantoj 0 Newbie Poster · Answer 7 · 2011-08-31T11:38:40+00:00

mantoj 0 Newbie Poster

13 Years Ago

what comedy da setya??????????

sandeepau 0 Newbie Poster · Answer 8 · 2011-12-05T13:07:11+00:00

I'm reopening this thread since I need few manipulation changes in order to optimize compare process.

Can someone suggest me how to handle sorted input this comparision process? Considering, I'm getting sorted input data from both input file. So, comparision should start from File 1: first record to File 2:first record or less than File1's first record. So, it will save the comparision time & would help in order to optimise the process. It won't required to check for all files's record since both input files are presorted.

Thank you very much in advance!