File comparison

Question

BastienP 0 Light Poster

15 Years Ago

Hello there,

I'm currently using a perl scrip that checks if every line in a file 1.csv is present in a file named 2.csv, and writes only the different lines.

The output is a file called out.csv.

What I'm trying to do is to write only the most recent lines in the file out.csv. The file 1.csv is older than the file 2.csv (2 different extracts from ActiveDirectory).

What is currently done is printing the 2 lines from 1.csv and 2.csv to the out.csv.

It's actually printing all the differences between the two files into output, and not only the newest lines.

I haven't checked yet but I think that the lines of file 1.csv (older file, the ones I'd like to drop) are printed first. But I'm not sure and haven't done enought tests. It seemed to me some of them were in some kind of random order.

I'm trying to find out how to check the lines and print only the newer ones... It seems I should do that with 2 different steps:

- running the script in order to delete the similar lines
- running an other script that will count lines and drop the first n/2 lines.

use strict;
use warnings;

my $f1 = $ENV{path_Files}."COMP/1.CSV";
my $f2 = $ENV{path_Files}."COMP/2.CSV";
my $outfile = $ENV{path_Files}."COMP/update.csv";
my %results = ();

open FILE1, "$f1" or die "Could not open file: $! \n";
while(my $line = <FILE1>){
   $results{$line}=1;
}
close(FILE1);

open FILE2, "$f2" or die "Could not open file: $! \n";
while(my $line =<FILE2>) {
   $results{$line}++;
}
close(FILE2);


open (OUTFILE, ">$outfile") or die "Cannot open $outfile for writing \n";
foreach my $line (keys %results) {
   print OUTFILE $line if $results{$line} == 1;
}
close OUTFILE;

What do you think of that ?

Bastien

file-system perl

2 Contributors
5 Replies
195 Views
2 Months Discussion Span
Latest Post 14 Years Ago Latest Post by d5e5

All 5 Replies

d5e5 109 Master Poster

14 Years Ago

Here is an example of how to find differences between an older and newer file, using a hash. For simplicity I used ordinary text data, but it should work for csv files as well since you don't need to parse the lines into fields at this point. (You can parse the data with another script, later.)

I attached the data files, 'names_1.txt' and 'names_2.txt' to this post.

#!/usr/bin/perl
#print_diffs.pl
use strict;
use warnings;

my $dir = '/home/david/Programming/Perl/data';
my $f1 = "$dir/names_1.txt";
my $f2 = "$dir/names_2.txt";
open FILE1, "$f1" or die "Could not open $f1: $! \n";

my %results = ();#Hash to store lines from files
my %meaning = (-1, "In new file but not in old file",
               0, "In both new and old files",
               1, "In old file but not in new file",); #Hash associating values with statuses

while(my $line = <FILE1>){
    chomp $line;
    $results{$line}=1;
}
close(FILE1);

open FILE2, "$f2" or die "Could not open $f2: $! \n";
while(my $line =<FILE2>) {
    chomp $line;
    #$results{$line}++; #Instead of incrementing by one...
    $results{$line}--; #Instead of incrementing by one...
}
close(FILE2);

foreach(sort order keys %results){
    print "Key $_ has value $results{$_} which means $meaning{$results{$_}}\n";
}

sub order{
    $results{$a} <=> $results{$b};
}

names_1.txt (0.04 KB)

Fred
Bob
Dan
Terry
Gail
Jane
Rose
George
Jack

names_2.txt (0.05 KB)

Frederick
Bob
Dan
Terry
Jane
Rose
George
Jack
Hubert

d5e5 109 Master Poster

14 Years Ago

I tested with the following old entries in 1.csv

user,Petergr," CN=Peter Graham,OU=Newport,DC=cp,dc=com"
user,Janiebo," CN=Janie Bourne,OU=Newport,DC=cp,dc=com"
user,Edgardu," CN=Edgar Dunn,OU=Newport,DC=cp,dc=com"
user,Belindaha," CN=Belinda Hart,OU=Newport,DC=cp,dc=com"
user,Mayja," CN=May Jamieson,OU=Newport,DC=cp,dc=com"
user,Leroyot," CN=Leroy Ota,OU=Newport,DC=cp,dc=com"

I modified the 4th line and added a 7th line to the above and saved it as 2.csv

user,Petergr," CN=Peter Graham,OU=Newport,DC=cp,dc=com"
user,Janiebo," CN=Janie Bourne,OU=Newport,DC=cp,dc=com"
user,Edgardu," CN=Edgar Dunn,OU=Newport,DC=cp,dc=com"
user,Belindajo," CN=Belinda Jones,OU=Newport,DC=cp,dc=com"
user,Mayja," CN=May Jamieson,OU=Newport,DC=cp,dc=com"
user,Leroyot," CN=Leroy Ota,OU=Newport,DC=cp,dc=com"
user,Rabbitbr," CN=Rabbit Brer,OU=Newport,DC=cp,dc=com"

The following program should print only the data for Belindajo and for Rabbitbr, because all the other lines in 2.csv exist in 1.csv and so are not new.

#!/usr/bin/perl
#find_new_records.pl
use strict;
use warnings;

my $dir = '/home/david/Programming/Perl/data';
my $f1 = "$dir/1.csv";
my $f2 = "$dir/2.csv";
open FILE1, "$f1" or die "Could not open $f1: $! \n";

my %results = ();#Hash to store lines from files
my %meaning = (-1, "In new file but not in old file",
               0, "In both new and old files",
               1, "In old file but not in new file",); #Hash associating values with statuses

while(my $line = <FILE1>){
    chomp $line;
    $results{$line}=1;
}
close(FILE1);

open FILE2, "$f2" or die "Could not open $f2: $! \n";
while(my $line =<FILE2>) {
    chomp $line;
    $results{$line}--; #Instead of incrementing by one...
}
close(FILE2);

foreach(keys %results){
    print "Key $_\n" if $results{$_} == -1;
}

The above gives me the following output:

Key user,Rabbitbr," CN=Rabbit Brer,OU=Newport,DC=cp,dc=com"
Key user,Belindajo," CN=Belinda Jones,OU=Newport,DC=cp,dc=com"

BastienP commented: Great stuff, as usual !!!! +1

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

BastienP 0 Light Poster · Answer 1 · 2010-09-28T21:14:28+00:00

Hello There,

I'm replying to this old post because I still don't know how to handle the problem I have.

The solution you gave David doesn't reply to my needs unfortunately. Because I already know which file is the latest one (and thus where the accurate info is).

Let's say the file 2.csv is the newest file, where good info are, and file 1.csv old entries.

What I need is a script that would check every line of the file 2.csv and drop the lines that are in the file 1.csv. The reason is that I'm doing an update of modified entries and I don't need to update old lines that are still the same and have no updated info.

I don't know if I make myself clear, don't hesitate to ask me explanations.

Thanks by advance,
Regards,
Bastien

BastienP 0 Light Poster · Answer 2 · 2010-09-29T01:10:40+00:00

David,

Once again you are the man ! Just wonderful, of course it works exactly how I needed it !

I know you're in Canada but if one day you come to Paris you've just won a flight in a light plane. And as I'm flying next to Versailles and his famous castle there's a nice view of it and Paris ! :-) -and really I mean it, I'd be glad to show you this wonderful view- !

Thanks once again

Regards
Bastien

d5e5 109 Master Poster · Answer 3 · 2010-09-29T22:20:19+00:00

David,
Once again you are the man ! Just wonderful, of course it works exactly how I needed it !
I know you're in Canada but if one day you come to Paris you've just won a flight in a light plane. And as I'm flying next to Versailles and his famous castle there's a nice view of it and Paris ! :-) -and really I mean it, I'd be glad to show you this wonderful view- !
Thanks once again
Regards
Bastien

Bastien,
Je vous en prie. You are welcome and thanks for the offer. My wife and I would love to see Paris, although we have no overseas trips planned for the next few years. Maybe some day.:)

Regards,
David

File comparison

Recommended Answers Collapse Answers

All 5 Replies

Recommended Answers