have a bit of an issue trying to obtain some data from a csv file using PERL. I can sort the file and remove any duplicates leaving only 4 or 5 rows containing data. My problem is that the data contained in the original file contains a lot more columns and when I try ro run this script it finds that all the data is unique.
I have the following fields within the orignal file:
LAO_START_WW,PROGRAM,ID,OP,PROBE_CARD,DEVREVSTEP,TEST_START,TESTER_ID
The data which I need to obtain and sort is within the op,probecard and tester_id fields.
How can I go about this?
The code that I use after manually deleting the fields that i do not require is as follows:
#!/usr/bin/perl -w
use strict;
my $csvfile = 'probecards.csv';
my $newfile = 'new.csv';
my $fieldnames = 1;
open (IN, "<$csvfile") or die "Couldn't open input CSV file: $!";
open (OUT, ">$newfile") or die "Couldn't open output file: $!";
my $header;
$header = <IN> if $fieldnames;
my @data = sort <IN>;
while( <IN> ) {
push @data, join "\t", (split /\t/)[4,5,8];
}
print OUT $header;
my $n = 0;
my $lastline = '';
foreach my $currentline (@data) {
next if $currentline eq $lastline;
print OUT $currentline;
$lastline = $currentline;
$n++;
}
close IN; close OUT;
print "Processing complete. In = " . scalar @data . " records, Out = $n records\n";
iNPUT CSV file:
LAO_START_WW,PROGRAM,ID,OP,PROBE_CARD,DEVREVSTEP,TEST_START,TESTER_ID
200812,12630M196,139,2660,S25E3N36,88BCRA,16/03/2008 12:05,IN01
200812,12630M196,1,2660,S25E3N36,88BLBHDA,16/03/2008 13:04,IN01
200812,12630M196,508,2660,S25E3N36,88BCRA,16/03/2008 13:41,IN01
200812,12630M196,437,2660,S25E3N35,88CLNHCC,16/03/2008 14:18,IN04
200812,12630M196,465,2660,S25E3N36,88BCRA,16/03/2008 15:34,IN01
200812,12630M196,27,2660,S25E3N36,88BCRA,16/03/2008 18:00,IN01
200812,12630M196,18,2660,S25E3N27,88BCRA,16/03/2008 19:03,IN03
200812,12630M196,11,2660,S25E3N36,88BCRA,17/03/2008 14:37,IN01
200812,12620M189,526,2660,S25E3N36,8PMVCVAE,17/03/2008 15:21,IN01
200812,12630M196,167,2660,S25E3N36,88BCRA,17/03/2008 19:02,IN01
200812,12630M196,652,2660,S25E3N36,88BCRA,17/03/2008 19:39,IN01
200812,12630M196,765,2660,S25E3N36,88CLNHCC,17/03/2008 20:15,IN01
Output required:
OP,PROBE_CARD,TESTER_ID
2660,S25E3N36,IN01
2660,S25E3N27,IN03
2660,S25E3N35,IN04
Any help would be grateful
I know its something to do with the duplicates being removed within the next if
but i cannot sort it out...maybe I have been looking at this to long
rgs
colin