I have a large data set (12,000 rows X 14 columns); the first 4 rows as below:
x1 y1 0.02 NAN NAN NAN NAN NAN NAN 0.004 NAN NAN NAN NAN
x2 y2 NAN 0.003 NAN 10 NAN 0.03 NAN 0.004 NAN NAN NAN NAN
x3 y3 NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN
x4 y4 NAN 0.004 NAN NAN NAN NAN 10 NAN NAN 30 NAN 0.004
I need to remove any row with "NAN" in columns 3-14 and then output the rest of the dataset. I wrote the following code:
#!usr/bin/perl
use warnings;
use strict;
use diagnostics;
open(IN, "<", "file1.txt") or die "Can't open file for reading:$!";
open(OUT, ">", "file2.txt") or die "Can't open file for writing:$!";
my $header = <IN>;
print OUT $header;
my $at_line = 0;
my $col3;
my $col4;
my $col5;
my $col6;
my $col7;
my $col8;
my $col9;
my $col10;
my $col11;
my $col13;
my $col14;
my $col15;
while (<IN>){
chomp;
my @sections = split(/\t/);
$col3 = $sections[2];
$col4 = $sections[3];;
$col5 = $sections[4];
$col6 = $sections[5];
$col7 = $sections[6];
$col8 = $sections[7];
$col9 = $sections[8];
$col10 = $sections[9];
$col11 = $sections[10];
$col13 = $sections[11];
$col14 = $sections[12];
$col15 = $sections[13];
if ($col3 eq "NAN" && $col4 eq "NAN" && $col5 eq "NAN" && $col6 eq "NAN" && $col7 eq "NAN" && $col8 eq "NAN" && $col9 eq "NAN" && $col10 eq "NAN" && $col11 eq "NAN" && $col12 eq "NAN" && $col13 eq "NAN" && $col14 eq "NAN" && $col5 eq "NAN"){
$at_line = $.;
}
else {
print OUT "$_\n";
}
}
close(IN);
close(OUT);
Running this code gave the following error:
Use of uninitialized value $col3 in string eq at filter.pl
line 46, <IN> line 2 (#1)
How can I make this program work? Thanks.