I need a perl script to extract the starting and ending positions of a column

here is the input

HETATM 7749 C1 NFG A1001 -31.772 -7.604 -23.847 0.80 61.71 C
HETATM 7750 O1 NFG A1001 -30.806 -8.518 -23.305 0.80 65.42 O
HETATM 7751 C2 NFG A1001 -32.987 -7.507 -22.895 0.80 60.75 C
HETATM 7752 F NFG A1001 -34.161 -7.378 -23.591 0.80 60.24 F
HETATM 7753 C3 NFG A1001 -32.871 -6.337 -21.918 0.80 59.86 C
HETATM 7754 O3 NFG A1001 -33.668 -6.536 -20.739 0.80 59.28 O
HETATM 7755 C4 NFG A1001 -31.392 -6.167 -21.609 0.80 58.86 C
HETATM 7756 O4 NFG A1001 -31.150 -5.379 -20.446 0.80 57.89 O
HETATM 7757 C5 NFG A1001 -30.787 -5.553 -22.865 0.80 58.98 C
HETATM 7758 C6 NFG A1001 -29.272 -5.425 -22.793 0.80 58.68 C
HETATM 7759 O6 NFG A1001 -28.680 -6.725 -22.846 0.80 58.41 O
HETATM 7760 O5 NFG A1001 -31.154 -6.319 -24.031 0.80 60.35 O
HETATM 7761 C11 NFG A1001 -30.085 -9.565 -23.872 0.80 66.39 C
HETATM 7762 C12 NFG A1001 -29.121 -10.165 -23.041 0.80 66.65 C
HETATM 7763 N1 NFG A1001 -28.897 -9.736 -21.759 0.80 66.91 N
HETATM 7764 O11 NFG A1001 -29.546 -8.809 -21.297 0.80 66.97 O
HETATM 7765 O12 NFG A1001 -28.045 -10.270 -21.048 0.80 66.67 O
HETATM 7766 C13 NFG A1001 -28.350 -11.226 -23.507 0.80 66.53 C
HETATM 7767 C14 NFG A1001 -28.500 -11.725 -24.792 0.80 66.55 C
HETATM 7768 N2 NFG A1001 -27.683 -12.771 -25.140 0.80 66.72 N
HETATM 7769 O21 NFG A1001 -26.873 -13.217 -24.331 0.80 66.38 O
HETATM 7770 O22 NFG A1001 -27.734 -13.283 -26.248 0.80 66.70 O
HETATM 7771 C15 NFG A1001 -29.466 -11.137 -25.637 0.80 66.48 C
HETATM 7772 C16 NFG A1001 -30.257 -10.063 -25.182 0.80 66.43 C

The output I wanted is

A1001 7749
A1001 7772

Hi nakshi,

Here is one.

#!/usr/bin/perl
use warnings;
use strict;

my @data_array;
push @data_array, join " " => @{ [split] }[ 4, 1 ] while <DATA>;

local $" = "\n";
print "@data_array[0,$#data_array]";

__DATA__
HETATM 7749 C1 NFG A1001 -31.772 -7.604 -23.847 0.80 61.71 C
HETATM 7750 O1 NFG A1001 -30.806 -8.518 -23.305 0.80 65.42 O
HETATM 7751 C2 NFG A1001 -32.987 -7.507 -22.895 0.80 60.75 C
HETATM 7752 F NFG A1001 -34.161 -7.378 -23.591 0.80 60.24 F
HETATM 7753 C3 NFG A1001 -32.871 -6.337 -21.918 0.80 59.86 C
HETATM 7754 O3 NFG A1001 -33.668 -6.536 -20.739 0.80 59.28 O
HETATM 7755 C4 NFG A1001 -31.392 -6.167 -21.609 0.80 58.86 C
HETATM 7756 O4 NFG A1001 -31.150 -5.379 -20.446 0.80 57.89 O
HETATM 7757 C5 NFG A1001 -30.787 -5.553 -22.865 0.80 58.98 C
HETATM 7758 C6 NFG A1001 -29.272 -5.425 -22.793 0.80 58.68 C
HETATM 7759 O6 NFG A1001 -28.680 -6.725 -22.846 0.80 58.41 O
HETATM 7760 O5 NFG A1001 -31.154 -6.319 -24.031 0.80 60.35 O
HETATM 7761 C11 NFG A1001 -30.085 -9.565 -23.872 0.80 66.39 C
HETATM 7762 C12 NFG A1001 -29.121 -10.165 -23.041 0.80 66.65 C
HETATM 7763 N1 NFG A1001 -28.897 -9.736 -21.759 0.80 66.91 N
HETATM 7764 O11 NFG A1001 -29.546 -8.809 -21.297 0.80 66.97 O
HETATM 7765 O12 NFG A1001 -28.045 -10.270 -21.048 0.80 66.67 O
HETATM 7766 C13 NFG A1001 -28.350 -11.226 -23.507 0.80 66.53 C
HETATM 7767 C14 NFG A1001 -28.500 -11.725 -24.792 0.80 66.55 C
HETATM 7768 N2 NFG A1001 -27.683 -12.771 -25.140 0.80 66.72 N
HETATM 7769 O21 NFG A1001 -26.873 -13.217 -24.331 0.80 66.38 O
HETATM 7770 O22 NFG A1001 -27.734 -13.283 -26.248 0.80 66.70 O
HETATM 7771 C15 NFG A1001 -29.466 -11.137 -25.637 0.80 66.48 C
HETATM 7772 C16 NFG A1001 -30.257 -10.063 -25.182 0.80 66.43 C

Output

A1001 7749
A1001 7772
#!/usr/bin/perl
use warnings;
use strict;

my $data_array;

#my @data_array;

my $dir = '/home/test/Desktop/pdb';
opendir(DIR, $dir) or die $!;
my @data_array = grep (/\.pdb$/, readdir (DIR));
closedir DIR;

open IN, "<$data_array" or die "$!";
        while($data_array=<IN>) {
        foreach my $data_array (@data_array) {
        push @data_array, join " " => @{ [split] }[ 4, 1 ] ;
        local $" = "\n";
        print "@data_array[0,$#data_array]";
        }
    }

exit;

I want to open each pdb in a directory and read them in an array and give the output for each pdb file. So, I have put them in a directory and executed the script.

But the script is showing an error at line 14.

Use of uninitialized value $data_array in concatenation (.) or string at ./dani.pl line 14.
No such file or directory at ./dani.pl line 14. please help me.

Use of uninitialized value $data_array in concatenation (.) or string at ./dani.pl line 14.
No such file or directory at ./dani.pl line

You are using a variable declared in Line 5 called $data_array but was not initialized as your file in Line 14.
Moreover, though perl is not confused at the use of $data_array variable, but you might. Since we already have an array variable that is called @data_array. Why not use a different scalar variable name.

There are others things you are also mixing up. Like using the same variable name as your variable for each line read from the filehandle IN.

Some good practice is that you use a lexical variable as your filehandles not a BAREWORD like IN etc.
Use the 3 arugment open function like so open my $fh,'<',$filename or die "..."
Close your opened filehandles.

I re-wrote your code from the start. This should work for you.

#!/usr/bin/perl
use warnings;
use strict;
use Cwd qw(abs_path);

my $dir = $ARGV[0];
$dir = abs_path($dir);

## change to the directory to use
chdir $dir or die "directory doesn't exists\n";

my @data_array;   ## array for collection

opendir my $dh, $dir or die "can't open directory: $!";
while ( my $filename = readdir($dh) ) {
    next if $filename eq '.' or $filename eq '..';
    open my $fh, '<', $filename or die "can't open file: $!";
    while (<$fh>) {
        push @data_array, join " " => @{ [split] }[ 4, 1 ];
    }
    {
        local $" = "\n";
        print "@data_array[0,$#data_array]\n";
    }
    close $fh or die "can't close file: $!";
    @data_array = ();  # empty the array for the next file
}
closedir $dh or die "can't close directory: $!";

You could change the $ARGV[0] to '/home/test/Desktop/pdb' in your own program, or you give the directory from the CLI.

I think it will make a lot easier to do the whole thing in a single lump, than getting the file first, then using a for-loop to go over the files.
So, going over each file, collect all the parameters needed, then print out the needful. Like that...

Hope this helps.

I have changed the script and given the path. But still it complains

Use of uninitialized value in join or string at ./dp.pl line 20,

so please help me

Can you post your line 15 to 30. How does it look.
The script given preiously should work properly.

Thanks

   1. #!/usr/bin/perl
   2. use warnings;
   3. use strict;
   4. use Cwd qw(abs_path);
   5.
   6. my $dir = '/home/test/Desktop/pdb';
   7. $dir = abs_path($dir);
   8.
   9. ## change to the directory to use
  10. #chdir $dir or die "directory doesn't exists\n";
  11.
  12. my @data_array; ## array for collection
  13.
  14. opendir my $dh, $dir or die "can't open directory: $!";
  15. while ( my $filename = readdir($dh) ) {
  16. next if $filename eq '.' or $filename eq '..';
  17. open my $fh, '<', $filename or die "can't open file: $!";
  18. while (<$fh>) {
  19. push @data_array, join " " => @{ [split] }[ 4, 1 ];
  20. }
  21. {
  22. local $" = "\n";
  23. print "@data_array[0,$#data_array]\n";
  24. }
  25. close $fh or die "can't close file: $!";
  26. @data_array = (); # empty the array for the next file
  27. }
  28. closedir $dh or die "can't close directory: $!";

HETATM 3575 C1 NAG A 472A 44.533 -1.415 16.010 1.00 6.18 C
HETATM 3576 C2 NAG A 472A 44.916 -0.562 14.804 1.00 6.44 C
HETATM 3577 C3 NAG A 472A 46.183 -1.114 14.153 1.00 6.38 C
HETATM 3578 C4 NAG A 472A 47.304 -1.170 15.150 1.00 6.09 C

I need another script. How do u extract colum5 i.e 472A, that has both numbers and letters.

I have tried using reg expression something like this

if $column[5] =~ /[0-9A-Z]+/
print $column[5];

but it doesn't work. kindly do me the needful.

Hi,
Why was Line 10 commented out. Please also check the file you are using as input.
Using the previous example given in my first post, you can modify to get your column 5. Bearing in mind that array index in Perl start from "0" not "1".

Hope that helps

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.