Parsing a Rebase file

Question

Anthony Cameron -2 Light Poster

14 Years Ago

Hello, I am parsing a rebase file and using different subroutines from the BeginPerlBioinfo module. I have used the subroutines I think I need but I keep on getting the message"use of initialized value $site in concatenation or string <$fh>.

use strict;
use warnings;
use P4;

# Declare and initialize variables
my %rebase_hash = (  );
my @file_data = (  );
my $query = '';
my $dna = '';
my $recognition_site = '';
my $regexp = '';
my @locations = (  );


# Get the REBASE data into a hash, from file "rebase.txt"
%rebase_hash = parseREBASE('rebase.txt');

# Prompt user for restriction enzyme names, create restriction map
do {
    print "Search for what restriction site for (or quit)?: ";
    
    $query = <STDIN>;

    chomp $query;

    # Exit if empty query
    if ($query =~ /^\s*$/ ) {

    exit;
    }

    # Perform the search in the DNA sequence
    if ( exists $rebase_hash{$query} ) {

    ($recognition_site, $regexp) = split ( " ", $rebase_hash{$query});

    # Create the restriction map
    @locations = match_positions($regexp, $dna);

    # Report the restriction map to the user
    if (@locations) {
    print "Searching for $query $recognition_site $regexp\n";
    print "A restriction site for $query at locations:\n";
    print join(" ", @locations), "\n";
    } else {
    print "A restriction site for $query is not in the DNA:\n";
    }
    }
    print "\n";
} until ( $query =~ /quit/ );

exit;

P4 is the name of the module I made up with the subroutines I think that I need.
Rebase.txt is the file that I need to parse.

The subroutines that I have used are:

# open_file
#
#   - given filename, set filehandle

sub open_file {

    my($filename) = @_;
    my $fh;

    unless(open($fh, $filename)) {
    print "Cannot open file $filename\n";
 
    }
    return $fh;
}


# A subroutine to get data from a file given its filename
# get_file_data
sub get_file_data {

    my($filename) = @_;

    use strict;
    use warnings;

    # Initialize variables
    my @filedata = (  );

    unless( open(GET_FILE_DATA, $filename) ) {
    print STDERR "Cannot open file \"$filename\"\n\n";
    exit;
    }

    @filedata = <GET_FILE_DATA>;

    close GET_FILE_DATA;

    return @filedata;
}
sub IUB_to_regexp {

    my($iub) = @_;

    my $regular_expression = '';

    my %iub2character_class = (

    A => 'A',
    C => 'C',
    G => 'G',
    T => 'T',
    R => '[GA]',
    Y => '[CT]',
    M => '[AC]',
    K => '[GT]',
    S => '[GC]',
    W => '[AT]',
    B => '[CGT]',
    D => '[AGT]',
    H => '[ACT]',
    V => '[ACG]',
    N => '[ACGT]',
    );

    # Remove the ^ signs from the recognition sites
    $iub =~ s/\^//g;

    # Translate each character in the iub sequence
    for ( my $i = 0 ; $i < length($iub) ; ++$i ) {
        $regular_expression
          .= $iub2character_class{substr($iub, $i, 1)};
    }

    return $regular_expression;
}
sub match_positions {

    my($regexp, $sequence) = @_;

    use strict;

    #
    # Declare variables
    #

    my @positions = (  );

    #
    # Determine positions of regular expression matches
    #

    while ( $sequence =~ /$regexp/ig ) {

    push ( @positions, pos($sequence) - length($&) + 1);
    }

    return @positions;
}

#
# A subroutine to return a hash where
#    key   = restriction enzyme name
#    value = whitespace-separated recognition site and regular expression

sub parseREBASE {

    my($rebasefile) = @_;

    use strict;
    use warnings;

    # Declare variables
    my @rebasefile = (  );
    my %rebase_hash = (  );
    my $name;
    my $site;
    my $regexp;

    # Read in the REBASE file
    my $rebase_filehandle = open_file($rebasefile);

    while(<$rebase_filehandle>) {

    # Discard header lines
    ( 1 .. /Rich Roberts/ ) and next;

    # Discard blank lines
    /^\s*$/ and next;

    # Split the two (or three if includes parenthesized name) fields
    my @fields = split( " ", $_);

    # Get and store the name and the recognition site

    # Remove parenthesized names, for simplicity's sake,
    # by not saving the middle field, if any,
    # just the first and last
    $name = shift @fields;

    $site = pop @fields;

    # Translate the recognition sites to regular expressions
    $regexp = IUB_to_regexp($site);

    # Store the data into the hash
    $rebase_hash{$name} = "$site $regexp";
    }

    # Return the hash containing the reformatted REBASE data
    return %rebase_hash;
}



1;

file-system perl

2 Contributors
6 Replies
208 Views
1 Day Discussion Span
Latest Post 14 Years Ago Latest Post by Anthony Cameron

All 6 Replies

d5e5 109 Master Poster

14 Years Ago

I copied and ran your script but couldn't reproduce the error you got. The program kept prompting me with the message "Search for what restriction site for (or quit)?: " as long as I typed and entered some input. When I pressed enter with no input the program exited with no error. I created a dummy file called 'rebase.txt' but didn't know what to put in it.

That message you got, "use of initialized value $site in concatenation or string" probably means that the $site variable has no value assigned to it when some statement attempts to combine it with another string. But I don't know what data to enter to get your program to reproduce the error you are getting.

d5e5 109 Master Poster

14 Years Ago

Now I get the error.
Regarding this statement in the parseREBASE sub: my @fields = split( " ", $_); I don't understand why you split each line from the file on spaces because each non-blank line appears to contain only one sequence followed by end-of-line character but no spaces. For example, lines 10 through 15 of the file you attached look like this:

AanI
TTA!TAA
AarI
CACCTGCNNNN!
AasI
GACNNNN!NNGTC

... so why split on spaces?

My computer time is just about over for today but I'll try to have another look at this tomorrow.

Edited 14 Years Ago by d5e5 because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Anthony Cameron -2 Light Poster · Answer 1 · 2010-12-02T02:35:05+00:00

I copied and ran your script but couldn't reproduce the error you got. The program kept prompting me with the message "Search for what restriction site for (or quit)?: " as long as I typed and entered some input. When I pressed enter with no input the program exited with no error. I created a dummy file called 'rebase.txt' but didn't know what to put in it.
That message you got, "use of initialized value $site in concatenation or string" probably means that the $site variable has no value assigned to it when some statement attempts to combine it with another string. But I don't know what data to enter to get your program to reproduce the error you are getting.

Hi thank you for your help and attached is the information I was using as the txt file.

Anthony Cameron -2 Light Poster · Answer 2 · 2010-12-03T00:06:55+00:00

I see what you mean about the split on spaces. Do you think if I remove that then the program will work? Thanks

d5e5 109 Master Poster · Answer 3 · 2010-12-03T02:06:13+00:00

I see what you mean about the split on spaces. Do you think if I remove that then the program will work? Thanks

No, that sounds too optimistic. Usually when I debug a program I get rid of one error and then another one pops up.:(

I think the reason the $site variable has no value when you try to concatenate it with something else is that the parseREBASE subroutine expects to find both the name and the site on each line that it reads, but in the file you attached there is only one field on each line. After reading a line to get the name, the program should read the next line to get the site. I made a change to the parseREBASE sub to read the next line and assign it to $site so $site will not be uninitialized when it is used. Try replacing the parseREBASE sub with the following:

sub parseREBASE {

    my($rebasefile) = @_;

    use strict;
    use warnings;

    # Declare variables
    my @rebasefile = (  );
    my %rebase_hash = (  );
    my $name;
    my $site;
    my $regexp;

    # Read in the REBASE file
    my $rebase_filehandle = open_file($rebasefile);

    while(<$rebase_filehandle>) {

    # Discard header lines
    ( 1 .. /Rich Roberts/ ) and next;

    # Discard blank lines
    /^\s*$/ and next;
    #--------------------------Start of changes 2010-12-02 d5e5
    ##### The following commented-out code assumes there are two or three fields
    ##### in each line of the file you attached, but there is only one
    ##### field per line. You have to read two lines to get each name-site pair.
    ##### Split the two (or three if includes parenthesized name) fields
    ####my @fields = split( " ", $_);
    ####
    ##### Get and store the name and the recognition site
    ####
    ##### Remove parenthesized names, for simplicity's sake,
    ##### by not saving the middle field, if any,
    ##### just the first and last
    ####$name = shift @fields;
    ####
    ####$site = pop @fields;
    chomp;
    $name = $_;
    #Read next line from file to get value of site.
    $_ = <$rebase_filehandle>;
    chomp;
    $site = $_;
    #--------------------------End of changes 2010-12-02 d5e5
    
    # Translate the recognition sites to regular expressions
    $regexp = IUB_to_regexp($site);

    # Store the data into the hash
    $rebase_hash{$name} = "$site $regexp";
    }

    # Return the hash containing the reformatted REBASE data
    return %rebase_hash;
}

Anthony Cameron -2 Light Poster · Answer 4 · 2010-12-03T02:24:58+00:00

Anthony Cameron -2 Light Poster

14 Years Ago

That is brilliant thank you very much.

Parsing a Rebase file

Recommended Answers Collapse Answers

All 6 Replies

Recommended Answers