Confounding Regular Expression Problem

Question

muppetjones 0 Newbie Poster

16 Years Ago

I'm still kinda new at this, so please bear with me, but I have tried debugging the program (perl -d) and using Carp as well. I just can't seem to isolate the problem.

I'm using regex to search through a series of sequences read in from BioPerl (this is not the issue). The program will read in the sequences without an issue, and with a smaller amount of test sequences there is no issue.

I'm getting a panic: malloc thrown during the middle of the regex expression, but I can't figure it out for the life of me. I know what the perl manual says it means, but I'm not seeing what is causing that in my program.

Any help would be greatly appreciated!!

This is the command line output when the error is thrown:

[06:52:41 ~/symmetry 191] $ perl symmetryfinder.pl

Files will be saved to <sf_exons/2008-07-07_18-52-43/>
Creating Regex...
        done.
Getting sequences from <exons>...
        done.
Loooping through sequences...
panic: malloc at symmetryfinder.pl line 183.
Searching through sequence 0[06:52:58 ~/symmetry 192] $

This is the code segment where the error occurs:
The regular expression is called on line 21

#---------------------------------------
#-- Get Sequences from File ------------ 
#---------------------------------------
print 'Getting sequences from <'.$filename.">...\n";
@sequences = file2seqs($filename);
print "\tdone.\n";

#$sequences[0] = 'AGGCCTAAACTGAAATAGTTTAG';
#$sequences[1] = 'CTAAACTTG';

print "\n".$sequences[0]->seq()."\n";
print "\n".$sequences[1]->seq()."\n";

#---------------------------------------
#-- Read Sequences for Symmetry --------
#---------------------------------------
print "Loooping through sequences...\n\t";
foreach my $seq (@sequences) { # sequences stored in $_
    
    print "\rSearching through sequence ".$tempcounter++;
    @matches = ($seq->seq() =~ /$regex/g) or carp();

} print "\n";

This is my regular expression:

$regex = qr/
    ##### Find the sequence
    ([agct]) # initial nucleotide
    (?=([agct]{5}
        )
     )
    
    ##### Add sequence to mer list and increment
    (?{
        # Clear our buffers
        $temp = '';
        @swaptemp = ();
        $swapstring = '';

        # Create mer and mer compliment
        $temp = $1.$2;

        @swaptemp = split('',$temp); # split our seq into an array
        while(@swaptemp) { # loop through the array
            $swapstring .= $swap{ pop(@swaptemp) }; # symmetry swap
        }
        
        # Check for previous existence
        if(!exists $merlist{$temp}) {
            # Declare & Initialize
            $merlist{$temp} = 0;
            $merlist{$swapstring} = 0;
         
            # Save swaps (for faster search later)
            $merswap{$temp} = $swapstring;
            $merswap{$swapstring} = $temp;
        }

        # increment our mer count
        $merlist{$temp}++;
    })

    /xi;

perl regex

2 Contributors
4 Replies
139 Views
23 Hours Discussion Span
Latest Post 16 Years Ago Latest Post by KevinADC

All 4 Replies

KevinADC 192 Practically a Posting Shark

16 Years Ago

It appears you are just running out of memory. Where the error occurs is here:

symmetryfinder.pl line 183

Your regexp does not appear to be valid. I don't think you can run code inside a regexp like that. Maybe it is something new with perl 6 I am not aware of.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

muppetjones 0 Newbie Poster · Answer 1 · 2008-07-08T20:09:10+00:00

It appears you are just running out of memory. Where the error occurs is here:
symmetryfinder.pl line 183
Your regexp does not appear to be valid. I don't think you can run code inside a regexp like that. Maybe it is something new with perl 6 I am not aware of.

The problem is that 183 is just the regex, which doesn't debug correctly (it doesn't display the code in the line-by-line step through), so I can't tell where the problem is.

Memory is the direct issue -- whether I'm running out is the problem. The error panic: malloc simply means the program is calling a negative address. Why and where the negative address is called is my issue.

When run with the following lines instead of from a file, it works just fine. So, I'm not quite sure whether it's BioPerl or something I'm doing.

#$sequences[0] = 'AGGCCTAAACTGAAATAGTTTAG';
#$sequences[1] = 'CTAAACTTG';

As far as the regex is concerned, it is perfectly valid to have code inside. The modifier [inline]/x[/inline] allows for whitespace inside the regex (much like /i or /g ). With whitespace, you may add code through the following method (?{ <code> }) or (??{ <code> }) (The latter is evaluated at runtime and inserted into the regex, whereas the former simply evaluates the code at runtime).

Any other help would be greatly appreciated. I'm going to keep trucking away at it.

muppetjones 0 Newbie Poster · Answer 2 · 2008-07-08T22:17:00+00:00

So I've further isolated the problem: I believe the split is causing the trouble, and I assume it's because of the regex nature of split .

Line 6 causes problems for some reason -- all it's doing is clearing $temp to make sure nothing is in there.
Line 14 causes a panic: malloc even when line 12 is used over line 11, but line 15 does not cause a problem either way.
Lines 16-18 cause no problem so long as valid variables are involved.

So there you have it -- it appears you cannot use a split inside a regex expression regardless of position inside proper code tags. Does anyone know why this might be? Or perhaps a work around?

I think I'll try just capturing each character individually and swapping that, but if anyone has a solution to the problem, I'd appreciate it!

$regex = qr/
    ([AGCT])
    (?=([AGCT]{5}))

    (?{
        #$temp = '';
        @swaptemp = ();
        $swapstring = '';
        
        # Create mer and mer compliment
        $temp = $1.$2;
        #$temp = 'AGCTAG';
        
        @swaptemp = split("",$temp); # split our seq into an array
#        @swaptemp = qw( A G T C C C );
        while(@swaptemp) { # loop through the array
            $swapstring .= $swap{ pop(@swaptemp) } ; # symmetry swap
        }

.....

/xi;

KevinADC 192 Practically a Posting Shark · Answer 3 · 2008-07-08T23:03:38+00:00

my bad about the code in the regexp. That was something new for me. I looked it up and sure enough:

A bit of magic: executing Perl code in a regular expression
Normally, regexps are a part of Perl expressions. Code evaluation expressions turn that around by allowing arbitrary Perl code to be a part of a regexp. A code evaluation expression is denoted (?{code}), with code a string of Perl statements.
Be warned that this feature is considered experimental, and may be changed without notice.

Note the last sentence in the above quote from perldoc. The feature is considered experimental, which basically means use it at your own risk and it may not even work properly. Quote taken from perl 5.10 and the perl regexp tutorial posted on the perldoc website.

One change I would recomend is to change this ineffcient loop:

while(@swaptemp) { # loop through the array
   $swapstring .= $swap{ pop(@swaptemp) } ; # symmetry swap
}

this should work better:

for (reverse @swaptemp) { # loop through the array backwards
   $swapstring .=  $swap{ $_} ; # symmetry swap
}

or maybe:

for ( @swaptemp) { # loop through the array
   $swapstring .=  $swap{ $_}# symmetry swap
}
$swapstring = reverse $swapstring; # reverse the string

As far as the other problem(s), I really have no idea. Ask on the BioPerl website or maybe someone on www.perlmonks.com will be able to help.

Confounding Regular Expression Problem

Recommended Answers Collapse Answers

All 4 Replies

Recommended Answers