Hi all,

I've started to get acquinted with Perl's HTML::Parser, which does a really good job at parsing html (duh ;-)
But for the life of me, I can't get it to return the actual text from subroutine! I keep getting references to it :\ :\

use HTML::Parser;
package MyParser;

sub start { 
    my ($self, $tagname, $attr, $attrseq, $origtext) = @_;
    if ($tagname eq 'img') {
        my $Title=$attr->{ title };
        # now this works well, and actually prints the value and not the reference!
        print "Image title found: ", $attr->{ title }, "\n";
        return($Title);
    } else {return();}
}  
my $parser = MyParser->new;
my $html="<img src='blah.jpg' title='hi!'>";
my $title;
$title= $parser -> parse($partial_html);
print "title is: $title\n"

# ------ OUTPUT -----
# Image title found: hi!
# title is: MyParser=HASH(0x9d2f40)

Can anyone explain to me what I am doing wrong? (Or how to correctly return values from the subroutine?)

Thanks a bunch! Blessed be those who answer! :)
-FH

hi,

Can anyone explain to me what I am doing wrong? (Or how to correctly return values from the subroutine?)

What you are doing wrong is that:

  1. You are not subclassing HTML::Parser, so if you run your code like you gave it what you are suppose to see is this:
    Can't locate object method "new" via package "MyParser" at ... line 13.
    Because, you don't have "new" in your class MyParser. However, if you had subclass or inherited from HTML::Parser, then you would have had access to it's "new" method.

  2. Using warnings and strict is good in perl programming. You would have been able to see that there was no variable called $partial_html with this message Global symbol "$partial_html" requires explicit package name at trash.pl line 19.
    Execution of trash.pl aborted due to compilation errors.

Lastly, reading the documentation, after using parse method from HTML::Parser says If an invoked event handler aborts parsing by calling $p->eof, then $p->parse() will return a FALSE value. Otherwise the return value is a reference to the parser object ($p).*

That is why you having

title is: MyParser=HASH(0x9d2f40)

a reference to the parser object.

Issue one, can be resolved by subclassing HTML::Parser and the last issue, use a global variable which you can get at.

The following should work:

use warnings;
use strict;

package MyParser;
use base qw(HTML::Parser);

my $title;

sub start {
    my ( $self, $tagname, $attr, $attrseq, $origtext ) = @_;
    if ( $tagname eq 'img' ) {
        $title = $attr->{title};

     # now this works well, and actually prints the value and not the reference!
        print "Image title found: ", $attr->{title}, "\n";
    }
}
my $parser = MyParser->new;
my $html   = "<img src='blah.jpg' title='hi!'>";

$parser->parse($html);
print "title is: $title\n"

Hi 2teez!

Of course you are correct. In order to provide a code snipett at short as possible I removed some stuff (like strict and warning). I did not notice that I also erased several important lines.

Thanks a bunch for all the corrections! Working with a global variable did the trick! :)

Best,
-FH

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.