Hey,

I have used the LWP:\:Simple module and saved the source of a website in a file. I am trying to extract all the data between the <head> tags and pass it to a variable to process.

So far I can't seem to extract the data properly. Any suggestions?

my $data = getstore("http://www.google.com/", "website.txt");
unless(is_success($data)){
	die "Could not retrive website: $data";
}
open(PAGE, "website.txt") or die "$!";
my @info = <PAGE>;
close(PAGE);
my @meta;
my $i = 0;
my $stuff;
foreach $stuff(@info){
	$meta[$i] = ($stuff =~ m/<head(.*?)</head>/);
	$i++;
}
$i = 0;
foreach $_ (@meta){
	#print $meta[$i];
	print $_;
}

Thanks

UPDATE:

I managed to get the data into one string and now I am trying to match with a regular expression.

I am having trouble with the regular expression.

my $d = ($s =~ m/<head>(.*)<\/head>/);

$s is the scalar with the whole string, I want to extract the head tag s from $s and assign to $d.

try:

my ($d) = $s =~ m/<head>(.*?)<\/head>/is;

Thanks that worked great.

What exactly does /is do? Also one more question if I try to extract the meta tags and place each individually in array will it work? Assuming that their maybe 1 or meta tags inside.

Something like this:
my (@m) = $s =~ m/<meta (.*?)>/is;

This is what I came up with, but it seems to be repeating the first match rather than check for the next meta tag.

my (@m) = $d =~ m/(<meta (.*?)>){1,5}/is;

my (@m) = $s =~ m/<meta (.*?)>/gis;

You can look up the regexp modifiers in any regexp tutorial.

i - case insentive matching
s - match as a single string so matches across newlines
g - global match, works like grep, finds all matches in a string/line

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.