hello, i want to parse a file that has the following log entries

1183245991.961 0.079 137.157.56.34 200 1277 GET http://linux.pacific.net.au/linux/packman/suse/10.2/repodata/repomd.xml "text/xml"
1183327698.250 2.568 137.157.56.212 200 57891 GET http://csc3-2004-crl.verisign.com/CSC3-2004.crl "application/pkix-crl"
1183328737.107 0.570 137.157.56.223 301 777 GET http://www.starnet.com/expiredialog/demo.php?product_name=xwin32&time_remain=1800&version=8.0.2216&locale=en_AU&w32iso639ulang=en&uuid={dfe167ce-f51c-4fd4-af80-25f31246f6bc} "text/html"
1183328737.908 0.781 137.157.56.139 200 5696 GET http://www.starnet.com/expiredialog/demo.php/xwin32/1800?version=8.0.2216&locale=en_AU&w32iso639ulang=en&uuid=%7bdfe167ce-f51c-4fd4-af80-25f31246f6bc%7d "text/html"
1183328738.777 0.759 137.157.56.91 200 3726 GET http://www.starnet.com/css/starnet.css "text/css"
1183679515.608 0.059 137.157.56.51 200 520 GET http://fdimages.fairfax.com.au/crtvs/bulletpoint.gif "image/gif"

Basically the area of concern would be to extract the FQDN of it. i only want to get the www. files only, however im also getting http:// images as well. an example correct output would be

http://www.carsales.com
http://www.sensis.com.cn
http://www.smh.com.au

but i am getting stuff like images.google.com as well.

below is the following code

@FQDN = split (/\./, $entry[6]);		# Grab the domain name (ie. aol.com, attbi.com)
            $lenght =1;
            $temp = $FQDN[$lenght]. "." .$FQDN[$lenght+1];
            $Domain{$temp}++;

please advise

Hello Watery!

Why not just use a simple regex to match the pattern you're looking for? This one works for me, but you might be able to simplify it even more:

open FILE, "sample.txt" or die $!;

while (<FILE>) {

   if ( $_ =~ /(http:\/\/www\..*?)\// ) {
      print $1 . "\n";
   }

}

close FILE;

With your sample text, my output looks like:


http://www.starnet.com
http://www.starnet.com
http://www.starnet.com


I hope this helps!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.