hello, i want to parse a file that has the following log entries
1183245991.961 0.079 137.157.56.34 200 1277 GET http://linux.pacific.net.au/linux/packman/suse/10.2/repodata/repomd.xml "text/xml"
1183327698.250 2.568 137.157.56.212 200 57891 GET http://csc3-2004-crl.verisign.com/CSC3-2004.crl "application/pkix-crl"
1183328737.107 0.570 137.157.56.223 301 777 GET http://www.starnet.com/expiredialog/demo.php?product_name=xwin32&time_remain=1800&version=8.0.2216&locale=en_AU&w32iso639ulang=en&uuid={dfe167ce-f51c-4fd4-af80-25f31246f6bc} "text/html"
1183328737.908 0.781 137.157.56.139 200 5696 GET http://www.starnet.com/expiredialog/demo.php/xwin32/1800?version=8.0.2216&locale=en_AU&w32iso639ulang=en&uuid=%7bdfe167ce-f51c-4fd4-af80-25f31246f6bc%7d "text/html"
1183328738.777 0.759 137.157.56.91 200 3726 GET http://www.starnet.com/css/starnet.css "text/css"
1183679515.608 0.059 137.157.56.51 200 520 GET http://fdimages.fairfax.com.au/crtvs/bulletpoint.gif "image/gif"
Basically the area of concern would be to extract the FQDN of it. i only want to get the www. files only, however im also getting http:// images as well. an example correct output would be
http://www.carsales.com
http://www.sensis.com.cn
http://www.smh.com.au
but i am getting stuff like images.google.com as well.
below is the following code
@FQDN = split (/\./, $entry[6]); # Grab the domain name (ie. aol.com, attbi.com)
$lenght =1;
$temp = $FQDN[$lenght]. "." .$FQDN[$lenght+1];
$Domain{$temp}++;
please advise