I have a code which parses/validates all the fields present in i/p weblog file.
My first field is ip_address & currently can have a value like 12.45.24.245
Now I have a change where ip_address can be a dummy value something like $10.00 or $23.123.34. or $12.233.
How should I change my regular expression so as to handle both the values ?
Code:#! /usr/bin/perl -w
use strict;
while (<DATA>) {
$_ =~ m|^
(\d+\.\d+\.\d+\.\d+)? # capture clientip
\s # followed by space
([\w-]+)\s # caputre '-' or their membership id
\[(\d{1,2}/\w{3}/\d{4}:\d{2}:\d{2}:\d{2}) # then the date
\s\+\d{4}\]\s" # the ' +0100] "' ready for the method on the next line
(\w{3,4})\s # ermm, the method
(\/.*?)\s # The request
(\w{4}\/\d\.\d)"\s # the protocol
(\d{3})\s([\d-]+?)\s" # status & content length
(.+?)"\s" # referer
(.*?)"\s" # useragent will need post processing
(.+?)" # All cookie string, will need post processing
|x;
my $cookies = cookieStringCleaner($11);
my ($persistant, $session);
foreach my $loopvar (@$cookies) {
if ($loopvar =~ /^eBizDAn/i) {
$persistant = $loopvar;
}
elsif ($loopvar =~ /^eBizCo/i) {
$session = $loopvar;
}
}
print "\n\n\nLINE: $.\nIP: $1\nMEMBER: $2\nDATE: $3\nMETHOD: $4
\nREQUEST: $5\nPROTOCOL: $6\nSTATUS: $7\nCONLEN: $8\nREFERER:$9\nAGENT:
$10\nCOOKIE Persist: $persistant\nCOOKIE Session: $session";
}
#
# SUBROUTINES
#
sub cookieStringCleaner() {
my $cookieString = shift;
# clean up the data a bit, remove spaces and '-'
# the '-' is an error by (other language) random num generator.
# taking it out will make lookups easier as they will just be a number
$cookieString =~ tr/ //d;
$cookieString =~ tr/-//d;
my @cookies = split(/;/, $cookieString);
return \@cookies;
}
I tried replacing (\d+.\d+.\d+.\d+)? with something like (\$.*|\d+.\d+.\d+.\d+)? but this gives me 2 extra places in case of $10.00 so it returns me the value "$10.00 - -"
Please suggest. Thanks in advance.