I have a .txt log file and must of it is crap. But there are parts that display when a user logs in, and at what time the logged in. Below is a portion of the log file. For example, "user1" is a user logging in and "user2" is another user logging in. So far I have created a python app that counts how many times a user logged in and when the logged in, and I have also counted how many users have logged in for the day, and the top three users.
However, I have not been able to figure out how to see how many users logged in during a three hour time frame. Like lets say from 12:00 to 15:00 and 15:00 to 18:00. I tried to some stuff but it really didn't work.
Example of what the .txt log file looks like:
<IP SNIPPED> - user1 [01/Feb/2008:04:32:12 -0500] "GET /controller?method=getUser HTTP/1.0" 200 305
<IP SNIPPED> - - [01/Feb/2008:04:57:38 -0500] "HEAD /images/DCI.gif HTTP/1.1" 200 -
<IP SNIPPED> - - [01/Feb/2008:04:57:38 -0500] "HEAD /eagent.jnlp HTTP/1.1" 200 -
<IP SNIPPED>- - [01/Feb/2008:04:57:38 -0500] "HEAD /jh.jnlp HTTP/1.1" 200 -
<IP SNIPPED> - - [01/Feb/2008:04:57:38 -0500] "HEAD /smack.jar HTTP/1.1" 200 -
<IP SNIPPED> - - [01/Feb/2008:04:57:38 -0500] "HEAD /jh.jar HTTP/1.1" 200 -
<IP SNIPPED> - - [01/Feb/2008:04:57:39 -0500] "HEAD /images/DCI.gif HTTP/1.1" 200 -
<IP SNIPPED>- - noone [01/Feb/2008:04:57:40 -0500] "GET /controller?method=getNode&name=S14000068 HTTP/1.0" 200 499
<IP SNIPPED> - - [01/Feb/2008:04:57:40 -0500] "GET /help/helpset.hs HTTP/1.1" 200 547
<IP SNIPPED> - - [01/Feb/2008:04:57:43 -0500] "GET /help/map.jhm HTTP/1.1" 200 59650
<IP SNIPPED> - user2 [01/Feb/2008:00:19:16 -0500] "GET /controller?method=getUser HTTP/1.0" 200 307
Here is what I have done so far.
import re
import time
fn = 'localhost_access_log.2008-02-01.txt'
#Using regular expressions to get user name from text file
pattLog = re.compile(r'([a-zA-Z0-9]+) \[(.+)\]')
fileList = open(fn).readlines()
logdict = {}
for item in fileList:
m = pattLog.search(item)
if m:
logdict.setdefault(m.group(1), []).append(m.group(2))
# Count how many times the user logged in and times they logged in
for key in logdict:
n = len(logdict[key])
print 'User %s logged in %d time%s:\n%s\n' % \
(key, n, ['','s'][n > 1 or 0], '\n'.join(logdict[key]))
#Find the top three most logged in users
freqList = [[len(logdict[key]), key] for key in logdict]
freqList.sort(reverse=True)
print 'The three most frequent users that logged in are: %s.' % (freqList[1:4])
#Count how many users logged in today
count = 0
for key in logdict:
count += 1
#print it out to the screen
print '%s users logged in today' % (count)
#This is where i try to find how many users logged during a ceertain time frame
d1 = '01/Feb/2008:04:57:40 -0500'
d2 = '01/Feb/2008:15:57:40 -0500'
def time_comp(upper, lower, d):
# upper and lower format %H:%M:%S
tu = time.strptime(upper, '%H:%M:%S')
tl = time.strptime(lower, '%H:%M:%S')
# parse d
# example string: '01/Feb/2008:04:57:40 -0500'
tm = time.strptime(d.split()[0].split(':',1)[1], '%H:%M:%S')
if tl <= tm <= tu:
return True
return False
print time_comp('16:00:00', '10:00:00', d1)
print time_comp('16:00:00', '10:00:00', d2)
if time_comp('16:00:00', '10:00:00', d1):
print 'User logged in during the target time.'
else:
print 'Out of range'
if time_comp('16:00:00', '10:00:00', d2):
print 'User logged in during the target time.'
else:
print 'Out of range'