Hello all,

I'm an information security professional who's decided to teach myself python, so for my first project I wanted to make something that I could actually find useful, so I've developed a small program for log file parsing and amazingly after some trial and error, it works :) However I've reached an impasse, and I'm not familiar enough yet with fancy output formats to figure it out, thus I come asking for advice:

Essentially what I have so far, is some code which prompts the user for the paths to two separate files, one being the intended log file, and the other a 'signature file', which is just a text file with each line being a regular expression of what might be a malicious log signature, and a commented out description of what the regular expression represents. The program then compiles the entire contents of the signature file, ignoring the commented portion, into one large regular expression; basically I just did a '|'.join(lines), and then compares the resulting piped regular expression against each line in the log file, increments a count if it gets a match, and then outputs each line to the screen, with a total count.

I thought it was pretty cool for a first program, but I'd like to take it a step further, as I'm finding that the program spewing out lines of a log file is rather ugly. So what I'd like to do, is write in some functionality where it still does the regular expression comparison, but instead of printing out each line, I'd like to output to the screen a column which displays the commented description, then in another column the number of times that particular expression was found. I've been playing around with things like str.find and str.rjust, but I'm kind of lost.

Not sure how to get it to count the matches of each individual regular expression, and not the count in total. Nor can I figure out how to get it to ignore the comment while doing the search, but to then print in a column the exact part I told it to ignore if it indeed finds a match. For example say I was comparing a log file for a web server against a signature file containing the regular expressions for known attack signatures. I'm trying to get it to look something like this:

--Signatures Found-- --Number of Matches--
Malicious Signature #1 999
Malicious Signature #2 999
Malicious Signature #3 999

I'm an information security professional who's decided to teach myself python

Yes a good choice.
I put together something that may in direction you describe.

import re

text = '''\
A fast and friendly dog.
email1@dot.net
myetest@online.no
My car is red.
han445@net.com
'''

find_email = re.findall(r'\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b',text,re.IGNORECASE)
print 'Text has %d email adress' % (len(find_email))
print find_email
for item in enumerate(find_email):
    print 'Email #%s --> %s' % (item[0],item[1])

'''Out-->
Text has 3 email adress
['email1@dot.net', 'myetest@online.no', 'han445@net.com']
Email #0 --> email1@dot.net
Email #1 --> myetest@online.no
Email #2 --> han445@net.com
'''
#Python 3.x
find_email = re.findall(r'\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b',text,re.IGNORECASE)
print ('Text has %d email adresser' % (len(find_email)))
print (find_email)
for item in enumerate(find_email):
    print ('Email #%s --> %s' % (item[0],item[1]))
commented: good thinking +10

Thanks for the help there, that's pretty much what I was looking for as far as output formatting is concerned. Now for the Regular Expression stuff. Does anyone know how to cause the program to use the list of regular expression I have in my signature file, while ignoring the commented portions, but then print to the screen the commented portion if it indeed finds a match for that regular expression. Basically my sig file looks like this:

<Scary Malicious Signature #1> ----------------------- #This is a scary sig
<Scary Malicious Signature #2> ------------------------# Also a scary sig.

BTW, the ----------------- isnt in my signature file, just using it to indicate space between columns in the signature file, as I couldnt figure out the proper forum formating.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.