hey guys,
i have a problem.. i have this chunk of text like this.....

No.     Time        Source                Destination           Protocol Info
      2 0.005318    192.168.110.33        192.168.110.44        ICMP     Echo (ping) request

Frame 2 (98 bytes on wire, 98 bytes captured)
Ethernet II, Src: Cisco-Li_4d:e1:30 (00:1c:10:4d:e1:30), Dst: IntelCor_4d:77:83 (00:13:02:4d:77:83)
Internet Protocol, Src: 192.168.110.33 (192.168.110.33), Dst: 192.168.110.44 (192.168.110.44)
Internet Control Message Protocol

No.     Time        Source                Destination           Protocol Info
      3 0.998730    192.168.110.33        192.168.110.44        DHCP     DHCP Offer    - Transaction ID 0x9e0e832

Frame 3 (347 bytes on wire, 347 bytes captured)
Ethernet II, Src: Cisco-Li_4d:e1:30 (00:1c:10:4d:e1:30), Dst: IntelCor_4d:77:83 (00:13:02:4d:77:83)
Internet Protocol, Src: 192.168.110.33 (192.168.110.33), Dst: 192.168.110.44 (192.168.110.44)
User Datagram Protocol, Src Port: bootps (67), Dst Port: bootpc (68)
Bootstrap Protocol

No.     Time        Source                Destination           Protocol Info
      4 0.998917    0.0.0.0               255.255.255.255       DHCP     DHCP Request  - Transaction ID 0x9e0e832

Frame 4 (348 bytes on wire, 348 bytes captured)
Ethernet II, Src: IntelCor_4d:77:83 (00:13:02:4d:77:83), Dst: Broadcast (ff:ff:ff:ff:ff:ff)
Internet Protocol, Src: 0.0.0.0 (0.0.0.0), Dst: 255.255.255.255 (255.255.255.255)
User Datagram Protocol, Src Port: bootpc (68), Dst Port: bootps (67)
Bootstrap Protocol

basically i'm trying to extract the DHCP server and the the IP address its giving me... sp i want to look for the text between "DHCP Offer" and "bootstrap" and then from that chunk get the part between "Internet Protocol) Src:" and "("

temp = re.findall("DHCP\s+Offer(.)Bootstrap", text)
    print (temp)
    name=re.findall("(Internet Protocol)\sSrc:.[(]", temp)
    print name

but i think its not reading past the 1st line between the srearch words

Thanks! that helps.. i also found another solution that is a bit simpler...
i do something like this

def findDHCP(filename):
    f = open(filename, "r")
    text = f.read()
    word = re.findall("(.*)Offer",text)
    splitter = re.compile('[\s]+')
    for n in word:
       temp = n;
       s = splitter.split(temp)
       print ("The DHCP Server IP is :"+s[3])
       print ("The IP allocated to me is :"+s[4])

but the problem isthis reads only till the end of that particular line.. now i want to read a line that is a constant number of lines below what i read through the above code. But the problem is the 2nd line i want to read doesnt have any siignificant indexes or anything to look for.... Is there someway I can do this?

Basically i want to read a block of text spread through a few lines between a KNOWN start point and END point (i've bolded the 2 points) and the Src Port and the Dest Port (in red) are what i want to extract

No.     Time        Source                Destination           Protocol Info
    140 3.050240    137.132.69.169        172.19.134.182        FTP      [B]Response: 150[/B] Opening BINARY mode data connection for /pub/ubuntu/dists/dapper-security/Contents-i386.gz (3376313 bytes).

Frame 140 (179 bytes on wire, 179 bytes captured)
Ethernet II, Src: Ditech_55:38:00 (00:d0:02:55:38:00), Dst: Usi_ac:fe:1e (00:16:41:ac:fe:1e)
Internet Protocol, Src: 137.132.69.169 (137.132.69.169), Dst: 172.19.134.182 (172.19.134.182)
Transmission Control Protocol, Src Port: ftp (21), Dst Port: 59785 (59785), Seq: 85, Ack: 178, Len: 113
File Transfer Protocol (FTP)

No.     Time        Source                Destination           Protocol Info
    141 3.051247    137.132.69.169        172.19.134.182        FTP-DATA FTP Data: 1368 bytes

Frame 141 (1434 bytes on wire, 1434 bytes captured)
Ethernet II, Src: Ditech_55:38:00 (00:d0:02:55:38:00), Dst: Usi_ac:fe:1e (00:16:41:ac:fe:1e)
Internet Protocol, Src: 137.132.69.169 (137.132.69.169), Dst: 172.19.134.182 (172.19.134.182)
Transmission Control Protocol, Src Port: 50003 (50003), Dst Port: 48115 (48115), Seq: 1, Ack: 1, Len: 1368
[B]FTP Data[/B]

Here is a code that finds the Src Port and Dst Port in your second example.
When you write regular expressions, always use raw strings prefixed with 'r', like r"my regex" . Also, the character '.' in a regex matches any character but the newline. If you want that '.' matches newline, you must use re.compile(r"my regex", re.DOTALL)

#!/usr/bin/env python

datafile = "chunk2.txt"

import re

head_re = re.compile(r"No[.]\s+Time\s+Source\s+Destination\s+Protocol Info\s+(\d+)")

def packets(filename):
    """Split the file in packets. Yield pairs (number, packet_content)"""
    text = open(filename).read()
    i = None
    num = None
    for match in head_re.finditer(text):
        if i is not None:
            yield (num, text[i:match.start()])
        i = match.start()
        num = int(match.group(1))
    if i is not None:
        yield (num, text[i:])

ports_re = re.compile(r"Src Port:([^,]*?),\s*?Dst Port:([^,]*)(?:,|$)")

def ports(packet):
    """Search the src port and dst port in a packet"""
    for match in ports_re.finditer(packet):
        yield (match.group(1), match.group(2))

if __name__ == "__main__":
    for num, p in packets(datafile):
        print num, list(ports(p))

... but you've made some mistakes with regex flags and pattern details. Also, your data contains only one DHCP Offer and one DHCP Request. If you need both IP's, you can use this:

>>> print re.findall(r'DHCP\s+.*?Internet Protocol,\s+Src:\s*(.+?)\s*\(.*?Bootstrap', text, re.M|re.S)
['192.168.110.33', '0.0.0.0']

To get offer and request IP's individually, use:

>>> print re.findall(r'DHCP Offer\s+.*?Internet Protocol,\s+Src:\s*(.+?)\s*\(.*?Bootstrap', text, re.M|re.S)
['192.168.110.33']
>>> print re.findall(r'DHCP Request\s+.*?Internet Protocol,\s+Src:\s*(.+?)\s*\(.*?Bootstrap', text, re.M|re.S)
['0.0.0.0']

hey guys!
I have another question kinda relating to my initial question and data packet structure.
I want to extract the packets where the protocol is either TCP or HTTP. How can i do that?
Right now i have this RE that I use, but it extracts ALL the info (which I want) from all the packets.

datalines = re.findall("Protocol Info[\s]+(.*\s*.*)(.*\s*.*)(.*\s*.*)(.*\s*.*)(.*\s*.*)",text)

Thanks!

datalines = re.findall("Protocol Info(.*)[HTTP,TCP](.*)(.*\s*.*)(.*\s*.*)(.*\s*.*)(.*\s*.*)(.*\s*.*)",text)

THis is what i have right now, but its not giving me any results at all...
I basically want the to select all lines (and their respective 5 succeeding lines) nased on if the first line says either TCP or HTTP.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.