Parsing arbitrary file.

Question

Enders_Game 13 Newbie Poster

14 Years Ago

So I'm trying to make a parser for a game replay.
I just need help getting started, I have docs for it explaining.

Atm I'm just trying to get the header parsed.

replay = open(r'C:\Users\CookieMonster\Desktop\mix.w3g', 'rb')

#offset | size | description
print(replay.read(0x1c)) #0x0000 | 28 chars  | zero terminated string "Warcraft III recorded game\0x1A\0
print(replay.read(4)) #0x001c |  1 dword  | fileoffset of first compressed data block (header size)
print(replay.read(4)) #0x0020 |  1 dword  | overall size of compressed file
print(replay.read(4)) #0x0024 |  1 dword  | replay header version:
print(replay.read(4)) #0x0028 |  1 dword  | overall size of decompressed data (excluding header)
print(replay.read(4)) #0x002c |  1 dword  | number of compressed data blocks in file

I get the output

b'Warcraft III recorded game\x1a\x00'
b'D\x00\x00\x00'
b'\xfby\x0b\x00'
b'\x01\x00\x00\x00'
b'\x9b8\x1c\x00'
b'\xe2\x00\x00\x00'

How do I parse these into something I can understand.
So for example the second dword (size of compressed file) how to change the binary into int.
I tried googling and got something suggesting int(replay.read(4), 16) but it doesnt work. ValueError: invalid literal for int() with base 16: b'\xfby\x0b\x00'
Later on I'll have to change hex into string so knowing how to do that would be useful too.
The bytes object docs didnt offer much help either.

Also why does reading 4 bytes someone give back 4 \x..\x..\x.. and sometimes only 3.

python

Edited 14 Years Ago by Enders_Game because: n/a

2 Contributors
1 Reply
358 Views
7 Hours Discussion Span
Latest Post 14 Years Ago Latest Post by griswolf

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

griswolf 304 Veteran Poster · Answer 1 · 2010-09-24T10:09:09+00:00

You want to look at the struct module which allows you to parse arbitrary sequence of binary types. You need to be aware of big-endian versus little-endian binary layout in the file.