Python and the JPEG Image File, Part 1, The Header
Intro
The JPEG image file format (.jpg) is very popular on the internet because you can pack a lot of picture information into a relatively small file. There are competing file formats like GIF and PNG. GIF is rather limited to the number of colors (8 bit = 256) compared to JPEG (24 bit = 16,777,216). JPEG can typically achieve 10:1 to 20:1 compression without visible loss. It allows you to specify a quality setting (1 - 100) with higher quality giving less compression.
JPEG header
Sharp edges in images like borders and embedded text give JPEG a hard time, that's why you can insert a text comment directly into the JPEG file header. As the name implies, the header is the first part of the JPEG file, followed by the compressed picture. In this part of the tutorial we take a look at the header of the a typical JPEG file and extract some information it contains. To make things simple, I have attached a sample JPEG image file that contains a 80x80 blue square at a resolution of 200dpi. This file also contains a text comment that can be extracted. Blue is the favorite color of vegaseat.
# print out the hex bytes of a jpeg file, find end of header, image size, and extract any text comment
# (JPEG = Joint Photographic Experts Group)
# tested with Python24 vegaseat 21sep2005
try:
# the sample jpeg file is an "all blue 80x80 200dpi image, saved at a quality of 90"
# with the quoted comment added
imageFile = 'Blue80x80x200C.JPG'
data = open(imageFile, "rb").read()
except IOError:
print "Image file %s not found" % imageFile
raise SystemExit
# initialize empty list
hexList = []
for ch in data:
# make a hex byte
byt = "%02X" % ord(ch)
hexList.append(byt)
#print hexList # test
print
print "hex dump of a 80x80 200dpi all blue jpeg file:"
print "(the first two bytes FF and D8 mark a jpeg file)"
print "(index 6,7,8,9 spells out the subtype JFIF)"
k = 0
for byt in hexList:
# add spacer every 8 bytes
if k % 8 == 0:
print " ",
# new line every 16 bytes
if k % 16 == 0:
print
print byt,
k += 1
print
print "-"*50
# the header goes from FF D8 to the first FF C4 marker
for k in range(len(hexList)-1):
if hexList[k] == 'FF' and hexList[k+1] == 'C4':
print "end of header at index %d (%s)" % (k, hex(k))
break
# find pixel width and height of image
# located at offset 5,6 (highbyte,lowbyte) and 7,8 after FF C0 or FF C2 marker
for k in range(len(hexList)-1):
if hexList[k] == 'FF' and (hexList[k+1] == 'C0' or hexList[k+1] == 'C2'):
#print k, hex(k) # test
height = int(hexList[k+5],16)*256 + int(hexList[k+6],16)
width = int(hexList[k+7],16)*256 + int(hexList[k+8],16)
print "width = %d height = %d pixels" % (width, height)
# find any comment inserted into the jpeg file
# marker is FF FE followed by the highbyte/lowbyte of comment length, then comment text
comment = ""
for k in range(len(hexList)-1):
if hexList[k] == 'FF' and hexList[k+1] == 'FE':
#print k, hex(k) # test
length = int(hexList[k+2],16)*256 + int(hexList[k+3],16)
#print length # test
for m in range(length-3):
comment = comment + chr(int(hexList[k + m + 4],16))
#print chr(int(hexList[k + m + 4],16)), # test
#print hexList[k + m + 4], # test
if len(comment) > 0:
print comment
else:
print "No comment"