Hi everyone,
As a personal project I've decided to write a small script which will take a raw_input film title, then look up the IMDB rating and return the result. As an extra challenge I decided to employ re.
Now, this is how far I have got (yes, I am yet to wrap most things in functions, I will do this when i have ironed out the following problems):
from BeautifulSoup import BeautifulSoup
import urllib2
import re
#get source code of page (function used later)
def fetchsource(url):
url = urllib2.urlopen(url)
source = url.read()
return source
#ask for film title
title = raw_input("Please enter a film title: ")
#format the raw_input string for searching
raw_string = re.compile(' ') #search for a space in string
searchstring = raw_string.sub('+', title) #replace with +
print searchstring
#find the film page url
url = "http://www.imdb.com/find?s=" + searchstring
print url
source = fetchsource(url)
soup = BeautifulSoup(source)
filmlink = soup.find('a', href=re.compile("title\/tt[0-9]*\/"))
print filmlink
If you run this code, it prints the film string and the search url fine: the problem is that my regex for getting the url of the film page from the search results page never produces anything. So "filmlink" is always empty. I'm not really sure why I'm getting no value here.
Is my regex bad, or have I not put the right options in?
Also, I don't quite understand exactly what I am doing with re.compile() but it works! Could somebody possibly write an easy to understand sentence or two?
Many thanks for your help.