I made a scraper for a web site, but I'm having problems runninf my code...
#!/usr/bin/env python
from bs4 import BeautifulSoup
import urllib2
import re
# Get the links...
html = urllib2.urlopen('http://www.blah.fi/asdf.html').read()
links = re.findall(r'''<a\s+.*?href=['"](.*?)['"].*?(?:</a|/)>''', html, re.I)
links_range = links[6:len(links)]
# Scrape and append the output...
f = open("test.html", "a")
for link in links_range:
html = urllib2.urlopen('http://www.blah.fi/' + link).read()
soup = BeautifulSoup(open(html))
content = soup.find(id="content")
f.write(content.encode('utf-8') + '<hr>')
f.close()
Here is the error...
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
IOError: [Errno 36] File name too long: '\xef\xbb\xbf<!DOCTYPE html PUBLIC "...
If I remove the 'for' loop and run a single instance of a page, it runs correctly.
What does the error mean?