Hello,
I begin in Python, and I have the following problem: I retrieve an excerpt of a HTML webpage from the web, and then want the result to be hold in a variable (before being processed by a reg-exp).
The function do get the HTML source, but when I assign the function to the variable t_main_page, the interpretor tells the variable is a None type.
Here is the code:
#/usr/bin/env/ python
# Script to fetch and parse the specific web page of PPI for Manufactured Goods
# on http://www.stats.gov.cn/english/ .
import urllib.request, re
from html.parser import HTMLParser
def fetch_main_page():
"""
Open the web page and retrieve the HTML code.
Returns: string UTF-8
"""
main_page = ''
try:
main_page = urllib.request.urlopen("http://www.stats.gov.cn/english/").read(20000).decode('gb2312')
except (UnicodeDecodeError, urllib.error.URLError) as e:
fetch_main_page()
else:
return main_page
t_main_page = fetch_main_page()
print(t_main_page)
"""
relevant_links = re.findall('<a href=(.*?)>PPI of Main Manufactured Goods.*?</a>', t_main_page)
for link in relevant_links:
print(link)
"""
Can someone tell me how to put the string returned by a function in a variable callable by the regexp ?
Thanks ! :)