Hi
I have the source of a webpage that tells the weather and i want to extract the data and my only hurdle left to jump is to remove all the formatting HTML marks inside and including the <> bracket. I have the web page source stored as a string so maybe the sting module. I dont know.
lllllIllIlllI 178 Veteran Poster
Recommended Answers
Jump to Postwell, you might consider the BeautifulSoup module.
link:
http://www.crummy.com/software/BeautifulSoup/
It has the capability to extract tags and values relatively easily.
Jeff
Jump to PostPython also has HTMLParser module that can help you muchly:
# extract a specified text from web page HTML source code import urllib2 import HTMLParser import cStringIO # acts like file in memory class HTML2Text(HTMLParser.HTMLParser): """ extract text from HTML code basically using inherited class HTMLParser and …
All 7 Replies
jrcagle 77 Practically a Master Poster
sneekula 969 Nearly a Posting Maven
lllllIllIlllI 178 Veteran Poster
jrcagle 77 Practically a Master Poster
lllllIllIlllI 178 Veteran Poster
lllllIllIlllI 178 Veteran Poster
bumsfeld 413 Nearly a Posting Virtuoso
Be a part of the DaniWeb community
We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.