I'm trying to learn the very basics of HTML parsing in python. Through these forums I learned what a parser is.
"
Parsing often means "perform syntax analysis" on a program or a text. It means check if a text obeys given grammar rules and extract the corresponding information. For example, suppose that you define the rule that the structure of a question in english is auxiliary verb + subject + main verb + rest
. Then the output of the statement parse("Are they playing football?")
could be a hierarchy of tuples, or other objects, like this
("question",
("auxiliary verb", "are"),
("subject", "they"),
("verb", "playing"),
("rest", "football"),
)
Programs and compilers handle such trees more easily than raw text." (thanks for that explanation Gribouillis)
So what would the output be if I fed this data-
<html>
<body>
<h1>My First Heading</h1>
<p>My first paragraph.</p>
</body>
</html>
to the python html parser?
(i.e from html.parser import HTMLParser)