Hi,
I have a HTML page in one variable.
I need build a mehod that will extract a tag content (dif extract_tag(self, tag_name)).
For example, given webpage:
<div id="mw-page-base" class="noprint"></div>
<div id="mw-head-base" class="noprint"></div>
<!-- content -->
<div id="content" class="mw-body">
<a id="top"></a>
<div id="mw-js-message" style="display:none;"></div>
<!-- sitenotice -->
<div id="siteNotice"><!-- centralNotice loads here --></div>
<!-- /sitenotice -->
<!-- firstHeading -->
<h1 id="firstHeading" class="firstHeading"><span dir="auto">Earth</span></h1>
</div>
and tag named: "content" the method should return
<a id="top"></a>
<div id="mw-js-message" style="display:none;"></div>
<!-- sitenotice -->
<div id="siteNotice"><!-- centralNotice loads here --></div>
<!-- /sitenotice -->
<!-- firstHeading -->
<h1 id="firstHeading" class="firstHeading"><span dir="auto">Earth</span></h1>
I want to do it using regex, but I'm not fimiliar Python.
Nativy, I think the pattern I'm looking for is "\<tag_name(.(\n)(\<div id.(\n)\<\/div>)).\<\/div>".
Note that another tag can be found inside given tag.
Is this statement good enoght? How do I use re.compile and re.match?
Thanks,
Net
BTW - I know it can be done using bs, but I prefer not to.