How to strip the html tags using beautiful code?

Question

meensatwork 0 Newbie Poster

14 Years Ago

Hello,
Iam trying to get a table from the html page.I succeeded in getting the table values.But i have problem in getting the field names in the 1st column.

# Getting the field names.How to strip the 'b' and get only the names
paramKey = table.findAll('b')
print paramKey

The above code prints all the text and <b></b> tags also. I want to remove the tags and read only the text.I found a function named striphtml in net but that expects string as an argument and not accepting this soup object.I have provided the full code below. Can somebody advice me on this?

import urllib2
import BeautifulSoup
import re
from BeautifulSoup import *

def striphtml(data):
    p = re.compile(r'<.*?>')
    return p.sub('',data)
    #return p.sub('', data)

pageurl = "http://www.cholawealthdirect.com/Corporateinfo/CompSearch.aspx?id=KFR1&cocode=476"
page = urllib2.urlopen(pageurl)
soup = BeautifulSoup(page)

rowIndex = 0
colIndex = 0
table = soup.find('td', { "id" : "_ctl0_InnerTable" })
rows = table.findAll('tr')

for tr in rows:
    cols = tr.findAll('td')
    print "----Row No----",rowIndex 
    for td in cols:
        print "Column no",colIndex,cols[colIndex].string
        colIndex = colIndex + 1
    colIndex = 0
    rowIndex = rowIndex + 1

# Getting the field names.How to strip the 'b' and get only the names
paramKey = table.findAll('b')
print paramKey

html-css python

Edited 14 Years Ago by meensatwork because: n/a

2 Contributors
1 Reply
214 Views
10 Hours Discussion Span
Latest Post 14 Years Ago Latest Post by richieking

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

richieking 44 Master Poster · Answer 1 · 2011-01-20T18:42:29+00:00

To have the values inside the tages ... you do this
on line 31

print paramkey.text

Now using find all makes paramkey a list. Therefore you must iter. paramkey.

print([x.text for x in paramkey])

Hope you got the idea :)
show your love....