Hello,
Iam trying to get a table from the html page.I succeeded in getting the table values.But i have problem in getting the field names in the 1st column.
# Getting the field names.How to strip the 'b' and get only the names
paramKey = table.findAll('b')
print paramKey
The above code prints all the text and <b></b> tags also. I want to remove the tags and read only the text.I found a function named striphtml in net but that expects string as an argument and not accepting this soup object.I have provided the full code below. Can somebody advice me on this?
import urllib2
import BeautifulSoup
import re
from BeautifulSoup import *
def striphtml(data):
p = re.compile(r'<.*?>')
return p.sub('',data)
#return p.sub('', data)
pageurl = "http://www.cholawealthdirect.com/Corporateinfo/CompSearch.aspx?id=KFR1&cocode=476"
page = urllib2.urlopen(pageurl)
soup = BeautifulSoup(page)
rowIndex = 0
colIndex = 0
table = soup.find('td', { "id" : "_ctl0_InnerTable" })
rows = table.findAll('tr')
for tr in rows:
cols = tr.findAll('td')
print "----Row No----",rowIndex
for td in cols:
print "Column no",colIndex,cols[colIndex].string
colIndex = colIndex + 1
colIndex = 0
rowIndex = rowIndex + 1
# Getting the field names.How to strip the 'b' and get only the names
paramKey = table.findAll('b')
print paramKey