BeautifulSoup and accented words

Question

Huakalero 0 Newbie Poster

13 Years Ago

Hi, I am using beautiful soup to get data from a webpage. With help I was able to get a list of cities with correct accents.

Now am trying to get a list of movie theaters in a selected city but these come with no accents, but with weird characters.

Code:

from urllib2 import urlopen
from BeautifulSoup import BeautifulSoup

page = urlopen("http://www.cinepolis.com/_CARTELERA/cartelera.aspx?ic=2")
html = page.read()
soup = BeautifulSoup(html)
complejos = soup.findAll('span',{'class':'TitulosBlanco'})
compList = []
for comp in complejos:
  name = comp.contents[0]
  compList.append(name)
  print "Complejo %s agregado áé" % name

I get this

Complejo CinÃ©polis VIP GalerÃas Diana Acapulco agregado áé
Complejo CinÃ©polis GalerÃas Diana Acapulco agregado áé
Complejo CinÃ©polis Acapulco agregado áé
Complejo CinÃ©polis Acapulco Renacimiento agregado áé
Complejo CinÃ©polis La Isla agregado áé
Complejo CinÃ©polis Pie de la Cuesta agregado áé
Complejo CinÃ©polis Sendero Acapulco agregado áé

python

2 Contributors
2 Replies
882 Views
16 Hours Discussion Span
Latest Post 13 Years Ago Latest Post by Huakalero

Gribouillis 1,391 Programming Explorer

13 Years Ago

Didn't you forget the argument convertEntities=BeautifulSoup.HTML_ENTITIES in BeautifulSoup() ?

Edited 13 Years Ago by Gribouillis because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Huakalero 0 Newbie Poster · Answer 1 · 2011-06-17T20:41:58+00:00

Short version: I am using that argument and the massage.
This is a small code I wrote so people in the forum could copy and run it. But in my project code I have a method that returns the soup using the argument you just mentioned, and also the massage to correct the page.
It works for getting the cities, but in this case it doesn't.