I am attempting to encode using a module called Beautiful Soup. All I need some direction on solving the problem. The encoding maps to <undefined>, so the unicode is not defined within the charmap.
The error I get is: UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-34: character maps to <undefined>
The sequence being encoded is: u'\u0411\u044a\u043b\u0433\u0430\u0440\u0441\u043a\u0438 \u043f\u0440\u0435\u0432\u043e\u0434 \u043d\u0430 \u0440\u0430\u0437\u0433\u043b\u0435\u0436\u0434\u0430\u0447\u0430 \u041c\u043e\u0437\u0438\u043b\u043b\u0430.'
The test should be: Български превод на разглеждача Мозилла.
The code being used is pieced below. Without an understanding of BeautifulSoup, it wouldn't make much sense. However, the above encoding error is where I need help:
#parses the long name for a project from index page
def parse_project_longname(html):
p=re.compile('>Name: <strong>.+?</strong>')
results=p.findall(html)
if(results):
name=results[0]
name=name[15:len(name)-9]
name=BeautifulSoup(name,convertEntities=BeautifulSoup.HTML_ENTITIES)
name=name.contents[0]
else:
name=None
return name
def test():
utils=FLOSSmoleutils('dbInfoTest.txt')
select='SELECT project_name, indexhtml FROM sv_project_indexes WHERE datasource_id=2'
utils.cursor.execute(select,)
results=utils.cursor.fetchall()
for result in results:
name=result[0]
html=result[1]
print("Name: "+name)
id=SavannahParsers.parse_project_longname(html)
print(id)
test()
Any help or direction would be appreciated. Thank you.