How to convert 64-bit unicode

Question

linuxoidoz 0 Light Poster

14 Years Ago

Hi,

How can I convert a 64-bit unicode string into a text string? I'm converting ASCII characters for example like this

str = unichr(int('00A9', 16))

But how can I convert unicode 'U2082' or any other character beyong the ASCII range?

Thank you.

python

3 Contributors
5 Replies
199 Views
2 Days Discussion Span
Latest Post 14 Years Ago Latest Post by Gribouillis

TrustyTony 888 ex-Moderator

14 Years Ago

Unicode string has potentially very many letters that does not fit to ASCII range. utf8 however can encode them in variable length codes (Python 2.6, python 3 has many changes for the system)

a=u'asfasdfö'
b=a.encode('utf8')
print a
print b

However wikipedis says:

The Python language environment officially only uses UCS-2 internally since version 2.1, but the UTF-8 decoder to "Unicode" produces correct UTF-16. Python can be compiled to use UCS-4 (UTF-32) but this is commonly only done on Unix systems.

I found this code by googling:http://www.xml.com/cs/user/view/cs_msg/2915

As I say in the article: "if possible, use a Python install compiled to use UCS4 character storage." Micah Dubinko asked how to check whether your current Python build is such. The best test right now is to take advantage of one of the bugs present in UCS2 builds and not UCS4 builds. The test that Eric van der Vlist came up with, for example:

if len(u'\U00010800') == 1:
    print "UCS4"
else: #len is 2 in UCS2 builds
    print "UCS2"

Edited 14 Years Ago by TrustyTony because: n/a

Gribouillis 1,391 Programming Explorer

14 Years Ago

You can use the unidecode module available from here http://pypi.python.org/pypi . For example

>>> str = unichr(int('00A9', 16))
>>> str
u'\xa9'
>>> from unidecode import unidecode
>>> unidecode(str)
'(c)'

Also, you should not use 'str' as a variable name because it's the name of a builtin type.

Edited 14 Years Ago by Gribouillis because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

linuxoidoz 0 Light Poster · Answer 1 · 2010-05-25T05:54:09+00:00

The above code outputs "UCS2", does it mean my Python doesn't support 64-bit unicode? And I can't output any unicode strings other than ASCII?

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 2 · 2010-05-25T10:13:35+00:00

You can output normal unicode and UTF-8 etc, not only ASCII.
http://en.wikipedia.org/wiki/UTF-16/UCS-2

linuxoidoz 0 Light Poster · Answer 3 · 2010-05-26T03:21:37+00:00

linuxoidoz 0 Light Poster

14 Years Ago

ok, but how do I do that in Python? Thanks.