How to convert unicode codepoint to integer?

Question

bchin 0 Newbie Poster

15 Years Ago

Hi,
I have been searching high and low on google, and I cannot seem to figure out how to convert unicode to integers. Take the unicode codepoint, u'3001', for example. I know in utf-8, this is suppose to be ideographic comma. The hexadecimal representation is 0xE38081. I know if I convert 0xE38081 to an integer, it is suppose to be 14909569. 14909569 is the answer I want, but I cannot seem to figure out how to do this in python.

>>> unichr(0x3001)
u'\u3001'
>>> str(unichr(0x3001))
'\xe3\x80\x81'
>>> int('\xe3\x80\x81',16)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 16: '\xe3\x80\x81'
>>> int('0xe38081',16)
14909569
>>>

How come int() won't take the syntax \xE3\x80\x81? How can I strip or replace \x? the string functions? strip() and replace() do not work either. Is there another method that can deal with a unicode codepoint?

python

2 Contributors
2 Replies
3K Views
8 Hours Discussion Span
Latest Post 15 Years Ago Latest Post by bchin

TrustyTony 888 ex-Moderator

15 Years Ago

Here is my going around and finally I got there, but not maybe most elegant way as Python did not allow me to take ord from the individual bytes making utf8 letter. So I ended up manipulating the repr of that letter by string manipulation.

a=unichr(0x3001)
b=a.encode('utf8')
print b
c=repr(b)
print c
d= r"0x"+c.translate(None,r"\x'")
print d,int(d,16)

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

bchin 0 Newbie Poster · Answer 1 · 2010-05-01T02:23:58+00:00

bchin 0 Newbie Poster

15 Years Ago

thanx for the lp tonyjv!