Hi,
I have been searching high and low on google, and I cannot seem to figure out how to convert unicode to integers. Take the unicode codepoint, u'3001', for example. I know in utf-8, this is suppose to be ideographic comma. The hexadecimal representation is 0xE38081. I know if I convert 0xE38081 to an integer, it is suppose to be 14909569. 14909569 is the answer I want, but I cannot seem to figure out how to do this in python.
>>> unichr(0x3001)
u'\u3001'
>>> str(unichr(0x3001))
'\xe3\x80\x81'
>>> int('\xe3\x80\x81',16)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 16: '\xe3\x80\x81'
>>> int('0xe38081',16)
14909569
>>>
How come int() won't take the syntax \xE3\x80\x81? How can I strip or replace \x? the string functions? strip() and replace() do not work either. Is there another method that can deal with a unicode codepoint?