Newbie alert - numerical value for unicode characters

Question

Dorson8009 0 Newbie Poster

14 Years Ago

I've searched all over for an answer to this, including this forum, so sorry if I missed something,
anyway, I'd like to get a numerical code from extended characters like ß or ü and so on.
I don't use them very much myself, as I'm a native English language user! But they pop up enough that I should be able to support them if they arise.
I have found some information on long chars but I didn't manage to find a resource I could understand enough to actually use.
i.e.

char c;
	int i;
	c = 'h';
	i = c;
	std::cout << i << "\n";

i is now equal to 104, the standard ascii number.
How can I consistently get the same number from one of the extended characters, and convert back again if needed?

c++

Edited 14 Years Ago by Dorson8009 because: n/a

3 Contributors
6 Replies
167 Views
1 Day Discussion Span
Latest Post 14 Years Ago Latest Post by Dorson8009

Narue 5,707 Bad Cop

14 Years Ago

If this is a serious project (as opposed to something for learning Unicode), I'd suggest ICU. Managing Unicode is a bitch without a good library.

Edited 14 Years Ago by Narue because: n/a

Narue 5,707 Bad Cop

14 Years Ago

ICU is enormously complex because Unicode is enormously complex.

when all I need is a number to character and back again conversion?

Let's assume you want to do it manually. You'd need to support at least UTF-8, UTF-16 (including surrogates), and UTF-32. The process is different for converting each of those into a code point. Now, in all honesty that's not especially difficult. It's more difficult than calling a library function, but straightforward, in my opinion.

The hard part comes when you realize that you're probably not just converting a character to a code point, you're likely introducing general Unicode support including I/O and comparisons, which opens up a can of worms like normalization (and normalization is stupidly complex if you're thinking about doing it manually).

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster · Answer 1 · 2011-01-15T05:52:26+00:00

If you hury up you can correect the code tags.

[code=cplusplus]

Notice no spaces and its cplusplus not c++

Are you talking about converting UNICODE wchar_t* to char*? Here is a thread that shows one way to do it.

Dorson8009 0 Newbie Poster · Answer 2 · 2011-01-15T06:10:06+00:00

Hi, thanks for your speedy answers! I'm having a look at ICU. I'll mark the thread "solved" in a day or so just in case any other good ideas turn up.

Dorson8009 0 Newbie Poster · Answer 3 · 2011-01-15T06:22:06+00:00

Hi again, ICU seems enormously complex, is it overkill when all I need is a number to character and back again conversion? Or is this the only way?

Dorson8009 0 Newbie Poster · Answer 4 · 2011-01-16T01:23:21+00:00

OK, thanks people for your insight, I appreciate it!