I'm using ASP to dynamically generate an XML file (for an RSS feed) that pulls information from a SQL DB. Users input the data into the DB using a UI that is a straightforward html form. However, there is a textarea field in that form in which most users seem to paste text from Microsoft word (or some other microsoft product that uses the ms word character set i.e. ms outlook).
My problem is that when this form submits to the database the ms characters are saved in the db as the microsoft character set. When the ASP GETs the information from the DB it seems that it pulls the same microsoft characters that are stored in the db out (as you would expect it would). However, the browser cannot interpret these characters (maybe they are not UTF-8 compatible?) and in place of these characters SOME browsers put a "unidentifiable character" character symbol and other browsers have the xml file (rss feed) break entirely.
I am using MS SQL and there is an option to set a field as xml which requires all information stored in the field to be xml compatible. However, this field does not perform any type of implicit conversion. It spits back an error message to the user on the web page which is unacceptable feedback.
Turning "Paste" off in the textarea is not an option. Users must still retain the ability to paste from something that uses an MS character set.
The input would look something like this:
(in ms character set obviously)
Hello. You're the biggest foo-bar I know.
The output would look something like this:
Hello� You�re the biggest foo�bar I know�
Does anybody have any ideas as to how to address this problem?
Is there some type of converting function in SQL that will convert the microsoft character set into something UTF-8 compatible?
Or is there some way server side to convert the information in this field.