Sooo ... I learned during my schema migration that our webpages use UTF-8 encoding but our MySQL database (all InnoDB tables) uses latin1. After finally running some utf8_decode() php magic, I was finally able to successfully get all posts to display correctly again.
However, now that I realize there's an issue, I want to do something about it moving forward. I'm not so sure it's a good idea to continue to be throwing UTF-8 encoded characters into a latin1 column in the database.
While we are an English-speaking site, a lot of users will sometimes use non-English characters within their posts/usernames, not to mention we have a lot of curly quotes that were copy/pasted from Microsoft Word.
We use the CodeIgniter PHP framework, and one of its functions is called ascii_to_entities(). I could run all user-submitted data through this function, and essentially it would be converted to HTML entities (i.e. © is the copyright symbol) where it can be stored nicely in both UTF-8 and latin1 databases and also display nicely on webpages. However, is this really something that I want to do? Is this the better alternative than converting MySQL to UTF-8?
If I were to go ahead and convert MySQL to UTF-8, would it correctly fix all of the UTF-8 characters currently stored in the latin1 database? (They look like gibberish when browsing through phpMyAdmin, for example, although they render just fine in the UTF-8 website.)
Being the SEO that I am, I'm also concerned about the search engine repercussions of my decision. Suppose a staff writer writes an article which makes heavy use of MS Word's "smart quotes". When I View the HTML source of that webpage, either I'm going to see curly quotes or I'm going to see a whole bunch of HTML entities. Neither are standard English characters. Which does googlebot prefer? ;)
Trying to make an educated decision about which is the best route moving forward.