Hello,

I am a training developer in .NET and web 2.0 technologies and this one is an exercise from one of our first lessons in PHP: accessing a MySQL database from a PHP script.

The problem I've encountered is that while database is thought to be in Unicode (i.e. the values contain accented characters, as this exercise is in French), the PHP script result in garbage characters (actually just one, a enlongated rectangle).

The exercise consists of a PHP script and a MySQL database with two tables where the script queries all entry data from one column in just one table (so no joins necessary).

The MySQL table is called "Restaurant" - well, actually this side isn't that much important yet. (Which means I'll come back to that later.)

The PHP script is

<?php
   mysql_connect("localhost","root","");
   mysql_select_db("Restaurant");
   $myQuery = "SELECT LibPlat FROM Plats";
   $rs = mysql_query($myQuery);
   while ($row = mysql_fetch_array($rs)) {
        echo ($row['LibPlats']."<br />\n");
   }
?>

My setup consists of Eclipse with PDT and Zend Debugger (latest version, i.e. 3.5.1 and just updated). I use Zend Server CE with its MySQL install (phpMyAdmin).

Basically the script does just what I've been asking it to do, except that the French accents (like in thé or café) don't appear (there's this rectangle instead).

Shawn Olson (or rather Andrew Penry on Olson's website) says that storing data in UTF-8 is not enough on MySQL: "[y]ou must also tell MySQL that the data you are passing into it is UTF-8 otherwise it will assume it is in its default encoding."

His code looks like this:

<?php
   $DB = new mysqli('localhost', 'user', 'root', 'dbname');
   $DB->query("SET NAMES 'utf8'");
   if (!empty($_POST['ta'])) {
        $DB->query("UPDATE document SET unicodeText='{$_POST['ta']}' WHERE ID=1");
   }
   $result = $DB->query("SELECT unicodeText FROM document WHERE ID=1");
   $return = $result->fetch_object();
   $result->close();
?>

I just have a little bit of a problem decyphering his code. I'm pretty sure that I can ignore his mysqli line. And $DB->query(""); looks pretty much the same way like my two-liner $myQuery = ""; mysql_query($myQuery); but then I run into some code I can't my hands nor tails of.

Does anyone know how to adapt Andrew's code snippet for my exercise? Or does anyone have a better idea of how to achieve that goal, and, as Andrew wrote, configuring the server may not always be an option, yet he hints that his code should eventually do the same thing as if I would set the server up to use unicode as the default character encoding.

Thank you!

Source:
Unicode Data with PHP 5 and MySQL 4.1

Hey.

Basically the script does just what I've been asking it to do, except that the French accents (like in thé or café) don't appear (there's this rectangle instead).

Do you set the charset for the resulting page?
By default, most browsers use the ISO charset, so if you plan on printing Unicode characters into it, you must specify that in the request.
Try adding this to the top of the page and see what happens.

header('content-type: text/html; charset=utf8');

You can also swap charsets in your browser, just to check it out.
If you use Firefox you do: View->Character Encoding->[new encoding]
If you use Internet Explorer... get Firefox! IE is the bane of web development :)

I am sorry that I didn't mention it.

The PHP is embedded in an HTML page. Doctype and everything is declared and the resulting HTML checks out as it should as XHTML 1.0 Strict on the W3C validator. Already when the PHP parser writes the HTML code the accented characters come out garbled, so this isn't the reason. Thanks anyway! Any help is appreciated (and if only to rule out some possible error sources)!

Shawn Olson (or rather Andrew Penry on Olson's website) says that storing data in UTF-8 is not enough on MySQL: "[y]ou must also tell MySQL that the data you are passing into it is UTF-8 otherwise it will assume it is in its default encoding."

Well, I don't know what they are on about there, but I have never come across a situation where I explicitly needed to specify a charset when sending a query, and I use UTF-8 exclusively in my PHP apps.
Not beyond specifying a charset in the CREATE TABLE commands, anyways.

You might want to check this manual entry out tho: 9.1.4. Connection Character Sets and Collations

I just have a little bit of a problem decyphering his code. I'm pretty sure that I can ignore his mysqli line. And $DB->query(""); looks pretty much the same way like my two-liner $myQuery = ""; mysql_query($myQuery); but then I run into some code I can't my hands nor tails of.

See if this makes more sense:

<?php
    // Open a new MySQL connection
    $DB = new mysqli('localhost', 'user', 'root', 'dbname');
    
    // Execute a query that sets the charset to be used in the following queries.
    $DB->query("SET NAMES 'utf8'");
    
    // Checks if a POST value named 'ta' was passed or not.
    if (!empty($_POST['ta'])) {
        // Update a row in the 'document' table using the passed data.
        $DB->query("UPDATE document SET unicodeText='{$_POST['ta']}' WHERE ID=1");
    }
    // Fetch a row from the 'document', the same row we just updated.
    $result = $DB->query("SELECT unicodeText FROM document WHERE ID=1");
    
    // Assign the result from the previous query to the $return variable.
    // I assume it will be used later in the code. Otherwise this line is pointless.
    $return = $result->fetch_object();
    
    // Close the result set from the previous query.
    $result->close();
?>

I am sorry that I didn't mention it.

The PHP is embedded in an HTML page. Doctype and everything is declared and the resulting HTML checks out as it should as XHTML 1.0 Strict on the W3C validator. Already when the PHP parser writes the HTML code the accented characters come out garbled, so this isn't the reason. Thanks anyway! Any help is appreciated (and if only to rule out some possible error sources)!

Ahh ok.

Is the MySQL table created to use UTF-8?
If you are not sure, try using the SHOW CREATE TABLE command. It should provide a DEFAULT CHARSET clause or two.

And how do you insert the data into MySQL?
You mention using phpMyAdmin. I've had problems inserting Unicode data via phpMyAdmin in the past. It seemed to only want to send ISO chars, even into a table clearly specified as UTF8.

Try applying the utf8_encode function to your data before printing it:

echo utf8_encode($row['LibPlats']) ."<br />\n";

Thank you - both ways work.

Using a query to set the charset for a query:

<?php
	mysql_connect("localhost", "root", "");
	mysql_select_db("restaurant");
	$myQuery = "SET NAMES 'utf8'";
	mysql_query($myQuery);
	
	if (!empty($_POST['ta'])) {
		mysql_query("UPDATE Plats SET LibPlats='{$_POST['ta']}'");
	}
	$myQuery = "SELECT LibPlats, PrixPlats FROM Plats";
	$rs = mysql_query($myQuery);
	while ($row = mysql_fetch_array($rs)) {
		print ($row['LibPlats']."<br />\n&euro;".$row['PrixPlats']."<br />\n");
	}
?>

And setting the charset for print/echo:

<?php
	mysql_connect("localhost", "root", "");
	mysql_select_db("restaurant");
	$myQuery = "SELECT LibPlats, PrixPlats FROM Plats";
	$rs = mysql_query($myQuery);
	while ($row = mysql_fetch_array($rs)) {
		print utf8_encode($row['LibPlats']."<br />\n&euro;".$row['PrixPlats']."<br />\n");
	}
?>

What I don't understand yet is the if-condition, especially with the $_POST array. Why 'ta'? (Of course, this lack of understanding doesn't prevent me from using the code!)

And: setting the charset for print seems the quicker way, so what may be the drawback of its use over setting the charset for the query?

What I don't understand yet is the if-condition, especially with the $_POST array. Why 'ta'?

The $_POST array represents the data sent via a HTML <form>, using the POST method. So the data in $_POST['ta'] is just the data in whatever form element was named "ta".

The if statement checks to see if there is any data in that element, and if there is, updates the database with that value.

And: setting the charset for print seems the quicker way, so what may be the drawback of its use over setting the charset for the query?

I'm not sure, to be honest. I've never seen a setup that required the charset to be explicitly set like that.
The obvious downside is that by using the utf_encode function, you need to encode each field individually. By setting it for the queries you can just set it and forget it.

I still suspect there is something wrong with the charset configuration in your MySQL server. If the tables are correctly set up to use UTF-8, PHP should be getting the data UTF-8 encoded.
I recommend you check your CREATE TABLE statements. See what charset you are actually using on them.

Okay, I verified, recreated, exported, re-recreated my tables so everything is in UTF-8 character set:

-- phpMyAdmin SQL Dump
-- version 3.2.0
-- http://www.phpmyadmin.net
--
-- Host: localhost
-- Generation Time: Nov 05, 2009 at 09:01 PM
-- Server version: 5.1.35
-- PHP Version: 5.3.0

SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO";

--
-- Database: `restaurant`
--
CREATE DATABASE `restaurant` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
USE `restaurant`;

-- --------------------------------------------------------

--
-- Table structure for table `categorie`
--

CREATE TABLE IF NOT EXISTS `categorie` (
  `IdCategorie` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
  `LibCategorie` varchar(255) NOT NULL,
  `OrdreCategorie` tinyint(3) unsigned DEFAULT NULL,
  PRIMARY KEY (`IdCategorie`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=5;

--
-- Dumping data for table `categorie`
--

INSERT INTO `categorie` (`IdCategorie`, `LibCategorie`, `OrdreCategorie`) VALUES
(1, 'Entrées', 1),
(2, 'Plats', 2),
(3, 'Desserts', 3),
(4, 'Boissons', NULL);

-- --------------------------------------------------------

--
-- Table structure for table `plats`
--

CREATE TABLE IF NOT EXISTS `plats` (
  `IdPlats` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
  `LibPlats` varchar(255) NOT NULL,
  `PrixPlats` decimal(5,2) DEFAULT NULL,
  `IdCategorie` smallint(6) NOT NULL,
  PRIMARY KEY (`IdPlats`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=15 ;

--
-- Dumping data for table `plats`
--

INSERT INTO `plats` (`IdPlats`, `LibPlats`, `PrixPlats`, `IdCategorie`) VALUES
(1, 'soup chinois', '2.99', 1),
(2, 'soup russe', '2.49', 1),
(3, 'bienenstich', '2.79', 3),
(4, 'steak haché', '6.99', 2),
(5, 'gratin dauphin', '6.39', 2),
(6, 'mohnpielen', '2.75', 3),
(7, 'canard aigre-doux', '7.29', 2),
(8, 'coca zéro', '1.59', 4),
(9, 'coca light', '1.59', 4),
(10, 'coca cola', '1.79', 4),
(11, 'perri-air', '18.99', 4),
(12, 'thé', '2.19', 4),
(13, 'café', '3.39', 4),
(14, 'eau plat', '1.29', 4);

Yet the PHP code still produces the rectangles instead of the correct character set.

So I think that in the case of unicode (UTF-8) characters, the queries have to be declared UTF-8. (Yeah, it makes sense to declare the charset encoding as early and globally as possible.)

So for the time being all that's need is:

<?php
	mysql_connect("localhost", "root", "");
	mysql_select_db("restaurant");
	$myQuery = "SET NAMES 'utf8'";
	mysql_query($myQuery);
	
	$myQuery = "SELECT LibPlats, PrixPlats FROM Plats";
	$rs = mysql_query($myQuery);
	while ($row = mysql_fetch_array($rs)) {
		print ($row['LibPlats']."<br />\n&euro;".$row['PrixPlats']."<br />\n");
	}
?>

Should the if-condition testing $_POST's contents be necessary at some later point, I'll ask again.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.