Php thesql character set: storing html of international content

character encodingMySQLPHP

i'm completely confused by what i've read about character sets. I'm developing an interface to store french text formatted in html inside a mysql database.

What i understood was that the safe way to have all french special characters displayed properly would be to store them as utf8. so i've created a mysql database with utf8 specified for the database and each table.
I can see through phpmyadmin that the characters are stored exactly the way it is supposed to. But outputting these characters via php gives me erratic results: accented characters are replaced by meaningless characters. Why is that ?

do i have to utf8_encode or utf8_decode them? note: the html page character encodign is set to utf8.

more generally, what is the safe way to store this data? Should i combine htmlentities, addslashes, and utf8_encode when saving, and stripslashes,html_entity_decode and utf8_decode when i output?

Best Answer

MySQL performs character set conversions on the fly to something called the connection charset. You can specify this charset using the sql statement

SET NAMES utf8

or use a specific API function such as mysql_set_charset():

mysql_set_charset("utf8", $conn);

If this is done correctly there's no need to use functions such as utf8_encode() and utf8_decode().

You also have to make sure that the browser uses the same encoding. This is usually done using a simple header:

header('Content-type: text/html;charset=utf-8');

(Note that the charset is called utf-8 in the browser but utf8 in MySQL.)

In most cases the connection charset and web charset are the only things that you need to keep track of, so if it still doesn't work there's probably something else your doing wrong. Try experimenting with it a bit, it usually takes a while to fully understand.