How to Avoid Character Encoding Issues in a Web Application

character encodingutf-8

In previous web applications I've built, I've had issues with users entering exotic characters into forms which get stored strangely in the database, and sometimes appear different or double-encoded when retrieved from the database and displayed back in the browser. I'm starting a new project now, and I want to prevent these issues right from the start.

What I'm looking for is a checklist of things I can do to prevent character encoding issues such as these, no matter what users enter into forms. If I set my database tables to UTF-8, and set all of my web pages to assume content is UTF-8, is this enough? Will some characters still appear differently than the user entered them? Should I do some validation on the client side that doesn't let users enter in certain characters?

Best Answer

If I set my database tables to UTF-8, and set all of my web pages to assume content is UTF-8, is this enough?

You need to ensure that the connection between the web application and the database doesn't mangle the encoding (I believe you need to explicitly set this on the connection string for MySQL, for instance).

Basically you need to ensure that every step in the chain is using the same encoding.

Related Topic