PHP outputs a normal question mark instead of ‘ or ’

encodingparsingPHPxml

I'm currently parsing an RSS feed and subparsing the html in the description field in order to create a custom XML structure.

In the description field there are ‘ and ’ signs and PHP outputs them as regular question marks. How come?

I've tried different encodings like UTF-8 and iso-8859-1 but nothing works..

This is the xml I'm parsing http://www.ilovetechno.be/artists_rss.xml

This is how it should get parsed http://www.crowdsurferapp.com/clients/ilovetechno/

Best Answer

There is a predefined order in that the encoding of a XML document is to be determined:

  1. charset parameter in the HTTP header field Content-Type:

    Content-Type: application/xml; charset=<character encoding>
  2. encoding attribute in the XML declaration:

    <?xml version="1.0" encoding="<character encoding>"?>

If both are missing, the default character encoding (UTF-8 or UTF-16) is used.

So in order to parse the XML document with the proper encoding, you need to look for those information. Take a look at the question PHP: Detect encoding and make everything UTF-8 for a solution from me.

I also recommend you to use UTF-8 for your internal processing and as the output encoding since that is one of the default character encodings for XML.