Php – How to prevent Php’s DOMDocument from encoding html entities

anchorcreatetextnodedomdocumenthrefPHP

I have a function that replaces anchors' href attribute in a string using Php's DOMDocument. Here's a snippet:

$doc        = new DOMDocument('1.0', 'UTF-8');
$doc->loadHTML($text);
$anchors    = $doc->getElementsByTagName('a');

foreach($anchors as $a) {
    $a->setAttribute('href', 'http://google.com');
}

return $doc->saveHTML();

The problem is that loadHTML($text) surrounds the $text in doctype, html, body, etc. tags. I tried working around this by doing this instead of loadHTML():

$doc        = new DOMDocument('1.0', 'UTF-8');
$node       = $doc->createTextNode($text);
$doc->appendChild($node);
...

Unfortunately, this encodes all the entities (anchors included). Does anyone know how to turn this off? I've already thoroughly looked through the docs and tried hacking it, but can't figure it out.

Thanks! 🙂

Best Answer

$text is a translated string with place-holder anchor tags

If these place holders have a strict, well-defined format a simple preg_replace or preg_replace_callback might do the trick.
I do not suggest fiddling about html documents with regex in general, but for a small well-defined subset they are suitable.