Php – How to use PHP’s DOM extension loadHTML

PHP

It was suggested to me that in order to close some "dangling" HTML tags, I should use PHP's DOM extension and loadHTML.

I've been trying for a while, searching for tutorials, reading this page, trying various things, but can't seem to figure out how to use it to accomplish what I want.

I have this string: <div><p>The quick brown <a href="">fox jumps...

I need to write a function which closes the opened HTML tags.

Just looking for a starting point here. I can usually figure things out pretty quick.

Best Answer

Can be done with DOMDocument class within PHP using the DOMDocument::loadHTML() & DOMDocument::normalizeDocument() methods.

<?php
    $html = '<div><p>The quick brown <a href="">fox jumps';

    $DDoc = new DOMDocument();
    $DDoc->loadHTML($html);
    $DDoc->normalizeDocument();

    echo $DDoc->saveHTML();
?>

OutPuts:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> 
<html><body><div><p>The quick brown <a href="">fox jumps</a></p></div></body></html> 

From there, just substr & strpos away the html that you don't want, like so:

<?php
    $html = '<div><p>The quick brown <a href="">fox jumps';

    $DDoc = new DOMDocument();
    $DDoc->loadHTML($html);
    $DDoc->normalizeDocument();

    $html = $DDoc->saveHTML();

    # Remove Everything Before & Including The Opening HTML & Body Tags.
    $html = substr($html, strpos($html, '<html><body>') + 12);
    # Remove Everything After & Including The Closing HTML & Body Tags.
    $html = substr($html, 0, -14);

    echo $html;
?>