If the following statements are true,
- All documents are served with the HTTP header
Content-Type: text/html; charset=UTF-8
. - All HTML attributes are enclosed in either single or double quotes.
- There are no
<script>
tags in the document.
are there any cases where htmlspecialchars($input, ENT_QUOTES, 'UTF-8')
(converting &
, "
, '
, <
, >
to the corresponding named HTML entities) is not enough to protect against cross-site scripting when generating HTML on a web server?
Best Answer
htmlspecialchars()
is enough to prevent document-creation-time HTML injection with the limitations you state (ie no injection into tag content/unquoted attribute).However there are other kinds of injection that can lead to XSS and:
this condition doesn't cover all cases of JS injection. You might for example have an event handler attribute (requires JS-escaping inside HTML-escaping):
or, even worse, a javascript: link (requires JS-escaping inside URL-escaping inside HTML-escaping):
It is usually best to avoid these constructs anyway, but especially when templating. Writing
<?php echo htmlspecialchars(urlencode(json_encode($something))) ?>
is quite tedious.And... injection issues can happen on the client-side as well (DOM XSS);
htmlspecialchars()
won't protect you against a piece of JavaScript writing toinnerHTML
(commonly.html()
in poor jQuery scripts) without explicit escaping.And... XSS has a wider range of causes than just injections. Other common causes are:
allowing the user to create links, without checking for known-good URL schemes (
javascript:
is the most well-known harmful scheme but there are more)deliberately allowing the user to create markup, either directly or through light-markup schemes (like bbcode which is invariably exploitable)
allowing the user to upload files (which can through various means be reinterpreted as HTML or XML)