Security – How Facebook Strips HTML/Apostrophes for XSS but Also Displays It

escapinghtmlSecuritytext processing

I'm not quite sure if this is a question for programmers.se rather than stackoverflow, but here goes. So Facebook [or any other large company] when given something like an apostrophe or html, can strip it of its malicious intent, but still display it properly. My current sanitizing function in PHP just strips those characters/makes them harmless via htmlentities() and such. So if I wrote an HTML tag, I would want it to be sanitized but also displayed on the website. How do I do this?

Best Answer

In general anything entered by a user (or untrusted machine or software using an API) needs to be escaped before embedding it within code (HTML, JavaScript, etc.) that is interpreted. Escaping is I think what you mean by "making it harmless". Most libraries have APIs (like htmlentities()) to facilitate this.

If you don't escape it, storing it is basically the only safe thing you can do. Analysis can be OK, as long as the analyzer cannot be commandeered by its input (i.e. it is robust and defensive and has no exploit).

Modifying input (e.g. stripping dangerous characters) can also be effective, but it is hard to do it in such a way that legitimate characters are suppressed (false positives). For example, if someone's name is John O'Malley-O'Hara, you don't want the system to remove the apostrophes (or the text between them), even though they look like single-quote delimiters common in code. In other words, it is so hard to make sure that input modification is done right that is it perhaps better to not do it at all.

I think the best approach is to treat all input carefully and escape it when displaying it. Some languages and frameworks can assist you with this (see "taint mode").

Related Topic