HTML Special Entities – How to Convert from Standard Input Stream in Linux

bashhtmllinuxscriptingsed

CentOS

Is there an easy way to convert HTML special entities from a data stream? I'm passing data to a bash script and sometimes that data includes special entities. For example:

"test" & test $test ! test @ # $ % ^ & *

I'm not sure why some characters show up fine and other don't but unfortunately, I don't have control over the data coming in.

I'm thinking I might be able to use SED here but that seems like it would be cumbersome and possibly prone to false positives. Is there a Linux command I could pipe to that specializes in decoding this type of data?

Best Answer

PHP is well suited to this. This example requires PHP 5:

cat file.html | php -R 'echo html_entity_decode($argn);'