If you use an appropriate class or library, they will do the escaping for you. Many XML issues are caused by string concatenation.
XML escape characters
There are only five:
" "
' '
< <
> >
& &
Escaping characters depends on where the special character is used.
The examples can be validated at the W3C Markup Validation Service.
Text
The safe way is to escape all five characters in text. However, the three characters "
, '
and >
needn't be escaped in text:
<?xml version="1.0"?>
<valid>"'></valid>
Attributes
The safe way is to escape all five characters in attributes. However, the >
character needn't be escaped in attributes:
<?xml version="1.0"?>
<valid attribute=">"/>
The '
character needn't be escaped in attributes if the quotes are "
:
<?xml version="1.0"?>
<valid attribute="'"/>
Likewise, the "
needn't be escaped in attributes if the quotes are '
:
<?xml version="1.0"?>
<valid attribute='"'/>
Comments
All five special characters must not be escaped in comments:
<?xml version="1.0"?>
<valid>
<!-- "'<>& -->
</valid>
CDATA
All five special characters must not be escaped in CDATA sections:
<?xml version="1.0"?>
<valid>
<![CDATA["'<>&]]>
</valid>
Processing instructions
All five special characters must not be escaped in XML processing instructions:
<?xml version="1.0"?>
<?process <"'&> ?>
<valid/>
XML vs. HTML
HTML has its own set of escape codes which cover a lot more characters.
The problem is that the output of the transform()
method of the XSLT processor is being serialised as a string when you access the output
property (either directly or indirectly), and Windows uses UTF-16 encoding for strings. The MSDN documentation of the output
property mentions this almost in passing at the foot of the page:
In this case, the output is always generated in the Unicode encoding, and the encoding attribute on the element is ignored.
(where they mean UTF-16 when they say "the Unicode encoding".)
If you use transformNodeToObject
, specifying a new DOMDocument
object as the output, then you can save the serialisation of the UTF-8 encoded content from that.
Better still for your case, if you have an object implementing the IStream
interface such as the stream associated with the file you're trying to save, you can pass that to transformNodeToObject
to send the UTF-8 output directly to disk. (I can't remember if you have to open and close the file manually in this case, so you'll have to experiment with that.)
Best Answer
You're making this too complicated. Just select the
name
attribute from the child nodes of allcom
nodes with an XPath expression:Use
//com/file/@name
if you need the expression to be more specific (in case there are other child nodes with aname
attribute.If you also want attributes from a parent node, you'll have to modify it like this: