Xml – Preserving entity references when transforming XML with XSLT

xmlxsltxslt-2.0

How can I preserve entity references when transforming XML with XSLT (2.0)? With all of the processors I've tried, the entity gets resolved by default. I can use xsl:character-map to handle the character entities, but what about text entities?

For example, this XML:

<!DOCTYPE doc [
<!ENTITY so "stackoverflow">
<!ENTITY question "How can I preserve the entity reference when transforming with XSLT??">
]>
<doc>
  <text>Hello &so;!</text>
  <text>&question;</text>
</doc>

transformed with the following XSLT:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

produces the following output:

<doc>
   <text>Hello stackoverflow!</text>
   <text>How can I preserve the entity reference when transforming with XSLT??</text>
</doc>

The output should look like the input (minus the doctype declaration for now):

<doc>
  <text>Hello &so;!</text>
  <text>&question;</text>
</doc>

I'm hoping that I don't have to pre-process the input by replacing all ampersands with & (like &question;) and then post-process the output by replacing all & with &.

Maybe this is processor specific? I'm using Saxon 9.

Thanks!

Best Answer

If you know what entities will be used and how they are defined, you can do the following (quite primitive and error-prone, but still better than nothing):

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema"
 xmlns:my="my:my">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:character-map name="mapEntities">
  <xsl:output-character character="&amp;" string="&amp;"/>
 </xsl:character-map>

 <xsl:variable name="vEntities" select=
 "'stackoverflow',
 'How can I preserve the entity reference when transforming with XSLT\?\?'
 "/>

 <xsl:variable name="vReplacements" select=
 "'&amp;so;', '&amp;question;'"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="/">
  <xsl:text disable-output-escaping="yes"><![CDATA[<!DOCTYPE doc [ <!ENTITY so "stackoverflow">
<!ENTITY question
"How can I preserve the entity reference when transforming with XSLT??"> ]>
]]>
  </xsl:text>

  <xsl:apply-templates/>
 </xsl:template>

 <xsl:template match="text()">
  <xsl:value-of select=
  "my:multiReplace(.,
                   $vEntities,
                   $vReplacements,
                   count($vEntities)
                   )
  " disable-output-escaping="yes"/>
 </xsl:template>

 <xsl:function name="my:multiReplace">
  <xsl:param name="pText" as="xs:string"/>
  <xsl:param name="pEnts" as="xs:string*"/>
  <xsl:param name="pReps" as="xs:string*"/>
  <xsl:param name="pCount" as="xs:integer"/>

  <xsl:sequence select=
  "if($pCount > 0)
     then
      my:multiReplace(replace($pText,
                              $pEnts[1],
                              $pReps[1]
                              ),
                      subsequence($pEnts,2),
                      subsequence($pReps,2),
                      $pCount -1
                      )
      else
       $pText
  "/>
 </xsl:function>
</xsl:stylesheet>

when applied on the provided XML document:

<!DOCTYPE doc [ <!ENTITY so "stackoverflow">
<!ENTITY question
"How can I preserve the entity reference when transforming with XSLT??"> ]>
<doc>
    <text>Hello &so;!</text>
    <text>&question;</text>
</doc>

the wanted result is produced:

<!DOCTYPE doc [ <!ENTITY so "stackoverflow">
<!ENTITY question
"How can I preserve the entity reference when transforming with XSLT??"> ]>

  <doc>
      <text>Hello &so;!</text>
      <text>&question;</text>
</doc>

Do note:

The special (RegEx) characters in the replacements must be escaped.
We needed to resolve to DOE, which isn't recommended, because it violates the principles of the XSLT architecture and processing model -- in other words this solution is a nasty hack.

Related Solutions

Xml – Transforming XML mixed nodes with disable-output-escaping

If I understand you right, you want text nodes to come out as literal text (disable-output-escaping="yes"), but the rest of the transformation should work normally (<bold> to <b> etc.)

Template modes can help:

<xsl:stylesheet 
  version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
  <xsl:output method="xml" encoding="utf-8" omit-xml-declaration="yes" />

  <xsl:template match="paragraph">
    <p>
      <xsl:apply-templates mode="literal" />
    </p>
  </xsl:template>

  <!-- literal templates (invoked in literal mode) -->
  <xsl:template match="bold" mode="literal">
    <b><xsl:apply-templates mode="literal"/></b>
  </xsl:template>
  <xsl:template match="italic" mode="literal">
    <i><xsl:apply-templates mode="literal"/></i>
  </xsl:template>
  <xsl:template match="text()" mode="literal">
    <xsl:value-of select="." disable-output-escaping="yes" />
  </xsl:template>

  <!-- normal templates (invoked when you don't use a template mode) -->
  <xsl:template match="bold">
    <b><xsl:apply-templates /></b>
  </xsl:template>
  <xsl:template match="italic">
    <i><xsl:apply-templates /></i>
  </xsl:template>

</xsl:stylesheet>

Xml – XSLT applied to XML doc with xmlns attribute

Have you tried prefixing element names with the doc: namespace prefix in your select attributes?

<xsl:template match="doc:contents">
  <xsl:copy>
    <xsl:apply-templates select="@*"/>
    <xsl:apply-templates select="doc:contentitem">
      <xsl:sort select="doc:id" data-type="number"/>
    </xsl:apply-templates>
  </xsl:copy>
</xsl:template>

Best Answer

Related Solutions

Xml – Transforming XML mixed nodes with disable-output-escaping

Xml – XSLT applied to XML doc with xmlns attribute

Related Topic