What is the difference between #PCDATA
and #CDATA
in DTD?
Xml – Difference between PCDATA and CDATA in DTD
dtdxml
Related Topic
- Xml – difference between XML Schema and DTD
- Xml – What does in XML mean
- Php – How to parse and process HTML/XML in PHP
- Xml – No grammar constraints (DTD or XML schema) detected for the document
- Xml – What’s the difference between text/xml vs application/xml for webservice response
- Xml – PCDATA vs CDATA in XML DTD
Best Answer
PCDATA
is text that will be parsed by a parser. Tags inside the text will be treated as markup and entities will be expanded.CDATA
is text that will not be parsed by a parser. Tags inside the text will not be treated as markup and entities will not be expanded.By default, everything is
PCDATA
. In the following example, ignoring the root,<bar>
will be parsed, and it'll have no content, but one child.When we want to specify that an element will only contain text, and no child elements, we use the keyword
PCDATA
, because this keyword specifies that the element must contain parsable character data – that is , any text except the characters less-than (<
) , greater-than (>
) , ampersand (&
), quote('
) and double quote ("
).In the next example,
<bar>
containsCDATA
. Its content will not be parsed and is thus<test>content!</test>
.There are several content models in SGML. The
#PCDATA
content model says that an element may contain plain text. The "parsed" part of it means that markup (including PIs, comments and SGML directives) in it is parsed instead of displayed as raw text. It also means that entity references are replaced.Another type of content model allowing plain text contents is
CDATA
. In XML, the element content model may not implicitly be set toCDATA
, but in SGML, it means that markup and entity references are ignored in the contents of the element. In attributes ofCDATA
type however, entity references are replaced.In XML,
#PCDATA
is the only plain text content model. You use it if you at all want to allow text contents in the element. TheCDATA
content model may be used explicitly through theCDATA
block markup in#PCDATA
, but element contents may not be defined asCDATA
per default.In a DTD, the type of an attribute that contains text must be
CDATA
. TheCDATA
keyword in an attribute declaration has a different meaning than theCDATA
section in an XML document. In aCDATA
section all characters are legal (including<
,>
,&
,'
and"
characters), except the]]>
end tag.#PCDATA
is not appropriate for the type of an attribute. It is used for the type of "leaf" text.#PCDATA
is prepended by a hash in the content model to distinguish this keyword from an element namedPCDATA
(which would be perfectly legal).