Xml – One xml namespace equals one and only one schema file

schemaxmlxsd

…or Why do these files validate in Visual Studio 2010 but not with xmllint1?

I'm currently working against a published xml schema where the original author's habit is to break down the schemas into several .xsd-files, but where some schema files have the same targetNamespace. Is this really "allowed"?

Example (extremely simplified):

File    targetNamespace    Contents
------------------------------------------------------------
b1.xsd  uri:tempuri.org:b  complex type "fooType"
b2.xsd  uri:tempuri.org:b  simple type "barType"

a.xsd   uri:tempuri.org:a  imports b1.xsd and b2.xsd
                           definition of root element "foo", that
                           extends "b:fooType" with an attribute
                           of "b:barType"

(Complete file contents below.)

Then I have an xml file, data.xml, with this content:

<?xml version="1.0"?>
<foo bar="1" xmlns="uri:tempuri.org:a" xmlns:xs="http://www.w3.org/2001/XMLSchema" />

For a long time, I have believed that all of this was correct, since Visual Studio apparently allows this schema style. However, today I decided to set up a command line utility for validating xml files, and I chose xmllint.

When I ran xmllint --schema a.xsd data.xml, I was presented with this warning:

a.xsd:4: element import: Schemas parser warning : Element '{http://www.w3.org/2001/XMLSchema}import':
Skipping import of schema located at 'b2.xsd' for the namespace 'uri:tempuri.org:b', since this
namespace was already imported with the schema located at 'b1.xsd'.

The fact that the import of b2.xsd was skipped obviously leads to this error:

a.xsd:9: element attribute: Schemas parser error : attribute decl. 'bar', attribute 'type':
The QName value '{uri:tempuri.org:b}barType' does not resolve to a(n) simple type definition.

If xmllint is correct, there would be an error in the published specs I'm working against. Is there? And Visual Studio would be wrong. Is it?

I do realize the difference between xs:import and xs:include. Right now, I just don't see how xs:include could fix things, since:

  • b1.xsd and b2.xsd have the same targetNamespace
  • they both differ in targetNamespace from a.xsd
  • and they do not (need to) know about each other

Is this a flaw in the original schema specification? I'm beginning to think that the third bullet point is crucial. Should the fact that they don't know about each other have led to placing them in different namespaces to begin with?


b1.xsd:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema targetNamespace="uri:tempuri.org:b" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:complexType name="fooType" />
</xs:schema>

b2.xsd:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema targetNamespace="uri:tempuri.org:b" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:simpleType name="barType">
    <xs:restriction base="xs:integer" />
  </xs:simpleType>
</xs:schema>

a.xsd:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema targetNamespace="uri:tempuri.org:a" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:b="uri:tempuri.org:b">
  <xs:import namespace="uri:tempuri.org:b" schemaLocation="b1.xsd" />
  <xs:import namespace="uri:tempuri.org:b" schemaLocation="b2.xsd" />
  <xs:element name="foo">
    <xs:complexType>
      <xs:complexContent>
        <xs:extension base="b:fooType">
          <xs:attribute name="bar" type="b:barType" />
        </xs:extension>
      </xs:complexContent>
    </xs:complexType>
  </xs:element>
</xs:schema>

Notes:

1) I'm using the Windows port of libxml2/xmllint found at www.zlatkovic.com.

Best Answer

The crux of the problem here is what does it mean when you have two different <import> elements, when both of them refer to the same namespace.

It helps to clarify the meaning when you consider that the schemaLocation attribute of <import> is entirely optional. When you leave it out, you're just saying "I want to import schema of namespace XYZ into this schema". The schemaLocation is just a hint as to where to find the definition of that other schema.

The precise meaning of <import> is a bit fuzzy when you read the W3C spec, possibly deliberately so. As a result, interpretations vary.

Some XML processors tolerate multiple <import> for the same namespace, and essentially amalgamate all of the schemaLocation into a single target.

Other processors are stricter, and decide that only one <import> per target namespace is valid. I think this is more correct, when you consider that schemaLocation is optional.

In addition to the VS and xmllint examples you gave, Xerces-J is also super-strict, and ignores subsequent <import> for the same target namespace, giving much the same error as xmllint does. XML Spy, on the other hand, is much more permissive (but then, XML Spy's validation is notoriously flaky)

To be safe, you should not have these multiple imports. A given namespace should have a single "master" document, which in turn has an <include> for each sub-document. This master is often highly artificial, acting only as a container. for these sub-documents.

From what I've seen, this generally consists of "best practise" for XML Schema when it comes to maximum tool compatibility, but some will argue that it's a hack that takes away from elegant schema design.

Meh.