from Other simpleTypes

Advanced Topics

In this section, we take a look at some important features available in XML Schema specifications that were not illustrated by the simple example we picked.

Defining New Types

In previous sections, we looked at how a built-in type could be used as a base type to create new, more restrictive types using facets. For example, the employee_id simpleType is a restriction of the built-in int type. The schema provides other ways to create new types.

Creating New simpleTypes from Other simpleTypes

A simpleType can be used as a base type to create a new, more restrictive type. For example, Flute Bank may assign employee IDs from 1 through 10,000 for U.S.-based employees and 10,001 through 100,000 for non-U.S.-based employees. In such a case, we can create two new types, using the employee_id simple type as a base (instead of a built-in type):

<xsd:element name="employee_id">
    <xsd:simpleType>
    <xsd:restriction base="xsd:int">
        <xsd:minInclusive value="1"/>
        <xsd:maxInclusive value="100000"/>
    </xsd:restriction>
    </xsd:simpleType>
</xsd:element>

<xsd:element name="us_employee_id">
    <xsd:simpleType>
    <xsd:restriction base="employee_id">
        <xsd:minInclusive value="1"/>
        <xsd:maxInclusive value="10000"/>
    </xsd:restriction>
    </xsd:simpleType>
</xsd:element>
<xsd:element name="non_us_employee_id">
    <xsd:simpleType>
    <xsd:restriction base="employee_id">
        <xsd:minInclusive value="10001"/>
        <xsd:maxInclusive value="100000"/>
    </xsd:restriction>
    </xsd:simpleType>
</xsd:element>

Given Flute Bank's numbering scheme for employee IDs, it is clear that no employee_id should be less than 1. It makes sense then to restrict new derived types (such as us_employee_id) from breaking this rule. This is achieved by fixing the facet value of minInclusive for the employee_id simpleType:

<xsd:element name="employee_id">
    <xsd:simpleType>
    <xsd:restriction base="xsd:int">
        <xsd:minInclusive value="1" fixed="true"/>
        <xsd:maxInclusive value="100000"/>
    </xsd:restriction>
    </xsd:simpleType>
</xsd:element>

By adding the fixed attribute and setting its value to true, a new simple type cannot be created that changes the minInclusiveValue from 1.

Deriving Types by Extension

We define a Flute employee as below:

<xsd:element name="employee">
    <xsd:complexType>
        <xsd:sequence>
            <xsd:element ref="employee_id" minOccurs="1" maxOccurs="1"/>
            <xsd:element ref="name" minOccurs="1" maxOccurs="1"/>
            <xsd:element ref="extn" minOccurs="1" maxOccurs="1"/>
            <xsd:element ref="dept" minOccurs="1" maxOccurs="1"/>
            <xsd:element ref="email" minOccurs="1" maxOccurs="1"/>
        </xsd:sequence>
    </xsd:complexType>
</xsd:element>

It is possible to create a new employee type with additional elements by extending the above complexType. (i.e., derive new complex types by extension). To continue with the example of a Flute employee, assume that a principal-level employee at Flute has two additional subelements: a seniority level (level 1, level 2, etc.) and a specialization (J2EE, security, J2ME, Web services, etc.). A new type called principalEmployee can be created, with the new elements added to the base employee:

<xsd:complexType name="principalEmployee">
    <xsd:complexContent>
       <xsd: extension base="employee">
            <xsd:sequence>
                <xsd:element name="level" type="xsd:int"/>
                <xsd:element name="specialization" type="xsd:string"/>
            </xsd:sequence>
        </xsd:extension>
    </xsd:complexContent>
</xsd:complexType>

In the above example, the content model of the Flute employee type is extended by adding new complexContent. The extension base is the employee element to which two new subelements are added.

Deriving Types by Restriction

Another way to derive a type is by restriction-that is, derive a new type by restricting what the base type represents. Consider this employee definition:

<xsd:element name="employee">
    <xsd:complexType>
        <xsd:sequence>
            <xsd:element ref="employee_id" minOccurs="1" maxOccurs="1"/>
            <xsd:element ref="name" minOccurs="0" maxOccurs="1"/>
            <xsd:element ref="extn" maxOccurs="unbounded"/>
            <xsd:element ref="dept" minOccurs="1" maxOccurs="1"/>
            <xsd:element ref="email" minOccurs="0" maxOccurs="1"/>
        </xsd:sequence>
    </xsd:complexType>
</xsd:element>

In the above example, an employee may have any number of phone extensions (maxOccurs= unbounded) and may not have any email address (minOccurs=0). It is possible to derive a new employee type by restricting the new type to a maximum of two phone extensions and no email address value:

<xsd:complexType name="restrictedEmployee">
    <xsd:complexContent>
       <xsd: restriction base="employee">
            <xsd:sequence>
                <xsd:element ref="employee_id" minOccurs="1" maxOccurs="1"/>
                <xsd:element ref="name" minOccurs="0" maxOccurs="1"/>
                <xsd:element ref="extn" maxOccurs="2"/>
                <xsd:element ref="dept" minOccurs="1" maxOccurs="1"/>

            </xsd:sequence>
        </xsd:restriction>
    </xsd:complexContent>
</xsd:complexType>\

In the restricted employee type, we have repeated all subelements that are unchanged (this is a requirement-all subelements must be retyped). The extn element is more restrictive, in that we have changed the maxOccurs value from unbounded to 2. We have also removed the email subelement from the definition of the restricted type.

To restrict the derivation of new types from a base type, the final attribute must be specified:

<xsd:complexType name="employee" final=#all|restriction|extension>

If final=#all, restriction and extension are prohibited, if final=restriction, derivation by extension is possible; restriction is not.

Unique Values

In the employeeList XML instance, it is possible to list the same employee more than once, because the employeeList schema does not restrict this. One way to ensure that only one employee appears in a list instance is to define that employee_id as a key. When an element is defined as a key, the validating parser will ensure that the instance document contains only unique values for that element.

<xsd:element name="employeeList">
    <xsd:complexType>
           <xsd:sequence>
           <xsd:element ref="employee" maxOccurs="unbounded"/>
           </xsd:sequence>
    </xsd:complexType>

    <xsd:key name="emp_key">
           <xsd:selector xpath="fl:employee"/>
           <xsd:field xpath="fl:employee_id"/>
    </xsd:key>
</xsd:element>

Note that the selector (which identifies the set of elements to which uniqueness applies) and the field (which identifies the unique field) are identified using xpath expressions, which require namespace-qualified names.

Assembling Schemas

Different departments within Flute may define schemas for entities the department primarily owns. For example, the HR department may define the base schema for all Flute employees, and the marketing department may define what a Flute customer schema looks like. These schemas can be put into different schema documents and included in another schema that references elements defined in the original schemas. When schemas are included in another schema document, all schemas must have the same namespace at the including schema. If one of the included schemas has no targetNamespace, it takes the namespace of the including schema:

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                    targetNamespace="http://www.flute.com"
                    xmlns="http://www.flute.com">
    <xsd:include schemaLocation="employee.xsd"/>
    <xsd:include schemaLocation="customer.xsd"/>

Just as the include element allows you to include schemas from the same namespace, the import element allows you to import schemas from different namespaces:

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                    targetNamespace="http://www.flute.com"
                    xmlns="http://www.flute.com">
       <xsd:import namespace = "http://wwww.mktg.flute.com
                    schemaLocation="customer.xsd"/>
    <xsd:import namespace = "http://wwww.hr.flute.com
                    schemaLocation="employee.xsd"/>

Making Schemas Extensible

A sidebar earlier in the chapter discussed one way to extend an XML schema: by providing instructions to another tool, such as Schematron, in an appinfo element, we can get the other tool to provide validations that XML schema parsers cannot handle. The results from the two validations will need to be combined (see Figure A.6).

Figure A.6: Extending XML schema using the appinfo element

Extensibility Elements: Creating Schemas That Can Evolve

So far, we have discussed XML schemas that provide a rigid, static structure to instance documents. If Flute Bank wanted to add a new attribute to the employee element or wanted new information in the employee list, the only way to achieve it would be to modify the schema. This might not be a problem if the schema is used and controlled by a single entity. But if a schema for an invoice is published by an industry consortium and is used by many organizations, changing it will not be easy. XML Schema provides the any and anyAttribute elements to make schemas extensible and open to evolution.

The example below shows the employee schema fragment modified to include the any element:

<xsd:element name="employee">
   <xsd:complexType>
      <xsd:sequence>
          <xsd:element ref="employee_id" minOccurs="1" maxOccurs="1"/>
          <xsd:element ref="name" minOccurs="0" maxOccurs="1"/>
          <xsd:element ref="extn" maxOccurs="unbounded"/>
          <xsd:element ref="dept" minOccurs="1" maxOccurs="1"/>
          <xsd:element ref="email" minOccurs="0" maxOccurs="1"/>
         <xsd:any minOccurs=0 />
      </xsd:sequence>
   </xsd:complexType>
</xsd:element>

The instance document fragment below is now valid!

<employee>
    <employee_id>10000</employee_id>
    <name>
        <first_name>John</first_name>
        <last_name>Doe</last_name>
    </name>
<extn>27304</extn>
<dept>110-433-2089</dept>
    <email>john.doe@acme.com</email>
    <office xmlns="http://www.flute.boston.com"> boston </office>
</employee>

In the above example, the instance document added a new subelement to employee. The office element is declared in namespace www.flute.boston.com. It is possible to specify the namespace in which these new elements should be defined. In the above example, we did not restrict the namespace of the new element; by default, the any element without a namespace attribute means that authors of instance documents can add new vocabulary from any namespace. To restrict the namespace of the new vocabulary, we can specify the any element with the namespace attribute as follows:

<any namespace="http://www.specific.namespace.com"/>

Now only elements defined in the http://www.specific.namespace.com namespace can be added to an instance document.

The following means that new elements in the instance document must not belong to the targetNamespace (any other namespace is okay):

<any namespace="##other"/>

The following means that only elements belonging to the target namespace can be added:

<any namespace="##targetNamespace"/>

The anyAttribute element allows the author of an instance document to add one or more attributes:

<xsd:element name="employee">
   <xsd:complexType>
      <xsd:sequence>
          <xsd:element ref="employee_id" minOccurs="1" maxOccurs="1"/>
          <xsd:element ref="name" minOccurs="0" maxOccurs="1"/>
          <xsd:element ref="extn" maxOccurs="unbounded"/>
          <xsd:element ref="dept" minOccurs="1" maxOccurs="1"/>
          <xsd:element ref="email" minOccurs="0" maxOccurs="1"/>
         <xsd:any minOccurs=0 />
       </xsd:sequence>
       <xsd:anyAttribute/>
   </xsd:complexType>
</xsd:element>

Adding any and anyAttribute elements ensures that schemas can evolve. If a receiving parser does not know how to handle the extension, it simply ignores it. That way, new features can be added to an instance document, and processing nodes can slowly be changed to handle the new changes.