XML Schema Tutorial - Part 3XML Schema Tutorial - Part 3

XSD Editor

This article gives an overview of some of the more advanced topics of XML Schemas and how to use them.

Data Types Overview

It is often useful to be able to take the definition for an existing entity, and extend it to add more specific information. In most modern development languages, such as C++, C# or Java, we would call this specialization, inheritance or sub classing.

This same concept also exists in the XML Schema standard, allowing us to take an existing type definition and extend it. Types defined in an XSD can also be restricted (although this behaviour has no real parallel in most development languages).

Extending Complex Types

It is possible to take an existing and extend it. Let's see how this may be useful with an example.

Looking at the AddressType that we defined earlier (in Part 1), let's assume our company has now gone international and we need to capture country specific addresses. In this case we need specific information for UK addresses (County and Postcode), and for US addresses (State and ZipCode).

So we can take our existing definition of address and extend it as follows:

Notice each of the two new address types extend the original 'base' address type using:

The newly introduced construct indicates that we are extending an existing type, and specifies the type itself. There is also another new construct, the element, which is just a container for the extension.

So to reiterate, we are defining a new called "USAddressType", this extends the existing type "AddressType", and adds to it a sequence containing the elements "State", and "Zipcode".

This is clearer when viewed graphically:

Extending Complex Types

We can now use these new types as follows:

Sample XML for these elements may look like this:

 34 thingy street someplace somerset/County> w1w8uu  234 Lancaster Av Smallville Florida 34543  

Try all the features of Liquid Studio Download Free Trial Now

Restricting Complex Types

The previous section showed how to take an existing definition, and extend it to create new types. But there is another option here, instead of adding to the type, we could restrict it.

Taking the same AddressType example, we can create a new type called "InternalAddressType". Let's assume "InternalAddressType" only needs Address->Line1.

Notice the new address type restricts the original 'base' address type using:

We are defining a new type "InternalAddressType". The element says we are restricting the existing type "AddressType" , and we are only allowing the existing child element "Line1" to be used in this new definition. The element is just a container for the restriction.

We also need to make a small modification to the base type as Derivation by restriction does not allow you to add or omit elements (unless they are optional in the base type), it simply allows you to restrict their valid values e.g. set a default value or set type="string" where previously no type was specified. So we must change "Line2" to have minOccurs="0".

Note: As we are restricting an existing type the only definitions that can appear in the are a sub set of the ones defined in the base type "AddressType". They must also be enclosed in the same compositor (in this case a sequence) and appear in the same order.

We can now use this new type as follows:

Sample XML for this element may look like this:

 Desk 4, Second Floor/

Tools for Designing and Developing XML Schemas Free Trial

Using the xsi:type Attribute

We have just shown how we can create new types based on existing one. This in itself is pretty useful, and will potentially reduce the amount of complexity in your schemas, making them easier to maintain and understand. However there is an aspect to this that has not yet been covered. In the above examples we created 3 new types (UKAddressType, USAddressType and InternalAddressType), all based on AddressType.

So, if we have an element that explicitly specifies it is of type "UKAddressType", then "UKAddressType" is what must appear in the XML document.

But if an element specifies its of type "AddressType", then any of the 4 types can appear in the XML document (UKAddressType, USAddressType, InternalAddressType or AddressType). The thing to consider now is, how will the XML parser know which type you meant to use, surely it needs to know otherwise it can not do proper validation?

Well, it knows because if you want to use a type other than the one explicitly specified in the schema (in this case "AddressType") then you have to let the parser know which type your using. This is done in the XML document using the xsi:type attribute.

Let's look at an example:

Using the xsi:type Attribute

Sample XML for the above may look like the following:

  Fred 22 whatever place, someplace sometown, ss1 6gy   

However, the following is also valid:

  Fred 234 Lancaseter Av SmallsVille Florida 34543   

Let's look at that in more detail.

We'll learn more about namespaces in the next section.

Still not sure? Then try Liquid Studio Free Download Free Trial

Extending Simple Types

There are 3 ways in which a simpleType can be extended; Restriction, List and Union. The most common is Restriction, but we will cover the other 2 as well.

Restriction

Restriction is a way to constrain an existing type definition. We can apply a restriction to the built in data types xs:string, xs:integer, xs:date, etc. or ones we create ourselves.

Here we are defining a restriction the existing type "string", we are applying a regular expression to it, to limit the values it can take.

This can be shown graphically in Liquid Studio as follows:

Simple Type Content

Simple Type Restriction Property

Let's go through this line by line.

  1. A tag is used to define a our new type, we must give the type a unique name - in this case "LetterType".
  2. We are restricting an existing type - so the tag is (you can also extend an existing type - but more about this later). We are basing our new type on a string so type="xs:string".
  3. We are applying a restriction in the form of a Regular expression, this is specified using the element. The regular expression means the data must contain a single lower or upper case letter a through to z.
  4. A closing tag for the restriction.
  5. A closing tag for the simple type.

Restrictions may also be referred to as Facets. For a complete list see the W3C XSD Standard, but to give you an idea, here are a few examples:

[0-9][0-9][0-9] - 3 digits all have to be between 0 and 9.

[a-z][0-9][A-Z] - 1st digit has to be between a and z and 2nd digit has to be between 0 and 9 and the 3rd digit is between A and Z. These are case sensitive.

[a-zA-Z] - 1 digit that can be either lower or upper case A to Z.

[123] - 1 digit that has to be 1, 2 or 3.

([a-z])* - Zero or more occurrences of a to z.

([q][u])+ - Looking for a pair letters that satisfy the criteria, in this case a q followed by a u.

([a-z][0-9])+ - As above, looking for a pair where the 1st digit is lower case and between a and z, and the 2nd digit is between 0 and 9, for example a1, c2, z159, f45.

It is important to note that not all facets are valid for all data types - for example, maxInclusive has no meaning when applied to a string. For the combinations of facets that are valid for a given data type refer to the W3C XSD standard.

Union

A union is a mechanism for combining two or more different data types into one.

The following defines two simple types "SizeByNumberType" all the positive integers up to 21 (e.g. 10, 12, 14), and "SizeByStringNameType" the values small, medium and large.

Simple Types

We can then define a new type called "USClothingSizeType", we define this as a union of the types "SizeByNumberType" and "SizeByStringNameType" (although we can add any number of types, including the built in types - separated by whitespace).

Simple Type Union Simple Type Union Properties

This means the type can contain any of the values that the two members can take (e.g. 1, 2, 3, . 20, 21, small, medium, large).

This new type can then be used in the same way as any other .

List

A list allows the value (in the XML document) to contain a number of valid values separated by whitespace.

A List is constructed in a similar way to a Union. The difference being that we can only specify a single type. This new type can contain a list of values that are defined by the itemType property. The values must be whitespace separated. So a valid value for this type would be "5 9 21".