Exploring the new features of XML Schema 1.1 (Part 1)

XML Schema 1.1 has just been promoted to Recommendation by the W3C in this year’s April. It’s time to explore the changes compared to the previous version.

First, the name of the standard has been changed to W3C XML Schema Definition Language (XSD). Beyond that, XSD 1.1 offers exciting new features, while preserving backward compatibility. This post is the first in a series of posts that will demonstrate some of the new features of XSD 1.1.

One of the two newly introduced constraining facets is called assertion (the other one is called explicitTimezone). As you will see, it is a powerful new feature that comes handy for defining datatypes. The facet constrains the value space by a user-provided logical expression that must be satisfied.

The following simple example demonstrates how to use the assertion facet:

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="number">
        <xs:simpleType>
            <xs:restriction base="xs:integer">
                <xs:assertion test="abs($value mod 2) eq 1"/>
            </xs:restriction>
        </xs:simpleType>
    </xs:element>

</xs:schema>

Note that the above just looks like as a plain old schema document, except for the assertion element. There is no way to explicitly indicate that XSD 1.1 is being used here.

The test attribute of the assertion element contains an XPath 2.0 expression that will be evaluated as true or false. (The boolean function is used to convert the value of the expression to a boolean.) In the XPath expression $value can be used to refer to the value being checked.

As mod stands for the modulo operation, the value space of the datatype defined is clearly the set of odd integers. Note that, an equivalent solution is to use regular expression matching that is also available in XML Schema 1.0. Replacing the assertion element in line 7 with

    <xs:pattern value=".*[13579]"/>

also results in the same value space.

However, there are situations in which regular expressions can not help. For example, consider the case of palindromes. Let’s try to define a new datatype whose value space is the set of palindrome strings. You may recall that from your computational theory class, this is not possible by using regular expressions. The good news is that we can do it by using XPath functions.

Since there is an XPath function called reverse,

<xs:simpleType name="palindromeString">
    <xs:restriction base="xs:string">
        <xs:assertion test="$value eq reverse($value)/">
    </xs:restriction>
</xs:simpleType>

seems to be a reasonable initial solution. Unfortunately, the function operates on sequences and can not be used to reverse strings directly.

The following trick will do the job. First, we will turn the string being checked into a sequence of Unicode codepoints (ie. a sequence of integers) using the string-to-codepoints function. Then the reverse function is applied to the resulting sequence. Finally, the codepoints-to-string function is used to turn it back into a string. Thus, our solution is now the following:

<xs:simpleType name="palindromeString">
    <xs:restriction base="xs:string">
        <xs:assertion test="$value eq codepoints-to-string(reverse(string-to-codepoints($value)))"/>
    </xs:restriction>
</xs:simpleType>

One more step is necessary to complete our job: comparison must be performed ignoring case and any punctuation characters. In order to do that we must replace both occurrences of $value with lower-case(replace($value, '[\s\p{P}]', '')) in the test attribute. Here we use the replace function to remove any whitespace and punctuation characters from the string.

Our final solution is the following:

?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:simpleType name="palindromeString">
        <xs:restriction base="xs:string">
            <xs:assertion test="lower-case(replace($value, '[\s\p{P}]', '')) eq codepoints-to-string(reverse(string-to-codepoints(lower-case(replace($value, '[\s\p{P}]', '')))))"/>
        </xs:restriction>
    </xs:simpleType>

    <xs:element name="palindrome" type="palindromeString"/>

</xs:schema>

For example, the following are all valid instances of the palindrome element:

<palindrome>never odd or even</palindrome>
<palindrome>Madam, I'm Adam</palindrome>
<palindrome>
    A man, a plan, a canal - Panama!
</palindrome>

You can download the examples above in a ZIP archive here.

Advertisements
Tagged

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: