A new API for access to schema information

By Michael Kay on May 21, 2012 at 04:12p.m.

Saxon for a while has had the ability to export the schema component model as an XML document, making it accessible to XSLT and XQuery applications. Providing the information at this level is much easier to process because it represents the "compiled" schema rather than the raw schema documents; so there's no sensitivity to all the stylistic variations in the way users choose to write schemas. Nevertheless, it's not a particularly easy model to work with, partly because chasing all the references from one component to another isn't much fun.

I've been experimenting today with a new interface that represents the schema components directly as a functional model.

So saxon:schema() returns the "Schema" schema component as a function, and saxon:schema()('element declarations') gets all the element declarations, as a list of functions. If you want the element declaration with a particular name, that's

let $bookDecl := saxon:schema()('element declarations')[.('name') = 'BOOK']

which is likely to be sufficiently common that I propose to offer a short-cut.

The properties of the element are then available, for example

$bookDecl('nillable')

tests whether the declaration is nillable, and

$bookDecl('abstract')

tests whether it is abstract.

Many properties have values that are themselves schema components, also represented as functions. So to discover whether an element has element-only content you can test:

if ($bookDecl('type definition')('variety') = 'element-only') ...

All the schema components and property records are available as described in the XSD 1.1 specification, with a very few exceptions (hopefully temporary):

(a) Saxon currently does not provide access to annotations (they are discarded at schema processing time)

(b) One or two other properties are not fully implemented, for example, {context} and {fundamental facets}

(c) The types available directly from the Schema do not include built-in types

(d) Saxon represents multiple patterns and assertions of a simple type as multiple facet objects rather than a single facet object.

(e) There's one property that's hard to represent directly in the XDM data model: the value of the enumeration facet is a sequence of sequences. So I'm currently exposing the enumeration values as string values rather than typed values.

Every component has an additional property "class" which tells you what kind of schema component it is. This is particularly necessary for the components representing facets, where the class is the only way of telling, for example, whether you have a minInclusive or a maxExclusive facet.

That's a lot of capability but I think there are one or two other things needed:

(i) getting top-level global components by name. I think the best way to do this is a function saxon:element-declaration() that takes the QName of the required component and returns the component directly; there will be one of these functions for each kind of component.

(ii) a few other 'convenience' properties that short-circuit complex navigation of the component model. For example, starting with an element declaration, getting all the element declarations of its permitted children, without having to trawl your way through the complex type and its particles and terms and their substitution groups. I guess only experience of real applications will show what convenience functions are really needed.

(iii) linking to this model from validated instances (a subset of the PSVI - initially perhaps just the ability to get from a node to the schema component representing its type annotation) and perhaps also from validation errors: if you use try/catch to catch a validation error, it would be nice if the error object gives access to the relevant schema component.

Note that I haven't mentioned maps. The functions that implement schema components might be maps, or they might not. That's an implementation detail. One of the nice things about the XSLT 3.0 model for maps (already implemented in Saxon 9.4) is that maps are functions, so  if an interface like this one is defined in terms of functions, it can use maps behind the scenes in the implementation, or not, as it pleases. In fact my current implementation isn't using maps, because that makes it easier to do a lazy implementation in which there is no underlying data structure other than the Java classes that are already there to implement the component model.

One nice property of this API is that I can't see many obstacles to standardizing it so it's available in all XSLT and XQuery implementations. The main work, of defining the schema component model, has already been done in the XSD specs.