What's new in Saxon 9.5?

By Michael Kay on March 18, 2013 at 11:25a.m.

The following is a provisional list of changes expected in the Saxon 9.5 release, which is in the final testing phase. There could still be changes to this list, for example a feature could be dropped if it is found to be unreliable during testing.

Optimization

The xsl:result-document instruction in Saxon-EE is now asynchronous. That is, the code to output the result document runs in a separate thread, in parallel with other processing. The maximum number of threads used by xsl:result-document instructions is limited by the configuration option FeatureKeys.RESULT_DOCUMENT_THREADS which defaults to the number of processors available to the Java VM; setting this to zero or one will suppress multithreading. Setting FeatureKeys.ALLOW_MULTITHREADING to false has the same effect. (This can be useful when debugging, because otherwise the output from xsl:message and fn:trace() can be very confusing).

The facility can also potentially cause problems if the code calls extension functions that have side-effects. Multi-threading can therefore be controlled, if required, using thesaxon:asynchronous attribute on the xsl:result-document instruction: use saxon:asynchronous="no" to suppress multi-threading. Asynchronous processing of xsl:result-documentis automatically suppressed if tracing (using a TraceListener) is enabled.

The collection() function is also now multi-threaded in Saxon-EE. Each document in the collection is parsed in a separate thread, and the documents are processed in the order in which parsing completes. This makes the order of the documents less predictable than in previous releases, though it was never guaranteed or documented.

Extensions

New XSLT 3.0 instructions such as xsl:iterate and xsl:try no longer have synonyms in the Saxon namespace.

Extension function saxon:for-each-group() is dropped (superseded by "group by" in XQuery).

New extension functions saxon:schema() and saxon:type are available, giving access to schema information. The saxon:schema() function obtains information from all the schema components available in the query or stylesheet; the saxon:type function gives information about the type annotation of a node. In both cases, the information is returned as a function item which behaves like a map from schema component property names to values; for example the name of the type annotation of a node is given by saxon:type($node)('name').

A new extension function saxon:send-mail() is available to send email via an SMTP server.

A new extension function saxon:key-map() is available: this allows the index constructed using xsl:key to be viewed as a map, giving extra functionality such as getting all the entries in a given range, enumerating the key values, concatenating keys across multiple documents, processing the entries in sorted order, etc.

The third argument of saxon:transform(), which supplies parameters to an XSLT transformation, may now take the form of a map, allowing the parameter value to be any value whatsoever (the old mechanism restricted it to atomic values.)

The extension function saxon:index has changed to expect a function as its second argument rather than a prepared expression, and it now returns a map which can be accessed using all the standard map functions. The extension function saxon:find is now a synonym for map:get. There is no longer an option to specify a collation.

A new flag "v" has been added to saxon:deep-equal() to suppress the check that two elements have the same "variety" of type: for example if one has element-only content, the other must have element-only content. This check was not performed in previous releases; in this release it is performed by default (as required by the fn:deep-equal() specification), but may be suppressed using this option. The option is useful when comparing validated and unvalidated documents.

The proposed EXPath file module (see http://www.expath.org/modules/file/) is implemented (in Saxon-PE and -EE). This provides a number of extension functions for reading and writing files in filestore, creating and deleting files and directories, listing the contents of directories, and so on.

The EXPath zip module (see http://expath.org/spec/zip) is implemented. The implementation is derived from the original implementation by Florent Georges, but has been more closely integrated into Saxon and more thoroughly tested. This module is open source code; the extensions are integrated into the Saxon-PE and Saxon-EE distribution, and are available to Saxon-HE users in source code form, where they can be linked to the product in the same way as any other extension functions (see  .

A new serialization option saxon:attribute-order is available. The value is a whitespace-separated list of attribute names. Attributes whose names are present in the list come first, in the order specified; other attributes follow, sorted first by namespace URI and then by local name.

When the -wrap option is used in XQuery, the serializer is now capable of generating an XML representation of a map, using nested elements to represent the entries in the map with their keys and values.

External Object Models

Support for JDOM2 is added.

Support for Apache Axiom is added.

A number of optimizations have been made to the support modules for other external object models, noticeably to speed up navigation of the descendant axis.

The class DocumentBuilderFactoryImpl, which constructed a DOM wrapper around a TinyTree, and which has been deprecated since 9.3, is now removed.

XSD Extensions

Parameterized validation: a new extension to XSD 1.1 is implemented to allow schema validation to be parameterized. The saxon:param element can be added to a schema to declare a parameter; the value of the parameter can be referenced in XPath expressions, for example in assertions. The parameter values can be set from the command line when running the Validatecommand, or from the s9api (Java) and Saxon.Api (.NET) interfaces when validating from an application. It is also possible to initiate parameterized validation using a new saxon:validateextension function available in XSLT and XQuery.

Internal Changes

Regular expressions: a new regular expression engine is introduced, based on the Apache Jakarta product.

The implementation of the base64 encoding and decoding routines has been rewritten in order to simplify the set of open source licenses in use

XPath 3.0 Functions and Operators

fn:serialize() function: the implementation has been changed to match the 2011 version of the W3C specification. This changes the format of the serialization parameters supplied in the second argument to the function.

The functions has-children()innermost(), and outermost() are implemented.

The function parse-xml-fragment() is implemented.

In format-date() and friends, timezone formatting has been updated to match the current spec. This may create incompatible changes. The changes apply whether or not XPath 3.0 processing is enabled.

The implementation of deep-equal() has been changed to enforce the rule that the type annotations of two element nodes must have the same variety. This corrects a long-standing non-conformance, but may cause incompatibility, especially when comparing a schema-validated document against an unvalidated document. A new flag "v" has been added to the Saxon equivalent function saxon:deep-equal() to suppress this check.

Command line

Command-line interfaces: added the option -quit:off to prevent exiting the Java VM in the event of a failure; instead, a RunTimeException is thrown. Useful when the command line interfaces are invoked from another Java application, for example ant, as it allows the calling application to recover.

When the -wrap option is used in XQuery, the serializer is now capable of generating an XML representation of a map, using nested elements to represent the entries in the map with their keys and values.

XSD 1.0 Support

There have been changes to the way the schema processor handles recovery from validation errors. By default the processor tries to continue validating to the end of the file, reporting as many validation errors as it can. All errors are reported to the error() method of the ErrorListener, which by default throws no exception; if the error() method does choose to throw an exception then validation terminates immediately. Validation can also be terminated early if an error limit is reached; the error limit can be set in the ParseOptions object that controls the validation. (The value 1 causes termination after the first error is reported).

At the end of the document, if there have been one or more validation errors, then a fatal error is thrown unless the "recover after validation errors" option (FeatureKeys.VALIDATION_WARNINGS) is set, in which case no exception occurs and processing can continue. Note that in this case the type annotations in any result document are unreliable. The setting FeatureKeys.VALIDATION_WARNINGS is ignored in the case where a document used as input to XQuery or XSLT is being validated, because in that case the incorrect type annotations would cause inconsistencies during subsequent processing. TODO: check this works.

If an element is found that cannot legitimately appear according to the content model of its parent, Saxon previously abandoned the validation of the content of that element, as well as the content of its following siblings. This has been changed so that the content of the offending element, and the content of its siblings, is now validated, using the context-determined type for the element (the "element declarations consistent" constraint ensures that if the same element name appears more than once in a content model, each must be associated with the same type).

The validation exception made available to the ErrorListener now includes a structured path indicating location of the offending node (previously the path was available only as a string). It also includes, where available, a reference to the schema type against which validation was attempted.

The error messages produced when a sequence of elements does not conform to the content model of a complex type has been improved. There is now more effort to distinguish different causes of the error: for example, too many repetitions of a repeated element, a mandatory element that has been omitted, an element that is in the wrong namespace.

Saxon now recognizes the XLink namespace and fetches the schema for this namespace locally rather than fetching it from the W3C web site (which will often time out).

XPath 3.0 Support

Revised the EQName syntax as per spec bug 15399: it is now Q{uri}local.

Affecting XPath 2.0 also, some very long-standing bugs have been fixed in the handling of schema-validated documents using xsi:nil - specifically, ensuring that the typed value of a nilled element is always an empty sequence, getting the matching of types element(N, T) and element(N, T?) right, as well as the impact of nillability on type subsumption.

XSLT 3.0 Support

The bind-group and bind-grouping-key variables on the xsl:for-each-group element are now implemented.

This involved some internal refactoring of the way variables are managed during the XSLT static analysis phase.

Implemented the composite attribute of the xsl:for-each-group element.

Changed the implementation of xsl:merge to reflect the revisions in the draft XSLT 3.0 specification (removed xsl:merge-input element; added sort-before-merge attribute).

Implemented accumulators (the new xsl:accumulator feature) for both streamed and unstreamed documents.

Implemented the xsl:stream instruction.

Implemented the error-code attribute of xsl:message.

Implemented the start-at attribute of xsl:number.

Implemented the context-item attribute of xsl:evaluate.

Implemented the xsl:assert instruction.

Implemented the xsl:map and xsl:map-entry instructions.

The restriction that xsl:import must precede other declarations in a stylesheet module has been removed.

Implemented the on-empty attribute of xsl:attribute.

Implemented the xsl:on-empty attribute of literal result elements. Tested with and without byte code generation. TODO: test with streaming, and with freestanding construction (evaluateItem).

Saxon now allows the EQName syntax Q{uri}local in places where a QName is required, for example the name attribute of xsl:variablexsl:templatexsl:function etc, the first argument of the functions key()system-property(), and function-available(), the final argument of format-number(), etc. This is useful for static names in the case where stylesheets are generated programmatically, and it is always useful for dynamic names because it avoids the dependency on the static namespace context.

It is now a static error if the same NodeTest appears in an xsl:strip-space and an xsl:preserve-space declaration at the same import precedence. This is a backwards incompatibility in the 3.0 specification.

There is now a warning message if the namespace URI of the document element of the principal input document does not match the namespace URIs used in the template rules of the stylesheet. This is designed to catch the common beginner's mistake of writing (for example) match="html" when the element to be matched is actually in a (default) namespace. The precise conditions for the warning are that the stylesheet contains one or more template rules in the initial mode that match an explicit element name, and none of these template rules matches an element in the namespace used for the top level element of the principal source document.

Implemented more of the new pattern syntax: patterns matching variables, namespace nodes, ... Patterns that match atomic values can no longer be used as part of a pattern that uses "union", "intersect", or "except" (as a result of clarification of the XSLT 3.0 specification.)

Affecting XSLT 2.0 also, a very long-standing bug has been fixed: documents read using the collection() function are now subjected to whitespace stripping as defined by xsl:strip-space declarations in the stylesheet.

Also affecting XSLT 2.0, a change has been made in the behavior of xsl:result-document when the href attribute is absent. Previously this caused the constructed result document to be serialized to the location specified by the base output URI. Now it causes the constructed document to be sent to the primary output destination selected in the calling API (which might, for example, be a DOMResult or an XdmDestination). Both interpretations appear to be allowed by the specification. Note that omitting the href attribute is done typically when you want to validate the result document against a schema, though the same effect can be achieved using xsl:document.

Whitespace stripping: the xsl:strip-space declaration now has no effect on a node if it is within the scope of an XSD 1.1 assertion: that is, whitespace text nodes are not stripped if any ancestor node has been validated against a type that contains an assertion. This is because changing the content of such a node could invalidate the assertion, thus breaking type safety.

Implemented content value templates. These allow expressions contained in curly braces to be contained in text nodes within a sequence constructor, rather like attribute value templates; the facility can be enabled by setting expand-text="yes" on any enclosing XSLT element (for example, <xsl:stylesheet>). Example: {$greeting}, {$first} {$last}, how are things?]]>. Note that the details of the facility in the specification (for example, handling of boundary whitespace) are subject to change.

The new match pattern syntax match="?{ expr }" is implemented. This matches an item T if the expression has an effective boolean value of true when evaluated with T as the context item. This construct replaces the ~ itemType syntax defined in earlier XSLT 3.0 drafts, but for the moment Saxon implements both.

Some of the features NOT implemented in XSLT 3.0 include:

Streaming

There has been considerable development of the streaming capability, much of it involving removal of restrictions that were not previously documented (or known).

The xsl:stream instruction is now implemented, making the saxon:stream extension function obsolescent (though it is retained for the time being). Accumulators are also implemented, in both streaming and non-streaming mode.

Streaming: Saxon is moving towards a design where it implements the "guaranteed streamability" rules in the W3C draft precisely, unless the configuration optionALLOW_STREAMABILITY_EXTENSIONS is set, in which case Saxon may implement streaming (or near-streaming) for some additional cases.

Certain constructs using positional filters can now be evaluated in streaming mode. The filter must be on a node test that uses the child axis and selects element nodes. The forms accepted are expressions that can be expressed as x[position() op N] where N is an expression that is independent of the focus and is statically known to evaluate to a number, x is a node test using the child axis, and op is one of the operators eq, le, lt, gt, or ge. Alternative forms of this construct such as x[N], remove(x, 1), head(x), tail(x), and subsequence(x, 1, N) are also accepted.

Streaming is now possible for  xsl:for-each-group using the group-adjacentgroup-starting-with, or group-ending-with attributes.

XQuery 3.0

Saxon 9.5 fully implements the XQuery 3.0 Candidate Recommendation of January 2013.

Forwards references to global variables are now allowed.

Added support for variables such as err:code and err:description within the catch block of try/catch.

Added support for function annotations.

Added support for the required-feature and prohibited-feature declarations.

XQuery 3.0 extensions can now co-exist with XQuery Update extensions in the same query. 

.NET API

The XPathSelector interface now has an EffectiveBooleanValue method.

Various methods have been added to bring the interface closer to the functionality level of the Java s9api interface.

System Programming Interfaces

In the Configuration class, many of the getters and setters for individual configuration properties have been removed, relying instead on the general-purpose methodsgetConfigurationProperty() and setConfigurationProperty(). To make these easier to use in the common case of boolean properties, the new methods getBooleanProperty() andsetBooleanProperty() are introduced. Property names that are only relevant to Saxon-PE or Saxon-EE are now generally recognized only by a ProfessionalConfiguration orEnterpriseConfiguration respectively; previously some were also recognized by a Saxon-HE configuration while others were not.

The classes ValueRepresentation and Value have been replaced by the new class Sequence. In some interfaces Sequence also replaces SequenceIterator, since passing aLazySequence (which implements Sequence) has all the same performance benefits as passing a SequenceIterator, while still allowing an Item to be passed directly, without wrapping.

The change affects (simplifies) the interface for integrated extension functions, where arguments and result values are now passed as Sequence objects rather than as SequenceIteratorinstances.

In the NodeInfo interface, the two methods getTypedValue() and atomize() for obtaining the typed value of a node have been unified into a single method, atomize(), which returns the new type AtomicSequence. In the vast majority of cases the typed value is a single atomic value, and this case works efficiently because AtomicValue implements AtomicSequence.

Also in the NodeInfo interface, the method getAttributeValue(fingerprint) has been dropped; callers should use getAttributeValue(uri, local) instead.

The OutputURIResolver interface has been changed: a new method newInstance() is added. This change is made because xsl:result-document is now multi-threaded by default, and since it's likely that existing implementations of OutputURIResolver won't be thread-safe, making an interface change is better than a semantic change that will cause the code to break in difficult-to-diagnose ways. The new method typically returns a new instance of the OutputURIResolver class, so that each instance of the class only needs to remember information about one result document, and is automatically thread-safe.

In the Java classes that implement the schema component model (in package com.saxonica.schema) many of the methods that return an Iterator have been replaced by methods that return a Set or a List. This simplifies processing by enabling use of the Java for-each construct, and is more convenient when the saxon:schema extension function is used to process schema components in the form of XPath functions.

The toString() method on the Expression class has been more carefully defined and implemented in an attempt to achieve the result that (for an Expression within the scope of XPath) the result is a legal and equivalent XPath 3.0 expression with no dependencies on namespace prefixes other than (a) the binding of the prefix "xs" to the standard Schema namespace, and (b) the assumption that the XPath functions namespace is the default function namespace. In other cases QNames are expanded using EQName notation Q{uri}local. There may be a few remaining cases where the output does not yet satisfy these intentions.

Extensibility

Saxon has long provided the ability to have an Item that wraps an external Java or .NET object, which can be supplied in arguments to extension function calls or used in the response from an extension function calls. In the past, such values have appeared in the type hierarchy under "atomic value", that is, as a peer of types such as xs:boolean and xs:string. This has changed so they are no long atomic values, instead forming a fourth kind of item alongside nodes, atomic values, and function items.

The string value and typed value of an external object are the same; they are the xs:string value that results from calling its toString() method.

The handling of extension items that wrap a Java null has been tidied up. When a Java extension function returns null, this is mapped to an XDM empty sequence, not to an extension item that wraps null. When an empty sequence is supplied to a Java extension function that expects an Object, the value null will be passed to the Java method. Extension items are therefore no longer allowed to wrap a Java null.

With reflexive extension functions, the handling of Java arrays has been improved; the component type of the array is now considered when deciding which of several overloaded methods is the best fit.

With reflexive extension functions, if there are two methods where one expects a String and the other a CharSequence, the one expecting a CharSequence is now preferred. Previously the function call was rejected as ambiguous. Although there is no particular reason for choosing one rather than the other, it's likely that both methods will have the same effect, so choosing one arbitrarily is better than reporting an error.

With reflexive extension functions, if there are two methods of the right name and arity, the decision which to use is now postponed until after type checking has been done. This is particularly useful when arguments are supplied in the form of variable references. Previously the decision was postponed only if the early analysis showed both methods as equally preferred; now it is always postponed.

When a constructor is called the code now does what the documentation has always said: a wrapped external object is returned with no conversion. So for example Date:new() will return a wrapped java.util.Date object rather than an instance of xs:dateTime, and HashMap:new() will return a wrapped HashMap rather than an XPath map. The code also avoids doing conversions other than object unwrapping for the first argument of a call to an instance method.

When a reflexive extension function throws an exception, the exception details are now captured in an error that can be caught using the try/catch capabilities in XSLT 3.0 and XQuery 3.0. In particular, the error code is a QName whose namespace URI is "http://saxon.sf.net/java-type and whose local name is the class name of the exception type, for examplejava.net.URISyntaxException.

It is now possible to use the reflexive call mechanism to call instance-level methods of the implementation classes for XDM values. For example, the following gets the current Julian instant:

<xsl:value-of select="d:toJulianInstant(current-dateTime())"
              xmlns:d="java:net.sf.saxon.value.DateTimeValue"/>

When an instance of java.util.Map is passed as a parameter to a stylesheet or query, then by default it is accessible within the stylesheet or query as an XPath 3.0 map object. However, if the parameter is declared with a required type of jt:java.util.Map, then it will instead be retained as an external object wrapping the java.util.Map, which means for example that (with the usual caveats and disclaimers) it is updateable by calling its put() method as an extension function.

The SQL Extension

Added the value #auto for the column-tag and row-tag attributes of sql:query, and made this the default; when this value is used, the element names used for rows and cells are taken from the table name used in the query, and the column names returned in the query metadata. To get the old default behaviour, specify row-tag="row" and column-tag="col".

Serialization

Support for XHTML 5 serialization, as defined in the 3.0 Serialization specification, is available. (Use <xsl:output method="xhtml" html-version="5.0"/>).

Comments and processing instructions no longer prevent an immediately following start tag being indented if it would have been indented in the absence of the comment or PI.

(However, comments and PIs are not themselves indented, because it is not possible to do this safely without looking ahead to see if the comment or PI is followed by significant character content.)