The pattern match="para except appendix//para"

By Michael Kay on May 26, 2022 at 03:15p.m.

If you saw this pattern in an XSLT stylesheet, I can guess your reaction: I haven't seen a pattern like that before. Cool, a neat way of matching paragraphs that aren't in an appendix. Must remember that and use it myself.

Sadly, it doesn't do what you think. Consider this input document:


<appendix id="A">
    <section id="A.1">
        <para>Ipsum lorem.</para>
    </section>
</appendix>                   
        

You'd probably be as surprised as I was to see that the Ipsum lorem paragraph in this example matches the pattern para except appendix//para.

To see why this is true, go to the spec, section 5.5.3:

An item N matches a pattern P if the following applies, where EE is the equivalent expression to P: N is a node, and the result of evaluating the expression root(.)//(EE) with a singleton focus based on N is a sequence that includes the node N.

So, this is saying that a node matches the pattern if it is selected by the expression root(.)//(para except appendix/para). Assuming that we're in a tree rooted at a document node, that means it must be selected by the expression /descendant-or-self::node()/(para except appendix//para).

Now, in our example document, one of the nodes selected by /descendant-or-self::node() is the section element; and when we evaluate (para except appendix//para) starting at the section element, the first operand (para) selects our paragraph, and the second operand (appendix//para) doesn't select it, so the expression as a whole selects it, and therefore it matches the pattern.

That's totally counter-intuitive, and it's certainly not what the Working Group intended. It's a nasty bug. So the question is, what can we do about it, given that this is a published spec and there are implementations out there, and user applications that depend on it?

Is there anything we can do about it?

Perhaps we should start by asking: what would we like the spec to say, if we had the opportunity to change it?

Given that we already have a special rule for patterns with a top-level union operator (see §6.5 rule 2), we could add a special rule for patterns with a top-level intersect or except operator: a pattern of the form A except B matches an item if pattern A matches the item and pattern B does not. (And analagously for intersect.)

If that's what we think we need to do, that leaves two challenges:

Starting with the second point, there are several possibilities:

The third option seems the most satisfactory. And that suggest a route forward for the spec: in XSLT 4.0, if and when we manage to get it defined, deprecate the except and intersect operators at the top level of a pattern, and replace them with new operators that have the expected intuitive semantics.