Java Generics revisited

By Michael Kay on January 21, 2020 at 09:14p.m.

In Saxon 9.9 we took considerable pains to adopt Java Generics for processing sequences: in particular the Sequence and SequenceIterator classes, and all their subclasses, became Sequence<? extends Item> and SequenceIterator<? extends Item>.

I'm now coming to the conclusion that this was a mistake; or at any rate, that we went too far.

What exactly are the benefits of using Generics? It's supposed to improve type safety and reduce the need for casts which, if applied incorrectly, can trigger run-time exceptions. So it's all about detecting more of your errors at compile time.

Well, I don't think we've been seeing those benefits. And the main reason for that is that in most cases, when we're processing sequences, we don't have any static knowledge of the kind of items we are dealing with.

Sure, when we process a particular XPath path expression, we know whether it's going to deliver nodes or atomic values. But when we write the Java code in Saxon to handle path expressions, all we know is that the result will always be a sequence of items.

There are some cases where particular kinds of expression only handle nodes, or only handle atomic values. For example, the input sequences for a union operator will always be sequences of nodes. It would be nice if we didn't have to handle a completely general sequence and cast every item to class NodeInfo. But it's an illusion to think we can get extra type safety that way. The operands of a union are arbitrary expressions, and the iterators returned by the subexpressions are going to be arbitrary iterators; there's no way we can translate the type-safety we are implementing at the XPath level into type-safe evaluators at the Java level.

It's particularly obvious that generics give us no type-safety at the API level. In s9api, XPathSelector.evaluate() returns an XdmValue. That's a lot better than the JAXP equivalent which just returns Object, but the programmer still has to do casting to convert the items in the return XdmValue to nodes, string, integers, or whatever. And there's no way we can change that; the XPath expression is supplied as a string at run-time, so it's only at run-time that we know what type of items it returns. If that's true at the API level, it's equally true internally. Any kind of expression can invoke any other kind of expression (that's what orthogonality in language design is about), which means that the interfaces between an expression and its subexpressions are always going to be general-purpose sequences whose item type is known only at execution time.

There are a couple of aspects of Java  generics that cause us real pain.

But having got Generics working, at great effort, in 9.9, should we retain them or drop them?

One reason I'm motivated to drop them is .NET. We have a significant user base on .NET, but we have something of a potential crisis looming in terms of ongoing support for this platform. Microsoft appear to be basing their future strategy around .NET Core, allowing .NET Framework to fade away into the sunset. But the technology we use for bridging to .NET, namely IKVM, only supports .NET Framework and not .NET Core; and Jeroen Frijters who single-handedly developed IKVM and supported it for umpteen years (with no revenue stream to support it) has thrown in the towel and is no longer taking it forward. So we're looking at a number of options for a way forward on .NET. One of these is source code conversion; and to make source code conversion viable without forking the code, we need to minimise our dependencies on Java features that don't translate easily to C#. Notable among those features is generics.

In the short term, I think I'm going to roll back the use of generics in selected areas where they are clearly more trouble than they are worth. That's particularly true of Sequence and its subclasses, including Item. For SequenceIterator it's probably worth keeping generics for the time being, but we'll keep that under review.