Formatting of floating point numbers

By Michael Kay on February 06, 2006 at 02:18p.m.

Given the query 1.11111111111111112E-7 some users might be surprised to see the answer displayed as 1.1111111111111111E-7. But those are probably people who don't realize that these are two different lexical representations of the same xs:double value (because of limitations of double-precision floating point). The question now is, given that you realize these are both different ways of displaying the same number, which one should you choose?

I ran into this problem in trying to pass all the tests in the XQTS test suite. There are one or two tests there where Saxon produces an answer that's off by a tiny amount from the published result, not because of rounding errors in the calculations, but because there is more than one way of printing the same number. In the Candidate Recommendation we have rules for converting a double (or float) to a string (you'll find them in section 17.1.2 of the Functions and Operators book). These rules specify that "there must be as many, but only as many, additional digits as are needed to uniquely distinguish the value from all other values for the datatype", and I think we (the WG) were under the impression that this gives you an unambiguous answer. These words are taken straight from the XPath 1.0 spec, which in turn took them straight from Java, so I took a look to see what Java does.

It turns out that it's very hard to reconcile what Java does with what Java actually says. As I reported in a bug entry, the single-precision float whose internal representation in hex is 58901722 is formatted by Java as 1.26743223E15, but the final digit "3" adds nothing, since all literals in the range 1.26743220E15 to 1.26743229E15 produce the same value.

I proposed three options to the WG: either say that you can choose any lexical representation that "round-trips" back to the original value; or that you must choose the exact decimal representation of the internal binary value, regardless how many digits it takes; or a tightening of the existing rules to make them unambiguous. The WGs chose the first option - which makes the implementor's life easier because you can use an existing floating-point library, at the cost of recognizing that different products will produce different lexical output. That's probably a wise choice at this stage of the game.

As it happens, and quite fortuitously, I was having to face problems with this code in Saxon at the same time. Saxon currently takes the Java output of float-to-string conversion and then does some post-processing to make it fit the XPath rules. This code wasn't working on the Saxon.NET platform because of bugs in the IKVMC cross-compiler. So I decided to bring the code "in house" and  implement float-to-string conversion within Saxon itself rather than relying on an external library (starting, of course, from published open-source code, in this case code published by Jack Shirazi in the O'Reilly book Java Performance Tuning).

Jack's code is extremely fast but unfortunately (as he admits) it doesn't always produce the right answer: when you round-trip, the result is sometimes out by a tiny fraction because of decimal division rounding errors. What I'm currently doing is to test if the value round-trips, and if it doesn't, I'm adjusting it up or down. The result still seems to be faster than the original code. The only slightly unsatisfactory feature is that whereas Java always produced a value that was right in the middle of the range of possible values, my code often produces a value that's at one extreme or the other. But it satisfies the new conformance rules, and it seems to perform OK, so it will have to do.