Taking Saxon-CE Forward

| No Comments | No TrackBacks

The work we are doing on Saxon-CE (XSLT 2.0 on the browser) received a very encouraging reception at XML Prague. Our half-day tutorial/workshop was very well attended, and there was a lot of useful feedback, and new ideas which we are hoping to incorporate. Discussions in the corridors of the conference also demonstrated a lot of interest. If you're not familiar with the current status, it's now on beta release, and we've been doing a lot of work that can be summed up as taking it from the prototype we showed last year to an industrial-strength product with stable interfaces, a range of development and debugging and instrumentation tools, responsive performance, and full cross-browser compatibility.

There are a few technical things we need to do before the product is finished. Providing a Javascript API is one; another is to provide control over management of the browser URL bar and history, so that users can save bookmarks and use the back button in the way that they expect; and we need to enable asynchronous fetching of XML documents from the server to create AJAX-style applications (since the J stands for Javascript, perhaps we should call it AXAX).

We're also starting to plan the commercial side of shipping a product. That includes pricing, the logistics of licensing and selling, and some basic marketing. We're not going to give the product away: there are too many good software products that have stalled because the value delivered to users wasn't flowing back to the developers to invest in the product. At the same time, giving software away free, within limits, is by far the best way of establishing a presence in the market.

Let me share my current thinking on pricing. I'm interested in feedback, so let me know what you think: these are just current ideas, so it might turn out quite different.

Firstly, the software will be licensed by domain (the name of the web domain on which it is deployed). We've already got that mechanism in place in the beta. We're hoping it's simple, clear, and hassle-free. We're not charging per developer, because that would be complicated and because the cost isn't directly related to the value; and we're not charging per end-user or per access, because that would be madness. Perhaps for people who want to deploy on multiple domains - the IBMs of this world - we will offer some kind of discount for domains after the first.

Secondly, we'll classify users into two groups: small and large. (Not "commercial" and "non-commercial" - that leaves far too many questions of definition). A small user is an individual or company or organization with revenue less than $x per annum (let's say $1m for the sake of argument), and a big user is anyone else. The entity whose revenue matters is the legal owner of the domain for which the license is held, which is nice because we can look that up; in many cases we can also check the revenue from public information, so there's not too much room for cheating.

Finally, we'll give you a choice of getting a one-year license or a permanent license. A one-year license will switch off at the end of the year. We hope that's a painful enough prospect that people who are taking the software seriously will choose to pay for a permanent license, while leaving a low-cost option for those who aren't quite sure yet how much commitment they want to make.

So that gives four prices, looking something like this (in GBP):

Price
12-month Permanent
Small organization
free £500
Large organization
£500 £2500

How do we tell if we have got this right? The prices look high by some standards, low by others. I think the test whether the price is in the right ballpark is that it is (a) less than the value users are getting from the software, and (b) greater than the money we are spending on developing it. On those criteria, it looks good to me, but I welcome your views!

Enhanced by Zemanta

Maps

| No Comments | No TrackBacks
I optimistically assumed that the W3C XQuery/XSLT specification for maps would be published long before Saxon 9.4 was released. I was wrong - the XQuery WG didn't rubber-stamp the XSLT WG's spec as expected, but insisted on reviewing it from first principles, starting a long drawn out study of requirements and use-cases which is still ongoing.

So Saxon 9.4 has maps, but no documentation for them. To remedy this, here is the spec. It happens to be a snapshot of a draft of the XSLT 3.0 specification, but of course everything can change before a draft is published. And as always when Saxon implements features in a draft specification, if the spec changes then Saxon will change with it, regardless of compatibility. Use at your own risk.

20.1 Maps

A map is an additional kind of item.

[DefinitionA map comprises a collation and a set of entries. Each entry comprises a key which is an arbitrary atomic value, and an arbitrary sequence called the associated value. Within a map, no two entries have the same key, when compared using the eq operator under the map's collation. It is not necessary that all the keys should be mutually comparable (for example, they can include a mixture of integers and strings). Key values will never be of type xs:untypedAtomic, and they will never be the xs:float or xs:double value NaN.]

The function call map:get($map, $key) can be used to retrieve the value associated with a given key.

A map can also be viewed as a function from keys to associated values. To achieve this, a map is also a function item. The function corresponding to the map has the signaturefunction($key as xs:anyAtomicValue) as item()*. Calling the function has the same effect as calling the get function: the expression $map($key) returns the same result asget($map, $key). For example, if $books-by-isbn is a map whose keys are ISBNs and whose assocated values are book elements, then the expression $books-by-isbn("0470192747") returns the book element with the given ISBN. The fact that a map is a function item allows it to be passed as an argument to higher-order functions that expect a function item as one of their arguments.

Like all other values, maps are immutable. For example, the map:remove function creates a new map by removing an entry from an existing map, but the existing map is not changed by the operation.

Like sequences, maps have no identity. It is meaningful to compare the contents of two maps, but there is no way of asking whether they are "the same map": two maps with the same content are indistinguishable.

20.1.1 The Type of a Map

The syntax of ItemTypeXP30 as defined in XPath is extended as follows:

MapType
[17]   ItemType   ::=   ... | MapType 
[18]   MapType   ::=   'map' '(' ( '*' | (AtomicOrUnionTypeXP30 ',' SequenceTypeXP30) ')'

The ItemType map(K, V) matches an item M if (a) M is a map, and (b) every entry in M has a key that matches K and an associated value that matches V. For example,map(xs:integer, element(employee)) matches a map if all the keys in the map are integers, and all the associated values are employee elements. Note that a map (like a sequence) carries no intrinsic type information separate from the types of its entries, and the type of existing entries in a map does not constrain the type of new entries that can be added to the map.

The ItemType map(*) is equivalent to map(xs:anyAtomicType, item()*), and matches any map regardless of its contents.

Because a map is a function, the type map(K, V) is derived from function(K) as V, and instances of map(K, V) can be used wherever the required type is function(K) as V.

20.1.2 Functions that operate on Maps

The functions defined in this section use a conventional namespace prefix map, which is assumed to be bound to the namespace URI http://www.w3.org/2005/xpath-functions/map.

There is no operation to atomize a map or convert it to a string.

The number of entries in the map may be obtained as count(map:keys($map)).

20.1.2.1 map:new
Summary

Creates a new map: either an empty map, or a map that combines entries from a number of existing maps.

Signatures

new() as map(*)

 

new($maps as map(*)*) as map(*)

 

new($maps as map(*)*,
$collation as xs:string) as map(*)

Rules

The function map:new constructs and returns a new map.

The zero-argument form of the function returns an empty map whose collation is the default collation in the static context. It is equivalent to calling the one-argument form of the function with an empty sequence as the value of the first argument.

The one-argument form of the function returns a map that is formed by combining the contents of the maps supplied in the $input argument. It is equivalent to calling the two-argument form of the function with the default collation from the static context as the second argument.

The two-argument form of the function returns a map that is formed by combining the contents of the maps supplied in the $input argument. The collation of the new map is the value of the $collation argument. The supplied maps are combined as follows:

  1. There is one entry in the new map for each distinct key value present in the union of the input maps, where keys are considered distinct according to the rules of thefn:distinct-values function with $collation as the collation.

  2. The associated value for each such key is taken from the last map in the input sequence $input that contains an entry with this key. If this map contains more than one entry with this key (which can happen if its collation is different from that of the new map) then it is implementation-dependent which of them is selected.

There is no requirement that the supplied input maps should have the same or compatible types. The type of a map (for example map(xs:integer, xs:string)) is descriptive of the entries it currently contains, but is not a constraint on how the map may be combined with other maps.

Examples

let $week := map{0:="Sonntag", 1:="Montag", 2:="Dienstag", 3:="Mittwoch", 4:="Donnerstag", 5:="Freitag", 6:="Samstag"}

The expression map:new() returns map{}(Returns an empty map, whose collation is the default collation from the static context).

The expression map:new(()) returns map{}(Returns an empty map, whose collation is the default collation from the static context).

The expression map:new((map:entry(0, "no"), map:entry(1, "yes"))) returns map{0:="no", 1:="yes"}(Returns a map with two entries; the collation of the map is the default collation from the static context).

The expression map:new((map:entry(0, "no"), map:entry(1, "yes"))) returns map{0:="no", 1:="yes"}(Returns a map with two entries; the collation of the map is the default collation from the static context).

The expression map:new(($week, map{7:="Unbekannt"})) returns map{0:="Sonntag", 1:="Montag", 2:="Dienstag", 3:="Mittwoch", 4:="Donnerstag", 5:="Freitag", 6:="Samstag", 7:="Unbekannt"}(The value of the existing map is unchanged; a new map is created containing all the entries from $week, supplemented with a new entry.).

The expression map:new(($week, map{6:="Sonnabend"})) returns map{0:="Sonntag", 1:="Montag", 2:="Dienstag", 3:="Mittwoch", 4:="Donnerstag", 5:="Freitag", 6:="Sonnabend"}(The value of the existing map is unchanged; a new map is created containing all the entries from $week, with one entry replaced by a new entry. Both input maps contain an entry with the key value 6; the one used in the result is the one that comes last in the input sequence.).

The expression map:new((map{"A":=1}, map{"a":=2}), "http://collation.example.com/caseblind") returns map{"a":=2}(Assuming that the keys of the two entries are equal under the rules of the chosen collation, only one of the entries can appear in the result; the one that is chosen is the one from the last map in the input sequence. If both entries were in the same map, it would be implementation-dependent which was chosen.).

20.1.2.2 map:collation
Summary

Returns the URI of the supplied map's collation

Signature

collation($input as map(*)) as xs:string

Rules

The function map:collation returns the collation URI of the map supplied as $input.

Examples

The expression map:collation(map:new((), "http://collation.example.com/caseblind")) returns "http://collation.example.com/caseblind".

20.1.2.3 map:keys
Summary

Returns a sequence containing all the key values present in a map

Signature

keys($input as map(*)) as xs:anyAtomicType*

Rules

The function map:keys takes any map as its $input argument and returns the keys that are present in the map as a sequence of atomic values, in implementation-dependent order.

Examples

The expression map:keys(map{1:="yes", 2:="no"}) returns some permutation of (1,2)(The result is in implementation-dependent order.).

20.1.2.4 map:contains 
Summary

Tests whether a supplied map contains an entry for a given key

Signature

contains($map as map(*),
$key as xs:anyAtomicType) as xs:boolean

Rules

The function map:contains returns true if the map supplied as $map contains an entry with a key equal to the supplied value of $key; otherwise it returns false. The equality comparison uses the map's collation; no error occurs if the map contains keys that are not comparable with the supplied $key.

If the supplied key is xs:untypedAtomic, it is converted to xs:string. If the supplied key is the xs:float or xs:double value NaN, the function returns false.

Examples

let $week := map{0:="Sonntag", 1:="Montag", 2:="Dienstag", 3:="Mittwoch", 4:="Donnerstag", 5:="Freitag", 6:="Samstag"}

The expression map:contains($week, 2) returns true().

The expression map:contains($week, 9) returns false().

The expression map:contains(map{}, "xyz") returns false().

The expression map:contains(map{"xyz":=23}, "xyz") returns true().

The expression map:contains(map{"abc":=23, "xyz":=()}, "xyz") returns true().

20.1.2.5 map:get
Summary

Returns the value associated with a supplied key in a given map.

Signature

get($map as map(*),
$key as xs:anyAtomicType) as item()*

Rules

The function map:get attempts to find an entry within the map supplied as $input that has a key equal to the supplied value of $key. If there is such an entry, it returns the associated value; otherwise it returns an empty sequence. The equality comparison uses the map's collation; no error occurs if the map contains keys that are not comparable with the supplied $key.

If the supplied key is xs:untypedAtomic, it is converted to xs:string. If the supplied key is the xs:float or xs:double value NaN, the function returns an empty sequence.

Notes

A return value of () from map:get could indicate that the key is present in the map with an associated value of (), or it could indicate that the key is not present in the map. The two cases can be distinguished by calling map:contains.

Invoking the map as a function item has the same effect as calling get: that is, when $map is a map, the expression $map($K) is equivalent to get($map, $K). Similarly, the expression get(get(get($map, 'employee'), 'name'), 'first') can be written as $map('employee')('name')('first').

Examples

let $week := map{0:="Sonntag", 1:="Montag", 2:="Dienstag", 3:="Mittwoch", 4:="Donnerstag", 5:="Freitag", 6:="Samstag"}

The expression map:get($week, 4) returns "Donnerstag".

The expression map:get($week, 9) returns ()(When the key is not present, the function returns an empty sequence.).

The expression map:get(map:entry(7,()), 7) returns ()(An empty sequence as the result can also signify that the key is present and the associated value is an empty sequence.).

20.1.2.6 map:entry
Summary

Creates a map that contains a single entry (a key-value pair).

Signature

entry($key as xs:anyAtomicType,
$value as item()*) as map(*)

Rules

The function map:entry returns a new map which normally contains a single entry. The collation of the new map is the default collation from the static context. The key of the entry in the new map is $key, and its associated value is $value.

If the supplied key is the xs:float or xs:double value NaN, the supplied $map is empty (that is, it contains no entries).

If the supplied key is xs:untypedAtomic, it is converted to xs:string.

Notes

The function map:entry is intended primarily for use in conjunction with the function map:new. For example, a map containing seven entries may be constructed like this:

map:new((
   map:entry("Su", "Sunday"),
   map:entry("Mo", "Monday"),
   map:entry("Tu", "Tuesday"),
   map:entry("We", "Wednesday"),
   map:entry("Th", "Thursday"),
   map:entry("Fr", "Friday"),
   map:entry("Sa", "Saturday")
   ))

Unlike the map{...} expression, this technique can be used to construct a map with a variable number of entries, for example:

map:new(for $b in //book return map:entry($b/isbn, $b))
Examples

The expression map:entry("M", "Monday") returns map{"M":="Monday"}.

20.1.2.7 map:remove
Summary

Constructs a new map by removing an entry from an existing map

Signature

remove($map as map(*),
$key as xs:anyAtomicType) as map(*)

Rules

The function map:remove returns a new map. The collation of the new map is the same as the collation of the map supplied as $map. The entries in the new map correspond to the entries of $map, excluding any entry whose key is equal to $key.

No failure occurs if the input map contains no entry with the supplied key; the input map is returned unchanged

Examples

let $week := map{0:="Sonntag", 1:="Montag", 2:="Dienstag", 3:="Mittwoch", 4:="Donnerstag", 5:="Freitag", 6:="Samstag"}

The expression map:remove($week, 4) returns map{0:="Sonntag", 1:="Montag", 2:="Dienstag", 3:="Mittwoch", 5:="Freitag", 6:="Samstag"}.

The expression map:remove($week, 23) returns map{0:="Sonntag", 1:="Montag", 2:="Dienstag", 3:="Mittwoch", 4:="Donnerstag", 5:="Freitag", 6:="Samstag"}.

20.1.2.8 fn:deep-equal
Summary

This function is extended to handle maps.

Signatures

deep-equal($parameter1 as item()*,
$parameter2 as item()*) as xs:boolean

 

deep-equal($parameter1 as item()*,
$parameter2 as item()*,
$collation as xs:string) as xs:boolean

Rules

The $collation argument identifies a collation which is used at all levels of recursion when strings are compared (but not when names are compared), according to the rules in [FO:choosing-a-collation]FO.

If the two sequences are both empty, the function returns true.

If the two sequences are of different lengths, the function returns false.

If the two sequences are of the same length, the function returns true if and only if every item in the sequence $parameter1 is deep-equal to the item at the same position in the sequence $parameter2. The rules for deciding whether two items are deep-equal follow.

Call the two items $i1 and $i2 respectively.

If $i1 and $i2 are both atomic values, they are deep-equal if and only if ($i1 eq $i2) is true, or if both values are NaN. If the eq operator is not defined for $i1 and $i2, the function returns false.

If one of the pair $i1 or $i2 is an atomic value and the other is not, or if one is a node and the other is not, the function returns false.

If $i1 and $i2 are both maps, the result is true if and only if all the following conditions apply:

  1. Both maps have the same number of entries.

  2. Both maps have the same collation.

  3. For every entry in the first map, there is an entry in the second map that:

    1. has the same key (compared using the eq operator under the maps' collation), and

    2. has the same associated value (compared using the fn:deep-equal2 function, under the collation supplied in the original call to fn:deep-equal2).

If $i1 and $i2 are both nodes, they are compared as described below:

  1. If the two nodes are of different kinds, the result is false.

  2. If the two nodes are both document nodes then they are deep-equal if and only if the sequence $i1/(*|text()) is deep-equal to the sequence $i2/(*|text()).

  3. If the two nodes are both element nodes then they are deep-equal if and only if all of the following conditions are satisfied:

    1. The two nodes have the same name, that is (node-name($i1) eq node-name($i2)).

    2. The two nodes are both annotated as having simple content or both nodes are annotated as having complex content.

    3. The two nodes have the same number of attributes, and for every attribute $a1 in $i1/@* there exists an attribute $a2 in $i2/@* such that $a1 and $a2 are deep-equal.

    4. One of the following conditions holds:

      • Both element nodes have a type annotation that is simple content, and the typed value of $i1 is deep-equal to the typed value of $i2.

      • Both element nodes have a type annotation that is complex content with elementOnly content, and each child element of $i1 is deep-equal to the corresponding child element of $i2.

      • Both element nodes have a type annotation that is complex content with mixed content, and the sequence $i1/(*|text()) is deep-equal to the sequence$i2/(*|text()).

      • Both element nodes have a type annotation that is complex content with empty content.

  4. If the two nodes are both attribute nodes then they are deep-equal if and only if both the following conditions are satisfied:

    1. The two nodes have the same name, that is (node-name($i1) eq node-name($i2)).

    2. The typed value of $i1 is deep-equal to the typed value of $i2.

  5. If the two nodes are both processing instruction nodes, then they are deep-equal if and only if both the following conditions are satisfied:

    1. The two nodes have the same name, that is (node-name($i1) eq node-name($i2)).

    2. The string value of $i1 is equal to the string value of $i2.

  6. If the two nodes are both namespace nodes, then they are deep-equal if and only if both the following conditions are satisfied:

    1. The two nodes either have the same name or are both nameless, that is fn:deep-equal(node-name($i1), node-name($i2)).

    2. The string value of $i1 is equal to the string value of $i2 when compared using the Unicode codepoint collation.

  7. If the two nodes are both text nodes or comment nodes, then they are deep-equal if and only if their string-values are equal.

Error Conditions

An error is raised [tba] if either input sequence contains a function item that is not a map.

Notes

Two nodes are not required to have the same type annotation, and they are not required to have the same in-scope namespaces. They may also differ in their parent, their base URI, and the values returned by the is-id and is-idrefs accessors (see Section 5.5 is-id Accessor DM30 and Section 5.6 is-idrefs Accessor DM30). The order of children is significant, but the order of attributes is insignificant.

The contents of comments and processing instructions are significant only if these nodes appear directly as items in the two sequences being compared. The content of a comment or processing instruction that appears as a descendant of an item in one of the sequences being compared does not affect the result. However, the presence of a comment or processing instruction, if it causes a text node to be split into two text nodes, may affect the result.

The result of fn:deep-equal(1, current-dateTime()) is false; it does not raise an error.

Examples

The expression fn:deep-equal(map{}, map{}) returns true().

The expression fn:deep-equal(map{"a":=1, "b":=2}, map{"b":=2, "a":=1.0}) returns true().

The expression fn:deep-equal(map{"a":=xs:double('NaN')}, map{"a":=xs:float('NaN')}) returns true().

let $at :=

<attendees>
  <name last='Parker' first='Peter'/>
  <name last='Barker' first='Bob'/>
  <name first='Peter' last='Parker'/>
</attendees>

The expression fn:deep-equal($at, $at/*) returns false().

The expression fn:deep-equal($at/name[1], $at/name[2]) returns false().

The expression fn:deep-equal($at/name[1], $at/name[3]) returns true().

The expression fn:deep-equal($at/name[1], 'Peter Parker') returns false().

20.1.3 Map Expressions

A new kind of expression is added to the syntax of XPath.

The syntax of PrimaryExprXP30 is extended to permit MapExpr as an additional alternative.

MapExpr := "map" "{" (KeyExpr ":=" ValueExpr ("," KeyExpr ":=" ValueExpr )*)? "}"

KeyExpr := ExprSingle

ValueExpr := ExprSingle

Note:

Two variations on this syntax are under consideration: removing the leading keyword "map", and using the token ":" in place of ":=". This would bring the syntax closer to Javascript and JSON notation. However, special lexical rules would be needed to disambiguate this use of ":" from other uses. Feedback is invited.

The value of the expression is a map whose entries correspond to the key-value pairs obtained by evaluating the successive KeyExpr and ValueExpr expressions.

Each KeyExpr expression is evaluated and atomized; a dynamic error occurs if the result is not a single atomic value. If the key value is of type xs:untypedAtomic it is converted toxs:string. The associated value is the result of evaluating the corresponding ValueExpr. The collation of the new map is the default collation from the static context. If the key value is NaN then the key/value pair is not added to the map. If two or more keys are equal under the collation of the map then the last occurrence is added to the map and the others are ignored.

For example, the following expression constructs a map with seven entries:

map {
  "Su" := "Sunday",
  "Mo" := "Monday",
  "Tu" := "Tuesday",
  "We" := "Wednesday",
  "Th" := "Thursday",
  "Fr" := "Friday",
  "Sa" := "Saturday
}

Note:

Unlike the map:new function, the number of entries in a map that is constructed using a map expression is known statically, except where duplicate keys or NaN values cause some entries to be ignored.

20.1.4 Examples using maps

This section gives some examples of where maps can be useful.

Example: Using maps with xsl:iterate

This example uses maps in conjunction with the xsl:iterate instruction to find the highest-earning employee in each department, in a single streaming pass of an input document containing employee records.

<xsl:stream href="employees.xml">
  <xsl:iterate select="*/employee">
    <xsl:param name="highest-earners" 
               as="map(xs:string, element(employee))" 
               select="map:new()"/>
    <xsl:variable name="this" select="copy-of(.)" as="element(employee)"/> 
    <xsl:next-iteration>
      <xsl:with-param name="highest-earners"
          select="let $existing := $highest-earners($this/department)
                  return if ($existing/salary gt $this/salary)
                         then $highest-earners
                         else map:new($highest-earners, map:entry($this/department, $this))"/>
    </xsl:next-iteration>
    <xsl:on-completion>
      <xsl:for-each select="map:keys($highest-earners)">
        <department name="{.}">
          <xsl:copy-of select="$highest-earners(.)"/>
        </department>
      </xsl:for-each>
    </xsl:on-completion>
  </xsl:iterate>
</xsl:stream>

 

Example: Using Maps to implement Complex Numbers

A complex number might be represented as a map with two entries, the keys being the xs:boolean value true for the real part, and the xs:boolean value false for the imaginary part. A library for manipulation of complex numbers might include functions such as the following:

<xsl:function name="i:complex" as="map(xs:boolean, xs:double)">
  <xsl:param name="real" as="xs:double"/>
  <xsl:param name="imaginary" as="xs:double"/>
  <xsl:sequence select="map{ true() := $real, false() := $imaginary }"/>
</xsl:function>

<xsl:function name="i:real" as="xs:double">
  <xsl:param name="complex" as="map(xs:boolean, xs:double)"/>
  <xsl:sequence select="$complex(true())"/>
</xsl:function>

<xsl:function name="i:imaginary" as="xs:double">
  <xsl:param name="complex" as="map(xs:boolean, xs:double)"/>
  <xsl:sequence select="$complex(false())"/>
</xsl:function>

<xsl:function name="i:add" as="map(xs:boolean, xs:double)">
  <xsl:param name="arg1" as="map(xs:boolean, xs:double)"/>
  <xsl:param name="arg2" as="map(xs:boolean, xs:double)"/>
  <xsl:sequence select="i:complex(i:real($arg1)+i:real($arg2), i:imaginary($arg1)+i:imaginary($arg2)"/>
</xsl:function>

<xsl:function name="i:multiply" as="map(xs:boolean, xs:double)">
  <xsl:param name="arg1" as="map(xs:boolean, xs:double)"/>
  <xsl:param name="arg2" as="map(xs:boolean, xs:double)"/>
  <xsl:sequence select="i:complex(
      i:real($arg1)*i:real($arg2) - i:imaginary($arg1)*i:imaginary($arg2),
      i:real($arg1)*i:imaginary($arg2) + i:imaginary($arg1)*i:real($arg2))"/>
</xsl:function>

Note:

This example demonstrates how useful it would be to allow user-defined type aliases, so that callers of this function library could write code that treats the value simply as acomplex-number, not as a map. A proposal to introduce such type aliases is under consideration.

 

Example: Using a Map as an Index

Given a set of book elements, it is possible to construct an index in the form of a map allowing the books to be retrieved by ISBN number.

Assume the book elements have the form:

<book>
  <isbn>0470192747</isbn>
  <author>Michael H. Kay</author>
  <publisher>Wiley</publisher>
  <title>XSLT 2.0 and XPath 2.0 Programmer's Reference</title>
</book>

An index may be constructed as follows:

<xsl:variable name="isbn-index" as="map(xs:string, element(book))"
    select="map:new(for $b in //book return map{$b/isbn := $b})"/>

This index may then be used to retrieve the book for a given ISBN using either of the expressions map:get($isbn-index, "0470192747") or $isbn-index("0470192747").

In this simple form, this replicates the functionality available using xsl:key and the key function. However, it also provides capabilities not directly available using the key function: for example, the index can include book elements in multiple source documents. It also allows processing of all the books using a construct such as <xsl:for-each select="map:keys($isbn-index)">

 

Example: A Map containing named Functions

As in Javascript, a map whose keys are strings and whose associated values are function items can be used in a similar way to a class in object-oriented programming languages.

Suppose an application needs to handle customer order information that may arrive in three different formats, with different hierarchic arrangement:

  1. Flat structure:

    <customer id="c123">...</customer>
    <product id="p789">...</product>
    <order customer="c123" product="p789">...</order>
    
  2. Orders within customer elements:

    <customer id="c123">
       <order product="p789">...</order>
    </customer>
    <product id="p789">...</product>
    
  3. Orders within product elements:

    <customer id="c123">...</customer>
    <product id="p789">
      <order customer id="c123">...</order>
    </product>
    

An application can isolate itself from these differences by defining a set of functions to navigate the relationships between customers, orders, and products: orders-for-customerorders-for-productcustomer-for-orderproduct-for-order. These functions can be implemented in different ways for the three different input formats. For example, with the first format the implementation might be:

<xsl:variable name="flat-input-functions" as="map(xs:string, function(*))*"
  select="map {
            'orders-for-customer' := 
                 function($c as element(customer)) as element(order)* 
                    {$c/../order[@customer=$c/@id]},
            'orders-for-product' := 
                 function($p as element(product)) as element(order)* 
                    {$p/../order[@product=$p/@id]},
            'customer-for-order' := 
                 function($o as element(order)) as element(customer) 
                    {$o/../customer[@id=$o/@customer]},
            'product-for-order' := 
                 function($o as element(order)) as element(product) 
                    {$o/../product[@id=$o/@product]} }                    
         "/>

Having established which input format is in use, the application can bind the appropriate implementation of these functions to a variable such as $input-navigator, and can then process the input using XPath expressions such as the following, which selects all products for which there is no order: //product[empty($input-navigator("orders-for-product")(.))]

Enhanced by Zemanta

A new regex engine

| No Comments | No TrackBacks
Welcome to the new blog site. I hope it meets expectations.

I've been busy integrating a new regex engine into Saxon over the last week or so.

Why?

The current arrangement is that Saxon (using code written originally by James Clark) parses the regular expression supplied to functions like matches() and replace(), or to the XSD pattern facet, and translates it into a Java JDK regular expression.

The disadvantages of this are (a) the process is inefficient, as the regular expression is effectively parsed twice (and the JDK regular expression that Saxon generates may be very long and complicated in some cases), (b) Saxon has little control over the details of evaluation, needed to achieve precise conformance to the XSD and XPath specs, (c) there are bugs in the JDK regex engine which Oracle appear to be in no hurry to fix.

In Saxon 9.4 we've moved forward to Unicode 6.0 (the JDK is still on 3 point-something) and this compounds the problem, because the expansions of character classes such as \p{Lu} become increasingly long-winded.

For Saxon-CE there is an added impetus. GWT doesn't support the Java regular expression API; instead it provides an interface to JavaScript regular expressions. We could have written yet another variant of the regex translator, including digging out the old version that targeted a regex engine with no support for non-BMP Unicode characters, but this didn't feel like an attractive option; it wasn't even easy to see how constructs such as XSD's subtraction operator ([A-Z-[aeiou]]) should be supported. It seemed a good time for Saxon to have its own regex engine.

I looked at two candidates: JRegex and Apache Jakarta. On paper, JRegex looked a better option, but I found the code extremely obscure and badly documented. I did a preliminary integration of JRegex, but it crashed on some of the tests, and understanding the engine well enough to debug and fix these problems didn't look like being much fun. So I decided to go with Jakarta.

Jakarta is not a particularly sophisticated regex engine, but it's easy to understand how it works and therefore easy to change it. It's also surprisingly small - not much bigger than the current JDK regex translator. I started by changing the front-end to handle the (slightly different) XSD and XPath dialects of regular expressions, which was straightforward enough, except perhaps for the rather subtle rules on how hyphens appearing within a character class (ie. between square brackets) are parsed, and how many digits get swallowed by a back-reference. 

The next change was to ensure that all Unicode codepoints (including surrogate pairs) are handled as single characters rather than pairs of 16-bit values. This didn't prove difficult, partly because the engine was already using an abstraction for character strings (although this was used only for the input string and not for the regex itself.) Then all character classes such as \p{Lu}, \i, and \c had to be implemented in terms of the Unicode 6.0 definitions - which wasn't hard, because the existing regex translator already understood these constructs.

At this stage nearly all the tests ran successfully (and fortunately, there's a large suite of regression tests for both XSD and XQuery).

Jakarta enforced a rule that in the construct (E)*, E must not be "nullable". For example, you aren't allowed to write (a?)*. That seems a reasonable restriction, but unfortunately there's no such limitation in the XSD or XPath specs, so I had to work out what to do with this construct. Simply removing the restriction meant that the matching engine would fail to terminate on these constructs (after all, every string contains an infinite number of consecutive empty sequences...) The pro-tem solution I've come up with, though I'm not 100% enamoured of it, is for the (*) operator to keep track of where it has been: if the same operation is applied at the same position in the input string more than once, the second attempt is deemed to be a no-match. (If anyone knows a better approach, let me know.)

The final problem shown up by the tests was that matching long input strings had a tendency to cause a stack overflow. This is because the engine is using recursion to work its way through the string, this being the easy way to implement back-tracking. It's well known that backtracking isn't a brilliant approach to regex evaluation, especially for pathological cases, but unfortunately the kind of regexes in popular use (with substring capture and backreferences) aren't true "regular" expressions in the sense that the theorists use the term, and the better algorithms can only handle truly regular expressions. At this stage I felt some pragmatic improvements were more appropriate than a rewrite using a new algorithm, and it seems that all the test cases that were failing in this way were amenable to a simple optimization: if you can detect that (E*) must be followed by something that doesn't match E, then the evaluation of E* will never need to backtrack, and therefore it can be evaluated using iteration rather than recursion.

It's now all running, both in "big" Saxon (-HE, -PE, -EE) and also in the client-side Saxon-CE engine. We haven't measured performance yet, but the results are unlikely to be surprising: initial compilation of regular expressions should be considerably faster; evaluation of simple regular expressions should be much the same, and evaluation of pathological regular expressions involving backtracking will probably be bad (as it already is in the JDK engine).


Enhanced by Zemanta

Tuples

The XQuery FLWOR expression has semantics that are defined in terms of a sequence of tuples of variable bindings. This strange concept has never really had any direct representation in Saxon, because it simply isn't needed. The vast majority of XQuery 1.0 FLWOR expressions are equivalent to a nested loop delivering a sequence of items, like nested xsl:for-each instructions in XSLT. There's only one case in XQuery 1.0 that can't be handled using this model, and that's a FLWOR expression that has two or more "for" clauses, and an "order by" clause that doesn't break down into a major key that depends on the outer "for" variable, and a minor key that depends on the inner "for" variable. It's actually quite hard to construct a realistic example that exploits this feature, so the fact that the Saxon implementation is somewhat bizarre doesn't really matter too much, because it is very rarely needed. 

XQuery 3.0, however, introduces several new clauses to the FLWOR expression. In fact, it's no longer a FLWOR expression because it now has an optional "count" clause, an optional "group by" clause, and an optional "window" clause - so it's now a FLWCGWOR expression. Not a term that is very likely to catch on. However, all these new constructs are defined essentially as operations that take a stream of tuples as input, and deliver a stream of tuples as output. So to implement these features, Saxon really needs to have some proper machinery for handling tuples. It's an odd state of affairs, having these composable constructs ("clauses") that aren't actually expressions, manipulaling things ("tuples") that aren't actually values, but that's the way XQuery works. It seems to have something to do with a relational algebra heritage, where operations defined in terms of tuples are of course the core stuff of the data model. 

To understand what these strange things called tuples are, it might help to start with simple examples and gradually increase the complexity. Consider the expression for $x in //a where $x[@b] return ($x+1) Here there's only one variable, so the tuple is a 1-tuple. The formal model of FLWOR expressions is that there is a sucession of values of the variable $x (as many as there are elements selected by //a) acting as input to the "where" clause, and the output is another succession of values of $x (this time containing as many values as match the predicate [@b]). The "return" clause then takes this stream of 1-tuples as input, and turns it into a sequence of items that are now regular values which can be stored in variables, passed to functions, and so on. Saxon actually implements this succession of 1-tuples as a sequence of items (with pipelined evaluation, of course, so it never holds the whole sequence in memory); because they are 1-tuples, they can be treated as regular values and don't require any special treatment. 

Now consider this expression: for $d in //dept, $e in //empl order by $e/salary return $d/name This is where things start getting weird, and where we need to understand tuples. The semantics are defined as delivering a sequence of ($d, $e) pairs (one pair for each department, employee combination), sorting this sequence of pairs by $e/salary, and then for each pair computing $d/name. Saxon does this by exploiting the "ObjectValue" class, which is a kind of XPath Item that encapsulates any Java object. It was designed to allow Java extension functions to return values that can be passed transparently to other Java extension functions, but it allows us to pack any kind of data we like into an Item, and this enables us to reuse all the machinery for handling sequences of items when we want to handle a sequence of tuples. In fact Saxon doesn't just pack the values of ($d, $e) into this composite pseudo-item, it also adds the values of the sort keys, and indeed the return value. Rather than sorting the input stream and then computing the return value for each tuple in the sorted input, we compute the return values for every item in the input before doing the sort. So we end up in effect with a sequence of values each of which has the form (salary, name); we sort this sequence on the value of salary, and then deliver the value of name. In contrast with the model described in the spec where the tuple holds the input variables ($d, $e), in the Saxon implementation the tuple holds the precomputed value of the return clause. Which means it doesn't actually need to contain the input values, because they will not be needed again. Precomputing the return value, and sorting all the return values along with the input tuples, is probably rather inefficient, at any rate in its use of memory. Also, it only works because a FLOWR expression can only have a single ORDER BY clause and a single RETURN clause. We therefore never need to implement a pipeline of operations working on the tuple stream; we only ever do one operation, which is sorting. 

To implement the extended FLWOR expression, we probably need to do something that is closer to the model used to describe the semantics. In this implementation the "stream of tuples" can be represented simply as a pair of integers, the location on stack where the values of $d and $e are held. When we ask for the next tuple, the contents of these variables change, but we don't actually need to return anything: it's more of an advance() operation with side-effects than a get-next() operation. The evaluation of the FLWOR expression as a whole can now be modelled like this (it's easiest to think of it in push mode, though Saxon will more often be using pull): until (exhausted) { advance(); return compute-result(); } where advance() advances the tuple stream causing the values of the relevant variables $d and $e to be updated on the stack, and compute-result() evaluates the return clause based on these values of the variables. In the case where there's an "order by" clause, the "order by" pseudo-expression works like this: until exhausted() { advance(); compute-sort-keys(); add-to-list(L); } sort(L) and when it in turn is called to advance(), it gets the next tuple from the sorted list, and updates the slots for $d and $e in the local stack frame. The sorted list needs to contain the values of the variables ($d, $e) and the sort keys, as at present, but it does not need to hold the result value; that should be computed after sorting. 

This looks like quite a substantial change, and it's because it feels fairly daunting that we haven't implemented this part of XQuery 3.0 in Saxon yet (the GROUP BY subset that's implemented in Saxon 9.3 is essentially the subset that can be implemented using conventional sequences of items, without resorting to the complexities of tuples. It also happens to match what XSLT implements, and what 99% of users actually need). However, I think I'm starting to understand what needs to be done to make this happen. Excuse me airing my thoughts in public, it always helps me to get my ideas straight to try and explain them to the world! I had been thinking of putting most of this off beyond Saxon 9.4, but I'm starting to feel more confident in tackling some of the changes sooner. 

The fact that other XQuery WG members have started contributing test material is an encouragement: implementation is always far easier if there are test cases available before you start. 

POSTSCRIPT: 1 August 2011 I have now implemented a tuple stream pipeline that handles the for, let, where, order-by, group-by, and count clauses of XQuery 3.0 - everything, in fact, except windowing. It all works quite neatly. It's a pull pipeline only at the moment, and there is currently no optimization. Rather than re-implement all existing optimizations of FLWOR expressions, in the short term I shall probably take the new FLWOR expression and rewrite it where possible into familiar forms that the optimizer already knows how to handle, so that the new tuple stream evaluation strategy is used only in cases where tuples are actually necessary, for example anything involving "group-by", anything with "order-by" over more than one range variable, anything with a "count" clause, and so on. Adding a push version of the pipeline should not be difficult, and is certainly worth doing for cases where the results of the FLWOR expression are written straight to the serializer. Michael Kay

Bytecode generation

Since the beginning of the year a lot of our development effort has been going into bytecode generation. O'Neil has been working on this full-time, concentrating on the XQuery side, and I've been putting in a fair bit of time too, adding the things needed for XSLT. We're making good progress, with perhaps 80% of the tests current running successfully. We're publishing a paper on the subject at Balisage, but I thought it would be a good idea to give a more informal progress report here. 

Firstly, what are the benefits? For the user, it looks like a fairly uniform speed-up of about 25%. Where we get more than that, we can usually see how the interpreter is doing a poor job; if we get less than that, it's because we aren't compiling code carefully enough. Comparing code that is well interpreted against code that is well compiled, the difference seems to be around 25%. 

For Saxonica, this is a development that works well for our business model. We want users to pay us to take the commercial version of the product, and they are more likely to do so if there is an immediate measurable benefit as soon as you open the box: no need to write your code differently to exploit new features, no need to lock yourself into schema-aware processing (which has great benefits, but does require up-front investment). To some extent, the optimizer in Saxon-EE addresses the same business need, but the trouble is that the benefits are non-uniform: it achieves dramatic speed-up for a small number of queries and stylesheets, but for 90% of simple user code, especially XSLT code that has already been hand-optimized using keys, it makes no difference. As well as being the best way of getting a 25% performance boost, bytecode generation has the advantage for us that the performance boost is achieved by adding code that is naturally separate from and additional to the open source codebase, so we aren't doing anything technically artificial to ensure the feature is available only in the Enterprise edition. 

Saxon has had the ability to generate Java code for some time. In fact, my first foray into this area was with Saxon 4.1 back in 1999, before the XSLT 1.0 spec was even finalized. But that attempt was dropped because it seemed more important to focus on high-level algorithmic optimization of execution paths, which offered much better gains than the 25% achievable through code generation. More recently Java source generation found its way into Saxon-EE, but it hasn't been a conspicuous success (it certainly hasn't repaid the months of effort that went into doing it.) I think there are a number of reasons for this. One is that it only covers XQuery, whereas 90% of Saxon users, especially those requiring high performance, are writing in XSLT. Another is that it doesn't cover the whole of XQuery - there are various restrictions such as use of collations, extension functions, and the like. But the biggest drawback is that it's simply too much hassle: generating Java source code, compiling it, and then running it with the configuration set up correctly just takes too much effort and thought. 

So this time around we're generating bytecode directly rather than Java source code. This means that it's completely transparent to the user, you simply won't see it happening. We're not writing bytecode that can be serialized to .class files and executed independently; it lives in memory only. This means it doesn't have to be self-contained; expressions that have been compiled are linked from the expression tree and invoked when necessary by the interpreter; similarly compiled expressions can refer back to interpreted code or other constructs such as schema definitions, collations, and the like. So the compiled code doesn't live in a separate world from interpreted code, which means we can be very selective about what we compile and what we interpret. We're not doing it yet, but the architecture certainly allows us to do "just-in-time" compilation of the parts of the code that are getting heavily used. We haven't yet tried it, but the bytecode generation should also work on .NET, in conjunction with the IKVM technology which translates bytecode to IL instructions as soon as a class is loaded. 

The tool we are using to generate bytecode is ASM. We didn't do a detailed comparison against alternatives, but it seemed to tick the right boxes and we were able to get it to work very quickly. It's weakest point is probably diagnostics - it goes to enormous lengths to verify that the generated bytecode is correct, and then gives you very little information if it isn't. We've slowly been customizing the way we use the technology to improve this - and of course, we're making fewer mistakes in the code we generate as we become more experienced. Run-time diagnostics are still difficult - sadly, we can't use a Java debugger to single-step through the code we have generated, but again we've created a fair bit of infrastructure to allow diagnostics to be injected into the generated code. 

My impression is that once you get used to it, generating bytecode is no more difficult than generating Java source (the only reason previous releases generated Java source is that I thought it would be easier - I was wrong). In some ways working at the bytecode level gives you more flexibility, because it is more composable; when generating Java you always have to worry about the fact that instructions can't appear inside an expression, which means you end up generating lots of unnecessary variables (which aren't necessarily optimized away when the code is compiled). Bytecode being a stack machine, it lends itself very well to compiling a composable expression language like XPath or XQuery. 

The decision to allow compiled and interpreted code to be mixed means that we're not making any changes to run-time data structures. In particular, XSLT/XQuery variables are still held in XPathContext objects on the Java heap; they aren't held as local variables on the Java stack. 

In deciding what should be compiled and what shouldn't, one criterion is obviously how heavily used the code is. But that's not the only criterion. Some code is very heavily used, but simply doesn't benefit from compilation. An example is conversion between doubles and strings. We could generate code to do this, but it would be no faster than calling the existing routine compiled from Java source. It's only possible for the compiled code to be faster if the interpreted code is doing things that don't need to be done. The interpreter in fact very rarely does things that don't need to be done, but it does spend a certain amount of time deciding whether it needs to do them, and that's the area where the 25% speed-up comes from. In fact, it's likely that nearly all the speed up comes from eliminating run-time testing of data that was placed in the expression tree during compilation (for example, type information) and from eliminating polymorphic method invocations (which is just another way in which run-time decisions are made). 

We're not compiling any code below the NodeInfo interface, which means that the compiled code isn't committed to a particular tree representation. It's quite possible that we could make further (small) savings by generating code that only works with (say) the TinyTree, but that would reduce flexibility. We've learnt the lesson that people appreciate improved performance, but they don't want to pay a heavy price for it, for example in a reduced ability for a compiled query to work with any input tree. 

It was definitely a good decision to delay doing bytecode generation until we had a mature stable product with advanced optimizations in place. The fact that we are generating bytecode makes it more difficult to change the repertoire of instructions on the expression tree, and thus to introduce new optimizations, but at this stage of development, we can live with that, since the major optimizations are already done.

COMMENTS
Comment notifications for this article:  
Re: Bytecode generation
by pvallone on Tue 17 May 2011 13:39 BST |  Profile |  Permanent Link
Great article. I saw that intel was developing a XSLT 2.0 processor awhile ago. I would think they were using MASM, but I am not sure. I haven't seen any more info on that project. They are light years away from Saxon.
Re: Bytecode generation
by John Cowan on Tue 17 May 2011 18:00 BST |  Profile |  Permanent Link
I'll just mention here, for the benefit of people who would rather generate Java source code (one thing's for sure, you get better diagnostics for it, and regression testing is trivial), that a good way to consume the source is with Janino <http://janino.net>, an embedded Java 1.4 compiler (it also understands static imports, autoboxing, and covariant returns). Like ASM, it generates classes directly in memory, and can compile isolated Java expressions, blocks, and class bodies as well as complete source "files".
Re: Bytecode generation
by Andriy Gerasika on Wed 18 May 2011 09:22 BST |  Profile |  Permanent Link
will this be Saxon 9.4? 

are there any plans for parallel extensions to XSLT?
Re: Re: Bytecode generation
by Michael Kay on Wed 18 May 2011 14:52 BST |  Profile |  Permanent Link
Current intent is that it will be in Saxon-EE 9.4, and will cover both XSLT and XQuery. Of course, plans can change.

Cutting Saxon down to size

It's clear that in cross-compiling Saxon to Javascript to run on the browser, reducing the size of the code could make a major impact on the performance and therefore usability of the resulting product. So I've been thinking about how one might achieve a radical reduction in the size. Rather like an incoming Chancellor faced with a record budget deficit, I've been trying to think the unthinkable to see where one might make savage cuts. 

It's instructive to compare Saxon 9.3 with Saxon 5.0, released in December 1999 as the first fully-compliant implementation of XSLT 1.0. Saxon 5.0 was 17K non-comment lines of code; Saxon-HE 9.3 is 143K, and Saxon-EE 9.3 is 210K. (The ratio of comment to non-comment lines, incidentally, has remained steady at roughly 1:1). 

Let's look at an example: the substring function, which is unchanged between XSLT 1.0 and XSLT 2.0. In Saxon 9.3, there's about 200 lines of Saxon-HE code to implement this function, plus 30 lines in EE to support compilation of XQuery to Java: there isn't any specific support for streaming, though there is for many other similar functions. In Saxon 5.0 the size of the code is 40 lines, plus 12 lines to support the rudimentary compile-to-Java facility that was present in that release, and later abandoned. In AJAXSLT, an abortive 2005 attempt by Steffen Meschkat to write an XSLT processor in Javascript, the substring function is implemented in 17 lines of code. In Henry Lindquist's 2008 XPath.js implementation of XPath 1.0, this is the implementation, in its entirety, formatted for readability: 

XPath.substring=function(s,i,l){ 
s=this.string(s); 
i=Math.round(this.number(i))-1; 
return(arguments.length==2)?s.substr(i<0?0:i):s.substr(i<0?0:i,Math.round(this.number(l))-Math.max(0,-i)); 
}; 

How do these implementations differ? 

Firstly, even the early 1999 version of Saxon (unlike either of the Javascript versions) supported Unicode astral planes (characters above 65535) properly; it also included rudimentary code to supply static type information. That meant it didn't use Java's substring() method. The code was also split into methods to support use in a run-time library called from compiled code. It could easily be reduced to 20 lines and still have full Unicode support. 

In the 9.3 version of the code, we see first some compile-time optimization, designed to avoid conversions of arguments from integer to double and then back to integers. Then we find that the 2-argument form of the function and the 3-argument form are separately implemented; and there are separate paths for the case when the arguments are integers rather than doubles, and for the case where the the input string is known to contain no astral characters (in which case the function can be implemented using the Java substring() method). It is also careful in this case to mark the output string as containing no astral characters, so that the consumer of the string can make similar optimizations downstream. 

In short, the code is ten times bigger than it needs to be, to achieve tiny speed improvements which in the case of client-side execution will never pay for the cost of the extra code that needs to be downloaded. (In Javascript, the assumption that integer arithmetic is faster than double probably needs to be reversed anyway.) Extrapolated across the product as a whole, this suggests that radical cuts are indeed possible. 

I have already cut the size of the code base for the GWT compilation from about 145K non-comment lines (in Saxon-HE 9.3) to under 80K, by cutting out unwanted functionality such as XQuery, updates, serialization, support for JAXP, Java extension functions, JDOM, etc etc. This study of the implementation of one function suggests that 40K ought to be achievable, given enough effort. That's consistent with the figure of 17K lines for Saxon 5.0, on the basis that XSLT 2.0 as a language is about twice the size of XSLT 1.0. 

Where might we look for these savings? 

The GWT compiler reports show that a significant proportion of the compiled Javascript contains string literals. A lot of these are data: character tables supporting Unicode normalization, case conversion, validation of XML characters, and regular expression handling. I've already ditched the case conversion code in favour of using Javascript's own version of the same - it's probably not 100% conformant to the XPath requirements in corner cases, but I doubt anyone will notice. The Unicode normalization tables are a vast amount of data to support one rarely-used function, normalize-unicode() - the obvious thing to do here is to put the data in a separate file on the server and download it only when it's needed. The data for validating XML characters can be halved by removing support for XML 1.1 (or for XML 1.0, take your pick). More radically, perhaps we can check whether a name is a valid name by exploiting functionality already available in the Javascript platform - if it's a valid name, then it's a valid XPath 1.0 expression, so we could throw it at an XPath 1.0 parser to find out. I'm not sure I would trust the answer (all the signs are that impementors take short cuts with conformance in this kind of environment), but perhaps I need to pick up a bit of that culture. 

For regular expressions, Saxon currently parses the regex against the XPath rules, and translates it to a Java regex. The "obvious" thing to do is to change that code as necessary to generate a Javascript regex instead. (I've already retargeted the code to use the Javascript RegExp class). The JS RegExp library doesn't support Unicode properly ("." matches a 16-bit UCS code point, not a Unicode character), so I would have to resurrect the JDK1.4 version of this code which handled the differences. But I'm not sure this is the right thing to do. Most people wouldn't complain if I simply offered Javascript regular expressions instead of XPath regular expressions. I'm not comfortable with that level of non-conformance myself, but I do think there is room here to find a more pragmatic balance. 

As regards optimization, it's quite hard to know what code one can reasonably discard. There's a lot of static type-checking logic, which has significant benefits in terms of diagnostics as well as optimization; I don't think it's right to discard all of this, but some of the more complex analysis can probably be simplified. The best thing is probably to chip away at it on a case-by-case basis. I want to reduce the "instruction set" of the compiled code, that is the number of different kinds of Expression that the compiler/optimizer generate. Some of these are used in only very specialized circumstances; some could be combined with other expressions. 

Saxon goes to a fair bit of trouble to ensure that the original stylesheet tree isn't needed at execution time. That involves copying data over from the stylesheet tree to the expression tree. This seems rather pointless in the browser environment. There's a lot of administrative overhead in maintaining things like line numbers for diagnostics that will be little use in the browser (because the XML parser doesn't supply line number information in the first place), so it should go. There's also code that unnecessarily generalized because the requirements of XSLT and XQuery are different. The whole apparatus for function binding is looking vastly over-engineered for a world in which every function call is either a system function, a constructor function for a built-in type, or a user-defined stylesheet function. 

So there are plenty of candidates. But as with the Chancellor's cuts, it's important to avoid becoming obsessive: there's a law of diminishing returns here. 

Also, the size of the download doesn't depend only on the size of the Saxon source code: it also depends on the libraries that are included. Adding in GWT's support for the HTML DOM (which is quite separate from the XML DOM) appears to have bloated the download size very substantially. I need to look at whether it's possible to avoid that or reduce its impact.
Posted to: 
COMMENTS
Comment notifications for this article:  
Re: Cutting Saxon down to size
by thorn on Sun 21 Nov 2010 20:15 GMT |  Profile |  Permanent Link
JavaScript doesn't have such thing as integer arithmetic.
Re: Re: Cutting Saxon down to size
by Michael Kay on Sun 21 Nov 2010 22:34 GMT |  Profile |  Permanent Link
That's right. GWT simulates it using double arithmetic. That's why Saxon's previous assumptions that integer arithmetic is faster are probably wrong.
Re: Re: Re: Cutting Saxon down to size
by John Cowan on Mon 22 Nov 2010 14:42 GMT |  Profile |  Permanent Link
Do you actually have evidence for that view even on the JVM? Raw floating-point and integer arithmetic are usually about equally fast on modern CPUs, and Perl is certainly performant, even though it does all arithmetic in float unless integer processing is explicitly enabled. My guess would be that the conditional processing overwhelms any remaining difference between int and float.
Re: Re: Re: Re: Cutting Saxon down to size
by Michael Kay on Mon 22 Nov 2010 16:56 GMT |  Profile |  Permanent Link
Actually, I wasn't really thinking about the cost of floating point arithmetic, but the cost of repeated conversion between integers, floating-point, and strings, which can happen an awful lot if you're not careful. I've no idea of the overall significance: one of the problems is that Saxon has accumulated a vast number of small local optimizations, most of which were probably justified at some stage by measurements on some particular use case, but which may now contribute very little in the overall scheme of things.
Re: Cutting Saxon down to size
by trygve on Sun 21 Nov 2010 22:51 GMT |  Profile |  Permanent Link
Hi Michael 

I've seen your post and tweets the last couple of weeks and the work your doing on trying to bring Saxon to the browser. I'm not sure how aware your are of this but JavaScript are not to be found only on the browser side and at the moment JavaScript are really picking up on the server side also. JavaScript are starting to be pretty powerful on the server and there are may advantages by having one language on both the server and the client. 

Again, I'm not sure how aware you are of this but the popularity of JavaScript on the server are mainly due to the surface of node.js: http://nodejs.org/ 

Node are a truly scalable, single treaded and non-blocking event server. It have gained a lot of interest due to this. Douglas Crockford (creator of JSON and architect at Yahoo) have a great talk where he talks about what makes Node a good application server: http://www.yuiblog.com/blog/2010/08/30/yui-theater-douglas-crockford-crockford-on-javascript-scene-6-loopage-52-min/ 

I must say; I would love to see Saxon coming to Node. And maybe to Node first...
Re: Re: Cutting Saxon down to size
by Michael Kay on Mon 22 Nov 2010 17:01 GMT |  Profile |  Permanent Link
Thanks for the reference, but I don't think there's any good reason to move the server version of Saxon to a different programming language, and if there were a reason, I doubt Javascript would be high on my list - the thought of writing a 250K-line system sofware package in a language with no strong typing and no integer arithmetic horrifies me. Javascript has some very nice features and I'm sure I'll warm to it as I get to know it better, but I doubt it will ever be my first choice.
Re: Re: Re: Cutting Saxon down to size
by trygve on Tue 23 Nov 2010 00:42 GMT |  Profile |  Permanent Link
I was perhaps a bit vague in my initial post, but I do not suggest a rewrite of Saxon to JavaScript. Such a task does horrify me too. 

Node have support for bindings. For instance there are bindings to several applications such as memcached (https://github.com/vanillahsu/node-memcache), mysql, postgress etc.. 

What I intended to say was that I would love to see a Saxon binding to Node with a JavaScript API so one can pass XML and XSLT to Saxon from JavaScript in Node and have the transformed document back. 

From what I've seen most Node bindings are written in C so I have no idea how it would be to add a binding to JAVA. Tough this blog post display an example of a Node - JAVA communication: http://blog.james-carr.org/2010/09/09/rabbitmq-nodejs-and-java-goodness/ 

I hope this explain my initial suggestion.
Re: Re: Re: Cutting Saxon down to size
by trygve on Tue 23 Nov 2010 00:43 GMT |  Profile |  Permanent Link
I was perhaps a bit vague in my initial post, but I do not suggest a rewrite of Saxon to JavaScript. Such a task does horrify me too. 

Node have support for bindings. For instance there are bindings to several applications such as memcached (https://github.com/vanillahsu/node-memcache), mysql, postgress etc.. 

What I intended to say was that I would love to see a Saxon binding to Node with a JavaScript API so one can pass XML and XSLT to Saxon from JavaScript in Node and have the transformed document back. 

From what I've seen most Node bindings are written in C so I have no idea how it would be to add a binding to JAVA. Tough this blog post display an example of a Node - JAVA communication: http://blog.james-carr.org/2010/09/09/rabbitmq-nodejs-and-java-goodness/ 

I hope this explain my initial suggestion.
Re: Re: Re: Re: Cutting Saxon down to size
by Christopher Sahnwaldt on Tue 23 Nov 2010 13:52 GMT |  Profile |  Permanent Link
In the RabbitMQ/Node.js example, the communication between Java and JavaScript is done over AMQP on top of TCP/IP. For a Saxon (or general Java) binding to Node.js, some other form of inter-process communication would probably be preferable. With the other bindings, library code written in C simply runs in the same process as Node.js, but that's probably not possible with Java code.
Re: Re: Re: Re: Re: Cutting Saxon down to size
by Sarah on Wed 15 Dec 2010 23:50 GMT |  Profile |  Permanent Link
For anyone who hasn't read it, I highly recommend "JavaScript: The Good Parts," also by Douglas Crockford and published by O'Reilly. It's a short read and looks good on a Kindle (or other e-reader). 

It's a great book out for those who haven't quite decided whether they like the unique taste of JavaScript's linguistic Kool-Aid. (One of the surprising things about JavaScript is how successful programmers can be without really understanding the language.) 

Besides being a fellow Yahoo, I first saw Mr. Crockford give a talk on his book at a functional programming interest group a few years back. At the time, I had the usual feelings about JavaScript: kinda hacky, weird typing, global variables, the DOM is terrible ... yuck. He changed my mind. 

JavaScript, for better or worse, is the most widely used programming language in the world. The work being done on JavaScript optimization is massive (and fairly interesting if you're the type of person who finds clever JIT techniques interesting). 

I think we're approaching a point where any engineer who doesn't really understand JavaScript is in danger of falling behind. 

PS: If you don't understand why JavaScript would be a featured topic at a functional programming interest group, you owe it to yourself to find out.
Re: Cutting Saxon down to size
by claudius on Tue 23 Nov 2010 18:36 GMT |  Profile |  Permanent Link
HI,
It is very nice to hear that Saxon will be also in JS!
I am also interested reducing the size of Saxon HE. Would you so kind to share what classes can be removed (for tests, related to Saxon EE/PE), so that to keep Saxon HE fully functional as to XQuery, XSLT, and XPath?
Thank you,
Claudius Teodorescu
Awesome work!
by Lech Rzedzicki on Wed 08 Dec 2010 15:47 GMT |  Profile |  Permanent Link
Hi Mike!

I really appreciate the effort and I'd really prefer to use a fat saxon.js than some other half-baked solution.
What I always treasured Saxon for is the completeness of the features, it really saves a lot of headaches to have a bug-free implementation.

The only other thing that I would recommend is that you release an alpha of your GWT build asap so that people can try it out.
To be honest, I didn't do much XSL client-side, but having a downloadable js implementation, would be awesome for many of my use cases

Compiling Saxon using GWT

After my TPAC rant a couple of weeks ago (see http://saxonica.blogharbor.com/blog/_archives/2010/11/4/4671786.html), a number of people whom I respect said, in effect, "don't dismiss Javascript". So I decided to do some experiments that I've had in mind for quite a while, attempting to compile Saxon's Java source code using the Google Web Toolkit (GWT), which produces Javascript output capable of running in the browser. After much hacking, I've managed to get it to compile. The next stage is to try and get it to run, and after that, to see if it will perform. But meanwhile I thought I would try and report on what I've been doing. 

My first step was to take the Saxon-HE 9.3 source code and cut away whatever wasn't needed for a minimally-conforming XSLT 2.0 processor on the browser. The code base I started from was about 143K ncloc (non-comment lines of code). I reckoned 80K might be a reasonable target to aim for. As with incoming governments slashing overblown public expenditure, the first cuts were quite easy, but they became progressively harder. First to go were XQuery-specific code, post-XSLT 2.0 extensions such as higher-order functions, Saxon-specific extensions, support for XOM, DOM4J, and JDOM, support for JAXP APIs, miscellaneous code only there for diagnostics or testing, and so on. Although schema-awareness, updates, streaming and various other features aren't supported in Saxon-HE, there's a fair bit of code that is only there to support these features, and I hacked much of it away. I got rid of the tiny tree - the idea is that the client-side product will operate directly on the DOM, though I retained the linked tree because it's needed for stylesheets. I also threw out the serializer - the aim is to deliver the result tree as a DOM. (We'll talk later about how to deliver secondary result trees). 

This exercise got the size down to about 100K ncloc, and also got rid of a lot of the dependencies that would prevent the code running under GWT. The next phase was driven by the error messages obtained when compiling under GWT. These errors fell into a number of categories: 

* dependencies on classes not supported by GWT: Properties, URI, BitSet, WeakReference, Thread, StringTokenizer, File, ByteOutputStream. In many cases it was possible to rewrite the code to avoid the dependency; in other cases it was easy to produce a replacement for the class either from scratch or by borrowing from OpenJDK. 

* regular expressions. This falls into two groups: Saxon makes some internal use of regular expressions for things such as parsing dates and durations; these uses were easily replaced using String.matches(), which GWT supports. The more complex use of regexes is to support XSLT/XPath functionality like analyze-string. So far I have adapted this code to use GWT's wrapper around the Javascript RegExp class. I haven't yet modified it to handle the differences between Java regular expressions and JS regular expressions. I think I will probably need to revert to the JDK1.4 version of the XPath-to-Java regex translator, since this works on the bases of "." matching UTF16 code units rather than Unicode code points. Alternatively, I could consider throwing conformance out of the window and simply supporting JS regular expressions instead of XPath regular expressions - I expect a lot of users would thank me for this. 

* document input and output. All the functionality that handles principal and secondary input and output documents needed to be rewritten. The basic model is to use the Javascript DOM, as exposed by GWT, for both input and output. Fortunately the existing Saxon DOM wrapper code can be used almost without change. There are a few things that may cause hassle, however. The GWT DOM does not expose ID-typed attributes, so the id() function won't work except for attributes named xml:id. It's not possible to get from an attribute to its parent element. More particularly, the GWT DOM is namespace-unaware, so operations like matching the expanded name of a node could turn out to be quite expensive. It also has nothing to reduce the pain of sorting nodes into document order. I may consider doing a one-off conversion from an input DOM to a Saxon linked tree if this improves performance (which was one reason I retained the linked tree in the code base); I might also consider doing some kind of hybrid between the current alternatives of wrapping and conversion, where there's a single pass operation over the DOM tree to collect and add additional information such as namespaces and position in document order - a kind of "eager wrapping" process in place of the current lazy wrapping of nodes that are actually visited. The other problem in this area is that GWT only supports asynchronous fetching of documents from the server (with a callback mechanism activated when the transfer is complete). That really won't work for calls on doc() and document(): fortunately Vojtech Toman (who wrote about his GWT port of the Calumet XProc engine at Balisage last year) showed me how to do synchronous access by making native calls on the underlying XmlHttpRequest object. 

* dynamic loading. GWT can't do dynamic instantiation of classes. So I scrapped all the interfaces where Saxon dynamically instantiates user-supplied classes (such as URIResolvers and ErrorListeners), and I rewrote all the places where Saxon dynamically instantiates its own classes (such as system function calls). 

The code is now about 83K ncloc, still a little above my first target, but comfortably close. It compiles, and the compiler generates 5 files of Javascript (packaged in HTML for reasons I haven't fully understood) each around 1Mb in size. I think the files are variants of each other, one per supported browser, so you're basically faced with a 1Mb download the first time you use a page that invokes an XSLT 2.0 stylesheet. That's big, but not impossibly so; it would be nice to get it down further. 

The next stage is to design and implement an API suitable for the client-side developer, and then to test that it works. Then comes performance work. Then, before we release, we'll have to think about such matters as licensing and pricing. My initial thought is some kind of model where it's free for non-commercial use but chargeable on a per-domain basis for large enterprises. But we don't have total freedom on licensing terms: we're constrained by the fact that there is some code in the product for which Saxonica does not own the copyright. 

Watch this space for further developments. 

====== 

Two further points: 

(a) the only thing I changed that explicitly causes a non-conformance is to remove the code for converting floating-point numbers to strings, using the underlying platform code instead. The existing code requires bit-twiddling on the internal representation of floating point, which isn't possible under GWT. What we lose is the precise rules for output of floating point - which given that GWT arithmetic isn't going to follow the XPath rules anyway, isn't a big loss. 

(b) I've done some compile reports on the size of the generated code. As one might expect, a large proportion of the code comes from the classes that are data-intensive, in particular those holding information about character properties. The class CaseVariants which holds upper/lower-case mappings is particularly heavy, and could probably be thrown out entirely, relying instead on whatever JS provides us with.
Posted to: 
COMMENTS
Comment notifications for this article:  
Re: Compiling Saxon using GWT
by Vyacheslav Zholudev on Tue 16 Nov 2010 13:28 GMT |  Profile |  Permanent Link
Sounds promising. Good luck! 
Would you be able to elaborate on this: 
"fortunately Vojtech Toman (who wrote about his GWT port of the Calumet XProc engine at Balisage last year) showed me how to do synchronous access by making native calls on the underlying XmlHttpRequest object." 

Thanks
Re: Re: Compiling Saxon using GWT
by Michael Kay on Tue 16 Nov 2010 17:54 GMT |  Profile |  Permanent Link
Vojtech's recommendation was to fetch documents by making a native call on XMLHttpRequest with a synchronous request. The mechanisms packaged in GWT are all asynchronous - which would be nice in theory, but the callback mechanism used to notify completion means you have to turn your code inside out to make it work.
Re: Compiling Saxon using GWT
by John Cowan on Tue 16 Nov 2010 16:40 GMT |  Profile |  Permanent Link
Well, now that you've nicely told us how to do it, I daresay someone else can release Saxon-GWT under the MPL (or any other license).
Re: Re: Compiling Saxon using GWT
by Michael Kay on Tue 16 Nov 2010 17:51 GMT |  Profile |  Permanent Link
It's always been possible in theory. They just have two minor problems: (a) testing it, and (b) persuading the world they are capable of supporting it.
Re: Compiling Saxon using GWT
by Vladimir Nesterovsky on Thu 18 Nov 2010 18:02 GMT |  Profile |  Permanent Link
I thought lately, that you could generate javascript from xslt the way you're doing with xquery to java. 

This would probably require some runtime but of much lesser size.
Re: Compiling Saxon using GWT
by Evan Lenz on Fri 19 Nov 2010 17:29 GMT |  Profile |  Permanent Link
Michael, this sounds way cool. I encourage you to keep working on it, and I look forward to reading further updates about it. 

Who knows, maybe someday you'll grow to love JavaScript and end up removing the GWT step. You already appear to be able to clone yourself anyway. :-)

Peer-to-Patent: should I take part?

I received last week an invitation to review a patent application, under the pilot (and strangely-named) peer-to-patent scheme whereby anyone who might be thought to be interested is invited to comment on patent applications during their examination by the patent office. The specific patent in question is at http://www.peertopatent.org/patent/20100250576/overview: it's a Microsoft application for an optimization technique for SPARQL queries. 

First question I have to ask myself is, should I take part? Intrinsically, it feels like sleeping with the enemy. It seems a very strange process, exploiting the willingness of the intellectual community to share knowledge in order to improve a process whose very purpose is the antithesis - to assign monopolies of knowledge. Do I want to participate in a process that might encourage the authorities to award fewer bad patents? Or do I think that bad patents are ultimately a good thing, because they will inevitably, in time, bring down the whole system? There are issues of tactics here, and issues of high moral principles. I'm going to compromise: I will respond telling them I refuse to take part in the process, but they can read my blog here for my views on the patent application if they want them. 

Another little tactical point: apparently in the US, software engineers are routinely advised not to read patents, on the grounds that the fine for infringing a patent is trebled if you knew about it. I don't know if that's really true - I find it hard to believe that in the land of the free, you can be fined for reading information that's in the public domain. If it is true, it makes me proud to be non-American. Anyway, I can hardly claim ignorance of a patent that I've been invited to review, so I'll take the plunge and click on the link; fortunately it turns out to be of little relevance to anything I'm doing in my products. 

First observation on the patent application (and others I have seen recently) is that it's much more readable than the applications of ten years ago. I don't think it uses the giveaway patent-speak word "plurality" even once. It's written by a software engineer, not a patent-lawyer trying to make it unreadable to other software engineers. That can only be a good thing. It also hardly tries to disguise the fact that it's a pure software patent. There's a bit of nonsense in the claims about patenting the media on which the software is written, but no serious pretence that the patent applies to a machine rather than to pure algorithms. It seems that in the US at least, you no longer have to pretend that it's not about software in order to get your patent application through. 

What about the substance? I'm told that the two ways a patent claim can be challenged are on the grounds of "obviousness" or on the grounds of "prior art" (i.e., someone else has done substantially the same thing before). I'm afraid I really don't know enough about how these tests are supposed to be applied. Wikipedia tells me that "obvious" means "obvious to one with ordinary skills in the art" and interprets that as someone with "the normal skills and knowledge in a particular technical field, without being a genius." Well, if that means obvious to the average programmer, then the stuff in this patent certainly isn't obvious. If it means obvious to the average person doing research or implementation in logic-based query languages, then it almost certainly is. I'm not working wit logic-based query languages, but it looks to me like the kind of solution I would immediately come up with if I were faced with this problem. The "naive solution" to the problem that they claim to be improving on is a strawman; you would have to be very ordinary to think that was the best you could do. 

As to prior art, again, I don't know enough about how this is evaluated. Does someone have to be tackling exactly the same problem and coming up with exactly the same solution to qualify as prior art? Or can one simply point out that this is just a specific example of a general approach that has been used by competent programmers for decades? 

This patent application points to all that is wrong with the patent system. It's a simple solution to a simple problem. It's impossible to say whether it's original, and it's impossible to say whether it's obvious. But what one can say with certainty is that there is no way it is in the public interest to give Microsoft a 25-year monolopy on solving this particular little problem in this particular way. Anyone who thinks doing that is going to encourage innovation needs their head examining.
Posted to: 
COMMENTS
Comment notifications for this article:  
Re: Peer-to-Patent: should I take part?
by Ed Davies on Sun 14 Nov 2010 16:02 GMT |  Profile |  Permanent Link
I know what a "monopsony" is but not a "monolopy" though feel it must have something to do with rabbits, somehow. 

More seriously, it is at least good news that this patent is readable. I tried to read one from the late 1980s a year or so ago and couldn't make much sense of it. This is mildly embarrassing as I wrote it, at least the version before the patent lawyer mangled it. (I suspect our paths to work crossed at the time as I lived in Reading and worked on Nine Mile Ride.) 

Still, I think it's a pity not to comment formally on this patent application. If there are even one or two comments from reasonably knowledgeable people saying "obvious" it will be very difficult to sustain the patent. If a lot of applications of this sort get shot down then maybe the whole system can be tidied up eventually. 

Personally, I don't think software patents are necessarily evil: there might be a very few cases a decade where patents might actually be justified. I'm thinking of some cryptographic algorithms, for example, where there's some real insight and work in testing the results. On the other hand, if the system is incapable of distinguishing these from the mere "nuisance" patents then we'd probably be better off, on balance, with no patents at all.
Re: Peer-to-Patent: should I take part?
by John Cowan on Mon 15 Nov 2010 15:35 GMT |  Profile |  Permanent Link
Ed: A monolope is plainly an antelope with only one horn; and I'm sure you can figure out what a postlope is. 

Michael: I find your attitude disturbingly similar to that of 19th- and 20th-century revolutionary leftist parties in the West, who were always opposed to anything that could improve the lives of the workers, on the grounds that it would only delay the inevitable Revolution. Well, maybe so; but people must live in the meantime. If revolution or a collapse of the patent system were truly inevitable, there might be some justification (though I doubt it) for such an attitude. But the patent system, like the monarchy with which it is intimately connected historically, is old and strong, and has survived centuries of drastic changes by itself changing. 

In addition, there is an equivoque on "improve the process". In the sense of improving the *efficiency* of the process, it's already been improved immensely: getting a patent is far easier than before, though still slow and monstrously expensive. What is at stake here is improving the *rectitude* of the process, so that the only patents that are granted are those that are truly novel. 

As for the treble damages, they do indeed exist. The idea is that knowing and willful infringement is much worse than mere ignorant infringement, and you can't be a knowing and willful infringer if you didn't know the patent existed. When dealing with strategic behavior, you are often better off if it is known that you do not know certain things. If you drive an armored car carrying a safe full of valuables, for example, you will prefer not to know the combination of the safe, so that (a) you cannot be suspected if the safe is emptied, and (b) no hijacker can get the combination out of you by unpleasant means.
Re: Peer-to-Patent: should I take part?
by Bambax on Tue 23 Nov 2010 08:24 GMT |  Profile |  Permanent Link
> Anyone who thinks doing that is going to encourage innovation needs their head examining 

... "examined"?
Re: Re: Peer-to-Patent: should I take part?
by Michael Kay on Tue 23 Nov 2010 09:14 GMT |  Profile |  Permanent Link
Not sure if that was intended as a criticism of my grammar? I don't know how one justifies "I need my shoes mending", but as colloquial usage I hear it quite often. And in this kind of writing, the occasional colloquialism can make the piece less dry.

TPAC: The Morning After

I'm feeling very angry with myself this morning - angry that I wasn't angry yesterday, during the TPAC plenary meeting. Angry that no-one else was angry; as several people remarked from the platform, it was an incredibly quiet meeting with almost no public dissent. And yet we should all be furious. 

Yesterday the browser vendors showed us some cool stuff that we will soon be able to do on the web. Text that runs at an angle, boxes with round corners, logos that spin, graphics that pulse to the beat of the music, forms that (wow!) check that you've entered a valid date. And everyone tweeted "Wow, that's cool". Well, it's not cool. It's a tragedy, and I'm angry about it. 

What we saw yesterday was the crudest possible demonstration of the power base of the browser vendors. We don't get to decide whether users can see boxes with round corners or logos that spin - the browser vendors get to decide it. They get to decide what date formats we can use when filling in forms. They get to decide that we can use ABCnotation but not MusicXML, which means we can listen to folk music but not to classical music. Sure, the browser is programmable; but they've decided we can write any program we like so long as it's in Javascript. What about people who want to write client-side software in XSLT 2.0? The browser vendors get to decide whether that's cool or not. What about people who want to do linked data using RDF? The browser vendors decide. 

For twenty years this industry was dominated by IBM. Their dominance was broken by the open systems movement; but instead of the open systems movement taking control, they allowed Microsoft to take over the seat of power while no-one was watching. Then for twenty years Microsoft controlled our destiny, until the open source movement once again gave the community some influence over its own future. What we saw yesterday is that control is now in the hands of a cabal of four or five browser vendors who between them are going to decide the experience that billions of people will get from the web. They are the style police, they decide what's cool: they decide what kind of typographical effects we can see, what kind of music we can listen to, what kind of programming language we can write in. 

We should be angry about it, and I'm ashamed that yesterday, no-one was. Like crowds saluting a great dictator, everyone just parroted "Wow, cool, man!". We hummed their tune. W3C, who should be championing the openness of the web, instead allowed the junta to take the platform. I should have got up and had a rant from the microphone, and all I managed was a lonely tweet. Sorry: I let you down. 

Meanwhile, enjoy playing with your new toys. Those in charge are hoping they will keep you quiet.
Posted to: 
COMMENTS
Comment notifications for this article:  
Re: TPAC: The Morning After
by Henry Story on Thu 04 Nov 2010 08:58 GMT |  Profile |  Permanent Link
There's no reason to limit yourself to browser vendors really. In my opinion every application should become a browser. See for example Beatnik, the Semantic Address Book https://sommer.dev.java.net/AddressBook.html 

Now what is interesting is if you look at apps such as Tweetdeck, Nambu, etc is why they don't extend to become information browsers? Nambu had integration with identi.ca at one point. Then it dropped it. In the Blogging space everyone made blog readers to read all blogs, now client vendors are locking themselves into one company's web site. Weird... 

Could you provide a WebID enabled blog, so I don't have to create myself a username/password just to comment? 
http://esw.w3.org/foaf+ssl
Re: TPAC: The Morning After
by Tab Atkins on Thu 04 Nov 2010 09:37 GMT |  Profile |  Permanent Link
I don't understand your argument. You seem to be saying that browser vendors are doing something evil by deciding what to pay attention to, because they leave out other technologies you may want to use.

You... do realize that there are a finite number of people working on browsers, with a finite amount of attention, right? There is more possible stuff to implement than they can ever actually do, so they *must* discriminate.

Does *somebody* want to do client-side software in XSLT2.0? Sure, it's likely there's at least one person who wants that. Is the reward from implementing the necessary support worth the effort of that implementation? So far, the answer is no. That's fine. There are simply more valuable things to spend time on.

You seem to be arguing that this prioritization of time is, itself, an evil thing. You even call browser vendors a "junta" for doing this. This seems incoherent (not to mention hyperbolic). What are they supposed to do? Work on everything at once? That's impossible. Make it possible for you to add anything you want? That can't get any easier than it already is, where you can write a plugin or hack the open-source code directly. Work on the specific things you want? Prove that enough other people care about it, and browsers will do so.

Is there something I'm missing here? Right now you seem to just be shouting that you want browsers to implement everything at once.

(Also, I second the call for some other commenting authentication. Forcing readers to actually register just to post a comment is obnoxious. I *almost* just quit, after having written my whole comment, just because I found it annoying.)
Re: Re: TPAC: The Morning After
by Michael Kay on Thu 04 Nov 2010 10:20 GMT |  Profile |  Permanent Link
Read it more carefully. I didn't say the browser vendors were evil or that they were adding the wrong features. I just bemoaned the fact that they have total control over our destiny. We need more openness and extensibility than they are giving us. Sure, we can write Javascript; but many of the things we would like to do for our users are too expensive to do in that way. And we can write our own browsers (or alternative clients), or we can write browser-specific plug-ins; but in most cases that doesn't deliver what our users need either. For most practical purposes, if we want to deliver significant new client-side functionality, we are at the mercy of the browser vendors.
Re: Re: Re: TPAC: The Morning After
by Tab Atkins on Thu 04 Nov 2010 13:33 GMT |  Profile |  Permanent Link
Dude, you call them a junta. 

Anyway, yes, I still think your position is incoherent. Yes, browser vendors are in charge of what you use, by virtue of the fact that you use browsers, and they write them. "Openness" and "extensibility" are not magical. You would not be satisfied simply by having an XML parser around. You need actual *support*, which takes time and effort. 

Time and effort are finite. We have more specs than we have time. Thus, some specs won't be implemented. You're basically asking for browser vendors to implement everything at once, and then declaring them a junta when they say "Um, we don't have the resources to do that." 

If you just objected to the *particular* choices browser vendors made, that would be fine. It's cool if you think they prioritized incorrectly. Your argument, though, is that they shouldn't prioritize at all, and by prioritizing they are doing something evil. 

You have exactly as much openness and extensibility as can be offered. If you want support for a new technology in the browser, you have to *implement it in the browser*. Luckily you can do that, since half the browsers are open-source. This is difficult, sure, but by its very nature you can't make it any easier.
Re: Re: Re: Re: TPAC: The Morning After
by Michael Kay on Thu 04 Nov 2010 14:50 GMT |  Profile |  Permanent Link
I chose my words carefully. "Junta" refers to the exercise of power with no democratic legitimacy. It does not imply any judgement about whether the decisions being made are right or wrong, good or evil.
Re: Re: Re: Re: Re: TPAC: The Morning After
by Tab Atkins on Thu 04 Nov 2010 15:50 GMT |  Profile |  Permanent Link
So, are you unhappy with the particular decisions being made, or with the fact that decisions were made at all? Your original post suggests the latter, which I find incoherent. 

You say "What if someone wants to develop a client-side app with XSLT2?". What if they do? This question isn't rhetorical. If enough people want to do this, then browser vendors will find it worthwhile to implement support for it. 

This is where I expect you'll object, given that this is the point of your original post. "Why do the browser vendors get to decide whether it's valuable enough to support?" The simple answer is that *someone* has to decide this. *You can't support everything.* There's no magic wand you can wave, no trivial little switch to implement in your code that will make XSLT2 be implemented, or RDFa, or MusicXML, or insert-your-favorite-spec-here. 

Someone has to spend a non-trivial amount of effort to make these technologies be supported in browsers. Who do you think does that effort? Browser vendors. You can do it too, if you want, if you have the necessary expertise. Unfortunately, there's no way to make this easier. There's no way to expose the necessary tools to authors to let them implement these things themselves without making it as difficult as just downloading the browser source and hacking on it directly. 

So, what's your alternative? You're angry about something, but unless I'm mistaken, it's about the fact that programming is hard and takes time. Sorry, that can't be fixed, and it's not the W3C's fault. 

(The only other thing I can see your argument meaning is that you're angry that browsers are insufficiently democratic. Presumably by "democratic" you mean something other than "whatever the most users want", since that's what they already do in order to compete with other browsers. I'm not sure what you could be meaning then, though.)
Re: Re: Re: Re: Re: Re: TPAC: The Morning After
by Michael Kay on Thu 04 Nov 2010 20:50 GMT |  Profile |  Permanent Link
>Who do you think does that effort? Browser vendors. You can do it too, if you want, if you have the necessary expertise. Unfortunately, there's no way to make this easier. There's no way to expose the necessary tools to authors to let them implement these things themselves without making it as difficult as just downloading the browser source and hacking on it directly. 

I don't believe that it can't be made easier. I don't believe it is impossible to design and deliver a cross-browser application programming interface that opens the platform to the world. With a will, it could be done. 

I think this is the essence of the problem. If it's not possible for third parties to add functionality to a browser without (a) getting the browser vendor to do the work, (b) hacking the browser source, or (c) writing the app in Javascript, then the architecture of the platform is badly broken - in fact, it can hardly be described as a platform at all. Yet that is the situation we are in.
I wouldn't dismiss JavaScript
by Simon St.Laurent on Thu 04 Nov 2010 20:53 GMT |  Profile |  Permanent Link
but otherwise, you're right.
Re: Re: Re: Re: Re: Re: Re: TPAC: The Morning After
by Tab Atkins on Thu 04 Nov 2010 22:49 GMT |  Profile |  Permanent Link
" I don't believe it is impossible to design and deliver a cross-browser application programming interface that opens the platform to the world." 

Shrug. I disagree. The web and the browser are both immensely complex, and very security-sensitive. If the browser exposed an interface that allowed a page author to, say, define something equivalent to FileSaver, that would be *scary as hell*. Arbitrary, potentially evil websites could abuse that to save files to my system without my permission. 

This is, of course, precisely what every program you've ever installed on your computer can do as well. Normal people don't really understand security around installing normal programs; they have even less awareness of the security implications of web pages, and would be totally screwed if webpages started being literally as powerful as installed programs (rather than just having all the same useful powers). Hell, *I'd* be screwed in that case.
Re: Re: Re: Re: Re: Re: Re: Re: TPAC: The Morning After
by Alain Couthures on Sat 06 Nov 2010 16:59 GMT |  Profile |  Permanent Link
The draft FileWriter API (http://dev.w3.org/2009/dap/file-system/file-writer.html) is indeed very interesting for offline applications and many institutional users of XForms are already asking me for it in XSLTForms. 

Allowing to limit this API with the browser preferences is obviously required for security reasons. An XML document is always harmless... unless when interpreted by a browser. Validation using a schema might help to limit risks. 

Nevertheless it's a great API required for real applications and browsers vendors have to understand this.
Re: Re: TPAC: The Morning After
by Dimitre Novatchev on Thu 04 Nov 2010 16:24 GMT |  Profile |  Permanent Link
<quote>Does *somebody* want to do client-side software in XSLT2.0?</quote> 

Such a question can only come from a person that hasn't done his homework. 

The fact is that there is a significant number of people asking questions on forums such as stackoverflow.com why no browsers support the use of XSLT 2.0 processors. It shouldn't be difficult to provide XSLT 2.0 support in a browser that already provides XSLT 1.0 support -- what is needed is just a switch to the newer (version) XSLT processor.
Re: TPAC: The Morning After
by Alain Couthures on Thu 04 Nov 2010 10:45 GMT |  Profile |  Permanent Link
While implementers know what's in their own implementation and what can be added more or less easily, they are not always good at deciding what can be the best for users. Market manipulation consists in presenting anything as the ever best product created and some really good products or ideas cannot succeed because of that.

Should there be a maximum quota for implementers in W3C Working Groups?

I'm convinced that Web technologies are going to be the everyday technologies for any kind of softwares. Universities and software companies should contribute more actively. Why not having free members?
Re: TPAC: The Morning After
by Shelley on Thu 04 Nov 2010 14:02 GMT |  Profile |  Permanent Link
That the browser vendors are determined to be the sole authority on implementation has been a discussion in the past, but there's been little or no move within the W3C to check this. 

Consider the current WhatWG and W3C HTML WG parallel effort that causes confusion and more that a few problems. This could end, easily, if three of the browser vendors stated the WhatWG will be no more. That they don't is tantamount to a threat to the W3C: do what we say or we'll pick up our marbles and go elsewhere. 

Recently the HTML5 accessibility subgroup came forth with a list of video requirements, and the actual recording of these requirements was challenged. That's unheard of in the computing world. The users provide requirements, and afterward, work with the developers to determine what can, and cannot be done. 

If this were a business environment what is happening is the equivalent of the development staff going into a meeting with the users and the developers telling the users, keep your user requirements, we'll tell you what you'll get. 

However, the W3C enables this. Look at the leadership for the HTML5 working group: two browser representatives, and someone from IBM. The W3C has signaled exactly who matters, and it isn't thee, or me. 

Why would the organization do differently just because you're all face to face?
Re: Re: TPAC: The Morning After
by Tab Atkins on Thu 04 Nov 2010 15:58 GMT |  Profile |  Permanent Link
"That the browser vendors are determined to be the sole authority on implementation has been a discussion in the past, but there's been little or no move within the W3C to check this." 

The browser vendors *are* the implementors. I have no idea who else you could possibly think should be the authority on implementation than the implementors themselves. Are you suggesting that browser vendors should be forced to implement things based on someone else's judgement? 

"Recently the HTML5 accessibility subgroup came forth with a list of video requirements, and the actual recording of these requirements was challenged. That's unheard of in the computing world. The users provide requirements, and afterward, work with the developers to determine what can, and cannot be done." 
The HTML5 accessibility subgroup is not the users. Disabled users are the users. (The a11y subgroup does indeed contain many disabled people, of course.) 

The hope is that the accessibility subgroup *represents* disabled users, and produces a list detailing their requirements. The list they actually produced shows that they didn't succeed 100% at that goal - most of the goals are perfectly fine, but some are clearly not accessibility requirements. 

"If this were a business environment what is happening is the equivalent of the development staff going into a meeting with the users and the developers telling the users, keep your user requirements, we'll tell you what you'll get." 
It is a well-known truism that users rarely understand what their actual requirements are. As a software developer, your job is almost *never* to give the user what they ask for; it's to give the user what they *need*. You figure out what they need by looking at what they ask for. Look at basically any credible document giving advice on requirements gathering.
Re: Re: Re: TPAC: The Morning After
by Tab Atkins on Thu 04 Nov 2010 16:04 GMT |  Profile |  Permanent Link
"As a software developer, your job is almost *never* to give the user what they ask for; it's to give the user what they *need*. You figure out what they need by looking at what they ask for. Look at basically any credible document giving advice on requirements gathering." 

Let me put that more strongly. If you give the user what they asked for instead of what they need, you have *failed*. It is a dereliction of your duty as a software developer to slavishly follow the precise requests of the user.
Re: Re: Re: Re: TPAC: The Morning After
by Shelley on Thu 04 Nov 2010 16:27 GMT |  Profile |  Permanent Link
Oh good lord, no. 

I've worked for a quarter of a century in a variety of industries--from creating a touch screen application for a door factory in Wisconsin to working on Saudi Arabia's air defense system. You never pre-determine user needs. You never do this, because you don't know the business, you don't necessarily know what the people want or need. 

Users put together requirements, and _from this_, you work with the users to determine what is feasible, and what isn't. There are restrictions, such as available technology, budget, government restrictions, and so on, but within these constraints, the users should have a significant say in the finished product. 

Web specifications are no different. How the standards are implemented, the exact algorithms to meet the end goal, should be up to the implementors, but the standards themselves should be driven by the users--of which the browser companies are just one of the many. The browser companies should say what is or is not doable, or is insecure, or inefficient. In addition, those creating other tools and libraries should also provide feedback about what works. This in addition to accessibility groups, designers, and web developers--all users, all with a stake in the final product. 

Anything else is the height of arrogance. Not only that, it's foolish in the extreme. If you put something into a standard that only browser companies want, other users aren't going to be interested. So then the browser companies have spent time implementing something no one needs, or wants. 

Re: Re: Re: Re: Re: TPAC: The Morning After
by Tab Atkins on Thu 04 Nov 2010 23:10 GMT |  Profile |  Permanent Link
"You never pre-determine user needs. You never do this, because you don't know the business, you don't necessarily know what the people want or need." 

Luckily, that's not what I said. Try again?
Re: Re: Re: Re: Re: Re: TPAC: The Morning After
by Shelley on Thu 04 Nov 2010 23:28 GMT |  Profile |  Permanent Link
"t is a well-known truism that users rarely understand what their actual requirements are. " 

Re: Re: Re: Re: TPAC: The Morning After
by Dimitre Novatchev on Thu 04 Nov 2010 16:35 GMT |  Profile |  Permanent Link
I wonder at this arrogance ...
It does explain a lot
by Simon St.Laurent on Thu 04 Nov 2010 16:37 GMT |  Profile |  Permanent Link
Too much, I fear.
Re: Re: Re: TPAC: The Morning After
by Shelley on Thu 04 Nov 2010 16:41 GMT |  Profile |  Permanent Link
"It is a well-known truism that users rarely understand what their actual requirements are." 

That's a very limited view of the users. I worked in one place where the company decided to locate the computing department people among the end users. You quickly learn that a) end users really don't like tech professionals condescending to them, and b) tech professionals don't have all the answers. 

You figure out what the users need by working _with_ them, not taking what they give you, or you think they've given you, and then using your own viewpoint to determine that they can or cannot have. 


Re: TPAC: The Morning After
by Simon St.Laurent on Thu 04 Nov 2010 14:17 GMT |  Profile |  Permanent Link
You're in a pretty unique position, as you've built amazing implementions of some of the more mind-boggling specifications the W3C has produced. You've certainly participated in the conversation leading to those specifications, and your implementation experience was critical to improving those specifications. 

To me, that seems like exactly how the relationship should work. You've actually lived the ideal, so I'm not surprised by your shock at how this process is working. 

While I like a lot of what's coming in and with HTML5, it's hard to have any faith at all in the process bringing it for the reasons you describe. My long-run hope is that HTML5 will be an energizing first draft, drawing attention and implementation experience, followed by a more serious conversation about how well these things work and how to refine them. 

(I suspect you're already aware of the utter lack of tolerance those driving the HTML5 process have for any criticism of it. "It may be a process trainwreck, but it's producing these shiny things, and how dare you criticize." Your comments thread already has a dose of that, but thanks for posting in a difficult environment.)
Re: Re: TPAC: The Morning After
by Tab Atkins on Thu 04 Nov 2010 16:01 GMT |  Profile |  Permanent Link
"You're in a pretty unique position, as you've built amazing implementions of some of the more mind-boggling specifications the W3C has produced. You've certainly participated in the conversation leading to those specifications, and your implementation experience was critical to improving those specifications. To me, that seems like exactly how the relationship should work. You've actually lived the ideal, so I'm not surprised by your shock at how this process is working." 
I'm sorry, but I don't understand what you're trying to say here. You just described the process as "specifications are created, people implement it and give feedback, specification is amended". That's precisely what's happening here. 

More importantly, what you just said seems like a non sequitur given the original post. In the OP, Kay is speaking not as an implementor, but as a user of the implementation, and he is angry that browsers haven't implemented every possible spec that he might want to use.
Re: Re: Re: TPAC: The Morning After
by Simon St.Laurent on Thu 04 Nov 2010 16:18 GMT |  Profile |  Permanent Link
Oh, good. Now I get to reply to the HTML5 auto-reply-to-criticism, complete with "I don't understand..." 

Mike Kay has been in these processes for years, in situations where they actually worked. Implementers were part of the conversation, not the dominant force. There was no WHATWG to jump up and down and insist that only its opinions really mattered. Those processes weren't perfect - I'm far from a W3C fan - and I know they included developer tantrums, balanced by other tantrums and some occasionally calm conversation. 

Pretty much every document-writing process is create, feedback, amend. The ugly questions here are the ones Kay raised - WHO creates, WHO gets listened to, and WHO amends. 

It's one thing not to implement specs - I think we're all aware how that works. It's another thing entirely to use that power of implementation to tell a specification group what it should specify. It's not at all to the W3C's credit that it's let what's left of its process get shat on this way.
Re: Re: Re: Re: TPAC: The Morning After
by Tab Atkins on Thu 04 Nov 2010 23:13 GMT |  Profile |  Permanent Link
"Oh, good. Now I get to reply to the HTML5 auto-reply-to-criticism, complete with "I don't understand..."" 

Sigh. 

" The ugly questions here are the ones Kay raised - WHO creates, WHO gets listened to, and WHO amends." 

In order: the browser vendors, the browser users, and the spec writers.
Thank you for hearing
by Simon St.Laurent on Thu 04 Nov 2010 23:33 GMT |  Profile |  Permanent Link
but alas, not listening. 

WHO creates the spec should not be identical to WHO creates the software. Period. That's at the heart of Kay's original complaint. 

A conversation with "browser users" is a difficult thing to imagine. So far as I can tell, WHATWG simply cherrypicks the users it cares to be interested in. 

And WHO amends is all too tightly in the hands of the writer who pretends his title is editor. Perhaps the most appalling part of this entire process is that the W3C hasn't removed Hixie from a position that's supposed to reflect consensus. 

Oh well. In a decade I'm sure we'll all laugh about this.
Re: Re: Re: TPAC: The Morning After
by Michael Kay on Thu 04 Nov 2010 20:55 GMT |  Profile |  Permanent Link
>he is angry that browsers haven't implemented every possible spec that he might want to use. 

No, no, no. I'm angry that browsers haven't made it possible for someone else to implement every spec that I might want to use. I'm angry that the platform is such a closed one. I'm angry that we as a community created such a closed platform in which we can only move forward as fast as the slowest browser vendor.
Re: Re: TPAC: The Morning After
by Gerrit Imsieke on Thu 04 Nov 2010 21:38 GMT |  Profile |  Permanent Link
One of the main issues is that the WHATWG people are focused on applications rather than content. Their view on content is either simplistic (like this blog postings' markup) or object model / UI centric. Despite the cool bells & whistles, parts of HTML5 are certainly reactionary, disregarding the markup and markup processing standardization efforts of the past 13 years. 

Maybe they don't think they're as reactionary as they are. They probably regard themselves as revolutionary because they don't need rely on RDBMSs but on JSON or other object storage facilities. They don't see the benefit that XML and related standards have brought to content processing. They just see that for their purposes, XML-related standards are difficult to implement, and they (the WHATWG/HTML5 people) do not benefit from implementing XML-related standards because they manipulate, for example, SVG with a DOM or some other Javascript interface and not XSLT/XQuery. Their XML is JSON, their XSLT/XQuery is Javascript. 

Yesterday I was shocked to see a similar backlash movement in my own company. 

I think there's no other way than to get involved with the Mozilla project, convince them that it won't hurt them feature- or supportwise to include an advanced XSLT/XQuery processor, and create cool applications that run with client-side XSLT/XQuery/XPath 3.0, such as interactive books or balance sheets delivered as purely declarative markup together with sleek & concise stylesheets. 

What may be missing from XSLT's side in this application scenario: an event model, i.e., a standard way to bind user interactions to certain transformations on certain content, and a way to make sense of user-submitted values in XSLT templates. From what I saw at XML Prague 2010, some XQuery vendors are more advanced than XSLT in the field of interactive applications. But I think XQuery's capabilities are somewhat limited when it comes to perform complex content transformations, and so are Javascript's. I'd like to see the elegance and expressive power of XSLT/XPath 2.0/3.0 applied to content-based Web applications.
Re: Re: Re: TPAC: The Morning After
by Michael Kay on Thu 04 Nov 2010 21:54 GMT |  Profile |  Permanent Link
>I'd like to see the elegance and expressive power of XSLT/XPath 2.0/3.0 applied to content-based Web applications. 

Yes - there's enormous potential here to apply XSLT's event-based processing model to create an interactive user interface. I must take a closer look at what XSLTForms has already achieved.
Re: Re: Re: Re: TPAC: The Morning After
by Alain Couthures on Fri 05 Nov 2010 08:41 GMT |  Profile |  Permanent Link
About XSLTForms and browsers, there are two distinct parts to be considered: the XSLT initial step and the Javascript XForms engine. 

Browsers support XSLT 1.0 (except Konqueror) but vendors don't seem to be concerned by reported bugs. The most important problem is in Mozilla FireFox: the namespace axis is not supported... There is at least one issue with the Opera XSLT engine about passing parameters to a template... IE9 will use MSXML6 while previous versions still use MSXML3 (XPath not being the default query language!). 

Hopefully the XSLT engines are always very fast comparing to Javascript interpreted instructions. For XSLTForms, they really act as compilers and it's one of the reasons of XSLTForms success comparing to pure Javascript solutions approach. XForms controls and events are translated into HTML controls and events+Javascript objects, that's all! XPath expressions are translated in Javascript objects, they are not parsed and evaluated everytime, they are compiled. 

But it is really a shame not to have XSLT 2.0 engines because parsing strings with XSLT 1.0 is awful, because the lack of sequences forced me to construct results with structured strings to be analyzed, after, with substring-before() and substring-after() functions,... 

The Javascript engine of XSLTForms doesn't depend on any Javascript framework. It's good because extensions can be done with any of them but it means that specific instructions for each browser have to be present. As usual, Internet Explorer is the most concerned by this problem but it is not the only one. 

XSLTForms first releases used a specific Javascript DOM implementation because of differences in browsers but it was too slow. Now, XSLTForms uses native XML documents. Still, there are major differences on the DOM API to be addressed by different Javascript instructions. 

XSLTForms still has its own XPath engine because there are XForms specific XPath functions but also because node dependencies have to be collected to reduce refreshing operations when necessary. This will allow to develop a specific XPath 2.0 engine and I am still thinking of implementing a basic XQuery engine with the same approach. 

I have also translated one of the XSLT 2.0 stylesheets of graph2svg project into an XSLT 1.0 stylesheet to be launched by the browser for displaying SVG graph from data. Again, the lack of sequences but also of math functions required a lot of complexity. 

Don't hesitate to contact me for more details about XSLTForms. 

For me, XML technologies are like LEGO bricks and I'm very glad that the W3C understood that, a long time ago. Obviously, at least some browser vendors haven't that in mind yet. 

Thanks! 

-Alain
Re: TPAC: The Morning After
by mnot on Thu 04 Nov 2010 23:38 GMT |  Profile |  Permanent Link
I guess I'm having trouble getting worked up about this. 

Considering how badly the Enterprise world has fucked up, the Mobile world has fucked up, and how every app store is turning into a private money machine, the fact that the browsers are working together to innovate and provide multiple implementations of a standard -- which granted they are driving -- doesn't seem so bad. 

Yes, this empowers them. If their failing is that they haven't figured out how to deliver the all-singing-all-dancing extensible application delivery platform that's interoperable across multiple uncoordinated implementations, I'll forgive them for that. But they better get their act together soon, dammit!
Re: TPAC: The Morning After
by Jeff-w3c on Fri 05 Nov 2010 11:58 GMT |  Profile |  Permanent Link
Michael, 

Good input about the Technical Plenary. 

As Chair of the Technical Plenary, I confess that I thought it would 
be a good idea to have a focus on HTML5, and that was part of the 
agenda. 

I also worked with the program committee to create a balanced agenda. We had talks about the Semantic Web, IETF, XML convergence with HTML, and accessibility. We had a session on different use cases and perspectives for television broadcast. There was also quite a diversity in Lightning Talks: Emotions, the Social Web, Speech, Semantic Web, Relational and RDF, 3D, Points of Interest, XML Performance, Privacy, and Philosophy. 

But I take your point. There is even more diversity in W3C activities 
and a big part of W3C's value is our effort to bridge multiple 
communities. I note that we have set up a survey for TPAC participants to share their thoughts on the meeting and I will include your suggestion in the set of recommendations we will seek to address. 

I also share some of your disappointment. In a 20 August message, I reached out to all W3C Members and Chairs for their best ideas. Nearly all requests - primarily in the form of Lightning talks - were all included. Let's use this thread as a prompt to work together so that the next TPAC agenda is more representative of all the work going on at the Consortium.
Re: Re: TPAC: The Morning After
by wangjiahuan88 on Thu 11 Nov 2010 08:35 GMT |  Profile |  Permanent Link
I am a very cheerful girl,<strong><a href="http://www.uggsboots-estore.com/u68-ugg-bailey-button-boots.html">ugg boots olive</a></strong>and I warm and cheerful, I believe we can become very good friends! 

Re: TPAC: The Morning After
by Hans-Juergen on Thu 11 Nov 2010 09:13 GMT |  Profile |  Permanent Link
A vigorous exchange of arguments followed, but I am afraid the whole battle was fought on the wrong ground, one floor above the floor where the heart of the matter is found. The main question is perhaps not who should specify, who should select and who should implement what. It is about content. 

Your personal work is all about building, creating new perspectives and possibilities for other people to build and create. It is about enabling - not about impressing and entertaining. Quality, precision, responsibility - they are what you are committed to. You stand for a culture, and a spirit. What you saw and what upset you was the lack of culture and the lack of spirit. A taste for the puerile and playful in command of technological development. What you are upset about - is it not the apparent unconcern for the role a browser may play in building structures, and helping us to solve difficult and important problems? The irresponsible obsession with entertaining and impressing, rather than building and solving problems. The hollowness.
FM TRANSMITTER
by VincentGreen on Thu 01 Dec 2011 04:05 GMT |  Profile |  Permanent Link
IPod purpose expands not possessing restrict - the brand new HLLY <a href="http://www.fmheroes.org/25mw-fm-transmitter_3_1.htm">25mW FM TRANSMITTER</a> which specifically produced for IPod in your car. HLLY producer consists of an exclusive location amid the fm transmitter field, some type like Belkin. I employed to possess three fm transmitters which were totally produced by HLLY. Three kinds have completely different energy and functions, range. 
Today, with the developing of the technology, more and more people have their own car, and every car does charge a music system. There are abounding means in which one can fit in acceptable music system. The latest trend in this area is of a car mp3 radio <a href="http://www.fmheroes.org/20w-fm-transmitter_4_1.htm">20W FM TRANSMITTER</a>.To apperceive what is pll fm transmitter you actually should apperceive what is PLL. Able-bodied the PLL is basically short for Phase-Locked Loop. 


Parameterized validation

I've been implementing an enhancement to the Saxon schema processor that has been on my wish-list for a long time: it allows schema validation to be parameterized using parameter values supplied on the command line or via the validation API. The parameter values can be referenced from any XPath expression in the schema, for example in assertions or in conditional type assignment. 

For example, suppose you are validating invoices. Your schema might allow a wide range of currencies to be used, but for a particular batch of invoices, you want the currency to be USD, and in another batch, you want it to be EUR. Then you can write the schema with a parameter: <saxon:param name="currency" as="xs:string" select="'USD'"/>; in the definition of the simple type invoice:currency you can write the assertion <xs:assert test="$value = $currency"/>, and when invoking the validation from the command line you can set the parameter currency=EUR. The select attribute of <saxon:param>, as with stylesheet parameters in XSLT, defines a default value. Very simple. 

There are of course many tricky details to be worked out. What should the scope of a parameter name be? (currently, it's confined to the schema document in which it is declared). But then, what if you want to reference the same parameter in more than one schema document? (current thinking, you can declare it in both and the declarations must be compatible). Should parameters automatically be in the target namespace of their containing schema document, like other declarations in a schema? (current thinking: no, follow the XSLT rules instead: no prefix means no namespace). 

For free-standing validation from the command line or from the s9api API, it's fairly easy to devise an interface for supplying parameters. It's less easy when validation is invoked from XSLT or XQuery. For XSLT, a new <saxon:validate> instruction with <xsl:with-param> children seems appropriate. For XQuery, it probably needs a custom extension to the syntax of the validate{} expression. 

There are many use cases for such a feature. I've often seen cases, for example, where one wants to apply increasingly rigorous validation to a document at different stages in its life-cycle. Another example is parameterized code-lists. With some creativity, it can also be used to achieve cross-document validation: the parameters supplied when validating one instance document can be taken from another instance document (or indeed, an entire instance document can be supplied as a parameter value). Given higher-order functions in XPath 2.1, the validation parameter can even be a dynamic function that is invoked from within the assertions. 

Currently, parameterizing a schema is often achieved using xs:redefine, and in XSD 1.1, the xs:override feature is designed to make this approach more flexible. However, there's a big limitation with this, namely that overriding or redefining a type acts globally: it affects every place the type is used. This means, for example, that you can't use two different variants of a schema to validate the input and the output of a transformation (for example, a transformation whose purpose is to convert documents from one flavour of vocabulary X to a different flavour of the same vocabulary). 

The mechanism allows parameters to be used anywhere schemas use XPath expressions: primarily in assertions and in conditional type assignment; but also in identity constraints (for which I have yet to find any practical utility) and in the saxon:preprocess extension which allows you to vary the allowed lexical representations of a data type. (See saxon:preprocess blog article). For example, you can already write <saxon:preprocess action="if ($value='yes') then 'true' else if ($value='no') then 'false' else $value"/> to create a subtype of xs:boolean that permits the lexical forms "yes" and "no"; with this new feature you can parameterize this to make the strings "yes" and "no" parameters supplied at validation time rather than constants. 

There's a drawback, of course, which is that this is a proprietary extension. However much one takes advantage of the extensibility points provided in the specifications (such as xs:appinfo in the XSD spec, or extension elements in XSLT), there's no hiding the fact that an application that takes advantage of a feature like this is locking itself in to Saxon as the validation technology. However, there's no harm in this: many features that found their way into the standards, and into other products, were first pioneered in Saxon; it's often better if products are one step ahead of the standards rather than one step behind. 

I think there would probably be some resistance to this feature in the XML Schema Working Group (and not just because of timescales): there are some strongly held philosophical views among WG members, and one of them is that validity is a context-free property. There are reasons for this: if you validate data before sending it to me, you don't want me to find that in my environment, it's not valid. Personally, I'm a pragmatist on this kind of issue. If you have to change the definition of the data interchange from "must conform to schema S" to "must conform to schema S with parameter settings P=X, Q=Y", that's not something that worries me. On the contrary, I think it creates a real opportunity for increased use of industry-standard schemas parameterized by local profiles that define additional constraints to those in the standard specification.
Posted to: 
COMMENTS
Comment notifications for this article:  
No comments found.