TEI Conference

By Michael Kay on November 11, 2008 at 02:18p.m.

I spent a couple of days last week at the annual members meeting of the TEI (Text Encoding Initiative). It was good to meet so many Saxon users: a few familiar faces, rather more familiar names, and quite a few who introduced themselves as avid fans of the product, the book, or both.

This wasn't a technology conference: the speakers didn't spend their time talking about the merits of different XML schema or transformation languages. They talked about the problem they were trying to solve: which in most cases involves the digital capture, with as much fidelity as possible, of ancient (or in a few cases not-so-ancient) texts. Very refreshing to spend a couple of days in the company of classicists, archaeologists, musicologists, literary scholars, librarians and archivists for whom the technology is just a tool.

In fact the toolset in this community seems to be pretty uniform: XSLT (very often Saxon), Relax NG, and oXygen. They were all very pleased by the announcement that oXygen now comes with Saxon-SA support built in, and I had  quite a few questions about whether I could see Saxon supporting Relax NG. (Good question, I don't know the answer. Although it's true that you can't do the type assignment with Relax NG, I would think that in principle, a lot of the compile-time checking that Saxon-SA does could equally well be done against a Relax NG schema.) One technology that is conspicuously absent is XQuery - I had a couple of rather bemused users asking why anyone would want to use it, when XSLT was so much more powerful?

The projects described involved the digitisation of everything from monumental inscriptions in the classical world (one of the many new words I learnt was "epigraphy"), through a 10th century palimpsest containing the earliest known manuscript of Archimedes, through the complete correspondence of the German composer Carl Maria von Weber, all the way to 1930s comic books. In all cases the challenge is to capture the detail - for example the fact that several words in an inscription might now be illegible, but were recorded in the 18th century by the first antiquarian visitors to a site. Capturing different features of the text often leads to a need for parallel markup, with corresponding XSLT challenges - but as I say, we didn't get much technical detail.

Did you know that someone has built an XML database containing 250,000 personal names of people mentioned in the records of classical Greece? All geographically coded with links to Google Earth (though I dare say the satellite imagery is a bit "out of date").

One thing that did rather surprise me was that there is an immense of amount of investment going into the capture of all this data, but no-one seems to have much idea how it is going to be used. It's obvious to everyone that it's an immensely valuable resource, but there didn't seem to be many reports of projects that are using it rather than creating it.

Little chance of any Saxon-SA sales among this crowd, sadly. They might have some well-funded projects but most of them are quite happy with what they can achieve using open source software.