Skip to content

XSLT and e-publishing, past and future

XSLT and e-publishing, past and future published on 1 Comment on XSLT and e-publishing, past and future

One link led to another, and so I found myself reading Liza Daly on “The unXMLing of digital books”, from about a year ago (February 2013). She also links to a nice presentation by John Maxwell, “The Webby Future of Structured Markup: Not your father’s XML”, also worth a look.

Both of these fine considerations give me the crawlies. I mean, not only do I agree with them on many key points regarding appropriate and inappropriate uses of structured markup (see my 2011 Balisage paper on some of this), but also, they evidently represent a trend. Which scares me. It’s hard not to wonder how much of this, for us die-hard structured markup fanatics, should be seen as writing on the wall — even if the fact that XML has now become unfashionable was predictable as soon as it got hot back in the early twenty-aughts. What goes up most come down: for something to be fashionable is precisely for it to be given credit it does not deserve (no it’s not a panacea, we kept saying), and fashion inevitably fades into embarrassment, then nostalgia, if it does not harden into ideology. And I’ve never been interested in XML for the ideology of it. So is there anything left?

Yet peel away a layer, and what both Daly and Maxwell say isn’t as hostile to XML as you might think. On the contrary: both of them allow, more or less explicitly, that there is a nugget in the dross. The question is how to keep it, indeed what we might make it into if we recognize and take care with it.

And this is what leads me back to XSLT. Daly is pleased that in her publishing system, she can just drop in the HTML and go. Whee! CSS gives her all the layering she needs. (Except it doesn’t … she talks about processing requirements for filtering, indexing and aggregation that uncontrolled HTML can’t address.) “We have very little preprocessing necessary; XSLT, which is hard to learn and harder to master, is almost absent from our workflow.” Interesting … so even with her fault-tolerant HTML toolkit, she has some need for preprocessing and some XSLT. Maybe eventually they’ll find a way to mothball that stuff too, presumably as soon as they find something else that meets the need half as well.

XSLT is hard to learn and hard to master. Who am I to say it isn’t? (If that wasn’t my experience, I’m willing to admit that mine is a special case, and not because I’m so smart. I’m only lucky or unlucky as the case may be.) The flip side is — it can be learned. (I’m also willing to help. As a professor of mine once said, “In Deutschland auch die Kinder sprechen Deutsch” — in Germany, even the children speak German!) And nothing else (at least out here in the free market) comes close to XSLT in power, adaptability and fitness for the particular class of problems for which it is designed — a class of problems that is central in the publishing space. XSLT is necessary enough to have been reinvented any number of times, when people have decided they would prefer to do without it, for reasons of platforms, or markets, or culture, or aesthetics.

Given Daly’s complaints about XML, the irony is in how XSLT’s strength is in dealing with only poorly controlled data sets. I mean, if your data is well controlled and as granular as you need, by all means use RDF or RDBMS or OO: go to town, enjoy, and consider yourself lucky. But XSLT needs only a tree; if you can get one out of your HTML, however sloppy it is, that’s good enough to get you in. You can even start with a data brick if you think of it as a tree with just one node. Of course what you can do with that tree depends on your level of control; but one of the things you can do in any case is expose the issues and start to assert the control you need. XSLT isn’t just for transformation into presentation formats or even preprocessing and normalization. It’s also for diagnostics, validation and heuristics — even conversion into structured formats from messes of tags or plain text. And yes, I mean XSLT 2.0 here. If the XSLT you tried offered no temporary trees, native grouping, stylesheet functions and regular expressions, you have no idea what you’re missing.

This is why, when you hang out with XSLT people these days, you pick up such mixed feelings. On the one hand, there is trepidation. We wonder if we should feel silly wearing last year’s hat, and we know we will be judged for it. On the other, we know that within organizations that use it, XSLT is known, if often grudgingly, as powerful ju-ju. For some, eager to reduce dependencies on skills that are hard to find, this will be an excellent reason to get rid of it. For others, with the skills or the sense to invest in them, it will continue to be a secret weapon as long as they have inputs that will benefit from transparency and control.

In this context, I try to tell myself the future actually looks bright. Daly reminds us that “books aren’t data”, and they aren’t wrong just because a file is invalid to some schema or other. I’ve said the same myself; but she’s not complaining about XML or even about schemas: she’s complaining about the insensitive and clumsy ways XML-based systems have been designed, built and used, about muddle-headedness and misleading promises. Yea, verily, yea. But when will structure (as loose or strict as as the case demands), inspection, validation and transformation go out of style?