Skip to content

XSLT and e-publishing, past and future

XSLT and e-publishing, past and future published on 1 Comment on XSLT and e-publishing, past and future

One link led to another, and so I found myself reading Liza Daly on “The unXMLing of digital books”, from about a year ago (February 2013). She also links to a nice presentation by John Maxwell, “The Webby Future of Structured Markup: Not your father’s XML”, also worth a look.

Both of these fine considerations give me the crawlies. I mean, not only do I agree with them on many key points regarding appropriate and inappropriate uses of structured markup (see my 2011 Balisage paper on some of this), but also, they evidently represent a trend. Which scares me. It’s hard not to wonder how much of this, for us die-hard structured markup fanatics, should be seen as writing on the wall — even if the fact that XML has now become unfashionable was predictable as soon as it got hot back in the early twenty-aughts. What goes up most come down: for something to be fashionable is precisely for it to be given credit it does not deserve (no it’s not a panacea, we kept saying), and fashion inevitably fades into embarrassment, then nostalgia, if it does not harden into ideology. And I’ve never been interested in XML for the ideology of it. So is there anything left?

Yet peel away a layer, and what both Daly and Maxwell say isn’t as hostile to XML as you might think. On the contrary: both of them allow, more or less explicitly, that there is a nugget in the dross. The question is how to keep it, indeed what we might make it into if we recognize and take care with it.

And this is what leads me back to XSLT. Daly is pleased that in her publishing system, she can just drop in the HTML and go. Whee! CSS gives her all the layering she needs. (Except it doesn’t … she talks about processing requirements for filtering, indexing and aggregation that uncontrolled HTML can’t address.) “We have very little preprocessing necessary; XSLT, which is hard to learn and harder to master, is almost absent from our workflow.” Interesting … so even with her fault-tolerant HTML toolkit, she has some need for preprocessing and some XSLT. Maybe eventually they’ll find a way to mothball that stuff too, presumably as soon as they find something else that meets the need half as well.

XSLT is hard to learn and hard to master. Who am I to say it isn’t? (If that wasn’t my experience, I’m willing to admit that mine is a special case, and not because I’m so smart. I’m only lucky or unlucky as the case may be.) The flip side is — it can be learned. (I’m also willing to help. As a professor of mine once said, “In Deutschland auch die Kinder sprechen Deutsch” — in Germany, even the children speak German!) And nothing else (at least out here in the free market) comes close to XSLT in power, adaptability and fitness for the particular class of problems for which it is designed — a class of problems that is central in the publishing space. XSLT is necessary enough to have been reinvented any number of times, when people have decided they would prefer to do without it, for reasons of platforms, or markets, or culture, or aesthetics.

Given Daly’s complaints about XML, the irony is in how XSLT’s strength is in dealing with only poorly controlled data sets. I mean, if your data is well controlled and as granular as you need, by all means use RDF or RDBMS or OO: go to town, enjoy, and consider yourself lucky. But XSLT needs only a tree; if you can get one out of your HTML, however sloppy it is, that’s good enough to get you in. You can even start with a data brick if you think of it as a tree with just one node. Of course what you can do with that tree depends on your level of control; but one of the things you can do in any case is expose the issues and start to assert the control you need. XSLT isn’t just for transformation into presentation formats or even preprocessing and normalization. It’s also for diagnostics, validation and heuristics — even conversion into structured formats from messes of tags or plain text. And yes, I mean XSLT 2.0 here. If the XSLT you tried offered no temporary trees, native grouping, stylesheet functions and regular expressions, you have no idea what you’re missing.

This is why, when you hang out with XSLT people these days, you pick up such mixed feelings. On the one hand, there is trepidation. We wonder if we should feel silly wearing last year’s hat, and we know we will be judged for it. On the other, we know that within organizations that use it, XSLT is known, if often grudgingly, as powerful ju-ju. For some, eager to reduce dependencies on skills that are hard to find, this will be an excellent reason to get rid of it. For others, with the skills or the sense to invest in them, it will continue to be a secret weapon as long as they have inputs that will benefit from transparency and control.

In this context, I try to tell myself the future actually looks bright. Daly reminds us that “books aren’t data”, and they aren’t wrong just because a file is invalid to some schema or other. I’ve said the same myself; but she’s not complaining about XML or even about schemas: she’s complaining about the insensitive and clumsy ways XML-based systems have been designed, built and used, about muddle-headedness and misleading promises. Yea, verily, yea. But when will structure (as loose or strict as as the case demands), inspection, validation and transformation go out of style?


Theory of generalized markup, 2014

Theory of generalized markup, 2014 published on

On xml-dev, Arjun Ray posts a link to Charles Goldfarb’s seminal paper on the theory of generalized markup and its application in SGML, which was first published in 1981 and subsequently revised and included, in 1986, as Annex A to the SGML standard (ISO 8879).

Thirty years along we have the web, and everything has changed, yet nothing has changed. The core of Goldfarb’s argument is the same lesson taught daily to neophyte web developers on how much better things are when you hang your styles on “semantic” class attributes.

Such labels are useful today because the elements on which they sit (p, td, li, div, span, what have you) have almost no “descriptive” semantics of their own (a p marks a paragraph?!): they have been reduced to their operations, viz. their effects in the browser. (No, p only starts a new line, with some vertical white space. Or not, as the case may be). The pretense of HTML5 to mitigate this trend with new semantic elements like article and aside acquires a poignant irony when we reflect that it can last only as long as these elements are not used (or abused, if that’s how you look at it) to do stuff in the browser that has nothing to do with what they “are” or are “supposed to be”. At that point (which has undoubtedly already past), the semantics of article, in HTML, become as vacuous as those of p. It means only what you say it means when it does what it does.

Similarly, I wryly note how the WordPress interface into which I type gives me an HTML strong element for its B button and an em for its I button, and how I then use strong to signal what might be term or gi in (“descriptive”) TEI, and em for what might be soCalled. (Someone somewhere has ruled that HTML b is bad and strong is good — it’s semantic! So I get strong whether I like it or not.) And yet, seeing only bold for my strong and italics for my em, you know well enough what I mean. Semantics are so sneaky!

This tug of war has gone on long enough to suggest that it cannot be won. Thus, having emerged as a de facto standard for formatting publications even off line — and accordingly reduced to the “presentational”, for good or ill — HTML kindly permits us, in order that we may do what we need, to sneak our semantics back into our markup. The fact that the application of class attribute values is so hard to constrain, particularly in comparison to the rigid document types imposed by SGML (Goldfarb calls them “rigorous”), is both a terrible weakness and a secret strength.

Does it seem paradoxical that an XML enthusiast should see any good in all this redoubled reversing? I hope not. Why we will never have a fully comprehensive descriptive markup language (after many valiant attempts) is more interesting than the simple fact of it. And the point of XML is not, it seems to me, what SGML so often presumed, to enable “true description” of our information. It is to achieve better layering in systems design, to be more flexible, more expressive, more graceful. As for HTML, if we didn’t have it, we’d have to invent it. And then we’d have to invent CSS to go with it.

Constantly revising

Constantly revising published on

I can’t help myself: I tweak and tweak. I’m a little worried about it. Is the blog post you read anything like the blog post I wrote? That’s a problem with which I am well familiar, and I am willing to live with it. But how about the blog post I wrote: is that one anything like the one I wrote? That’s what’s got me nervous.

In my experience, under the pressure of revision, writing tends to set and harden, like plaster. Eventually there isn’t much more you can do with it without breaking it into pieces. (Maybe you can then make something of the salvage job; maybe not.) We’ll see if that happens here, if these posts eventually settle. If they do, maybe you and I have a chance.

The ungainliness of XML

The ungainliness of XML published on

XML and XSLT are ungainly because they are the products of evolutionary processes. Neither is a first-generation technology, but rather a refinement of something that had gone before. (XML was spawned from SGML; XSLT’s roots are in DSSSL with some Omnimark admixture.) You might think this should streamline them: and indeed it has, if you compare them to their progenitors. Yet they also have their histories written onto and into them: they are not pretty, but somewhat lopsided and peculiar.

(There are some smart people who try to avoid XML and XSLT completely, partly on the basis of their various oddnesses. These people may or may not be able on their own to make something work as well as XML does, but that is a separate question. Another separate question is whether they can help other people to use their thing for something just a little different from what it was designed for.)

Yet this ungainliness is also part of the strength and charm of XML/XSLT, once you learn to look past the flaws on the surface. It results from the fact that both have to address a wide range of conflicting requirements, well enough. And this, they do.

The more hidden strength — the way well-described, well-managed XML data can be kept safe away from the storms of technological change in the browser (or anywhere else) — only becomes evident over time.


Reflection on blogging platforms

Reflection on blogging platforms published on

To someone who doesn’t know coding, operating a blogging platform will be akin to divination. Placate a harsh and inscrutable god. Throw the dice and see how they land. Win or lose, the oracle tells you. Good luck! And keep in mind, even when you win — you never know when it will all be taken away!

To someone who does know coding? A test of patience, an exercise in compromise. Sort of like cooking in someone else’s kitchen. The results are not inevitably bad….

Inkblot here we come

Inkblot here we come published on

Twenty fourteen (what came with the installation) wasn’t half bad, but what tipped me over the edge was the hard-pixel encoding of the CSS. I’m a true believer in relative sizing. Maybe this had nothing to do with how frustrating I was finding it to make the modifications I needed, or maybe not.

The topic is WordPress themes, if the foregoing made no sense to you at all.

So I looked again, and as of now I’m customizing Michael Sisk‘s Inkblot. This is solid stuff that reconciles me greatly to having to live in an HTML world: flexible, straightforward, clean and clear, and as old-fashioned as I want it to look without a lot of fuss. Fantastic work.

Styling experiments

Styling experiments published on

One does want to know to what extent WordPress is flexible and transparent enough to support customization, and not only by an engineer familiar with its inner workings, but the poor, plain, pained user.

Custom tagging? This HTML ‘blockquote’ element has a @class attribute provided by hand in the source. It would be nifty if that gave me a handle for styling it.

If all were well, that would have come out purple … so, it breaks! Back to the drawing board … so it seems that blockquote/@class gets stripped in back. What about arbitrary spans? Or homemade @style values?

Early report … blockquote appears not to be safe. But the spans are pushed through. One can hope divs might also be.

Here’s my special div, to which I would like to assign a hanging indent, which I will do in the CSS….

Work at the house

Work at the house published on

There will be pictures. Today, Gary the floor guy is installing our new cork floors.

This all started because we decided we need windows upstairs that actually keep the weather out.

But it didn’t make much sense to get them fixed and then move to the floors without also giving the walls and ceilings fresh paint. Our new colors (from the Sherwin Williams collection): in the office, Softened Green (a sagey color); Optimistic Yellow in the spare room (Daffodil was too bright), Daydream (a kind of light purplish blue) in the master bedroom, Medici Ivory in the hallways and for frame and baseboard trim (everywhere), and Copper Wire as an accent wall color next to the stairway down. We chose the colors, of course, for their names, just as one chooses wine for the design on the bottle.

The carpet being removed is colored Allergenic Grey. It was never very pretty and nothing in comparison to the brown-red cork flooring that is now replacing it.

Some accounting for the nothing of it

Some accounting for the nothing of it published on

(The move to meta happens right away. No! Bad blog! Get it back on its leash.)

So finally I decided I needed a place to log and document current projects and interests, in the hope that some interaction with interested parties might offer me some guidance for the future. I need inputs; the only way to get them is to produce more outputs.

Then too, there’s the question of the “natural form” that any blog tends to take, whether professional or personal, and whether the lines between professional and personal are clear or blurry.

A major motivation for me is discovering that an 18-month lag time for major academic media publication makes for disjunction and asynchronization between levels of attention off and on the web. On the web, things come and go quickly: the half-life for attention is probably days. And I need that tighter feedback loop. Yet to meet other needs than simply diversion or entertainment, however, one needs things to be able to age. So one is writing always for the future as well as the present. This makes for bad writing. Probably the best productions on and for the web are written in the spirit of “here today, gone tomorrow”, even if the hope is that from all the dross, some metal might eventually be mined.

Rather than try and overthink it (always my tendency), I am going to try letting the strategy emerge. Welcome, dear reader, and please let me know what you think.