Skip to content

Microformat proving ground

Microformat proving ground published on

From perfect grief there need not be

Wisdom or even memory:

One thing then learnt remains to me,—

The woodspurge has a cup of three.


As the Starved Maelstrom laps the Navies

As the Vulture teased

Forces the Broods in lonely Valleys

As the Tiger eased

By but a Crumb of Blood, fasts Scarlet

Till he meet a Man

Dainty adorned with Veins and Tissues

And partakes — his Tongue

Cooled by the Morsel for a moment

Grows a fiercer thing

Till he esteem his Dates and Cocoa

A Nutrition mean

I, of a finer Famine

Deem my Supper dry

For but a Berry of Domingo

And a Torrid Eye



WordPress, with the help of a couple of plugins, is … barely … able to add a layer on top for me to edit CSS, to drive the formatting. The worst problem is not actually the part about its being a plugin (and therefore prone to breakage), but rather in how WordPress is unable to save the native HTML dependably. It is evident why this is (for all kinds of reasons WordPress will not allow random HTML injections), but it creates a problem for anyone who … needs more …

Oh! And here’s an Achilles’ heel – the CSS is easily lost. For example, on the site’s front page, the same code that comes out pretty and formatted on the blog post’s page is … busted.

SVG in WordPress

SVG in WordPress published on

So … what I’ve learned is, with the help of an extension one can indeed get an SVG to appear under WordPress … barely. (Take a look at the little Irish Airman experiment.)

It drops in as media, and WordPress can only link to it, not embed it. I.e., via an img, not by including the SVG in the HTML. This is okay for some purposes, maybe not so great for others, since among other things, it makes controlling the scaling relative to the HTML page next to impossible.

The larger theme is the “too much to know” problem. Because people don’t know how to use SVG, support for it is slow to come. Since support for it is slow to come, no one explores it and learns how to use it. Folks like me (nothing special, I just came in through a side door) are outliers again.

Yeats’s Irish Airman (a visualization)

Yeats’s Irish Airman (a visualization) published on
Yeats's Irish Airman (a programmatic rendering)

A fanciful interpretation of William Butler Yeats’s fantastic poem, “An Irish Airman Foresees his Death”, in SVG. This was drawn (some years ago now) using an XSLT stylesheet working over a rather plain XML version of this poem in four quatrains of tetrameter lines. The “fourness” of this poem suggests its structure might be taken to be that of … a biplane.

If you see nothing, it’s due to a failure either in this platform, or your browser. (I can see it in the preview, but one of the hazards of this kind of work is that I can’t control every link in the chain. And some of them can be rather weak.) Some reflections on SVG in WordPress are coming in another post….

Humanism as waking up to the uncanny and unaccountable

Humanism as waking up to the uncanny and unaccountable published on

Visions of the Impossible: How ‘fantastic’ stories unlock the nature of consciousness” is a really well done piece by Jeffrey J. Kripal, raising fundamental questions, to which I too would like some answers. I think the world is thirsty right now for a “humanistic” perspective, in the wider sense. Yet there is also great hesitancy even about asking the questions. No one wants to be made fun of; and this is Trickster stuff, likely to get you in trouble.

A graduate professor of mine once asked me whether I didn’t make the assumption the psyche exists. “Consciousness is primary”, I think, was the way he put it. I admitted I did. (And I do. I think it was an insightful question.)

Is Mary Shelley’s Frankenstein a documentary history?

Is Mary Shelley’s Frankenstein a documentary history? published on

Frankenstein’s creature, in his story-within-the-story of Mary Shelley’s masterpiece, says (along with much else) to Victor Frankenstein:

I have copies of these letters, for I found means, during my residence in the hovel, to procure the implements of writing; and the letters were often in the hands of Felix or Agatha. Before I depart I will give them to you …

Much later in the book, as Walton describes his transcription of Victor’s account to him, he vouches:

His tale is connected and told with an appearance of the simplest truth, yet I own to you that the letters of Felix and Safie, which he showed me, and the apparition of the monster seen from our ship, brought to me a greater conviction of the truth of his narrative than his asseverations, however earnest and connected. (Emphasis supplied.)

Just to state the obvious: the creature gives Victor copies of the letters of Felix and Safie (his correspondent), as evidence of the truth of his account. (Which is a bit odd, as we have no particular reason to doubt the creature’s story, once we have accepted his existence. Yet there it is.) Later Victor shows these to Walton, presumably to substantiate his own retelling of the creature’s story to him.

So, is the novel a history of how certain papers got to be in Walton’s hands? (No mention is made of what happens to them after Victor’s death.) Does Walton enclose them in his letters to his sister Margaret (presumably the source of the publication)?

How about the other letters described (and sometimes transcribed) in the course of the narrative? Do they also exist as documentary evidence? Is “Mary Shelley” a front for Margaret Saville?

Are XML tags sharp objects?

Are XML tags sharp objects? published on

Start and end tags, no, they are not sharp, despite appearances. They will generally not poke or hurt you as long as you keep them properly closed (that is, every start has its end inside the same parent). Tags written with angle brackets indicate structure, bracing the XML document, holding everything in place. They are your friends.

The really bad tags in XML and the ones you have to watch out for are the entity references, the things that start with &. Think about what & means to an XML parser. It sees & and it doesn’t know what comes next. It looks for a name. (Let’s hope it finds a legal name before it hits ;.) Finding a name, it looks it up. (Let’s hope it is able to find someplace to do so.) It splices in what it says. It then goes back.

This is a precarious operation. Stuff supposed to be “XML” fails to parse all the time, not because its element markup is awry, but because its entities are not resolving correctly, if at all. And if even a single entity reference fails, the document cannot be processed. Use entities only with care. Don’t assume they’re safe just because you’ve seen them a lot elsewhere (such as in HTML).

Note that XML character references look like entity references, but aren’t. It’s pretty safe in XML to refer to a character in Unicode by its number, such as 
 (the LF character) or (its hexadecimal equivalent) 

Watch out for your entity references! They can break your documents when they move across boundaries, if their declarations become lost. To have standalone XML (this means well-formed, but also entirely self-contained) you should avoid any entity references that have to be declared. Which is pretty much all of them.


“Strategic Reading”: Renear at JATS-Con

“Strategic Reading”: Renear at JATS-Con published on

Video of JATS-Con 2013(4) is here: Day One; Day Two. My demo of visualizations of data encoded in NLM/JATS XML (or with adjustments, of any XML data) starts around 2:46 of Day Two (I went first in the open session).

But the reason for my post today is Allen Renear’s keynote. (Well worth the watch, it starts at Day Two, 4:58.) As always, Allen is both revelatory, and provocative. Every time I hear him, things come into better focus: where all this is coming from, where it’s aiming, and where the rub is happening between fundamental principles, and present-day exigencies.

His talk last week was on what he calls “strategic reading”, which I think is both profoundly incisive, and in its way troubling. Incisive because this is, indeed, the shape of things to come. What Allen describes sounds correct, both as a description of what is happening, and as a tendency and a trend. Troubling because I, for one, can’t help be concerned about what we risk losing next to what we gain.

Don’t get me wrong: I am all in favor of “strategic reading”, and I do quite a bit of it myself. Allen is describing the way we now scan and read at once, dipping in and out, making assessments at a distance, making choices even before we read, before we commit time and effort to deeper engagement. Electronic media and (where we have it) the strong encoding behind it (behind them, I am quick to correct myself) facilitate this inasmuch as they allow us to aggregate and filter according to criteria selected in advance–to foreground, highlight and dramatize significant content before we even know it is there. (Back to my demo on visualization of document structures.) Especially in an age of information overload, when so much of what we see is only a distraction, this is necessary and inescapable. We will even have serendipity engines, or so they tell us. (Unless that’s another case of Artificial Intelligence Meets Natural Stupidity.)

Yet at the same time, the other voice says this isn’t really reading, but a strategic avoidance of reading. Not that that isn’t perfectly fine, in its way (we’re certainly not going to read all that stuff). But it doesn’t offer the rewards that I once learned can be won, with effort, from a well-wrought text, serving as an occasion for a kind of contest of mind, a discipline of attention with an unknown outcome.

So much of what we “learn” isn’t learning at all, but only reinforcement. We only become more like what we were already (as Gertrude Stein said of Americans between the wars). I can’t help but wonder whether this is enough. I also want to be changed by what I read.


XSLT and e-publishing, past and future

XSLT and e-publishing, past and future published on 1 Comment on XSLT and e-publishing, past and future

One link led to another, and so I found myself reading Liza Daly on “The unXMLing of digital books”, from about a year ago (February 2013). She also links to a nice presentation by John Maxwell, “The Webby Future of Structured Markup: Not your father’s XML”, also worth a look.

Both of these fine considerations give me the crawlies. I mean, not only do I agree with them on many key points regarding appropriate and inappropriate uses of structured markup (see my 2011 Balisage paper on some of this), but also, they evidently represent a trend. Which scares me. It’s hard not to wonder how much of this, for us die-hard structured markup fanatics, should be seen as writing on the wall — even if the fact that XML has now become unfashionable was predictable as soon as it got hot back in the early twenty-aughts. What goes up most come down: for something to be fashionable is precisely for it to be given credit it does not deserve (no it’s not a panacea, we kept saying), and fashion inevitably fades into embarrassment, then nostalgia, if it does not harden into ideology. And I’ve never been interested in XML for the ideology of it. So is there anything left?

Yet peel away a layer, and what both Daly and Maxwell say isn’t as hostile to XML as you might think. On the contrary: both of them allow, more or less explicitly, that there is a nugget in the dross. The question is how to keep it, indeed what we might make it into if we recognize and take care with it.

And this is what leads me back to XSLT. Daly is pleased that in her publishing system, she can just drop in the HTML and go. Whee! CSS gives her all the layering she needs. (Except it doesn’t … she talks about processing requirements for filtering, indexing and aggregation that uncontrolled HTML can’t address.) “We have very little preprocessing necessary; XSLT, which is hard to learn and harder to master, is almost absent from our workflow.” Interesting … so even with her fault-tolerant HTML toolkit, she has some need for preprocessing and some XSLT. Maybe eventually they’ll find a way to mothball that stuff too, presumably as soon as they find something else that meets the need half as well.

XSLT is hard to learn and hard to master. Who am I to say it isn’t? (If that wasn’t my experience, I’m willing to admit that mine is a special case, and not because I’m so smart. I’m only lucky or unlucky as the case may be.) The flip side is — it can be learned. (I’m also willing to help. As a professor of mine once said, “In Deutschland auch die Kinder sprechen Deutsch” — in Germany, even the children speak German!) And nothing else (at least out here in the free market) comes close to XSLT in power, adaptability and fitness for the particular class of problems for which it is designed — a class of problems that is central in the publishing space. XSLT is necessary enough to have been reinvented any number of times, when people have decided they would prefer to do without it, for reasons of platforms, or markets, or culture, or aesthetics.

Given Daly’s complaints about XML, the irony is in how XSLT’s strength is in dealing with only poorly controlled data sets. I mean, if your data is well controlled and as granular as you need, by all means use RDF or RDBMS or OO: go to town, enjoy, and consider yourself lucky. But XSLT needs only a tree; if you can get one out of your HTML, however sloppy it is, that’s good enough to get you in. You can even start with a data brick if you think of it as a tree with just one node. Of course what you can do with that tree depends on your level of control; but one of the things you can do in any case is expose the issues and start to assert the control you need. XSLT isn’t just for transformation into presentation formats or even preprocessing and normalization. It’s also for diagnostics, validation and heuristics — even conversion into structured formats from messes of tags or plain text. And yes, I mean XSLT 2.0 here. If the XSLT you tried offered no temporary trees, native grouping, stylesheet functions and regular expressions, you have no idea what you’re missing.

This is why, when you hang out with XSLT people these days, you pick up such mixed feelings. On the one hand, there is trepidation. We wonder if we should feel silly wearing last year’s hat, and we know we will be judged for it. On the other, we know that within organizations that use it, XSLT is known, if often grudgingly, as powerful ju-ju. For some, eager to reduce dependencies on skills that are hard to find, this will be an excellent reason to get rid of it. For others, with the skills or the sense to invest in them, it will continue to be a secret weapon as long as they have inputs that will benefit from transparency and control.

In this context, I try to tell myself the future actually looks bright. Daly reminds us that “books aren’t data”, and they aren’t wrong just because a file is invalid to some schema or other. I’ve said the same myself; but she’s not complaining about XML or even about schemas: she’s complaining about the insensitive and clumsy ways XML-based systems have been designed, built and used, about muddle-headedness and misleading promises. Yea, verily, yea. But when will structure (as loose or strict as as the case demands), inspection, validation and transformation go out of style?


Theory of generalized markup, 2014

Theory of generalized markup, 2014 published on

On xml-dev, Arjun Ray posts a link to Charles Goldfarb’s seminal paper on the theory of generalized markup and its application in SGML, which was first published in 1981 and subsequently revised and included, in 1986, as Annex A to the SGML standard (ISO 8879).

Thirty years along we have the web, and everything has changed, yet nothing has changed. The core of Goldfarb’s argument is the same lesson taught daily to neophyte web developers on how much better things are when you hang your styles on “semantic” class attributes.

Such labels are useful today because the elements on which they sit (p, td, li, div, span, what have you) have almost no “descriptive” semantics of their own (a p marks a paragraph?!): they have been reduced to their operations, viz. their effects in the browser. (No, p only starts a new line, with some vertical white space. Or not, as the case may be). The pretense of HTML5 to mitigate this trend with new semantic elements like article and aside acquires a poignant irony when we reflect that it can last only as long as these elements are not used (or abused, if that’s how you look at it) to do stuff in the browser that has nothing to do with what they “are” or are “supposed to be”. At that point (which has undoubtedly already past), the semantics of article, in HTML, become as vacuous as those of p. It means only what you say it means when it does what it does.

Similarly, I wryly note how the WordPress interface into which I type gives me an HTML strong element for its B button and an em for its I button, and how I then use strong to signal what might be term or gi in (“descriptive”) TEI, and em for what might be soCalled. (Someone somewhere has ruled that HTML b is bad and strong is good — it’s semantic! So I get strong whether I like it or not.) And yet, seeing only bold for my strong and italics for my em, you know well enough what I mean. Semantics are so sneaky!

This tug of war has gone on long enough to suggest that it cannot be won. Thus, having emerged as a de facto standard for formatting publications even off line — and accordingly reduced to the “presentational”, for good or ill — HTML kindly permits us, in order that we may do what we need, to sneak our semantics back into our markup. The fact that the application of class attribute values is so hard to constrain, particularly in comparison to the rigid document types imposed by SGML (Goldfarb calls them “rigorous”), is both a terrible weakness and a secret strength.

Does it seem paradoxical that an XML enthusiast should see any good in all this redoubled reversing? I hope not. Why we will never have a fully comprehensive descriptive markup language (after many valiant attempts) is more interesting than the simple fact of it. And the point of XML is not, it seems to me, what SGML so often presumed, to enable “true description” of our information. It is to achieve better layering in systems design, to be more flexible, more expressive, more graceful. As for HTML, if we didn’t have it, we’d have to invent it. And then we’d have to invent CSS to go with it.