Skip to content

Viral properties of schemas

Viral properties of schemas published on

Markup languages are memes as well as memetic, semantic and sometimes mimetic systems.

(Within XML, think TEI, Docbook, NLM/JATS, DITA, MODS/METS, vertical schemas. HTML, fer chrissake. Markdown. Grammar-based formats. Object property sets. Application binaries.)

I have written about markup languages considered as games; they can also be considered as genes. Indeed one might see a document (here in the very narrow, XML/SGML sense of that word) as phenotype to a schema’s genotype. The fact that formally, documents are always already valid to infinities of unspecified schemas (since an infinite number of valid schemas can be trivially and arbitrarily generated for a given document), obscures the fact that “in the real world” (that is, in the wider contexts of our knowing of it), a document does not stand merely as a formal abstraction (a collection of element types), but also has a history. As such, it is representative of its history as it is of its (purported) representations. That is, the aircraft maintenance manual is not only a description of an aircraft; it is also a maintenance manual. This history, often, implicates a particular schema.


  • Replicate (are copied across file systems)
  • Are modified and adapted
  • May be identified with “species”, which evolve over time
  • Are embedded in local systems, but also enable discrete semantic “environments” (which may be portable)
  • Occasionally cross the barriers between systems (often to be adapted on the other side)
  • Sometimes include mechanisms for their own adaptation

(BTW these are also all true of documents, at a more discrete level. Indeed, schemas are among the mechanisms that documents provide for their own adaptation. Note how you can stretch the concepts of “document” and “schema” here very widely, to all kinds of media, especially if you allow for the idea that schemas may exist implicitly without being codified.)

Unlike living systems (but perhaps like viruses), schemas cannot be said to have volition of their own. They get their volition from the environments in which they are promulgated. Perhaps they are more plant-like than animal-like.

Also the question arises as to whether (and to what extent) they are parasitic or symbiotic. One suspects they have to be symbiotic in order to encourage adoption. However, they clearly get much of their power from their network effects (the more people using HTML, the more useful HTML becomes to use) — and at a certain point, this may introduce stresses between the local goals (of HTML users themselves) and of the interests that promote HTML despite poor fitness to local goals.

Schemas are also the deliberate, logical/social/linguistic creations of people and of organizations of people. Can they be this, and also “viral” at the same time?

Single-serving egg nog

Single-serving egg nog published on

Perfect! Especially in a household with only one egg consumer —


I happen to have simple syrup on hand (it’s a good cheat when making old-fashioned or Sazerac cocktails), which I’ll use instead of the fine sugar, or maybe I’ll find some of that, or even a little brown sugar. The nutmeg and vanilla extract will be premium stuff (Mexican vanilla from Pensey’s I think). I suggest pre-mixing for up to a month (there really is science to show that the microbes die off in eggnog if the proof is what it should be). Yes, this means you must plan ahead.

I love the advice here about bourbon and rum etc. I am not above making a triple-barrel nog: rye or bourbon, rum and a teeny smidge of cognac. Not that you’ll be able to taste it. There is nothing wrong with straight rye or bourbon for this application.

The nice thing about making nog by the egg is that you know exactly what you’re getting — one egg, one shot (hefty or not), one tablespoon sugar etc. This is way better than having a quart pitcher of eggnog in front of you, with no idea of what’s in it — then finding the pitcher is half gone, you need a nap (have become tired and emotional) and you have no room for dinner.

In any case, it deserves a cheer either way. Assuming you use good ingredients (farm egg and milk, spirits that are not too good to share with a friend, quality sweetener, no chemicals), there is nothing so delicious and restorative in the winter dark than a good egg nog.


Model and Process, part I

Model and Process, part I published on

Being called on to pinch-hit for a colleague at GSLIS (and seriously, it’s an honor to be asked), I am today pondering the relation between “document modeling” (especially but not only as a sort of “data modeling”) and the design and implementation of processes, that is, actual workflows that perform actual operations and achieve actual results.

To describe it thus is to see this immediately as a deep set of questions. (Not only are the questions themselves, but the set is deep.) Yet many or most even of those who are students of these dark arts, ever much ponder them, pretty much going on our ways developing models and deploying (and even operating) processes, without much thinking about how truly dependent on one another these are.

It is not that one must always devote a document model to a process: document models can be designed independently of any actual or particular process — and have been so much so it puts me in mind of what Mark Twain is said to have said, when asked if he believed in infant baptism: “Not only do I believe in it; I’ve seen it”. Indeed, this activity is theoretically necessary (or at least that argument can be made), and to design such models  (to be “application independent”) — and to design systems that support the design (and yes, ironically, the processing) of such models is work of paramount importance. Yet at the same time, it is only when we actually try to do things with actual data — leveraging our modeling, that is to say, and capitalizing and profiting from our investment in it — that we discover the true outlines and limits set by our models along with (and reflecting) their powers. (Well, that is not strictly true, as some people are possessed of enough perspicacity to be able to envision and thus anticipate the limits of a design, without actually running the thing. But these people are rare, and tend not to be listened much to in any case.)

Thus there is a theoretical as well as a practical interest in process, as well as model, as indeed there can be an abstraction of process too — models of process, as are specified in multiple layers of our system in any case, in its various software components designed to interface with each other in various ways. It’s models all the way down. But what enables the models to layer up is the processes in which they are expressed and transformed.

Maybe “model” and “process” are simply projections of “space” and “time” in the imaginal worlds we construct and actuate in our systems building and operation? Space can be thought of without time, if geometric relations can subsist among however many spatial directions there may be, straight or bent, outside of time. (Let no one come in without geometry, as it said over the door of Plato’s Academy.) But time moves, is not static, and in one direction only, even as it ebbs and flows and branches and aligns, as time lines (it may be, however we might define such a thing) cross and rejoin other threads or strands of themselves. With time, space is more than just a cold, abstract unchanging field of potential. It has energy, becomes capable of transformation, a setting for organic emergence.

Is this what we also see within the simulacra we build in the machine, whose laws of physics are so different from ours? Add process to model, that is, and you really get legs. A process must always have or work with a model, even if only implicitly, so the model comes first. But it is only when we put fuel in the line and start cranking that we find out how the engine runs.

XML vs/and HTML5

XML vs/and HTML5 published on

One thing we learned at the Balisage 2014 Workshop on “HTML5 and XML: Mending Fences” of two weeks ago is how vast a gulf is …

I thought the day went pretty well. Highlights included a knock-down survey by Steve DeRose on requirements for hypertext (not always so well addressed in the web stack); a demo by Phil Fearon showing the level of polish and performance that can be achieved today (at least by an engineer of his caliber) with XML/XSLT/SaxonCE in the browser; and the redoubtable Alex Miłowski reflecting ambivalence (or at least this is the way I took it, very sympathetically): regret for missed opportunities and concern for the future, mixed with an unshakeable sense of opportunities still in front of us.

Most of us in the room were ambivalent, probably, albeit all in our own ways. We were treated, by Robin Berjon (who did a fantastic job helping the Balisage conference committee organize and run the event) and by Google’s fearless and indomitable Dominic Denicola, to an examination of what HTML5 will offer us, including features — web components — that promise the capability of hiding “shadow DOMs” in the browser presenting arbitrary markup vocabularies (which in principle includes descriptive markup languages) and binding them to browser semantics, allowing them to be handled and processed in CSS and from script using standard APIs. Awesome stuff.

On the other hand, web components (notwithstanding prototypes) are still a distance off, and no one knows like this crowd how the most well-meaning standards initiatives can go awry. (Can anyone say “XML namespaces”?) Plus, this was an audience already satisfied (at least for the most part) that traditional XML document processing architectures — layered systems often running where browsers never see them — are meeting our needs. Not only has XML not failed us, it is a phenomenal success; on top of the many excellent tools we have, all we really want is more and better tools (sure, we’ll take web components); better integration in application stacks of all kinds; and — above all — more curiosity, sympathy and maybe even understanding from ambitious hot-shot developers who aspire to make the world a better place.

I mean, we came to listen, and we did, but I’m not sure anyone was there to hear us but us.

I hasten to add, of course, that this isn’t (or isn’t just) a matter of feeling unappreciated. To be sure, an audience that included technical staff from organizations such as (just to name a few) the US House of Representatives, the Library of Congress, NCBI/NLM (publishers of PMC), NIST, ACS, OSA, and other sophisticated publishers both large and small — people who use XML daily to get the job done (and web browsers too, if only for the last mile) — found it a bit jarring to hear our tried-and-true solution (which indeed is such close kindred to the web) described as a “pet project”, and therefore deemed unworthy of the attention of an important company that builds browser software. But this wasn’t the issue. More than anything, we need not recognition or respect (heck, this is a crowd of introverts, happy not to get too much attention) — but support — young blood, new, active and imaginative developers who will help us not to retire and forget our working systems, but to maintain, extend and improve them.

Instead, we are being offered a new platform on which to relearn old lessons, however disguised they will be in new syntax and technical terminology. And what is worse — the message we hear being told to others is that the successful technologies and solutions we have built do not matter, will soon obsolesce, and deserve no further consideration in the wider economy, to say nothing of investment.

Yes, I exaggerate! I didn’t hear any of this said, at least in so many words, by anyone last August 4. These were just implications hanging in the air.

Yet the sense was unmistakeable that these two cultures were frankly baffled, each by the other. One culture (“the web”?) deliberately limits its scope of interest to the web itself – necessarily and rightly so – and so it must perforce assume that the HTML web and its browser (name your favorite vendor here) are the be-all-end-all, the only communications medium a civilization would ever need. (I know this is a caricature here. Feel free to argue against it.) Another culture (call it “XML”) values standards in text-based document markup not because they are a driver for adoption of anything in particular, but when and as they support adaptability and heterogeneity — of applications and of needs and purposes — on a robust, capable, versatile, open and non-proprietary technical ecosystem — that is, not on one platform or another, but on whatever platforms work best, today and then (differently) tomorrow.

So why are XML developers regarded now as lacking vision? Because we live in a world bigger than the web, whose edges we do not claim to see?

Set this aside, the little voice tells me: it doesn’t really matter. Instead, come back to that unshakeable sense of opportunity that Alex Miłowski communicated. This could work: this does work. We have XML, XSLT, XQuery: the tools are there, and the work is being done. There is no shortage of strong ideas circulating in the XML space. (Over the course of the next few days, Balisageurs including David Lee, Hans-Jürgen Rennau, John Lumley, Abel Braaksma and others showed this well enough.) And HTML5 does not have to be a competitor any more than other formats, both data sources and transformation targets: like PDF, HTML, CSV, you name it, HTML5 will be a tool for us to use, for the work it is good for.

Looking at the OHCO at Balisage 2014

Looking at the OHCO at Balisage 2014 published on

The Balisage 2014 program is up, and I’m on it, Thursday Aug 7 at 9am. See

The LMNL project I started over a decade ago with Jeni Tennison (or the “rump version” thereof, since everyone else has moved onto other things) is still offering me intellectual rewards. It turns out a range model like LMNL’s is very useful for addressing the question of what one is describing when one marks up a document, because it forces no predisposition to one hierarchical rendition or another (or any) before the document description is mature. So the “ontology” of the markup can remain much looser, to develop iteratively.

This means that LMNL is useful not only for models of texts that have honest MCH (multiple concurrent hierarchies), such as models of poetry that present both verse and grammatical (sentence/phrase) hierarchies together, but also for examining texts that show structural anomalies — the kind of thing that makes us wonder whether and where the hierarchies and “containment” are even to be found.

Since Mary Shelley’s Frankenstein shows such a structural anomaly, it makes an interesting case study. Formally, it’s a hybrid between an epistolary novel (in its framing narrative) and a more conventional first-person narrative (with a long embedded narrative in the center). But overlap between these two structures (at least in almost every printed edition) gives pause: it frustrates a clean and simple representation in a hierarchical model such as a conventionally encoded XML version, and raises questions about the coherence of its representation. Is this a feature (of a gothic horror romance), or a bug?

Interestingly, LMNL offers a way to demarcate the parts of the book without organizing them into a single hierarchy or any hierarchy at all. This turns out to be useful for asking questions about this work and hence about the idea of the OHCO in general, as applied to texts that were not already committed to hierarchical forms in their composition.

Or were they? LMNL helps us ask.

New Republic takedown of DH

New Republic takedown of DH published on 1 Comment on New Republic takedown of DH

Adam Kirsch’s takedown of digital humanities in the New Republic tries its best, but ultimately disappoints.

After taking a few easy potshots and tossing forward a few casual examples and over-generalizations, he concludes somewhat lamely with this: “The best thing that the humanities could do at this moment, then, is not to embrace the momentum of the digital, the tech tsunami, but to resist it and to critique it. This is not Luddism; it is intellectual responsibility.” But what digital humanist would want any less? Indeed, hasn’t he been listening: isn’t this the whole point?

(We pause to note the somewhat comical image of standing on a breakwall, Canute-like, resisting a tsunami by offering a critique of it.)

Similarly, his peroration is reduced to the usual highbrow clichés:

These are the kinds of questions that humanists ought to be well equipped to answer. Indeed, they are just the newest forms of questions that they have been asking since the Industrial Revolution began to make our tools our masters. The posture of skepticism is a wearisome one for the humanities, now perhaps more than ever, when technology is so confident and culture is so self-suspicious. It is no wonder that some humanists are tempted to throw off the traditional burden and infuse the humanities with the material resources and the militant confidence of the digital. The danger is that they will wake up one morning to find that they have sold their birthright for a mess of apps.

That bit too has another funny notion, namely that a posture should be a burden — one wonders why, if one’s posture were a burden, wouldn’t one straighten up or do whatever one should do to relieve it? presumably the answer being “well that depends”, and I take it that Kirsch feels the posture of skepticism is still worth undertaking. (It’s a tough job but someone’s got to do it.) But I think he’s got it exactly backward: if those who assume responsibility for “the humanities” don’t step up and engage directly with the digital — apps is exactly what we’ll be left with. The apps, after all, will be there in any case. It’s only a matter of whether there’s anything demonstrably better to offer against them. Like, using digital machinery not just to create game spaces, diversions and echo chambers, but also to help understand and document our actual world and its actual history.

But finally, one thing puzzles me. How does Adam Kirsch figure that DHers recommend that humanities students and scholars stop doing whatever it is that they’re doing (or are supposed to do, in his mind or anyone else’s)? What DHer has ever said that the old-fashioned work isn’t worth doing? Why can’t this be about expanding the pie (to use another cliché)?

Liking paper after all

Liking paper after all published on

Over on Wired is a smart piece by Brandon Keim on why we like reading paper after all. (I found it in the aggregator.)

My pattern these days is to peruse the tablet for morning newspaper time, which is now also magazine and aggregator time, and paper for bedtime reading. (I am reading Zite, which is called “the aggregator” in our house. For complicated reasons, I don’t much read in Flipboard, however pretty it is. Maybe I’ll write about that. I hope I like the new Zitified Flipboard when the day comes.)

But I still want paper whenever I want to dig in. Keim describes text on screen as “slippery” and more difficult to retain, and I can confirm this. He also cites research. (It’s always reassuring when Science corroborates what we knew anyway: this is science we know we don’t have to discount.) Or, I would qualify, it’s not so much that paper makes for better retention. It is just makes for a safer space, allowing the mind to quiet enough to hear the quieter tones and inflections and feel the texture of the text. (In turn, I suppose these may be conducive to better retention.) A book is going to be what it is, while text on a screen, even on a tablet, is always offering, unsettlingly, to transmogrify into something else — if nothing else, then into another text. I suppose this “mind-quieting” theory also helps account for why this is so subjective. It’s related to the distractability factor but not limited to it. The researchers cited by Keim remind us of how much information we are getting from the codex format, implicitly and passively, and this is important too: physical, tangible pages have a kind of grounding effect.

The same thing goes for a printed PDF of a research paper or scholarly article. One can better see it for what it is, and isn’t, when it is given space and material (paper!) of its own, even if it’s just a stapled set of 8½x11s.

This all bears on what I was thinking about in relation to Renear’s Strategic Reading. As long as my primary purpose with a text is to assess it and assimilate it, the screen is fine. But to give the text a chance to write me (inscribe on me, change me) — requiring a receptive mind as well as an active one — then having it printed on paper first is a good first step. Paper is just a better instrument for that.



I promised the floors

I promised the floors published on

Since the last set of shots didn’t actually show much in the way of floors.

The new floors are cork, the latest eco-friendly surface. It is a bit vulnerable to rough handling but otherwise perfect (quiet, soft, not cold to touch).
The new floors are cork, the latest eco-friendly surface. It is a bit vulnerable to rough handling but otherwise perfect (quiet, soft, not cold to touch).
Heading into the master bedroom (still being unpacked).
Heading into the master bedroom (still being unpacked).
Looking back from the master bedroom through to the spare room at the other end of the house. The floor is continuous (installer did a good job with that).
Looking back from the master bedroom through to the spare room at the other end of the house. The floor is continuous (installer did a good job with that).