Skip to content

Viral properties of schemas

Viral properties of schemas published on

Markup languages are memes as well as memetic, semantic and sometimes mimetic systems.

(Within XML, think TEI, Docbook, NLM/JATS, DITA, MODS/METS, vertical schemas. HTML, fer chrissake. Markdown. Grammar-based formats. Object property sets. Application binaries.)

I have written about markup languages considered as games; they can also be considered as genes. Indeed one might see a document (here in the very narrow, XML/SGML sense of that word) as phenotype to a schema’s genotype. The fact that formally, documents are always already valid to infinities of unspecified schemas (since an infinite number of valid schemas can be trivially and arbitrarily generated for a given document), obscures the fact that “in the real world” (that is, in the wider contexts of our knowing of it), a document does not stand merely as a formal abstraction (a collection of element types), but also has a history. As such, it is representative of its history as it is of its (purported) representations. That is, the aircraft maintenance manual is not only a description of an aircraft; it is also a maintenance manual. This history, often, implicates a particular schema.


  • Replicate (are copied across file systems)
  • Are modified and adapted
  • May be identified with “species”, which evolve over time
  • Are embedded in local systems, but also enable discrete semantic “environments” (which may be portable)
  • Occasionally cross the barriers between systems (often to be adapted on the other side)
  • Sometimes include mechanisms for their own adaptation

(BTW these are also all true of documents, at a more discrete level. Indeed, schemas are among the mechanisms that documents provide for their own adaptation. Note how you can stretch the concepts of “document” and “schema” here very widely, to all kinds of media, especially if you allow for the idea that schemas may exist implicitly without being codified.)

Unlike living systems (but perhaps like viruses), schemas cannot be said to have volition of their own. They get their volition from the environments in which they are promulgated. Perhaps they are more plant-like than animal-like.

Also the question arises as to whether (and to what extent) they are parasitic or symbiotic. One suspects they have to be symbiotic in order to encourage adoption. However, they clearly get much of their power from their network effects (the more people using HTML, the more useful HTML becomes to use) — and at a certain point, this may introduce stresses between the local goals (of HTML users themselves) and of the interests that promote HTML despite poor fitness to local goals.

Schemas are also the deliberate, logical/social/linguistic creations of people and of organizations of people. Can they be this, and also “viral” at the same time?

Model and Process, part I

Model and Process, part I published on

Being called on to pinch-hit for a colleague at GSLIS (and seriously, it’s an honor to be asked), I am today pondering the relation between “document modeling” (especially but not only as a sort of “data modeling”) and the design and implementation of processes, that is, actual workflows that perform actual operations and achieve actual results.

To describe it thus is to see this immediately as a deep set of questions. (Not only are the questions themselves, but the set is deep.) Yet many or most even of those who are students of these dark arts, ever much ponder them, pretty much going on our ways developing models and deploying (and even operating) processes, without much thinking about how truly dependent on one another these are.

It is not that one must always devote a document model to a process: document models can be designed independently of any actual or particular process — and have been so much so it puts me in mind of what Mark Twain is said to have said, when asked if he believed in infant baptism: “Not only do I believe in it; I’ve seen it”. Indeed, this activity is theoretically necessary (or at least that argument can be made), and to design such models  (to be “application independent”) — and to design systems that support the design (and yes, ironically, the processing) of such models is work of paramount importance. Yet at the same time, it is only when we actually try to do things with actual data — leveraging our modeling, that is to say, and capitalizing and profiting from our investment in it — that we discover the true outlines and limits set by our models along with (and reflecting) their powers. (Well, that is not strictly true, as some people are possessed of enough perspicacity to be able to envision and thus anticipate the limits of a design, without actually running the thing. But these people are rare, and tend not to be listened much to in any case.)

Thus there is a theoretical as well as a practical interest in process, as well as model, as indeed there can be an abstraction of process too — models of process, as are specified in multiple layers of our system in any case, in its various software components designed to interface with each other in various ways. It’s models all the way down. But what enables the models to layer up is the processes in which they are expressed and transformed.

Maybe “model” and “process” are simply projections of “space” and “time” in the imaginal worlds we construct and actuate in our systems building and operation? Space can be thought of without time, if geometric relations can subsist among however many spatial directions there may be, straight or bent, outside of time. (Let no one come in without geometry, as it said over the door of Plato’s Academy.) But time moves, is not static, and in one direction only, even as it ebbs and flows and branches and aligns, as time lines (it may be, however we might define such a thing) cross and rejoin other threads or strands of themselves. With time, space is more than just a cold, abstract unchanging field of potential. It has energy, becomes capable of transformation, a setting for organic emergence.

Is this what we also see within the simulacra we build in the machine, whose laws of physics are so different from ours? Add process to model, that is, and you really get legs. A process must always have or work with a model, even if only implicitly, so the model comes first. But it is only when we put fuel in the line and start cranking that we find out how the engine runs.

XML vs/and HTML5

XML vs/and HTML5 published on

One thing we learned at the Balisage 2014 Workshop on “HTML5 and XML: Mending Fences” of two weeks ago is how vast a gulf is …

I thought the day went pretty well. Highlights included a knock-down survey by Steve DeRose on requirements for hypertext (not always so well addressed in the web stack); a demo by Phil Fearon showing the level of polish and performance that can be achieved today (at least by an engineer of his caliber) with XML/XSLT/SaxonCE in the browser; and the redoubtable Alex Miłowski reflecting ambivalence (or at least this is the way I took it, very sympathetically): regret for missed opportunities and concern for the future, mixed with an unshakeable sense of opportunities still in front of us.

Most of us in the room were ambivalent, probably, albeit all in our own ways. We were treated, by Robin Berjon (who did a fantastic job helping the Balisage conference committee organize and run the event) and by Google’s fearless and indomitable Dominic Denicola, to an examination of what HTML5 will offer us, including features — web components — that promise the capability of hiding “shadow DOMs” in the browser presenting arbitrary markup vocabularies (which in principle includes descriptive markup languages) and binding them to browser semantics, allowing them to be handled and processed in CSS and from script using standard APIs. Awesome stuff.

On the other hand, web components (notwithstanding prototypes) are still a distance off, and no one knows like this crowd how the most well-meaning standards initiatives can go awry. (Can anyone say “XML namespaces”?) Plus, this was an audience already satisfied (at least for the most part) that traditional XML document processing architectures — layered systems often running where browsers never see them — are meeting our needs. Not only has XML not failed us, it is a phenomenal success; on top of the many excellent tools we have, all we really want is more and better tools (sure, we’ll take web components); better integration in application stacks of all kinds; and — above all — more curiosity, sympathy and maybe even understanding from ambitious hot-shot developers who aspire to make the world a better place.

I mean, we came to listen, and we did, but I’m not sure anyone was there to hear us but us.

I hasten to add, of course, that this isn’t (or isn’t just) a matter of feeling unappreciated. To be sure, an audience that included technical staff from organizations such as (just to name a few) the US House of Representatives, the Library of Congress, NCBI/NLM (publishers of PMC), NIST, ACS, OSA, and other sophisticated publishers both large and small — people who use XML daily to get the job done (and web browsers too, if only for the last mile) — found it a bit jarring to hear our tried-and-true solution (which indeed is such close kindred to the web) described as a “pet project”, and therefore deemed unworthy of the attention of an important company that builds browser software. But this wasn’t the issue. More than anything, we need not recognition or respect (heck, this is a crowd of introverts, happy not to get too much attention) — but support — young blood, new, active and imaginative developers who will help us not to retire and forget our working systems, but to maintain, extend and improve them.

Instead, we are being offered a new platform on which to relearn old lessons, however disguised they will be in new syntax and technical terminology. And what is worse — the message we hear being told to others is that the successful technologies and solutions we have built do not matter, will soon obsolesce, and deserve no further consideration in the wider economy, to say nothing of investment.

Yes, I exaggerate! I didn’t hear any of this said, at least in so many words, by anyone last August 4. These were just implications hanging in the air.

Yet the sense was unmistakeable that these two cultures were frankly baffled, each by the other. One culture (“the web”?) deliberately limits its scope of interest to the web itself – necessarily and rightly so – and so it must perforce assume that the HTML web and its browser (name your favorite vendor here) are the be-all-end-all, the only communications medium a civilization would ever need. (I know this is a caricature here. Feel free to argue against it.) Another culture (call it “XML”) values standards in text-based document markup not because they are a driver for adoption of anything in particular, but when and as they support adaptability and heterogeneity — of applications and of needs and purposes — on a robust, capable, versatile, open and non-proprietary technical ecosystem — that is, not on one platform or another, but on whatever platforms work best, today and then (differently) tomorrow.

So why are XML developers regarded now as lacking vision? Because we live in a world bigger than the web, whose edges we do not claim to see?

Set this aside, the little voice tells me: it doesn’t really matter. Instead, come back to that unshakeable sense of opportunity that Alex Miłowski communicated. This could work: this does work. We have XML, XSLT, XQuery: the tools are there, and the work is being done. There is no shortage of strong ideas circulating in the XML space. (Over the course of the next few days, Balisageurs including David Lee, Hans-Jürgen Rennau, John Lumley, Abel Braaksma and others showed this well enough.) And HTML5 does not have to be a competitor any more than other formats, both data sources and transformation targets: like PDF, HTML, CSV, you name it, HTML5 will be a tool for us to use, for the work it is good for.

Liking paper after all

Liking paper after all published on

Over on Wired is a smart piece by Brandon Keim on why we like reading paper after all. (I found it in the aggregator.)

My pattern these days is to peruse the tablet for morning newspaper time, which is now also magazine and aggregator time, and paper for bedtime reading. (I am reading Zite, which is called “the aggregator” in our house. For complicated reasons, I don’t much read in Flipboard, however pretty it is. Maybe I’ll write about that. I hope I like the new Zitified Flipboard when the day comes.)

But I still want paper whenever I want to dig in. Keim describes text on screen as “slippery” and more difficult to retain, and I can confirm this. He also cites research. (It’s always reassuring when Science corroborates what we knew anyway: this is science we know we don’t have to discount.) Or, I would qualify, it’s not so much that paper makes for better retention. It is just makes for a safer space, allowing the mind to quiet enough to hear the quieter tones and inflections and feel the texture of the text. (In turn, I suppose these may be conducive to better retention.) A book is going to be what it is, while text on a screen, even on a tablet, is always offering, unsettlingly, to transmogrify into something else — if nothing else, then into another text. I suppose this “mind-quieting” theory also helps account for why this is so subjective. It’s related to the distractability factor but not limited to it. The researchers cited by Keim remind us of how much information we are getting from the codex format, implicitly and passively, and this is important too: physical, tangible pages have a kind of grounding effect.

The same thing goes for a printed PDF of a research paper or scholarly article. One can better see it for what it is, and isn’t, when it is given space and material (paper!) of its own, even if it’s just a stapled set of 8½x11s.

This all bears on what I was thinking about in relation to Renear’s Strategic Reading. As long as my primary purpose with a text is to assess it and assimilate it, the screen is fine. But to give the text a chance to write me (inscribe on me, change me) — requiring a receptive mind as well as an active one — then having it printed on paper first is a good first step. Paper is just a better instrument for that.



“Strategic Reading”: Renear at JATS-Con

“Strategic Reading”: Renear at JATS-Con published on

Video of JATS-Con 2013(4) is here: Day One; Day Two. My demo of visualizations of data encoded in NLM/JATS XML (or with adjustments, of any XML data) starts around 2:46 of Day Two (I went first in the open session).

But the reason for my post today is Allen Renear’s keynote. (Well worth the watch, it starts at Day Two, 4:58.) As always, Allen is both revelatory, and provocative. Every time I hear him, things come into better focus: where all this is coming from, where it’s aiming, and where the rub is happening between fundamental principles, and present-day exigencies.

His talk last week was on what he calls “strategic reading”, which I think is both profoundly incisive, and in its way troubling. Incisive because this is, indeed, the shape of things to come. What Allen describes sounds correct, both as a description of what is happening, and as a tendency and a trend. Troubling because I, for one, can’t help be concerned about what we risk losing next to what we gain.

Don’t get me wrong: I am all in favor of “strategic reading”, and I do quite a bit of it myself. Allen is describing the way we now scan and read at once, dipping in and out, making assessments at a distance, making choices even before we read, before we commit time and effort to deeper engagement. Electronic media and (where we have it) the strong encoding behind it (behind them, I am quick to correct myself) facilitate this inasmuch as they allow us to aggregate and filter according to criteria selected in advance–to foreground, highlight and dramatize significant content before we even know it is there. (Back to my demo on visualization of document structures.) Especially in an age of information overload, when so much of what we see is only a distraction, this is necessary and inescapable. We will even have serendipity engines, or so they tell us. (Unless that’s another case of Artificial Intelligence Meets Natural Stupidity.)

Yet at the same time, the other voice says this isn’t really reading, but a strategic avoidance of reading. Not that that isn’t perfectly fine, in its way (we’re certainly not going to read all that stuff). But it doesn’t offer the rewards that I once learned can be won, with effort, from a well-wrought text, serving as an occasion for a kind of contest of mind, a discipline of attention with an unknown outcome.

So much of what we “learn” isn’t learning at all, but only reinforcement. We only become more like what we were already (as Gertrude Stein said of Americans between the wars). I can’t help but wonder whether this is enough. I also want to be changed by what I read.