Skip to content

Are XML tags sharp objects?

Are XML tags sharp objects? published on

Start and end tags, no, they are not sharp, despite appearances. They will generally not poke or hurt you as long as you keep them properly closed (that is, every start has its end inside the same parent). Tags written with angle brackets indicate structure, bracing the XML document, holding everything in place. They are your friends.

The really bad tags in XML and the ones you have to watch out for are the entity references, the things that start with &. Think about what & means to an XML parser. It sees & and it doesn’t know what comes next. It looks for a name. (Let’s hope it finds a legal name before it hits ;.) Finding a name, it looks it up. (Let’s hope it is able to find someplace to do so.) It splices in what it says. It then goes back.

This is a precarious operation. Stuff supposed to be “XML” fails to parse all the time, not because its element markup is awry, but because its entities are not resolving correctly, if at all. And if even a single entity reference fails, the document cannot be processed. Use entities only with care. Don’t assume they’re safe just because you’ve seen them a lot elsewhere (such as in HTML).

Note that XML character references look like entity references, but aren’t. It’s pretty safe in XML to refer to a character in Unicode by its number, such as 
 (the LF character) or (its hexadecimal equivalent) 
.

Watch out for your entity references! They can break your documents when they move across boundaries, if their declarations become lost. To have standalone XML (this means well-formed, but also entirely self-contained) you should avoid any entity references that have to be declared. Which is pretty much all of them.

Categories