arguing semantics
One of the catchphrases bandied about in the Web 2.0 internet world is “semantic markup”. Even for those who are familiar with the word “semantics”, though, when I bring up the term to many laypeople I’m met with a blank and slightly frightened stare. Semantic Markup is a really important cornerstone of modern web design and methodology, so it’s really worth getting people to understand what it is, and why it matters. The problem is that most of the time those who don’t understand the term are too intimidated or embarrassed to pipe up and ask for a clear definition, so the concept can often become just another one of those techy terms that no-one gets, and no-one therefore really values.
One example I’ve seen lots of times in web design books to illustrate semantic markup is the difference between using i and em to produce italicized text (well, basically — see below). Most modern web dudes argue that em should be used because rather than describing the appearance of the words in question it describes the meaning, or put another way, em defines why the enclosed text is differentiated from the other text around it while i simply differentiates it without indicating why.
To which lots of laypeople will reply with the argument, “if they’re both making the text italicized what’s the difference?”
The fact that they’ve answered this way means that they’re not thinking about the real scope and purpose of either markup language or the web, and if you’re not thinking about the big picture, you’ll never be able to figure out what place semantic markup has in that big picture. So let’s take a second to explore a bit of background before we consider the issue of semantic markup.
First though, a hypothetical dictionary definition: “Semantic markup is a methodology which dictates that all content of a document should be enclosed with tags that most define that content’s meaning and function within the document.” That makes sense … unless you don’t understand what markup really is.
So let’s back up a bit and succinctly describe what our modern perceptions of markup languages and their role in communicating content across the internet and beyond really is in this post Web 2.0 world of ours. XHTML (which is really just a version of XML for web pages) is supposed to be a device-independent language which uses tags to provide semantic structure to a web document — and by semantic structure I mean tags which define the meaning and structural purpose of the enclosed text.
However, note the term “device-independent”. In other words, these documents are documents which are supposed to be written without any foreknowledge of the device in which they may ultimately be digested. For many of us, XHTML is going to be read on a browser agent, a web browser like Safari or Firefox, using our eyes. But device-independent documents may very well be read on portable devices, or not even read at all, but spoken, like with screen readers which speak the text aloud.
In other words, the real purpose of semantic markup is to provide meaningful structure to a document not just for people who read it, but for anyone at all who might need to digest the document in any kind of way at all — even if that “person” isn’t a human being at all, but a machine (like a Googlebot or the API of a program — semantic markup is particularly important to bots visiting a site for search engine ranking).
That is why the meaning of a document’s structure is important, and not its appearance, and why the two terms should be treated as mutually exclusive.
To return to the classic em versus i example, making text appear italicized has no meaning to someone who can’t see the italics. But ascribing emphasis to a segment of text communicates its meaning regardless of how the user is digesting the information. Communicating the meaning of the emphasis is what makes it semantic. How it is dealt with by the so-called user agent (i.e. browser or screen reader or whatever) is an entirely different matter.
All web browsers have prebuilt styles for displaying all HTML markup, from em to code. I’m pretty sure that all of the current crop use italicized text by default to render content enclosed in an em tag. This makes perfect sense because the de facto method for communicating emphasis in writing (in Western languages at least) is through italicized text. Whereas as a matter of fact using the em tag to make things like corporate names italicized is probably breaking the rule of semantic markup, unless the author of the document specifically wants emphasis placed on those corporate names. I’m guilty of using the em tag for any and all needs of italicized text, and I know a lot of other people are too. Perhaps I should consider revising this to use the em tag only for words and phrases that are designed to have emphasis placed on them, and save the visual needs of italicizing corporate names and the like for the i tag, since that’s purely a visual construction.
Anyway, to make a long story short, selling people on the notion of semantic markup is diffcult unless they know that web sites are supposed to be device-independent. If they can learn to think of the content of the web sites as independent from the style of web sites (a cornerstone of enlightened web design), they can begin to grasp the concept of why choosing the most appropriate tag to define the meaning of the structured content is a really valuable thing, and why the difference between em and i is a lot greater than appearances might suggest!
Technorati tags: semantic markup, web design.