Why you shouldn’t create another markup language
A while back, Gareth Rushgrove quoted me in DSLs for HTML and CSS - The Future, or Just Plain Wrong? where I said: I’m not sure why everyone insists on clumsily reinventing HTML every few weeks (eg. wiki syntaxes, of which there are hundreds)
.
Gareth posted this in the context of Haml and Sass, Rubyist abstractions for HTML and CSS. People think that when I bitch about these things, I’m saying that they are bad. They aren’t intriniscally bad. Problem is, taken as a set, they are a pain in the behind. I already know a language that allows me to express inline differentiations in a document. It’s called HyperText Markup Langauge, or HTML for short. I don’t need to abstract HTML, because HTML isn’t that complicated. a for links, em for emphasis, strong for strong, q for quotes, kbd and var for inputtable strings and variables, img for images and so on.
But, the thinking goes, that’s too complicated for Ordinary People (something of a myth really). So we reinvent it. The first clumsy reinvention of HTML I can remember is BBCode, which really just takes the angle brackets and replaces them with square brackets. If you need to remember the markup mapping, that’s not actually much of an improvement. Why is [url=http://example.org]Example[/url] any better than the same in HTML? It’s not. It’s slightly more convenient for programmers though.
Then more recently, we’ve started getting things like Textile and Markdown. Of the two, I think Markdown is preferable, although I’ve recently been trying out Textile and it’s just about okay.
But then there’s wiki syntaxes. If you pick the main wiki engines, they all use different syntaxes. I’ve got the MediaWiki syntax burned into my brain. But if you try any other wiki engine, they all have different syntaxes, and it gets very annoying, very quickly. All this means is that I have to remember ten different ways of making something italic or making a link. What’s the damn point? I know how to make a link. I use an a tag. At the very most, I can see space for a separate wiki syntax - only one though. My choice would be the MediaWiki syntax. I’ve actually gotten to the point of not contributing to wikis that use anything other than MediaWiki syntax, and I’d like it if other people were to follow the same rule. We should turn the MediaWiki syntax into a de-facto standard for all wikis, just for pragmatic reasons. The vast majority of people likely to click ‘edit’ on a wiki article will probably start on Wikipedia or another MediaWiki server. And, well, the syntax has been pretty well tested - en.wikipedia.org has over eight million reigstered users and a shit ton of unregistered users. If you have a wiki that uses anything other than MediaWiki syntax, you better have a damn good reason. I haven’t yet seen a damn good reason.
There is maybe a case for Textile and Markdown. But no more of them. There’s too many already.
And if you are thinking of allowing people to post rich text, your starting point ought to be HTML. Yes, people will have to learn HTML. So provide a link to a funky little pop-up with all the basics you need. And, yes, you’ll need to run the input through an HTML input filter like Tidy (be sure that you use one that is aware of XSS and other potential security hazards, and preferably one which has a comprehensive test suite). They exist in most languages you are likely to be building applications in, so what’s the problem?
Just don’t make me learn yet another “lightweight markup language” or I’ll get a crane, wait until you are sitting on the toilet and drop on top of you a collection of all the books ever written about HTML, including my old copy of Teach Yourself HTML 4.