Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Erik Naggum's XML Rant from 2002: Disturbingly prescient (schnada.de)
86 points by batista on May 5, 2012 | hide | past | favorite | 114 comments


The title of this post should be "how not to disagree with others in a public forum". The tone is alienating and disrespectful, so much so that the valid criticisms of XML as a technology are completely buried by the arrogance and off-topic political and social commentary: "the current American presidency and XML have much in common", "XML is the drug-addicted gang member who had committed his first murder before he had sex, which was rape."

If we want computing to be a friendly and welcoming field and contribute to the betterment of humanity, we can't tolerate people who behave like this in public.


"If we want computing to be a friendly and welcoming field and contribute to the betterment of humanity, we can't tolerate people who behave like this in public."

"Friendly and welcoming" is not a pre-condition for the "betterment of humanity". If it were, only smooth talkers would thrive(like lawyers) and not assholes like Steve Jobs etc...

I would rather computing be a field where good ideas win, and if the person with the good idea is nice and welcoming, well that's just a nice bonus but shouldn't affect the evaluation of an idea in my ideal world which I know doesn't really exist.

Innovation should be easier and the notion that a good idea requires it be presented within some tolerable/unnecessary social parameters is artificially limiting and only provides an environment where less innovation happens.


Even though he's gone, Naggum is not without his own defenses for his abrasiveness. This one of his is especially applicable to your own comment.

http://www.xach.com/naggum/articles/3250519673662774@naggum....

(And of course meta-discussion about Naggum was fairly common when he was alive and active, so there are many more defenses and attack-backs.)


I think the worst of it was Naggum's utter conviction of his own moral superiority... at the same time he was gleefully dumping shit on others.

He's dead, BTW. Although he was smart and sometimes made good points, I do not miss him.


>I think the worst of it was Naggum's utter conviction of his own moral superiority... at the same time he was gleefully dumping shit on others.

How is that "the worst"? One's feeling of moral superiority and dumping shit on others are not contradictory --except if one's conviction of his moral superiority is wrong. Dumping shit on others does not make you morally inferior (in the same way that they could be), if that's what you suggest. If one is bad-mouthing some rapist, it doesn't mean that he's as morally bad as the rapist because of doing so.

That said, all this "moral" thing is beside the point: what Naggun was utterly convicted of was his _technical_ correctness, which is something completely different.


Dumping shit on others does not make you morally inferior

You're missing the picture. Naggum dumped on practically everyone. And while he would always argue that he had been provoked because everyone else were such idiots, I think a dispassionate observer would reach a different conclusion.

all this "moral" thing is beside the point

Ironically, Naggum would not have agreed with you on this point. He made a big deal of it.

Oh, and it really is "Naggum" with an "m" -- the HN title is wrong.


Can you please stop calling him Naggun?


Well, if everybody understands that I refer to Naggum, where's the problem?

(That's essentially the argument of the "omit semicolons / pro ASI" people -- since browsers grok it, where's the problem with abusing an error checking mechanism with our syntax?)

That said, what is it to you? I thought that was his name, probably because phonetically it sounds better in my language, so I misspelled it 2-3 times. Only know I found out it is Naggum.

It's not like I did it on purpose, or that I would continue doing it if I knew better, so I find your request absurd.


"I think the worst of it was Naggum's utter conviction of his own moral superiority... at the same time he was gleefully dumping shit on others."

You act as though he named people and dumped shit on them. He didn't. And data formats are not a "moral" issue.

The people who feel offended by this guy are clearly insecure as hell.


I disagree. Generally you are right, and discussions should be civil and measured, but some things are beyond the pale and deserve a proportionate response. XML is one of these. Obviously Naggum has written a rant rather than a technical explication, but he makes his point better than any white paper possibly could.

I find this to be one of the best pieces of polemic writing that I have ever seen. It's technically accurate, concentrates the abuse on the technology rather than the submitter, and is hilarious to read. It also perfectly captures my sentiments regarding XML.

I loved it when it was written, and I love it now. I'm sorry that it offends you, but if it gets even one more person to sit up and ask themselves whether they should choose XML for a new project they are writing, I feel it has served its purpose and justified the offense.


Indeed.

> Robbery is not just another way of making a living, rape is not just another way of satisfying basic human needs, torture is not just another way of interrogation. And XML is not just another way of writing S-exps.

>| I use XML on a daily basis and think it is a simple and intelligent way to represent data.

> A comment on this statement is by now entirely superfluous.

>| I would like to hear why you think it is so bad, can you be more specific please?

> If you really need more information, search the Net, please.

And condescension is not just another way of disagreement.


You may not have been present during XML's ascendency. Naggum's rhetorical voice was a totally appropriate response. Being shouted down by the XML goon squad got real old real fast.


Sometimes you get so frustrated with being polite and civil and have no patience but to let loose with as vivid a picture of the problem as you can possibly unleash.

Civility leads to complacency. That was the post of someone with some serious conviction, not a passive participant who was indifferent to standards.


>If we want computing to be a friendly and welcoming field and contribute to the betterment of humanity, we can't tolerate people who behave like this in public.

Other people can't tolerate overly-polite, overly-cautious, overly-friendly, overly-political-aware, overly-PC tone.

It makes us droozy and keeps the discussion in a level better suited for enterprise brainstorming meetings and committees. Everybody looses.

Tech and science leaders where very frequently passionate and prone to rant. Just think of Torvalds reactions to (what he thinks are) stupid ideas in the kernel list -- even if he's wrong, it helps keep everybody vigilant and agitated for his contributions.

>If we want computing to be a friendly and welcoming field and contribute to the betterment of humanity, we can't tolerate people who behave like this in public.

The aim is not to make computing a "friendly and welcoming field" (that would be the boy scouts), it's to make it an effective, productive and interesting to work in field.

If you start judging contributions to a field by politeness of "behavior in public" then you're doing the field a disfavor.

The biggest scientist in some field could as well be a huge jerkoff -- for example, Djikstra was known for his passionate rants and snarky tone against everything he considered a bad practice.


You're setting up a false dichotomy. There is a middle ground between behaving like an asshole and too much diplomacy. Do you really think respectful disagreement is the same as being "overly-PC"? Torvalds is a little harsh sometimes, but it's nothing compared to the disgusting rudeness of this rant.


Yes it's a great point. Too often we try and reduce things to black and white when nearly everything is a shade of gray.


I've never understood that "shades of gray" line. How about some color?

Edit: this seems to need clarifying. I understand what the line means. What I don't understand is why people use it so much when it repeats the error it purports to correct. In nearly every case where "reality is either 0 or 1" is wrong, "reality is some coefficient of a single variable between 0 and 1" is just as wrong. That is a poor way to champion the richness of reality. What's interesting is how it corresponds to the emotional crimpedness of a world in which everything is gray.


"black or white" is a dot. "shades of gray" is like an axis on which you can plot points.

"color" on the other hand is like a 3-dimensional space, and so points can no longer be compared and sorted. How do you compare green with blue? You cannot.

On the topic, you can compare the rudeness of two messages and you can also establish some threshold over which the rudeness is no longer acceptable. That's why people use the "shades of gray" metaphor, because it's still useful.

And indeed, life has color, but that's why life is complicated, which is why we feel the need to simplify its dynamics.


It's a gradient, not a binary.

Adding colors to the metaphor adds words with no increase of understanding.


Understanding is not disjoint from emotion.


What is this, arguing for the sake of arguing? He answered your question perfectly. Adding color for "emotions" makes absolutely no sense in this context.


No, but using a broken metaphor confuses things. The phrase:

"Don't see things as either black or white, the world has millions of hues"

conveys the message that situations are nuanced as well as being truer to how things are with respect to color.

So, it's better than some constrained metaphor about a BW gradient (much more since lots of people don't even know what a gradient is).


The metaphor isn't broken. What you propose is broken.

The word "gradient" isn't even part of the metaphor, so I don't see how that would cause confusion. What comes between black and white? Gray. Not millions of color hues. Shades of gray makes sense and fighting to try and get colors in the mix somewhere is only serving to muddy this up.

Edit: Actually, here's another perspective: millions of color hues come between black and white only if you're thinking about colors to begin with. (Black and white aren't strictly considered colors even.) So while I can see where you're getting this from, I still think it doesn't serve anyone well to try and spice it up from the very well understood meaning of the phrase.


How is he acting like an asshole or being rude? Torvalds is WAY ruder than this guy.

I don't know what you're projecting onto this essay to find it assholey or rude, but it's bunk.

Frankly, I'm disappointed that he failed to condemn namespaces.


To be fair, the rant was written in 2002. While namespaces existed as a W3C REC as of 1999, I don't recall them being used much in practice until significantly later than that (as is usual for anything going from committee spec to the real world). Had he written it later, he may very well have ripped on namespaces.


On the contrary, this kind of social incontinence inhibits the exchange of ideas. Most of the best people leave as the discourse degrades toward toxicity. Comp.lang.lisp was a dramatic example.

Diverting one's personal emotional issues into public technical debates is not courage.


It is important to have strong opinionated technical arguments unsullied by consensus building distraction. Naggum's writing here is not an example of that. He is opinionated about irrelevant non-technical issues. This type of behavior sabotages the efforts of those who might strive to actually win the technical argument on the merits.


How about this: Argue the point. Do it passionately or humerously or snarky if it works for you. The problem with the OP is his snarks and sarcasm are disconnected from his technical points. His actual technical point are few and far between, and generally not argued very well at all.


It's about being objective.


I can't believe the discussion about this article has devolved into a discussion about whether a particular writing style, which has been a witty and entertaining vehicle for argument for thousands of years, should be expunged from the programming community as if it were a pathology. A preference for civility is one thing, but an inability to distinguish between different pieces of writing according to their intent, effect, and value is another thing entirely. Naggum's target here is XML (a juggernaut) and the entire community behind it (another juggernaut,) not an individual poster. He isn't bullying anyone or singling anyone out personally. He also writes with a great deal of experience and insight into this particular problem, and his tone is very well-suited to communicate his experience and insight.

The programming community has its share, and maybe more than its share, of people who hurt the community more than help it because of how they treat other people, and also people who could learn to participate in a much more constructive way. The matter can't be oversimplified, though. There are no simple rules that can be applied. Sure, you could describe the post as aggressive, intolerant, and irritable. Most writing that can be described as aggressive, intolerant, and irritable could be improved by making it less so. Then again, most things described as "pungent" -- rotting garbage, my feet -- could be improved by making them less so, but not Camembert.


What you find "witty", I find boring. What is so interesting about dozens of ad hominem attacks? They're not even clever insults... just long adverbs (staggeringly) attached to playground names (idiot) and misogynistic, nonsensical analogies (rape).


Well, that's one way of reacting to it. Many people who found Christopher Hitchens delightfully witty decided he was no longer so witty when he became a neocon and started writing things they disagreed with. That only means they never appreciated him as a writer, but only as a cheerleader for their particular side.

misogynistic, nonsensical analogies (rape)

Being offended at horrible things treated lightly in a piece like this is the surest sign one has misplaced one's sense of humor.


How can rape be misogynistic when men can be raped? I suppose it could be both misogynistic and misandric...


"his tone is very well-suited to communicate his experience and insight"

I find this statement fascinating. I was unaware that a discussion of, for example, the breast-size preferences of American males was relevant to a criticism of XML. Indeed, I did not consider that discussion, or others in his rant, to be an effective method of communicating his experience and insight regarding XML.


Anything can become relevant through analogy. Attributes, explicit end-tags, character entities, and validation before processing are acknowledged as worthwhile features in the contexts SGML was originally created for, but the desire to have them in the contexts XML is intended for is compared to an infantile longing to receive the comforts a child receives from a mother, when one is no longer a child and is relating to someone who is not one's mother. Abstractly speaking, an expectation that develops in one context (a tiny infant nursing from his mother's swollen breast) may become ridiculous if it is transferred unmodified into a different context (a grown man who expects women to have breasts larger than his head.)


Analogy works best when it's short and pithy. The amount of text devoted to the 'analogy' should definitely not be allowed to grow several times longer than the exposition it's supposed to illustrate. By then things have gone long past the point where useful rhetorical glossing morphs into incontinent self-absorbed dithering.


The heart of his suggestions:

Remove the syntactic mess that is attributes. (You will then find that you do not need them at all.) Enclose the /element/ in matching delimiters, not the tag. These simple things makes people think differently about how they use the language. Contrary to the foolish notion that syntax is immaterial, people optimize the way they express themselves, and so express themselves differently with different syntaxes. Next, introduce macros that look exactly like elements, but that are expanded in place between the reader and the "object model". Then, remove the obnoxious character entities and escape special characters with a single character, like \, and name other entities with letters following the same character. If you need a rich set of publishing symbols, discover Unicode. Finally, introduce a language for micro-parsers than can take more convenient syntaxes for commonly used elements with complex structure and make them /return/ element structures more suitable for processing on the receiving end, and which would also make validation something useful. The overly simple regular expression look-alike was a good idea when processing was expensive and made all decisions at the start-tag, but with a DOM and less stream-like processing, a much better language should be specified that could also do serious computation before validating a document -- so that once again processing could become cheaper because of the "markup", not more expensive because of it. But the one thing I would change the most from a markup language suitable for marking up the incidental instruction to a type-setter to the data representation language suitable for the "market" that XML wants, is to go for a binary representation.


He basically suggest turning XML into s-expressions. Eg.

    <p>Hello, <a href="http://example.com">world</p>
Turns into something like:

   <p "Hello," <a <href "http://example.com">"world">>
Not an obvious win as far as I can tell, wouldn't like to fix an ummatched bracket in a document ending in >>>> rather than e.g. </form><div></body></html>. But your mileage may vary.

His other ideas have more or less been implemented, with varying success. XML-Schemas have been introduced which is more powerful that DTD's. Some schema languages can indeed do complex computations for better or worse.

The macro-expansion approach is how XSL-formatting worked, and it turned out it was not as powerful or elegant as the CSS approach. It made sense when coming from a publishing background, but didn't predict dynamic interactive web pages.


"Not an obvious win as far as I can tell, wouldn't like to fix an ummatched bracket in a document ending in >>>> rather than e.g. </form><div></body></html>. But your mileage may vary."

HTML would then be an acronym for Lots of Irritating Single Angle Brackets.


This is an interesting issue

<p>blah</p> is easier to 'see' than <p"blah">

But in a big HTML you'll still be lost if you don't use indenting or a tool. At the end of the file you'll still have several close-tags one way or the other

One way would be to rewrite this in an easier way (but less canonical). In math there are several ways of doing that, for example:

Operator precedence and some defaults simplify, so you can write 3+2x(3+4) instead of (3+(2x(3+4)))

But it would probably make things worse for a text markup language.

Edit: changing asterisks for x


Hm, my immediate thought was that it would be written like TeX, e.g.:

    \p{Hello, \a[http://example.com]{world}}
This doesn't quite match what he's saying but I think it's a little more readable than your version.


I think he's more suggesting that XML is a fail because it attempts to take patterns that were designed (and work well) for document markup, and just mindlessly apply them to a very different task - data definition.

XML's totally fine for markup. It's also totally fine for situations where the primary content of the file is the XML tags themselves. For example, it's got its share of irritating gaffs but overall I'm rather fond of XAML as a way to define user interface layout.

But I've also seen a lot of monstrosities in XML. Situations where the tags themselves comprise 80% of the file, and yet are almost entirely redundant. Situations where the markup primarily serves to obfuscate the data itself. Situations where a file format changes to XML, bringing about a switch to routines that use an off-the-shelf XML parser. . . that still manage to consume twice as many lines of code as the from-scratch custom parser they replace.


> Not an obvious win as far as I can tell

How ironic, then, that you got the brackets right on the second one but not the first one.

Adding extra quotes and spacing confuses the issue needlessly, too.


The attack on attributes was my favorite parts. Attributes are indeed a moronic mess.

Too bad he neglected to similarly attack namespaces.


Heh, Naggum. Why do we still talk about him? Did he build awesome things? Did he change the world? Affect so many lives by being a great teacher?

One thing he did do was hate a lot. You can find him throwing hate at whatever it is you hate, no problem. Perl, C++, XML... whatever (except Lisp), he probably wrote an article flaming it. So when you want to find a well-written article lambasting whatever it is you hate, you can look to Naggum.

And while he wrote those thousands of USENET posts, the rest of us were off building great things with the stuff he hated on. He was barely on my radar until he died, and looking through archives I realized he actually responded to me a couple times. I guess I was too busy making stuff to pay much attention.


I can't really be bothered to cite references since I'm typing this on an iPad and I doubt you'd care but I do like to point out for others than he has build great things and worked on influential projects (take SGML for example).

I can only dream to attain 10% of the technical prowess he achieved in his short life.


Erik was a jerk who ruined comp.lang.lisp for years. I can't believe people still praise him.


It's certainly a poor way to argue when you set yourself up so that unless the opposing side agrees with you on politics, morality, the U.S. government, education, human development, and oh yeah, portable document formats, they're going to be forced to disagree with you.


Only if the opposing side is an idiot who can't separate beliefs about separate things from one another. Naggum himself has a great rant on such "one-bit" people: http://www.xach.com/naggum/articles/3225130472400367@naggum....

But then, if you're knowingly arguing with idiots, it's typically done more for some weird sense of fun than for the purpose of actually trying to change their mind. Even an argument between supposedly smart people (even those aware of Bayes' Theorem) doesn't result in mind changes all that often. Actually changing someone else's mind about something they already have a relatively firm belief on is at least as hard as changing your own mind.


So it's only a poor way to argue if the opposing side is a normal human, is what you're saying.

If you think you can fully separate beliefs about separate things from another, there's a whole slew of psychology research that says you're deluded.


This has piqued my interest. What specific research are you referring to?


The point isn't that they are unable to separate their beliefs, it's that if you give me a slew of arguments (all of which are tangentially related) and I happen to disagree strongly with one of them, I'm likely to focus on that one and argue against that point. At worst, it causes me to reply about my opinion on George W. Bush (and not about XML) and at best it is a distraction and makes me feel like your assumptions are wrong (since I disagree with them on the political plane) and hence there must be a flaw in your higher level argument tying them together.

In my experience people who make such sweeping connections between disperate concepts, fields, events, etc usually have a very shallow understanding of the things they are tying together and tend to be conspiracy theorists, seeing patterns where none exist. Scroll through any "new age" section in your local bookstore and you will find books tying together religion, quantum physics, numerology, human health, evolution, poetry, and other areas in the hope that they are building up a wall of irrefutable evidence by sheer breadth alone. In the end, by their lack of focus, they end up having no deep argument at all and are generally profoundly wrong in at least one (if not all) of the areas they use in their arguments, deflating their entire point.


It is usual to correlate technology preferences with entrepriseyness - BigCo developers, in general, prefer java or c#, whereas startuppy devs, in general, prefer ruby or python, and nobody likes php except the millions of devs who use it.

Is there a similar correlation between technology preferences and political leaning, or religion? If I said I was Christian, or opposed abortion, or approved of gun ownership or of invading Iraq, would you assume, bayesianly, that I probably prefer XML to JSON?

Does a preference for lisp, for example, indicate a predisposition to atheism? What kinds of political or religious leanings might be inferred from one's position on the RDBMS-NoSQL spectrum?


Programming is a way of modeling reality and a preference might reflect a bit of your own mind. Then again, people aren't always consistent, so it might not necessarily correlate strongly. Nice remark though!


Yes, it's a rant. Why? Because probably the author is tired of explaining the same things over and over to the XML defenders

Really

JSON is better (and XML is crap) it's ridiculous, it's obvious, it's clear cut, and when people "don't get it" you start resorting to irony

Like this:

> Enclose the /element/ in matching delimiters, not the tag

Let's compare them

<blah>foo</blah>

To json

blah: foo

1 - size (json wins)

2 - matching. Here's an exercise, try splitting json with sed, and then try with XML. Sed is a good exercise, because it's limited to simple grammars, so it kind of gives an idea how much trouble it is to parse each one

3 - redundancy: what would mean if the line was <blah>foo</lala> This is a mistake, right? But I can assume the tag name is blah and the data is foo. This is a glaring redundancy in grammar, bound to cause trouble.

Not to mention working with json data (like python) is natural compared to work with XML data.


Now try to apply this to the real world. The most popular and widely used SGML-derived format by far is (X)HTML. Show me e.g. how a paragraph with a few embedded links is improved by translating to JSON.

It is not. Of course some kinds of data are indeed expressed simpler with JSON.

Another real world example is RSS. More data-y that HTML but still may contain mixed content. Would it be better if it had been JSON from the beginning? The answer is not clear cut.

So maybe JSON is not always the right answer. Maybe it depends on what kind of data you want to express. But then it suddely is not a debate with a "XML-defenders" which "dont get it". Then it is a discussion about which tool is approprite in a given situation rather than a war between tribes. And how do you boost you ego and sense of superiority with a boring debate about tools?


Popular != Better

"Show me e.g. how a paragraph with a few embedded links is improved by translating to JSON."

Even though HTML and XML are related, something like a web page is so ingrained that would be difficult to change

And JSON was never a markup language

The criticism of XML goes much more towards using it as a 'Key/Value' serialization format.

But you could imagine something like that (commas and colons would have to be replaced in a 'JSON' markup language)

{p: Read it on {href:news.ycombinator.com, a:Hacker News}}

Size gains, parsing speed gains.

RSS is not as clear cut, really, it's not a complicated structure, so it could be YAML for example

"So maybe JSON is not always the right answer. Maybe it depends on what kind of data you want to express"

XML has attributes, which I'm not against (unlike the article) and it has a more natural nesting of information. But for the majority of data, yes, JSON is better

This is about coder productivity (which is increased with JSON it's not even funny), either in Python/Java (even with the whole swiss army knife of XML libs they have)/C# or others.

Speed is also a issue. Encoding and decoding XML is slower, even with C speedups (example: Python lxml). And this is relevant for transformation of data or even sending it to the user.


This was exactly the point addressed in the OP. Quote:

* SGML is a good idea when the markup overhead is less than 2%. Even attributes is a good idea when the textual element contents is the "real meat" of the document and attributes only aid processing*

In your example where you want to add a small amount of markup to a large amount of text, SGML wins. JSON is a worse choice for such cases.

It seems to me that we have collectively solved these problems, which is why we have both JSON and XML. Naggum was right when he said that XML was being applied incorrectly, and Douglas Crockford solved the problem by introducing JSON.


"Yes, it's a rant. Why? Because probably the author is tired of explaining the same things over and over to the XML defenders"

If one is tired of talking, one's best option is to shut up.


"Yes, it's a rant. Why? Because probably the author is tired of explaining the same things over and over to the XML defenders"

If one is tired of talking, one's best option is to shut up.

Even better, if one's arguments are not being found persuasive, perhaps one should come up with better arguments, or reconsider their position...


He didn't say he was tired of talking. He said he was tired of REPEATING.

If one is too tired to read, one should be too tired to comment on the reading material.


> JSON is better (and XML is crap) it's ridiculous

Except that in JSON, strings need to be surrounded by double quotes. Why impose on human readers a syntactic requirement that only compilers/interpreters need?

From this respect, JSON is in the same mixed bag as XML is.

If you want to go down that slope, be consistent and push YAML instead of JSON.


"Except that in JSON, strings need to be surrounded by double quotes"

This is a funny limitation, since JSON came from Javascript and {a:1} is perfectly legal in Javascript

"Why impose on human readers a syntactic requirement that only compilers/interpreters need?"

Well, for human readers this can be bad, but <> can be bothersome as well.

YAML could be a better choice (or something similar to JSON but with better syntax)


> This is a funny limitation, since JSON came from Javascript and {a:1} is perfectly legal in Javascript.

That's only legal when the key is not a reserved word. I never use that form, as it's error-prone.


JSON has its fair share of shortcomings too, for example:

XML

    <p>Some <b>bold<b> text</p>
JSON

    { "p": ["Some ", { "b": "bold" }, " text"] }


I wouldn't ever mark up documents like that. JSON and XML are shitty languages for such things. So is html to be honest but we're stuck with it.

Markdown, rst or something are better. As long as it has a simple deterministic terse grammar that is... If you want you can stuff that inside XML or rst and attach metadata.


That was the point. Markdown too has its weakness: it doesn't scale very well - how do I markup chemical symbols, or gene names, or complex tables? We soon run out of keyboard symbols to denote many concepts...


i agree. I think We should create and use extensible languages (both written and spoken) that are better represented and understood both machine and human.

Mathematics, English and most notations are too full of nuances to represent things unambiguously.


  > Yes, it's a rant. Why?
Actually, the rant's author just liked/felt compelled to communicate that way, pretty much all the time. It really hurt him socially.


And that's how we seem to have transitioned to JSON, at least for some information storage and transition :) Remember when it was all the rage to use XML for RPC? and even in AJAX?


JSON and XML have two totally different application domains. For some XML is better for some JSON. People who think JSON is better never had a problem with changing data schemas, different parties and common interface.

In short XML is a lot more powerful than simple JSON. JSON is a subset of XML with a different syntax.


I like XML for streams of data or for data that has to be stored for a long time (like logs). I like JSON for everything else.


How is XML any more immune to schema changes?


Is not immune, but you have an explicit contract in the form of a XML Schema Definition (XSD).


XSD is not part of XML, it's a totally separate specification. JSON likewise has schema specifications like JSON Schema.

Also, Protocol Buffers are by far better a better schema language than XSD.


Hard to forget, given that you have to type "XMLHttpRequest" every time you do any AJAX (The 'X' in which stands for XML, iirc).


haha ah right! Excellent (and tragic) example! I've been using jQuery too much at work and been spoilt by $.get() ;)


No you don't, since 99% of the time you'd use some library for that, like jQuery.


My worst practical problem with XML is that it is kryptonite to doing conflict resolution in version control systems. I've eased my pain with one such document by translating the document elements to a directory full of files, each of which has the name of its unique identifier attribute and the contents of which looks something like

  === attributes ===
  a=b
  === contained elements ===
  id_1234
  id_4567
  === associated text ===
  This is the text associated
  with this element
Those one-line tags which are syntactically distinct from attributes or identifiers serve as anchors for the diff program to "do the right thing". For this application, I can omit whitespace text elements, and there is at most one #text element per containing element, so I can conflate them. Were there more than one, I could generate id's for the contained text elements.

Are there better ways of doing this? Tools?


Eh, as XML rants go I prefer this one from Graydon Hoare (lead designer of Mozilla's Rust): http://www.rdb.com/demo/XML/

It colorfully beats around the bush a bit, but its primary points are:

1. any lossless coding of bits can be cajoled into representing anything, the important question is whether it's a convenient coding.

2. the fact that so many XML formats have sub-languages embedded into strings means that nearly all XML processing requires another, higher level of software to fully parse the document. (it also is evidence that XML wasn't that convenient of an encoding to begin with).

3. just because you can write a document of XML expressing some logical idea doesn't mean that the idea is implementable. For example, the "Spacecraft Markup Language" (which is a real thing, or was).


Unfortunately, I lose interest in this rant amongst all the inane references to other things in the world.

Kind of ironic that an argument in favor of less verbosity couldn't be more verbose itself.


I feel the same way, except it has been quite a while since I've read a good usenet-style rant, so I indulged.

The correctness of an idea vs the expression of the idea are not related. You can be "right" and horrible at explaining or "wrong" and very persuasive and eloquent.

The comparisons with comments from Linus are good examples. He can be very abrasive and condescending, but is usually right!


The author's technical criticisms are indeed valid. But Linus Torvalds doesn't compare things to murderous, rapist gang members, or offer offensive social commentary on the culture of an entire nation, and I suspect he wouldn't find your comparison of him to this author very flattering.


Naggum's the name.


This part felt particularly applicable to the intellectual (and emotional) environment of young companies and the products they bring forth:

Many an idea or concept not only looks, but /is/ good in its infancy, yet turns destructive later in life. Scaling and maturation are not the obvious processes they appear to be because they take so much time that the accumulated effort is easy to overlook. To be successful, they must also be very carefully guided by people who can envision the end result, but that makes it appear to many as if it merely "happens".

The idea and the people that carry it out are intertwined because neither are static or even have any "essential" material whatsoever.

Worth reading just to have that thought articulated so well (once again).


This is kind of off-topic, but whenever I see a spate of, "Is RSS dead/dying?" posts, I wish they would focus more of the historical context around the W3C and XML standards and developer/browser maker reaction to that to split from the W3C to focus of HTML/CSS/JS

Are there any other end-user facing specs/apps built on XML beside beside RSS?

It always seemed redundant to me that the most popular RSS reader is Google's which is re-wrapping stripped down XML feeds back into a another HTML/CSS/Javascript presentation layer.

Maybe JSON readers would be more popular to implement.


I don't know what do you mean by "end-user facing specs/apps".

But if enterprise apps where the content is formatted in XML count, then there are thousands (I'm working on OpenERP, for example, which uses XML both to define views and to transmit content to the client).

Then there's SVG. And the Office formats, both MS' and LibreOffice's. And obviously XHTML. And XMPP, which drives Google Talk and clients to FB Chat.

Just look at the list: https://en.wikipedia.org/wiki/List_of_XML_markup_languages


> (I note in passing that the stereotypical American male longs for much larger than natural female breasts, presumably to maintain the proportion to his own size from his infancy, which has caused the stereotypical American female to feel a need for breasts that will give the next generation a demand for even more disproportionally large breasts.)

Is this how Norwegians see Americans?!


Given that breastfeeding has been declining in America over the past couple generations, it reflects a misunderstanding of American sexual pathology, if, in fact, it was meant as more than a throwaway comment.


This is how much of the world see Americans


No, this is how European psychologists/psychoanalysis guys in general see Americans.

As a general notion it's not that far for the truth, if you ask me, even if the casual mechanism to explain it is wrong.


> ...go for a binary representation. ... The question of what we humans need to read and write no longer has any bearing on what the computers need to work with.

We seem to be heading away from binary.

I think the bigger problem with XML is the XML stack. Many of the ideas are insightful, powerful, useful - but the embodiments are tedious (e.g. XML Schema, XSLT). OTOH, a problem is an opportunity


At first the hyperbole was somewhat amusing. But as he criticizes xml for being too heavy for what it is frequently used for, so are his diversions into politics and human behaviour. They end up becoming an obstruction to the meat of the text.


15 years ago I had Erik in my killfile. He's dead and I still can't escape him.


Wiki on Mr. Naggum for those curious about the man behind the rant: http://en.m.wikipedia.org/wiki/Erik_Naggum#_


This really shouldn't be news. The complaint seems to be that XML is no good for anything that isn't document/text heavy (i.e. doesn't fit in with the intent of SGML).

If you weren't already thinking this in 2002 then you probably weren't paying attention.


I noticed the reference to binary representations with some enthusiasm. For a variety of reasons, I find myself using HDF5 in instances where I might have chosen XML (or JSON for that matter) a few years ago.


"If GML was an infant, SGML is the bright youngster far exceeds expectations and made its parents too proud, but XML is the drug-addicted gang member who had committed his first murder before he had sex, which was rape."

I think he could have just left it there.


R.I.P.


Especially the part around 2/3rds of the text, about how to improve XML.

Also loved the jokes and metaphors. We need such people in tech discussions, most community leaders have turned teletubbies-nice to each other, to the detriment not only of a good flame-fight, but of the actual dismissal of brain dead ideas.

Instead of violently ridiculing proponents of bad ideas/software/etc to shame and (hopefully) sepuku, we tip-toe around them, or at best, just make light fun of them, like in the "Mongo is webscale" video.


I think rants like these makes us more stupid. It turns a discussion about pros and cons of various technologies into pissing matches. His colorful metaphors does not actually illuminate the issue at hand, but only serves to set up "studid" versus "smart" and appeals to emotion by hoping to convince the reader to the side of the smart ones, without actually arguing the specific points of the thechnology. (E.g. why is backslash obviously better than ampersand-semicolon as character escapes? If he have a valid argument for this he certainly does't feel the need to disclose the reason for the reader. He would rather make an elaborate infantilization analogy about breasts.)

I guess this kind of rant appeals to people who have a deep need to feel as part of small elite surrounded by a sea of stupidity, but doesn't have the capacity to understant how different tools may have various pros and cons in different specific situations.


>His colorful metaphors does not actually illuminate the issue at hand

I think quite the opposite, he makes a very compelling case, and even gets into the details. Even in the part you mention, for example you missed some stuff that answers your question (without hand-holding):

= = = Then, remove the obnoxious character entities and escape special characters with _a single character_, like \, and _name other entities with letters following the same character_. If you need a rich set of publishing symbols, _discover Unicode_. = = =

>I guess this kind of rant appeals to people who have a deep need feel as part of small elite surrounded by a sea of stupidity, but doesn't have the capacity to understant how different tools have pros and cons in specific situations.

It's not about "different tools have pros and cons in specific situations", it's about how some tools have defects that make them bad for EVERY SINGLE situation. CORBA comes to mind as another example.

It's not that something XML-like cannot be good for certain situations.

It's rather that XML as-it-was-designed has several flaws that don't NEED to be there, and don't benefit ANY use case.

For example, goto has some use cases in C that it is good for. NULL terminated strings on the other hand are a bad idea, and have no place anywhere.


  > he makes a very compelling case, and even gets into the details
He always made a compelling case, I would go so far as to say that he was almost always right. The problem with this style of communication is that it hardens people's positions, because the criticism of the position comes wrapped up with the implication that anyone who holds it must be an asshole. That makes it much more difficult for most people to approach the question from a purely technical perspective.


The parts you quoted between === from the article aren't at all a compelling case to me, that's nothing more than a prescription. If he has some argument here, he needs to make it.


It is not "making a case" to state that character entities are "obnoxious" and backslash should be used instead of ampersand. It is stating a preference without providing any kind of justification.

And what do you suggest is the meaning of "discover Unicode" in the context of a criticism of XML? XML was unicode from the beginning.


@jebblue

>Your use of capitalization reflects your emotional state of mind on the topic.

No, my use of capitalization reflects my frustration with being unable to bold words for emphasis.

Your use of psychobabble on the other hand reflects a lot more on your cultural background.


Your use of capitalization reflects your emotional state of mind on the topic.


Also loved the jokes and metaphors.

I did not: He tried to be clever, but only succeeded in looking like an asshole.

The technical criticism may have merit, the presentation leaves a lot to be desired.

We need such people in tech discussions

I don't. Thankfully, it's easy to ignore people on the internet.


I agree with you, and don't mind someone being blunt when they believe they're right and mainstream views are wrong, even when I disagree with the argument the person is making.

It often seems to result in a strong reaction in the opposite direction, however. I was amused when I read Ted Dziuba's "Node.js is Cancer" rant, and was able to separate the technical arguments from the invective and evaluate them on their merit. A lot of people, though, seemed to respond angrily with indignant forum or blog posts. So I suppose there's always the danger that being too blunt will result in your real message being lost in the resulting firestorm.


Dziuba's post has ended up the 3rd result on Google for node.js. Mission accomplished, I'm sure!


My favorite:

  Remove the syntactic mess that is attributes.
One of the many reasons I love json.



The parts of this that aren't about XML are stupid as hell.


While I generally dislike comments like this (and I wouldn't be surprised if it got downvoted on HN as lacking substance), this time, it fits. The "arguments" provided we're inane, providing no justification for the very-few actual suggestions contained therein.

What is particularly sad is that I completely agree with his comments on XML, and I try to teach others about the dangers of not using XML intelligently, but I would never refer to this author to back up my feelings.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: