You know what I want? Schemas. And clear error messages. I want to know beforeha...

vlovich123 · on Oct 3, 2020

Haven’t you described cap’n’proto, protobuf, thrift, flatbuffers etc?

I know cap’n’proto also has fantastic support for using the schema for config files. You can just compile any constant as a stand-alone serialized message that you mmap into your code in a safe way. It can’t do complex math and things (at least yet) but you can express lists, dictionaries, and reference other constants, so as a config file replacement I love it. I’ve also found the format to be far more regular and consistent than you get with things like text protobuf (you’re still using the schema language instead of another format)

quietbritishjim · on Oct 4, 2020

> You can just compile any constant as a stand-alone serialized message that you mmap into your code in a safe way.

Are you suggesting using a binary format for your config files? I think most people would find that more trouble than a decent text format.

> ... than you get with things like text protobuf

You can just use protobuf's canonical JSON representation (thought the lack of ability to use comments is annoying).

vlovich123 · on Oct 4, 2020

You store your configuration as plain text in your repository and whatnot. When it comes to deployment you just compile it to a binary file.

Cap’n’proto also has plain text and JSON serialization formats if you really want to have your deployed config file be directly human-editable and deserialize from that. I was just noting a very cool feature of having your config written in cap’n’proto and it’s what Cloudflare uses to maintain a bunch of config internally if I read Kenton’s allusions to it correctly.

vlovich123 · on Oct 4, 2020

Just to be clear, I'm saying you use Cap'n'Proto constants to store your schema: https://capnproto.org/language.html#constants

You can then compile it into whatever format (JSON, plain text, binary) that you want for actually reading it from disk.

LibertyBeta · on Oct 4, 2020

I think the parent is typing to say that the data is stored in a map which is read to a proto, etc. Kinda like what GPRC does over HTTP. Which kinda makese sense. The schema gives you a great idea of what "should be", and the typing/errors/etc are understood by the host language.

pookeh · on Oct 3, 2020

We had that a decade ago. It was called XML and XML Schema. All IDEs support it.

JSON was a huge step backwards in the name of simplicity. And now when we are going to add similar functionality to JSON, something else is going to come out in the name of simplicity (like NestedText).

m463 · on Oct 3, 2020

I think XML sort of failed simplicity.

In a minute I can read and write json from most languages I use.

In the same amount of time, I'm still wondering if I should use a tag or attribute in xml. cdata? expat?

It's not that xml isn't a good technology. It's that it's not appropriate for general use, especially in comparison to simpler alternatives.

vbezhenar · on Oct 4, 2020

If your child node has unique name among its siblings and does not contain nested nodes, then it's an attribute. Otherwise it's an element. Seems pretty obvious to me.

The fundamental issue with XML is its impedance mismatch with common data structures which forces using Object to XML mappers (whether explicitly or implicitly). It's more or less solved with XML Schemas or DTDs, but if you're looking at just XML, you can't tell whether some element is an array or a single node. Thus JSON is better suited for serialization.

quietbritishjim · on Oct 4, 2020

> If your child node has unique name among its siblings and does not contain nested nodes, then it's an attribute. Otherwise it's an element. Seems pretty obvious to me.

That is really not what attributes are for. I feel a bit of a fraud posting that because I'm not an XML expert and so not really clear what they actually are for. (This reenforces the parent's point: you need to be an expert to know what such a fundamental feature is for.) I remember it's something like "something used to help interpret the actual value" e.g. units of measurement. But most of the time, even if it's non-repeating with no children, you're supposed to use elements rather than attributes.

One problem here is that attributes are so much more compact (and so often easier to read) than elements that it's tempting to use them in places where you ought to use an element (and many people over time have given in to that temptation). Another problem is that the distinction between attributes and elements is almost never useful. That was the parent comment's point by the looks of things.

> The fundamental issue with XML is its impedance mismatch with common data structures

That's probably part of it, but I think at least as problematic is that it has many features that most of the time you don't need and don't want to have to care about. Things like CDATA (also mentioned by the parent comment), custom entities, external entities, DTDs (which can be inline in XML files so you need to know all about DTDs to understand XML properly). That's why there are all sorts of weird XML vulnerabilities that JSON doesn't have. Did you know you can make an XML file that reads your /etc/passwd file when it's parsed? That is not an issue with JSON.

sergeykish · on Oct 4, 2020

HTML tag and attribute is markup. Strip it and document would be still legible for a human being. Markup is non human part - presentation, semantic web.

Confusion arise once a human observer is lost.

quietbritishjim · on Oct 4, 2020

Thanks, I found this explanation really helpful, and almost obvious in retrospect (as the best explanations often are!).

I had been thinking that all of these extra features that XML have are just a case of massive overengineering that no one would ever need. In fact it's a case of taking something fundamentally meant for text documents with extra markup, as the name implies, and misapplying it to config files and IPC messages which are just not the original domain at all.

sergeykish · on Oct 5, 2020

Thank you.

I think we should draw on XML strength points. People read articles in browser, not plain text. "Add to cart" is just a POST request with id

    curl -d id=foo

yet we have forms and interactivity. Like in literate programming text and data live together, interactive application like a Smalltalk image.

In XML we can separate data from presentation.

    <?xml-stylesheet type="text/css" href="foo.css"?>
    <?xml-stylesheet type="text/xsl" href="bar.xsl"?>
    <root>...

Machine receives data, human receives application with documentation, builder. That's exactly what we have today except UI can be plugged to any stored document. To good to be true.

I think XML was killed by poor usability. Plain text XML, XHTML and XSLT authoring is not fun.

I am trying to uncover it from DOM perspective [1], so far I like it more than Markdown. XHTML and HTML is just a serialization format. HTML is not a good one [2], [3], [4]. XSLT may have nice GUI or compact syntax like RELAX NG.

[1] http://sergeykish.com/live-pages

[2] http://sergeykish.com/script-style-is-cdata-in-html

[3] http://sergeykish.com/pre-newline-ignored-in-html-test

[4] http://sergeykish.com/content-after-html-appended-to-body-in...

tannhaeuser · on Oct 4, 2020

> Did you know you can make an XML file that reads your /etc/passwd file when it's parsed?

Not only can SGML (but not XML on its own) read /etc/passwd, it can format it into fully-tagged markup and then render it into eg an HTML table. Demonstrating what SGML/XML is actually designed for: encoding and authoring semistructured text. This can't be overstated in discussions like these where use cases for config formats, service payload formats, and actual text authoring are all thrown into the same basket when they shouldn't.

Btw: you can parse and canonicalize this new config file format into markup using the same SGML mechanism you'd be using for CSVs like /etc/passwd, namely short references

Btw2: you can skip/ignore markup declarations in XML, including whole declaration sets (DTDs) since these can be recognized using plain greedy regexpes, though you can't ignore entity declarations when actually used in your XML body text

brabel · on Oct 4, 2020

> you need to be an expert to know what such a fundamental feature is for.

No you don't... the parent commenter explained to you what it's for in a simple and concise manner... you chose to not accept that even though you're not an expert in this, and then complains you need to be an expert to do it?!?

quietbritishjim · on Oct 4, 2020

The parent commenter gave an explanation that, yes, was simple and concise, and also good enough for you to believe it (or you already thought that way). But it's also wrong. That just reinforces my point.

(The true difference is explained in sibling comments to yours, by sergeykish and tannhaeuser, if you're interested.)

gortok · on Oct 4, 2020

The parent commenter explained it in a somewhat obtuse way.

I don’t doubt they meant to be clear, but reading it they were not and raised more questions than were answered.

As an example:

Wouldnt attributes be better served as details about the current element?

Wouldn’t elements be better served as “I am a child of the parent”?

Why would I use an attribute as a “non-repeating child” when semantically that doesn’t make sense when looking at the document? The attribute is inside the element’s definition, and seems to me attributes should be used to further describe the element being presented itself, and not be structural or describe itself as a child in any way.

atombender · on Oct 4, 2020

JSON Schema [1] is actually a mature standard now, with decent tooling support, mostly through OpenAPI (formerly Swagger), which extends it with support for endpoints.

It's much simpler to use than XML Schemas, and arguably results in cleaner data models, since it doesn't have anything analogous to XML namespaces that allow for arbitrary mixing of schemas.

[1] https://json-schema.org/

Netch · on Oct 4, 2020

> We had that a decade ago. It was called XML and XML Schema.

It would be true if XML was not full of all this SGML debris like "entities" (really, uncontroller macros), if real schema formats was flexible enough (I needed <c> inside <a> and <c> inside <b> when they totally different), etc.

But when a config reader tool has to deal with 40+-year legacy of enterprise guys wanting to embrace the universe, but all this doesn't allow to control contents without external measures like regexp checking... that simply shuts up facing real world.

thaumaturgy · on Oct 3, 2020

Magento is a popular codebase that made XML-based configuration a fundamental part of its architecture. The results were terrible and caused numerous headaches and countless hours lost to trying to troubleshoot inscrutable configuration issues. The Magento 2 codebase began a shift away from XML for configuration, although it still uses some.

There may be room for an argument that Magento did XML badly (it did many things badly), but I don't believe I've ever seen XML done well.

vbezhenar · on Oct 4, 2020

I love XML configuration in Spring.

imtringued · on Oct 6, 2020

I don't get it. The @Configuration and @Bean annotations are at least 100 times more readable and powerful than whatever garbage people used to write into their xml files to define beans. 20 lines of xml are often equivalent to like 8 lines of Java and each of those Java lines is shorter than the xml equivalent. Repeating closing tags is not very interesting.

kanox · on Oct 4, 2020

We have it today for JSON, it's called JSON Schema and many IDEs support it.

tda · on Oct 4, 2020

Exactly, jsonschema allows one to describe exactly how the json should look like including inter field validation. And with tools like reactjsonschemaform you can generate a ui on top of it for free.

irrational · on Oct 4, 2020

I spent years working with xml, xslt, xml schema. Frankly when I first saw json I thought it was terrific. Nothing has changed my mind since. Why do you feel like it is a huge step backwards?

jeffbee · on Oct 4, 2020

XML is fatally flawed because you can't safely put one XML doc inside another one. Because of this rather fundamental problem, it never was any good for anything, and it never will be.

magicalhippo · on Oct 4, 2020

Sure you can. At work we talk to a system that requires that we do exactly this. The solution they chose is entirely trivial and safe: include the embedded doc as a base64 encoded string...

And yes, I'm being sarcastic.

vbezhenar · on Oct 4, 2020

I don't understand why can't you safely put one XML doc inside another one? Many XML formats are literally built using this feature, like SOAP.

jeffbee · on Oct 4, 2020

SOAP was and is an epic disaster, so that hardly seems like a refutation. The known way to embed an entire XML doc into a SOAP message was to use CDATA, which isn't a general solution because it means the embedded doc can't have ]]> in it anywhere. You could also base64-encoded the included doc.

Both of these solutions and all other known solutions to this problem are, as I'm sure you can see, just awful.

You can't just paste XML in XML because of the <?xml?> thing, because of entities, and because of half a dozen other misfeatures of XML.

jiggawatts · on Oct 4, 2020

You miss the point entirely.

You put XML fragments inside a parent XML document using namespaces.

This is very well supported, and used extensively.

Trying to "escape" XML to nest it in a parent XML document is Wrong with a capital W.

uryga · on Oct 4, 2020

> You put XML fragments inside a parent XML document using namespaces.

could you post or link to an example? i'm not very familiar with advanced XML features

or for a simple example: what would it look like to put `child` into `parent` using namespaces?

  # parent-doc.xml
  <parent>
    <!-- embed here -->
  </parent>

  # child-doc.xml
  <child x=3 y=5/>

jiggawatts · on Oct 4, 2020

Roughly speaking, you can do things like the following:

    <!-- The special XMLNS attribute binds a short alias to a long name -->
    <p:parent xmlns:p="urn:some:unique:string">
        <c:child xmlns:c="urn:some:other:child:name" x=3 y=5>
            <c:subchild> <!-- No need to repeat the fully qualified unique name -->
                <p:tada>You can even interleave!</p:tada>
            </c:subchild>
        </c:child>
     </p:parent>

Note that while this is possible to write by hand, typically namespaces are for documents generated and processed by tools. The XML Schema Definition (XSD) format has full support for namespaces, so you can define documents based on modular chunks. E.g.: you can "import" the SVG namespace into a diagramming XML document format namespace, but restrict its usage to only the child nodes of an "img" tag. Or MathML as the children of "graph" nodes. Both SVG and MathML can potentially import a shared "font" namespace. Or whatever.

In the XML Reader API, each element has a "fully qualified" name that includes the long namespace prefix. If you use the API correctly, your tool can handle nested documents, or gracefully ignore them if it's appropriate.

The fiddly part is making this efficient, i.e.: avoiding a full string comparison against a long URI or URN. You typically have to "register" the namespaces you're interested in, and the API gives you some sort of efficient token instead of a string to use from then on.

I'm not saying it's perfect. Nothing is in XML. It was designed by committee, it brought too much of the legacy SGML baggage with it, but its namespace capabilities are a lot better than nothing at all, in much the same way that C# or Java don't have perfect type systems, but they're superior to loosely typed languages.

sergeykish · on Oct 4, 2020

You don't embed plain text XML in CDATA, right? You escape it

    function escapeXml(unsafe) {
        return unsafe.replace(/[<>&'"]/g, function (c) {
            switch (c) {
                case '<': return '&lt;';
                case '>': return '&gt;';
                case '&': return '&amp;';
                case '\'': return '&apos;';
                case '"': return '&quot;';
            }
        });
    }

Or you convert to the same encoding, strip XML declaration, expand entities. In short work with adequate tools.

malodyets · on Oct 4, 2020

Good for nothing?

Well, except for handling complex content documents like in all ebooks and, in sgml form, all webpages like this one.

tootie · on Oct 4, 2020

XInclude works pretty well.

rudolph9 · on Oct 4, 2020

Check out https://cuelang.org/

verdverm · on Oct 4, 2020

Came here to say the same, Cuelang is by far the best config system and paradigm I have tried. All else seems so last century, though Cuelang has its foundation in NLP systems from last century :]

vitiral · on Oct 6, 2020

Never seen this, it's awesome! Might be an improvement over jsonnet, which was my favorite approach

fn1 · on Oct 4, 2020

Slightly off-topic, but yes, having fail-fast deserialisation is great.

I wrote a json/kotlin-serialisation library once and purposely restricted some json-features to achieve that:

1. Fields can arrive in any order - this is standard

2. Field names are matched case-insensitively - so keyA and keya are the same, because who would use two variables differing only by case. Serialization keeps the original casing of the name.

3. Missing fields throw an error. if they are nullable, they have to be explicitly set to null - so that you can be sure the serialization side upgraded to the latest version of a protocol if a field was added, and things don't just work by chance.

4. Nullable strings are not coerced to empty strings or anything like it. Kotlin is null-safe, so if it's a string, it has to be "". If it's, for whatever reason, a nullable string, you can set it to null.

5. Enums are also serialized case-insensitively - so you an write "keyA": "eNumVaLuE" if you want - typos should not break the code here, no on would you two enums differing only by case. IIRC booleans could also be TRUE, tRuE, truE etc. (but NOT t or f, or yes or no, or 0 or 1 or empty).

6. Superfluous properties are silently ignored.

These rules were a great tradeoff for quick development, mixing languages and having fail-fast behavior with a stable protocol.

(https://medium.com/@fabianzeindl/generated-json-serialisatio...)

michaelmior · on Oct 3, 2020

JSON schemas are available for a number of JSON/YML config formats from JSON Schema Store[0]

[0] https://www.schemastore.org/json/

RobIII · on Oct 3, 2020

> You know what I want? Schemas.

I can see this work perfectly fine in typed languages like C#: `NestedText.Deserialize<T>("nestedtext")` where the deserialize method handles the actual mapping of nested text objects to `T` by providing the deserializer a class / classes that handles the string -> scalar(s) mapping for the given T. That would, sort of, function as a Schema.

I think the only thing, from glancing over the project, that would need to be supported to make this really useful is nested lists/dictionaries. I don't see how this can be done but maybe I'm missing it.

setr · on Oct 4, 2020

You can always do that, defining the schema in the client to produce sensible checks, even with JSON. The problem is that wherever the spec is underspecified is another place where two different clients can deserialize differently, and both be correct.

And the problem with stringly typed systems is that everything is underspecified

zmj · on Oct 3, 2020

Protobufs have a text representation.

andrewg · on Oct 3, 2020

Yes indeed - it's actually pretty nice. You just define a message for your configuration schema:

  message Config {
    repeated Server server = 1;
  }

  message Server {
    string address = 1;
    int32 port = 2;
    bool standby = 3;
  }

And then you use the text representation in a config file:

  # main instance
  server { address: "127.0.0.1" port: 4567 }
  # backup instance
  server { address: "127.0.0.1" port: 9876 standby: true }

And load it into a message instance:

  Config config;
  google::protobuf::TextFormat::ParseFromString(input, &config);

atombender · on Oct 4, 2020

Unfortunately, it is undocumented and has no formal spec, and this appears to be intentional, with no plans for improvement: https://github.com/protocolbuffers/protobuf/issues/3755.

kortex · on Oct 4, 2020

Wow, I use pb's a ton and didn't know this. I'd upvote this twice if I could!

It looks oddly like HCL. I wonder...

quietbritishjim · on Oct 4, 2020

As of protobuf 3 they also have a canonical JSON representation, which you can access from all the supported languages.

tootie · on Oct 4, 2020

You want XML from 15 years ago? Yes, me too. Schemas and includes.

oblio · on Oct 4, 2020

I've used XML. I don't want namespaces, I don't want the verbosity, I don't want entities, I don't want the security vulnerabilities.

I should have mentioned that I want something simple and readable.

already_exists · on Oct 3, 2020

Like in Windows where you configure by clicking check boxes that can get disabled if invalid, with tooltips explaining what they do, additional help if you press F1, etc. ?

It would be nice if we had such tools.

pas · on Oct 3, 2020

There's JSONSchema, and there are GUIs for handling/inspecting them.