JSON without quotes for keys

jsonlinuxserialization

I need a textual human readable format which is reasonably compact and version-control friendly to serialize a persistent memory heap. My Bismon system (GPLv3) has such a format (it is textual, human-readable, git-friendly, occasionally editable under emacs, but specific to Bismon. It is usually loaded and dumped by the bismon program.). That format is documented in the Bismon technical draft report (please skip the first few pages for H2020 bureaucracy), chapter §2 Data and its persistence in Bismon. For an example of a file using that format, look into Bismon's store1.bmon (and other store*.bmon files).

I am considering that such a format might better be JSON like (but I am not sure). Just because many developers are familiar with JSON.

The JSON format requires object keys to be quoted strings, e.g. { "x":1, "y":2 }.

I am thinking of an application (maybe RefPerSys, which conceptually could become a Bismon done right) where a JSON notation is very useful (and where a human-readable textual file format is essential), but where we deal with only JSON objects whose keys are always C-identifiers like (starts with an latin letter, contains letters, digits, underscores). However, that application may need to parse perhaps a million of such objects, and parsing performance does matter a little, and more significantly file space will matter a lot (since for {x:1,y:2} only 9 bytes are needed, but {"x":1,"y":2} requires 13 bytes, i.e. about 40% more space). My exact goal is any textual, human-readable, quickly and easily machine-parsable, tree-structured, compact version-controllable (i.e. git friendly) format. Most of the time it is dumped and loaded by the same application. Occasionally, I may need to glance into it with some editor, and perhaps even to change a small bit of it with that editor. I am not imagining needing a generic JSON transformer or processor like jq.

But my feeling is that, when the keys are C identifiers like (and different of the three JSON keywords: true, false, null), the quotes could be avoided, like for example in {x:1, y:2}. I am also understanding that some JavaScript implementations might be able to parse that.

I am obviously guessing that parsing {x:1,y:2} is faster than parsing { "x":1, "y":2 } or even {"x":1,"y":2} (simply because the textual representation is slightly shorter) especially when we deal with millions of such JSON objects.

In a Bismon or RefPerSys like system, a possible example could be:

{ oid: _7T9OwSFlgov_0wVJaK1eZbn,
  name: word,
  mtime: 1502296590.98,
  class: _7T9OwSFlgov_0wVJaK1eZbn,
  attrs: [ { at:  _01h86SAfOfg_1q2oMegGRwW, va: "for words" } ]
}

(currently, in commit ff19f15ecd2f647d42 of Bismon, the equivalent is in lines 1011 and following of store1.bmon; the | there delimits comments and these comments like |=word| there could be removed, since skipped at parsing; the comments in these dumped and loaded *.bmon files will be removed once Bismon is stable enough)

In a few years, I could have many millions of such JSON objects. The bismon program is a server, started every morning (it then loads its persistent state in textual format) and ended every evening (it then dumps its persistent state in textual format). So taking one or a few minutes to load, and one or a few minutes to dump, a large persistent state is definitely acceptable. But the git commited disk size of that textual persistent state is more a concern (since both gitlab and github are unhappy with large textual files).

Since humans will very rarely look into the textual persistent store (as rarely as a compiler writer is looking into generated assembler, or as rarely as the sqlite team is looking into huge *.sql files), I value compactness and git-friendliness over readability. So I could even consider something as compact as:

 {oid:_7T9OwSFlgov_0wVJaK1eZbn,nam:word,mti:1502296590.98,
  cla:_7T9OwSFlgov_0wVJaK1eZbn,
  att:[{a:_01h86SAfOfg_1q2oMegGRwW,v:"for words"}]}

or even the same in a single line. However, being occasionally able to git diff is valuable.

In other words, the JSON model is very nice to me. But its concrete syntax less so. Patching most JSON libraries to such a simplified syntax is very probably trivial work.

This brings three questions:

  • what is the exact name of such common variant of a JSON format with such only C-identifier keys. (It seems that the YAML specification still suggests it to be JSON, but it is not exactly JSON but something very close to it). While that format is not exactly JSON, it is very JSON like (and the conversion to exact JSON is trivial, assuming a parsing library exists for it).

  • what are open source C or C++ libraries dealing with that format (for Linux/x86-64)? I am guessing that adapting the source code of JSON parsing libraries to that special case is trivial. But I really want to avoid forking one.

  • can recent Web browsers (Firefox or Chrome) efficiently parse {x:1,y:2} as JSON? I tend to believe that yes (since that notation is exactly compatible with JavaScript).

This GIT and YAML answer could be relevant.

And I just discovered HJSON which might be what I want.


PS. I can avoid any set of given C keywords or identifiers in the keys, if I have such a list of forbidden or reserved names. In particular, I will avoid every JavaScript or C++ keyword (like for or auto or while) for key names. The only platform I care about is Linux (currently x86-64).

PPS. Another application where human-readable textual file format is essential is my Bismon project (a persistent reflexive monitor for static source code analysis, under GPLv3+ license), and I am explaining why in the Bismon draft report (that is a H2020 draft deliverable, so please skip the first few pages for H2020 bureaucracy). I have chosen in Bismon to have my own human-readable textual format, but that particular choice might have been a big mistake, and I probably should have used some JSON-like one (or even JSON itself), like suggested in this question. The RefPerSys project might become a "Bismon done right" project. And the persistent data of Bismon (e.g. its store2.bmon textual file) is git-version controlled and occasionally hand-edited (but most of the time, loaded and dumped by bismon itself). So, yes, there are cases where textual data cares about a 20% space difference: for both gitlab and github, a textual version-controlled file of 700Kbytes or of 1.1Mbytes is presented very differently: in Bismon, its store2.bmon file is already shown only in raw format.

Best Answer

Keys in a JSON dictionary are not quoted strings, they are strings. Strings in JSON start with a quote, continue with escaped or unescaped characters, and end with a string. You can’t have different JSON. You can define a different exchange format, but it won’t be JSON and you are completely on your own.