Parsing Techniques – Arguments Against the Cthulhu Way

language-agnosticparsing

I have been assigned the task of implementing a Domain Specific Language for a tool that may become quite important for the company. The language is simple but not trivial, it already allows nested loops, string concatenation, etc. and it is practically sure that other constructs will be added as the project advances.

I know by experience that writing a lexer/parser by hand -unless the grammar is trivial- is a time consuming and error prone process. So I was left with two options: a parser generator à la yacc or a combinator library like Parsec. The former was good as well but I picked the latter for various reasons, and implemented the solution in a functional language.

The result is pretty spectacular to my eyes, the code is very concise, elegant and readable/fluent. I concede it may look a bit weird if you never programmed in anything other than java/c#, but then this would be true of anything not written in java/c#.

At some point however, I've been literally attacked by a co-worker. After a quick glance at my screen he declared that the code is uncomprehensible and that I should not reinvent parsing but just use a stack and String.Split like everybody does. He made a lot of noise, and I could not convince him, partially because I've been taken by surprise and had no clear explanation, partially because his opinion was immutable (no pun intended). I even offered to explain him the language, but to no avail.

I'm positive the discussion is going to re-surface in front of management, so I'm preparing some solid arguments.

These are the first few reasons that come to my mind to avoid a String.Split-based solution:

  • you need lot of ifs to handle special cases and things quickly spiral out of control
  • lots of hardcoded array indexes makes maintenance painful
  • extremely difficult to handle things like a function call as a method argument (ex. add( (add a, b), c)
  • very difficult to provide meaningful error messages in case of syntax errors (very likely to happen)
  • I'm all for simplicity, clarity and avoiding unnecessary smart-cryptic stuff, but I also believe it's a mistake to dumb down every part of the codebase so that even a burger flipper can understand it. It's the same argument I hear for not using interfaces, not adopting separation of concerns, copying-pasting code around, etc. A minimum of technical competence and willingness to learn is required to work on a software project after all. (I won't use this argument as it will probably sound offensive, and starting a war is not going to help anybody)

What are your favorite arguments against parsing the Cthulhu way?*

*of course if you can convince me he's right I'll be perfectly happy as well

Best Answer

The critical difference between the two approaches is, that the one he considers to be the only correct way is imperative and yours is declarative.

  • Your approach explicitly declares rules, i.e. the rules of the grammar are (almost) directly encoded in your code, and the parser library automatically transforms raw input into parsed output, while taking care of state and other things that are hard to handle. Your code is written within one single layer of abstraction, which coincides with the problem domain: parsing. It's reasonable to assume parsec's correctness, which means the only room for error here is, that your grammar definition is wrong. But then again you have fully qualified rule objects and they are easily tested in isolation. Also it might be worth noting, that mature parser libraries ship with one important feature: error reporting. Decent error recovery when parsing went wrong is not trivial. As proof, I invoke PHP's parse error, unexpected T_PAAMAYIM_NEKUDOTAYIM :D

  • His approach manipulates strings, explicitly maintains state and lifts up the raw input manually to parsed input. You have to write everything yourself, including error reporting. And when something goes wrong, you are totally lost.

The irony consist in that the correctness of a parser written with your approach is relatively easily proven. In his case, it is almost impossible.

There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.

C. A. R. Hoare

Your approach is the simpler one. All it precludes is for him to broaden his horizon a bit. The result of his approach will always be convoluted, no matter how broad your horizon.
To be honest, it sounds to me, that the guy is just an ignorant fool, who is suffering from the blub syndrome, arrogant enough to assume you are wrong and yell at you, if he doesn't understand you.

In the end however, the question is: who is going to have to maintain it? If it's you, then it's your call, no matter what anybody says. If it's going to be him, then there's only two possibilities: Find a way to make him understand the parser library or write an imperative parser for him. I suggest you generate it from your parser structure :D

Related Topic