Language Design – Handling Two Binary Operators of Same Precedence

language-designparsing

Are there any programming (or scripting) language (or some domain specific language) having two binary operators opl and opr of same precedence with opl being left-associative and opr being right-associative?

^{(I cannot find such an example, but I am trying to code some parser general enough to handle that weird case)}

How would expressions of the form x opl y opr z or x opr y opl z be parsed? And more generally with even more operands?

Best Answer

Here are three languages that let you define your own operators, which do two and a half different things! Haskell and Coq both disallow these sorts of shenanigans – but differently – while Agda allows this sort of mixing of associativities.

First, in Haskell, you simply aren't allowed to do this. You can define your own operators and give them the precedence (from 0–9) and associativity of your choice. However, the Haskell Report disallows you from mixing associativities:

Consecutive unparenthesized operators with the same precedence must both be either left or right associative to avoid a syntax error. [Haskell 2010 Report, Ch. 3]

So, in GHC, if we define a left-associative (infixl) operator <@ and right-associative operator @> at the same precedence level – let's say 0 – then evaluating x <@ y @> z gives the error

Precedence parsing error
cannot mix ‘<@’ [infixl 0] and ‘@>’ [infixr 0] in the same infix expression

(In fact, you can also declare an operator to be infix but non-associative, like ==, so that x == y == z is a syntax error!)

On the other hand, there's the dependently-typed language/theorem prover Agda (which, admittedly, is considerably less mainstream). Agda has some of the most malleable syntax of any language I know, supporting mixfix operators: the standard library contains the function

if_then_else_ : ∀ {a} {A : Set a} → Bool → A → A → A

which, when called, is written

if b then t else f

with the arguments filling in the underscores! I mention this because this means that it has to support incredibly flexible parsing. Naturally, Agda also has fixity declarations (although its precedence levels range over arbitrary natural numbers, and are typically in 0–100), and Agda does permit you to mix operators of the same precedence but different fixities. I can't, however, find information about this in the documentation, so I had to experiment.

Let's reuse our <@ and @> from above. In the two simple cases, we have

x <@ y @> z parsing as x <@ (y @> z); and
x @> y <@ z parsing as (x @> y) <@ z.

I think what Agda does is group the line into "left associative" and "right associative" chunks, and – unless I'm thinking about things wrong – the right-associative chunk gets "priority" in grabbing the adjacent arguments. So that gives us

a <@ b <@ c @> d @> e @> f <@ g

parsing as

(((a <@ b) <@ (c @> (d @> (e @> f)))) <@ g

However, despite my experiments, I guessed wrong the first time I wrote that out, which might be instructive :-)

(And Agda, like Haskell, has non-associative operators, which correctly give parse errors, so it would be possible for mixed associativities to result in a parse error too.)

Finally, there's the theorem-prover/dependently-typed language Coq, which has even more flexible syntax than Agda because its syntax extensions are actually implemented by giving specifications for the new syntactic constructs and then rewriting them into the core language (vaguely macro-like, I suppose). In Coq, the list syntax [1; 2; 3] is an optional import from the standard library. New syntaxes can even bind variables!

Once again, in Coq, we can define our own infix operators and give them precedence levels (from 0–99, mostly) and associativities. However, in Coq, each precedence level can only have one associativity. So if we define <@ as left-associative and then try to define @> as right-associative at the same level – say, 50 – we get

Error: Level 50 is already declared left associative while it is now expected to be right associative

Most operators in Coq are on levels that are divisible by 10; if I've had associativity issues (these level associativities are global), I've generally just bumped the level by one in either direction (usually up).

Best Answer

Related Solutions

Parsing Techniques – Arguments Against the Cthulhu Way

Best Way to Parse a File in PHP – Web Development Guide

Related Topic