Programming Languages – What is Left Recursion in Layman’s Terms

definitionparsingprogramming-languages

According to one page on code.google.com, "left recursion" is defined as follows:

Left recursion just refers to any recursive nonterminal that, when it produces a sentential form containing itself, that new copy of itself appears on the left of the production rule.

Wikipedia offers two different definitions:

  1. In terms of context-free grammar, a non-terminal r is left-recursive if the left-most symbol in any of r’s productions (‘alternatives’) either immediately (direct/immediate left-recursive) or through some other non-terminal definitions (indirect/hidden left-recursive) rewrites to r again.

  2. "A grammar is left-recursive if we can find some non-terminal A which will eventually derive a sentential form with itself as the left-symbol."

I'm just barely starting out with language creation here, and I'm doing it in my spare time. However when it comes down to selecting a language parser, whether left recursion is supported by this parser or that parser is an issue that immediately comes up front and center. Looking up terms like "sentential form" only leads to further lists of jargon, but the distinction of "left" recursion almost has to be something very simple. Translation please?

Best Answer

A rule R is left-recursive if, in order to find out whether R matches, you first have to find out whether R matches. This happens when R appears, directly or indirectly, as the first term in some production of itself.

Imagine a toy version of the grammar for mathematical expressions, with only addition and multiplication to avoid distraction:

Expression ::= Multiplication '+' Expression
            || Multiplication

Multiplication ::= Term '*' Term
                 || Term

Term ::= Number | Variable

As written, there's no left-recursion here — we could pass this grammar to a recursive-descent parser.

But suppose you tried to write it this way:

Expression ::= Expression '*' Expression
            || Expression '+' Expression
            || Term

Term ::= Number | Variable

This is a grammar, and some parsers can cope with it, but recursive descent parsers and LL parsers can't — because the rule for Expression begins with Expression itself. It should be obvious why in a recursive-descent parser this leads to unbounded recursion without actually consuming any input.

It doesn't matter whether the rule refers to itself directly or indirectly; if A has an alternative that starts with B, and B has an alternative that starts with A, then A and B are both indirectly left-recursive, and in a recursive-descent parser their matching functions would lead to endless mutual recursion.