What does “context-free” mean in the term “context-free grammar”

compilergrammarprogramming-languages

Given the amount of material that tries to explain what a context-free grammar (CFG) is, I found it surprising that very few (in my sample, less than 1 in 20) give an explanation on why such grammars are called "context-free". And, to my mind, none succeeds in doing so.

My question is, why are context-free grammars called context-free? What is "the context"? I had an intuition that the context could be other language constructs surrounding the currently analyzed construct, but that seems not to be the case. Could anyone provide a precise explanation?

Best Answer

It means all of its production rules have a single non-terminal on their left hand side.

For example, this grammar which recognizes strings of matched parentheses ("()", "()()", "(())()", ...) is context-free:

S → SS
S → (S)
S → ()

The left-hand side of every rule consists of a single non-terminal (in this case it's always S, but there could be more.)

Now consider this other grammar which recognizes strings of the form {a^n b^n c^n : n >= 1} (e.g. "abc", "aabbcc", "aaabbbccc"):

S  → abc
S  → aSBc
cB → WB
WB → WX
WX → BX
BX → Bc
bB → bb

If the non-terminal B is preceded by the terminal/literal character c, you rewrite that term to WB but if it's preceded by b, you expand to bb instead. This is presumably what the context-sensitivity of context-sensitive grammars is alluding to.

A context-free language can be recognized a push-down automaton. Whereas a finite state machine makes use of no auxiliary storage, i.e. its decision is based only on its current state and input, a push-down automaton also has a stack at its disposal and can peek at the top of the stack for taking decisions.

To see that in action, you can parse nested parentheses by moving left to right and pushing a left parentheses onto a stack each time you encounter one, and popping each time you encounter a right parentheses. If you never end up trying to pop from an empty stack, and the stack's empty at the end of the string, the string is valid.

For a context-sensitive language, a PDA isn't enough. You'll need a linear-bounded automaton which is like a Turing Machine whose tape isn't unlimited (though the amount of tape available is proportional to the input). Note that that describes computers pretty well - we like to think of them as Turing Machines but in the real world you can't grab arbitrarily more RAM mid-program. If it's not obvious to you how an LBA is more powerful than a PDA, an LBA can emulate a PDA by using part of its tape as a stack, but it can also choose to use its tape in other ways.

(If you're wondering what a Finite State Machine can recognize, the answer is regular expressions. But not the regexes on steroids with capture groups and look-behind/look-ahead you see in program languages; I mean the ones you can build with operators like [abc], |, *, +, and ?. You can see that abbbz matches regex ab*z just by keeping your current position in the string and regex, no stack required.)

Related Solutions

The difference between syntax and grammar

A grammar is a set of rules that define the syntax for a particular language.

When people are talking specifically about a parser (especially one generated with a parser generator like yacc, Byacc, ANTLR, etc.), they may do a bit more hair-splitting, and talk specifically about those syntactical rules that are encoded using the generator's rules, vs. those parts that are enforced separately by code attached to a rule. For example, in C when you define an array, the size you specify for the array must be strictly positive (not zero). The grammar rule might basically say something like:

typename var_name '[' unsigned_int ']'

...and then separately, there would be a bit of code to check that the unsigned_int was non-zero. In this case, it could make some sense to talk about the requirements of the syntax and the grammar separately from each other, with the two having slightly different requirements (that, enforced together, we presume fit the requirements of the language itself).

Java Parsing – Better Design Alternatives for Special Cases

Predictive parser you selected (LL(k)) means you will have to solve left-recursion problems. Algorithm for solving direct and indirect recursions is clearly described on wikipedia:

http://en.wikipedia.org/wiki/Left_recursion

Some info can be found in posts here on StackOverflow:

https://stackoverflow.com/questions/2652060/removing-left-recursion https://stackoverflow.com/questions/2999755/removing-left-recursion-in-antlr https://stackoverflow.com/questions/4994036/left-recursion-elimination

In human language (non-scientific :) "left-recursion problem" means you can't endlessly go into recursion with non-terminal (A -> Ab) again and again. At some time you HAVE TO feed parser algorithm with a terminal to breake a loop.

In BNF this could look like:

Recursion Problem:

NT: NT T
NT: T

One solution:

NT: T NT2
NT2: T NT2
NT2:

For your grammar this could look like:

DataType:
    PrimitiveDataType ArrayDimensions
 |  ComplexDataType ArrayDimensions

ArrayDimensions:
    [] ArrayDimensions
 |

If your parser generator doesn't allow empty productions and/or if you want to process array types separately, try something like this:

DataType:
    DataTypeName
 |  ArrayDataType

ArrayDataType:
    DataTypeName ArrayDimensions

DataTypeName:
    PrimitiveDataType
 |  ComplexDataType

ArrayDimensions:
    []
 |  [] ArrayDimensions

Best Answer

Related Solutions

The difference between syntax and grammar

Java Parsing – Better Design Alternatives for Special Cases

Related Topic