Backus-Naur Form – Expressing Repetition with Recursive Production Definitions

designgrammarlanguage-agnostic

When reading grammars defined using Backus–Naur form (BNF), I've noticed that the grammars never seem to use an explicit repetition symbol, such as * or +. This is opposed to Extend Backus–Naur form however, which does seem to have explicit repetition symbols. The Wikipeida article on EBNF mentions this, specifically under the section Conventions:

  1. The normal character representing each operator of Extended BNF and its implied precedence is (highest precedence at the top):'

    * repetition-symbol
    - except-symbol
    , concatenate-symbol
    | definition-separator-symbol
    = defining-symbol
    ; terminator-symbol
    . terminator-symbol
    

Although I have studied BNF before, I've never truly understood how BNF expresses much of the functionality easily captured in EBNF, particularly repetition.

After observing BNF grammars more closely, I believe that repetition is captured by using recursive definitions in productions. For example:

<number> ::= <digit> | <number> <digit>;
<digit> ::= 1 | 2 ... | 8 | 9;

In the above grammar, a number is defined as "a single digit or a number followed by a digit", or in other words "one or more digits". And while the above works fine for simple repetition, it seems to become more challenging to express more complex repetition patterns. For example, take the file_input production definition from the Python (3.6) grammar:

file_input: (NEWLINE | stmt)* ENDMARKER

How exactly would the above be translated to "classic" BNF (disregarding the syntax differences) ? I'm unsure of how the above would be translated, while correctly preserving the meaning expressed.

Do BNF grammars use recursive production definitions to express repetition? If so, how does the method deal with examples like the one above?

Best Answer

EBNF and BNF are equivalent; any EBNF grammar can be converted to a BNF grammar.

For your example:

file_input: (NEWLINE | stmt)* ENDMARKER

file_input: <statements> ENDMARKER
<statements>: (<statements> (NEWLINE | stmt)) |
              empty

There are some stack overflow questions about it, but I found this helpful, especially the summary at the end.