Clarification about Grammars , Lexers and Parsers

grammarlanguage-designlexerparsing

Background info (May Skip): I am working on a task we have been set at uni in which we have to design a grammar for a DSL we have been provided with. The grammar must be in BNF or EBNF. As well as other thing we are being evaluated on the Lexical rules in the grammar and the Parsing rules – such as if rules are suitable for the language subset, how comprehensive these rules are, how clear the rules are ect.

What I don't understand is if these rules are covered in a grammar defined in BNF (it's a new topic for us).

The Question: Does a grammar for a given language that has been defined in either BNF or EBNF contain / provide rules for Lexical Analysis and/or Parsing? (or do these have to be specified else-where?)

Also what would be considered a lexical rule? And what would be considered a parsing rule?

Best Answer

Yes, a BNF grammar contains all the rules you need for lexical analysis and parsing. The difference between the two is a little fuzzy. A good example of a lexical rule in EBNF would be:

number = [ "-" ], digit, { digit } ;
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;

Usually lexers can be implemented using relatively simple code. You can search a string for the next space, then see if your result starts with an optional "-", contains at least one digit after that, and contains only digits after that. Lexers used to almost always be a separate step, but are usually lumped together with the parser nowadays. Hence the fuzziness.

A parser rule would use the number non-terminal to make something larger, like the following addition expression.

add = number, "+", number

Even though they are mixed up in the same file, your professor is still going to want to see a clear distinction between "lexer" rules and "parser" rules. For example, don't do this:

add = {"0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" }, "+",
      {"0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" }

Not only is that error prone, it's difficult to read and difficult to implement.

Related Topic