I don't think you'll find a simple solution to handling these types of parsing errors in the lexer.
I would keep the lexer (flex/lex) as dumb as possible, it should just provide a stream of basic tokens (identifiers, keywords, etc...) and have the parser (yacc/bison) do the error detection. In fact it is setup for exactly what you want, with a little restructuring of your approach...
In the lexer (parser.l), keep it simple (no eol/newline handling), something like (isn't full thing):
}%
/* I don't recall if the backslashify is required below */
SINGLE_QUOTE_STRING \'.*\'
DOUBLE_QUOTE_STRING \".*\"
%%
{SINGLE_QUOTE_STRING} {
yylval.charstr = copy_to_tmp_buffer(yytext); // implies a %union
return STRING;
}
{DOUBLE_QUOTE_STRING} {
yylval.charstr = copy_to_tmp_buffer(yytext); // implies a %union
return STRING;
}
\n return NEWLINE;
Then in your parser.y file do all the real handling (isn't full thing):
command:
error NEWLINE
{ yyclearin; yyerrorok; print_the_next_command_prompt(); }
| chdir_command STRING NEWLINE
{ do_the_chdir($<charstr>2); print_the_next_command_prompt(); }
| ... and so on ...
There are two things to note here:
- The shift of things like NEWLINE to the yacc side so that you can determine when the user is done with the command then you can clear things out and start over (assuming you have "
int yywrap() {return 1;}
" somewhere). If you try to detect it too early in flex, when do you know to raise an error?
- chdir isn't one command (unless it was sub ruled and you just didn't show it), it now has chdir_command STRING (the argument to the chdir). This makes it so that the parser can figure out what went wrong, you can then yyerror if that directory doesn't exist, etc...
This way you should get something like (guessing what chdir might look like):
cd 'some_directory
syntax error
cd 'some_directory'
you are in the some_directory dude!
And it is all handled by the yacc grammer, not by the tokenizer.
I have found that keeping flex as simple as possible gives you the most ***flex***ibility. :)
I found a solution after tinkering a bit. So the problems arise from a circular dependency between flex and bison.
The parser generated call flex routine in this way:
yychar = yylex (&yylval, scanner);
So in the bison input we must include the scanner header file lex.yy.h
and it's define as:
int yylex (YYSTYPE * yylval_param ,yyscan_t yyscanner);
But YYSTYPE
is defined inside the parser header parser.tab.h
, in my case i said to bison that my type will be double
:
typedef double YYSTYPE;
Now the solution. Inside scanner.l
you must include the parser headers so that flex can return correct tokens (nothing changed).
But inside the parser.y
you must include both headers file, if you include only the lex.yy.h
it will complain:
lex.yy.h:282:1: error: unknown type name ‘YYSTYPE‘
because YYSTYPE
is defined inside parser.tab.h
. And finally, for some reason, the bison parser doesn't know what yyscan_t
even including the lexer header:
error: unknown type name ‘yyscan_t’
One workaround is defining it to void:
%lex-param {void *scanner}
%parse-param {void *scanner}
see yyscan_t
definition: flex yyscan_t
So here is the final result:
scanner.l
%{
#include <stdio.h>
#include "parser.tab.h"
%}
%option 8bit reentrant bison-bridge
%option warn noyywrap nodefault
%option header-file="lex.yy.h"
//rest of the scanner
parser.y
%{
#include <stdio.h>
#include "parser.tab.h"
#include "lex.yy.h"
void yyerror(yyscan_t scanner, char const *msg);
%}
%define api.value.type {double}
%define parse.error verbose
%define api.pure
%lex-param {void *scanner}
%parse-param {void *scanner}
//rest of the input
main.c
#include <stdio.h>
#include "parser.tab.h"
#include "lex.yy.h"
int main(void) {
yyscan_t scanner;
yylex_init(&scanner);
yyset_in(stdin, scanner);
yyparse(scanner);
yylex_destroy(scanner);
return 0;
}
Best Answer
There are some differences between Lex and Flex, but you have to be abusing Lex to run into the problems with Flex. (I have a program which abuses Lex and doesn't work under Flex, therefore.) This is primarily in the area of input lookahead; in Lex, you can provide your own input code and modify the character stream; Flex won't let you do that.
Yacc and Bison are pretty closely compatible, though Bison has some extra tricks it can do.
You probably can't find legitimate copies of (the original, AT&T versions of) Lex and Yacc to install on Ubuntu. I wouldn't necessarily say it is impossible, but I'm not aware of such. Flex and Bison are readily available and are equivalent for most purposes. You may also find various alternative and approximately equivalent programs from the BSD world.
Lex and Yacc are maintained by the Unix SVRx licencees - companies such as IBM (AIX), HP (HP-UX) and Sun (Solaris) have modified versions of Lex and Yacc at their command. MKS also provides MKS Lex and MKS Yacc; however, the Yacc at least has some non-standard extensions.
Flex and Bison are free. (AT&T) Lex and Yacc are not.