How to use indentation as block delimiters with bison and flex

bisoncompiler-constructionflex-lexer

I wounder how to implement indentation as block delimiters in bison + flex. Just like in python. I'm writing my own programming language ( mostly for fun, but I intend to use it together with a game engine ), I'll try to come up with something special that minimizes boilerplate and maximizes dev speed.

I have already written an compiler ( actually a `langToy' to Nasm translator ) in C, but failed. By some reason it was only able to handle one string in the whole source file ( well, I had been awake for more than 48 hours – so… You know, brain meltdown ).

I don't know if curly brackets and/or begin -> end are easier to implement ( I don't have problem doing that ) or if it's just my brain that locks up.

Thanks in advance!


Update: Okay, I have no clue about how to do it with flex. I have problems with returning multiple DEDENTs to the parser. Flex/Bison are relatively new to me.


Update 2:
This is the flex-file I've come up with so far; it does not quite get it:

%x t
%option noyywrap

%{
  int lineno = 0, ntab = 0, ltab = 0, dedent = 0;
%}

%%

<*>\n  { ntab = 0; BEGIN(t); }
<t>\t  { ++ntab; }
<t>.   { int i; /* my compiler complains not c99 if i use for( int i=0... */
         if( ntab > ltab )
           printf("> indent >\n");
         else if( ntab < ltab )
           for( i = 0; i < ltab - ntab; i++ )
             printf("< dedent <\n");
         else
           printf("=        =\n");

         ltab = ntab; ntab = 0;
         BEGIN(INITIAL);
         /* move to next rule */
         REJECT;}
.    /* ignore everything else for now */

%%

main()
{
  yyin = fopen( "test", "r" );
  yylex();
}

You can try to play around with it, maybe you sees what I'm missing. returning multiple dedents would be an ease in Haxe ( return t_dedent( num ); ).

This code doesn't always match the indents/dedents correctly.


Update 3: I think that I will give up hope on flex and do it my own way, If anyone knows how to do it in flex I would be happy to hear it anyways.

Best Answer

What you need to do is have flex count the amount of whitespace at the beginning of every line and insert an appropriate number of INDENT/UNINDENT tokens for the parser to use to group things. One question is what you want to do about tabs vs spaces -- do you just want to have them be equivalent with fixed tab stops, or do you want to require indenting to be consistent (so if one line begins with a tab and the next with a space, you signal an error, which is probably a little harder).

Assuming you want fixed 8-column tabstops, you can use something like

%{
/* globals to track current indentation */
int current_line_indent = 0;   /* indentation of the current line */
int indent_level = 0;          /* indentation level passed to the parser */
%}

%x indent /* start state for parsing the indentation */
%s normal /* normal start state for everything else */

%%
<indent>" "      { current_line_indent++; }
<indent>"\t"     { current_line_indent = (current_line_indent + 8) & ~7; }
<indent>"\n"     { current_line_indent = 0; /*ignoring blank line */ }
<indent>.        {
                   unput(*yytext);
                   if (current_line_indent > indent_level) {
                       indent_level++;
                       return INDENT;
                   } else if (current_line_indent < indent_level) {
                       indent_level--;
                       return UNINDENT;
                   } else {
                       BEGIN normal;
                   }
                 }

<normal>"\n"     { current_line_indent = 0; BEGIN indent; }
... other flex rules ...

You do have to make sure you start the parse in indent mode (to get the indentation on the first line).

Related Topic