Php – How to write a command interpreter/parser

algorithmscommand lineparsingperlPHP

Problem: Run commands in the form of a string.

  • command example:

    /user/files/ list all;
    equivalent to:
    /user/files/ ls -la;

  • another one:

    post tw fb "HOW DO YOU STOP THE TICKLE MONSTER?;"

equivalent to:
post -tf "HOW DO YOU STOP THE TICKLE MONSTER?;"

Current solution:

tokenize string(string, array);

switch(first item in array) {
    case "command":
        if ( argument1 > stuff) {
           // do the actual work;
        }
}

The problems I see in this solution are:

  • No error checking other than nested ifs-else inside each case. The
    script becomes very big and hard to mantain.
  • Commands and responses are hardcoded.
  • No way of knowing if flags are correct or missing parameters.
  • Lack of intelligence to suggest "you might wanted to run $command".

And the last thing I can't address is synonyms in different encodings, example:

case command:
case command_in_hebrew:
    do stuff;
break;

The last one might be trivial, but well, what I want to see is the solid fundations of this kind of program.

I'm currently programming this in PHP but might do it in PERL.

Best Answer

Let me admit frankly, building parser is a tedious job and comes close to compiler technology but building one would turn out to be a good adventure. And a parser comes with interpreter. So you got to build both.

A quick introduction to parser and interpreters

This is not too technical. So experts don't fret at me.

When you feed some input into a terminal, the terminal splits the input into multiple units. The input is called expression and the multiple units are called tokens. These tokens can be operators or symbols. So if you enter 4+5 in a calculator, this expression gets split into three tokens 4,+,5. The plus is considered an operator while 4 and 5 symbols. This is passed to a program (consider this as an interpreter) which contains the definition for the operators. Based on the definition (in our case, add), it adds the two symbols and returns the result to the terminal. All compilers are based on this technology. The program that splits an expression into multiple tokens is called a lexer and the program that converts these tokens into tags for further processing and execution is called parser.

Lex and Yacc are the canonical forms for building lexers and parsers based on BNF grammar under C and it is the recommended option. Most parsers are a clone of Lex and Yacc.

Steps in building a parser/intrepreter

  1. Classify your tokens into symbols, operators and keywords (keywords are operators)
  2. Build your grammar using the BNF form
  3. Write parser functions for your operations
  4. Compile it an run as a program

So in the above case your of addition tokens would be any digits and a plus sign with definition of what to do with the plus sign in the lexer

Notes and Tips

  • Choose a parser technique that evaluates from left to right LALR
  • Read this dragon book on Compilers to get a feel of it. I personally haven't finished the book
  • This link would give a super-fast insight into Lex and Yacc under Python

A simple approach

If you just need a simple parsing mechanism with limited functions, turn your requirement into a Regular Expression and just create a whole bunch of functions. To illustrate, assume a simple parser for the four arithmetic functions. So you would be the calling the operator first and then the list of functions (similar to lisp) in the style (+ 4 5) or (add [4,5]) then you could use a simple RegExp to get the list of operators and the symbols to be operated upon.

Most common cases could be easily solved by this approach. The downside is you can't have a lot of nested expressions with a clear syntax and you can't have easy higher order functions.