Python – Using Python Yacc\Lex as a formula parser

pythonyacc

At the moment i'm working on using the python implementation of Yacc/Lex to build a formula parser for converting strings of formulae into a set of class defined operands. So far i've been mostly successful but i've come to an empasse in defining the parsing rules due to ambiguity with parentheses and several shift/reduce errors.

The Backus Naur Form for the formulae ive been working on is

phi ::= p ; !p ; phi_0 & phi_1 ; phi_0 | phi_1 ; AX phi ; AF phi ; AG phi ; AU phi_0 U phi_1.

Also i've been trying to allow arbitrary matched parentheses but this is also where a lot of the confusion is coming from and i'm thinking where the shift reduce errors are coming from. Its fairly necessary for the task i'm applying it to that parentheses are there to force specific evaluations on formulas, so I have to work that out.

Currently my parser is defined inside a class which builds its lexical analyser with

            tokens = (
            'NEGATION',
            'FUTURE',
            'GLOBAL',
            'NEXT',
            'CONJUNCTION',
            'DISJUNCTION',
            'EQUIVALENCE',
            'IMPLICATION',
            'PROPOSITION',             
            'LPAREN',
            'RPAREN',
            'TRUE',
            'FALSE',
            )

        # regex in order of parsing precedence
        t_NEGATION    = r'[\s]*\![\s]*'
        t_FUTURE      = r'[\s]*AF[\s]*'
        t_GLOBAL      = r'[\s]*AG[\s]*'
        t_NEXT        = r'[\s]*AX[\s]*'
        t_CONJUNCTION = r'[\s]*\&[\s]*'
        t_DISJUNCTION = r'[\s]*\|[\s]*'
        t_EQUIVALENCE = r'[\s]*\<\-\>[\s]*'
        t_IMPLICATION = r'[\s]*[^<]\-\>[\s]*'
        t_LPAREN      = r'[\s]*\([\s]*'
        t_RPAREN      = r'[\s]*\)[\s]*'
        t_PROPOSITION = r'[\s]*[a-z]+[-\w\._]*[\s]*'
        t_TRUE        = r'[\s]*TRUE[\s]*'
        t_FALSE       = r'[\s]*FALSE[\s]*'

        precedence = (
        ('left', 'ASSIGNMENT'),
        ('left', 'NEGATION'),
        ('left', 'GLOBAL','NEXT','FUTURE'),
        ('left', 'CONJUNCTION'), 
        ('left', 'DISJUNCTION'), 
        ('left', 'EQUIVALENCE'),
        ('left', 'IMPLICATION'),   
        ('left', 'AUB', 'AUM'),
        ('left', 'LPAREN', 'RPAREN', 'TRUE', 'FALSE'),
        )            

        lexer = lex.lex()
        lexer.input(formula)

And the parsing rules as

        def p_double_neg_paren(p):
            '''formula : NEGATION LPAREN NEGATION LPAREN PROPOSITION RPAREN RPAREN
            '''
            stack.append(p[5].strip())

        def p_double_neg(p):
            '''formula : NEGATION NEGATION PROPOSITION
            '''
            stack.append(p[3].strip())

        def p_double_neg_inner_paren(p):
            '''formula : NEGATION NEGATION LPAREN PROPOSITION RPAREN
            '''
            stack.append(p[4].strip())

        def p_double_neg_mid_paren(p):
            '''formula : NEGATION LPAREN NEGATION PROPOSITION RPAREN
            '''
            stack.append(p[4].strip())

        def p_groupAssignment(p):
            '''formula : PROPOSITION ASSIGNMENT ASSIGNVAL
            '''
            stack.append(p[1].strip() + p[2].strip() + p[3].strip())

        def p_neg_paren_take_outer_token(p):
            '''formula : NEGATION LPAREN PROPOSITION RPAREN
                       | NEGATION LPAREN TRUE RPAREN
                       | NEGATION LPAREN FALSE RPAREN
            '''
            stack.append(Neg(p[3]))

        def p_neg_take_outer_token(p):
            '''formula : NEGATION PROPOSITION
                       | NEGATION TRUE
                       | NEGATION FALSE
            '''

            stack.append(Neg(p[2].strip()))

        def p_neg_take_outer_token_paren(p):
            '''formula : LPAREN NEGATION PROPOSITION RPAREN
                       | LPAREN NEGATION TRUE RPAREN
                       | LPAREN NEGATION FALSE RPAREN
            '''
            stack.append(Neg(p[3].strip()))

        def p_unary_paren_nest_take_outer_token(p):
            '''formula : GLOBAL LPAREN LPAREN NEGATION formula RPAREN RPAREN
                       | NEXT LPAREN LPAREN NEGATION formula RPAREN RPAREN
                       | FUTURE LPAREN LPAREN NEGATION formula RPAREN RPAREN
            '''
            if len(stack) >= 1:
                if p[1].strip() == 'AG':
                    stack.append(['AG', ['!', stack.pop()]])
                elif p[1].strip() == 'AF':
                    stack.append(['AF', ['!', stack.pop()]])
                elif p[1].strip() == 'AX':    
                    stack.append(['AX', ['!', stack.pop()]])


        def p_unary_paren_take_outer_token(p):
            '''formula : GLOBAL LPAREN formula RPAREN
                       | NEXT LPAREN formula RPAREN
                       | FUTURE LPAREN formula RPAREN
            '''
            if len(stack) >= 1:
                if p[1].strip() == "AG":
                    stack.append(AG(stack.pop()))
                elif p[1].strip() == "AF":
                    stack.append(AF(stack.pop()))
                elif p[1].strip() == "AX":
                    stack.append(AX(stack.pop()))

        def p_unary_take_outer_token(p):
            '''formula : GLOBAL formula
                       | NEXT formula
                       | FUTURE formula
            '''

            if len(stack) >= 1:
                if p[1].strip() == "AG":
                    stack.append(AG(stack.pop()))
                elif p[1].strip() == "AF":
                    stack.append(AF(stack.pop()))
                elif p[1].strip() == "AX":
                    stack.append(AX(stack.pop()))

        def p_unary_take_outer_token_prop(p):
            '''formula : GLOBAL PROPOSITION
                       | NEXT PROPOSITION
                       | FUTURE PROPOSITION
            '''

            if len(stack) >= 1:
                if p[1].strip() == "AG":
                    stack.append(AG(stack.pop()))
                elif p[1].strip() == "AF":
                    stack.append(AF(stack.pop()))
                elif p[1].strip() == "AX":
                    stack.append(AX(stack.pop()))

        def p_binary_take_outer_token(p):
            '''formula : formula CONJUNCTION formula 
                       | formula DISJUNCTION formula 
                       | formula EQUIVALENCE formula
                       | formula IMPLICATION formula
            '''

            if len(stack) >= 2:
                a, b = stack.pop(), stack.pop()
                if self.IMPLICATION.search(p[2].strip()) and not self.EQUIVALENCE.search(p[2].strip()):
                    stack.append(Or(a, Neg(b)))
                elif self.EQUIVALENCE.search(p[2].strip()):
                    stack.append(And(Or(Neg(a), b), Or(Neg(b), a)))
                else:
                    if p[2].strip() == "|":
                        stack.append(Or(b, a))
                    elif p[2].strip() == "&":
                        stack.append(And(b, a))

        def p_binary_paren_take_outer_token(p):
            '''formula : LPAREN formula RPAREN CONJUNCTION LPAREN formula RPAREN 
                       | LPAREN formula RPAREN DISJUNCTION LPAREN formula RPAREN
                       | LPAREN formula RPAREN EQUIVALENCE LPAREN formula RPAREN 
                       | LPAREN formula RPAREN IMPLICATION LPAREN formula RPAREN
            '''
            if len(stack) >= 2:
                a, b = stack.pop(), stack.pop()
                if self.IMPLICATION.search(p[4].strip()) and not self.EQUIVALENCE.search(p[4].strip()):
                    stack.append(Or(a, Neg(b)))
                elif self.EQUIVALENCE.search(p[4].strip()):
                    stack.append(And(Or(Neg(a), b), Or(Neg(b), a)))
                else:
                    if p[4].strip() == "|":
                        stack.append(Or(b, a))
                    elif p[4].strip() == "&":
                        stack.append(And(b, a))

        def p_binary_lparen_take_outer_token(p):
            '''formula : LPAREN formula RPAREN CONJUNCTION formula 
                       | LPAREN formula RPAREN DISJUNCTION formula
                       | LPAREN formula RPAREN EQUIVALENCE formula 
                       | LPAREN formula RPAREN IMPLICATION formula
            '''
            if len(stack) >= 2:
                a = stack.pop()
                b = stack.pop()
                if self.IMPLICATION.search(p[4].strip()) and not self.EQUIVALENCE.search(p[4].strip()):
                    stack.append(Or(a, Neg(b)))
                elif self.EQUIVALENCE.search(p[4].strip()):
                    stack.append(And(Or(Neg(a), b), Or(Neg(b), a)))
                else:
                    if p[4].strip() == "|":
                        stack.append(Or(b, a))
                    elif p[4].strip() == "&":
                        stack.append(And(b, a))

        def p_binary_rparen_take_outer_token(p):
            '''formula : formula CONJUNCTION LPAREN formula RPAREN 
                       | formula DISJUNCTION LPAREN formula RPAREN
                       | formula EQUIVALENCE LPAREN formula RPAREN 
                       | formula IMPLICATION LPAREN formula RPAREN
            '''
            if len(stack) >= 2:
                a = stack.pop()
                b = stack.pop()
                if self.IMPLICATION.search(p[4].strip()) and not self.EQUIVALENCE.search(p[4].strip()):
                    stack.append(Or(a, Neg(b)))
                elif self.EQUIVALENCE.search(p[4].strip()):
                    stack.append(And(Or(Neg(a), b), Or(Neg(b), a)))
                else:
                    if p[4].strip() == "|":
                        stack.append(Or(b, a))
                    elif p[4].strip() == "&":
                        stack.append(And(b, a))

        def p_proposition_take_token_paren(p):
            '''formula : LPAREN formula RPAREN
            '''
            stack.append(p[2].strip())

        def p_proposition_take_token_atom(p):
            '''formula : LPAREN PROPOSITION RPAREN
            '''
            stack.append(p[2].strip())

        def p_proposition_take_token(p):
            '''formula : PROPOSITION
            '''
            stack.append(p[1].strip())

        def p_true_take_token(p):
            '''formula : TRUE
                       '''
            stack.append(p[1].strip())

        def p_false_take_token(p):
            '''formula : FALSE
                       '''
            stack.append(p[1].strip())

        # Error rule for syntax errors
        def p_error(p):
            print "Syntax error in input!: " + str(p)
            os.system("pause")
            return 0

I can see the lex\yacc rules are fairly messy, i've removed much of the debugging code in each rule for brevity and tidiness but can anyone see where i'm going wrong here? Should I move handling of parentheses to another method or can it be done with what I have now? Is there some other way I can process these formula strings into the predefined class operations without getting all the shift/reduce errors?

Sorry for airing all my dirty code online but I could really use some help on something thats been bugging me for months. Thanks.

Best Answer

START SMALL!!! Regardless of the parser library you end up using, try doing just a simple binary operation like expr & expr, and get that working. Then add support for '|'. Now you have two different operators, and you have enough to represent precedence of operations, and parentheses actually play a part. This BNF would look something like:

atom := TRUE | FALSE | '(' expr ')'
and_op := atom '&' atom
or_op := and_op '|' and_op
expr = or_op

Do you see how this works? No explicit "take left paren", "pop right paren" stuff. Figure out what your precedence of your other operations is going to be, and then extend this recursive infix notation parser to reflect them. But DON'T DO ANYTHING ELSE until you get this minimal bit working first. Otherwise, you are just trying to solve too much at once.