Pack pPEG -- Examples/Expression

Using pPEG to parse expressions with operator precedence

Module expr_grammar in this directory contains an example of parsing arithmetic expressions starting with a simple expression parser which depends on an external table to define operator precedence and associativity and an alternative but equivalent grammar (recognizes same syntax) which uses a rule naming convention which can be used with the ptree_pratt/2 predicate in module pPEGutilities to apply precedence rules. In both cases, the mapping to a final expression is done in semantic analysis, i.e., after parsing the syntax using peg_parse.

The grammar rules defining the syntax of expressions in both grammars are:

        expr       = _ (prefixOp _)* term _ (postfixOp _)* (infixOp expr)? _

        term       = number / Pexpr
        number     = [0-9]+
        Pexpr      = '(' expr ')'

        _          = [ \t\n\r]*     # optional whitespace

So this grammar just defines operands in expressions as integers (rule 'number'). Expressions (rule 'expr') consist of operands connected with prefix, infix and postfix operands. Parenthesized expressions (rule 'Pexpr') are used to defined sub-expressions which can override any operator precedence definitions.

To complete the grammar the allowable operators must be defined. If operator precedence information is elsewhere, this is a fairly straightforward task; a small common set of defintions is:

        prefixOp   = unaryOp
        infixOp    = addOp / mulOp / expOp
        postfixOp  = postOp
  
        unaryOp    = [-+]
  
        addOp      = [-+]
        mulOp      = [*/]
        expOp      = '^'
  
        postOp     = '++' / '--'

This is now a complete pPEG grammar which can parse expressions and produce a ptree result: as done by the predicate parse_expr/2:

?- parse_expr("1+2*3+4",Tree).
Tree = expr([number("1"), addOp("+"), expr([number("2"), mulOp("*"), expr([number("3"), addOp("+"), number("4")])])]).
 
?- parse_expr("-1/2++",Tree).
Tree = expr([unaryOp("-"), number("1"), mulOp("/"), expr([number("2"), postOp("++")])]).

As this shows, it's a straight syntactic mapping to the ptree, i.e., both expressions have exactly the same form as there is no operator precedence enforced by the grammar. (See the pPEG_API_Guide in docs for examples of how to build precedence into the grammar using rule hierarchies.) Applying operator precedence rules must be done in subsequent semantic analysis using some external table of operator definitions. (See the SWIP-grammar example for an an example of how this is done in a Prolog parser using current_op/3.)

Decoupling operator precedence information from the operator syntax is a bit unfortunate. But using a rule naming convention to embed this precedence information in the grammar itself is a partial solution, and no additional external table is required. Module pPEGutilities provides ptree_pratt/2 to simplify the task of applying precedence information to parsed expressions using Pratt parsing techniques. This requires that the names of rules defining operators must have a suffix of the form "_PA" where "P" is any legal character in a pPEG rule name (`[0-9A-Za-z_]`). The character code of "P" is interpreted as a precedence value (unlike Prolog precedence values, higher values bind tighter) and "A" is defined by the set [LR] and specifies operator associativity (left or right respectively); the "A" value is used to break ties when precedence values are equal.

The second requirement for operator definition rules is that the right hand side of a Pratt operator rule defines just the operator symbol, i.e.,it's a terminal rule. Here's our example grammar with Pratt operator rules:

        prefixOp   = notOp_2R / unaryOp_7R
        infixOp    = addOp_4L / mulOp_5L / expOp_6R
        postfixOp  = postOp_7L

        unaryOp_7R = [-+]

        addOp_4L   = [-+]
        mulOp_5L   = [*/]
        expOp_6R   = '^'

        postOp_7L  = '++' / '--' # / '-'

Expressions in this grammar can be parsed using pratt_parse_expr/2 (defined with the grammar in expr_grammar.pl):

?- pratt_parse_expr("1+2*3+4",Tree).
Tree = expr([number("1"), addOp_4L("+"), expr([number("2"), mulOp_5L("*"), expr([number("3"), addOp_4L("+"), number("4")])])]).

?- pratt_parse_expr("-1/2++",Tree).
Tree = expr([unaryOp_7R("-"), number("1"), mulOp_5L("/"), expr([number("2"), postOp_7L("++")])]).

Note that the result trees are exactly the same as before other than the "semantic labels" (the rule names) now contain the precedence information in their suffixes.

The predicate ptree_pratt/2 maps a ptree generated by peg_parse to a so-called "Pratt tree" which is a ptree with the following properties:

The input ptree is re-structured by applying the precedence information in the rule names.
All operator terminal nodes have the original rule names replaced by the matched operator symbol.
Any sub-expressions with the same node name as the root are likewise mapped.

Operand nodes are preserved other than those mapped by 3. Examples:

?- pratt_parse_expr("1+2*3+4",Tree), ptree_pratt(Tree,Pratt).
Tree = expr([number("1"), addOp_4L("+"), expr([number("2"), mulOp_5L("*"), expr([number("3"), addOp_4L("+"), number("4")])])]),
Pratt = +[+[number("1"), *([number("2"), number("3")])], number("4")].

?- pratt_parse_expr("-1/2++",Tree), ptree_pratt(Tree,Pratt).
Tree = expr([unaryOp_7R("-"), number("1"), mulOp_5L("/"), expr([number("2"), postOp_7L("++")])]),
Pratt = /([-[number("1")], ++([number("2")])]).

Having produced a Pratt tree with explicit precedence, it's now fairly straight forward to map this to an arithmetic Prolog term which could be evaluated (with user defined arithmetic functions) for the example grammar:

:- arithmetic_function(user:(++ /1)).
++(N,R) :- arithmetic_expression_value(N,V), R is V+1.
  
:- arithmetic_function(user:(-- /1)).
--(N,R) :- arithmetic_expression_value(N,V), R is V-1.
  
% convert a pratt tree of an expression in pratt_expr_grammar/1 to an arithmetic term
pratt_to_term(number(S), N) :- !,            % terminal value
        number_string(N,S).
pratt_to_term('Pexpr'([Pratt]), Exp) :- !,   % parenthesized expression
        pratt_to_term(Pratt,Exp).
pratt_to_term(Func, Exp) :-                  % any operator node
        Func =.. [F,Args],
        pratt_termList(Args,Args1),
        Exp =.. [F|Args1].
  
pratt_termList([],[]).
pratt_termList([A|Args],[A1|Args1]) :-
        pratt_to_term(A,A1),
        pratt_termList(Args,Args1).

as demonstrated by the following examples:

?- pratt_parse_expr("1+2*3+4",Tree), ptree_pratt(Tree,Pratt), pratt_to_term(Pratt,Exp), arithmetic_expression_value(Exp,R).
Tree = expr([number("1"), addOp_4L("+"), expr([number("2"), mulOp_5L("*"), expr([number("3"), addOp_4L("+"), number("4")])])]),
Pratt = +[+[number("1"), *([number("2"), number("3")])], number("4")],
Exp = 1+2*3+4,
R = 11.

?- pratt_parse_expr("-1/2++",Tree), ptree_pratt(Tree,Pratt), pratt_to_term(Pratt,Exp), arithmetic_expression_value(Exp,R).
Tree = expr([unaryOp_7R("-"), number("1"), mulOp_5L("/"), expr([number("2"), postOp_7L("++")])]),
Pratt = /([-[number("1")], ++([number("2")])]),
Exp = - 1/ ++(2),
R = -0.3333333333333333.
  
?- pratt_parse_expr("-3^ -3--",Tree), ptree_pratt(Tree,Pratt), pratt_to_term(Pratt,Exp), arithmetic_expression_value(Exp,R).
Tree = expr([unaryOp_7R("-"), number("3"), expOp_6R("^"), expr([unaryOp_7R("-"), number("3"), postOp_7L("--")])]),
Pratt = ^([-[number("3")], -[--([number("3")])]]),
Exp =  (- 3)^ - --(3),
R = 0.1111111111111111.

(arithmetic_expression_value/2 is required to dynamically evaluate expressions with user defined artihmetic functions - see `library(arithmetic))`.)