Did you know ... | Search Documentation: |
Pack pPEG -- Examples/Expression_Grammars/README.md |
Module expr_grammar
in this directory contains an example of parsing arithmetic expressions starting with a simple expression parser which depends on an external table to define operator precedence and associativity and an alternative but equivalent grammar (recognizes same syntax) which uses a rule naming convention which can be used with the ptree_pratt/2 predicate in module pPEGutilities
to apply precedence rules. In both cases, the mapping to a final expression is done in semantic analysis, i.e., after parsing the syntax using peg_parse
.
The grammar rules defining the syntax of expressions in both grammars are:
expr = _ (prefixOp _)* term _ (postfixOp _)* (infixOp expr)? _ term = number / Pexpr number = [0-9]+ Pexpr = '(' expr ')' _ = [ \t\n\r]* # optional whitespace
So this grammar just defines operands in expressions as integers (rule 'number
'). Expressions (rule 'expr
') consist of operands connected with prefix, infix and postfix operands. Parenthesized expressions (rule 'Pexpr') are used to defined sub-expressions which can override any operator precedence definitions.
To complete the grammar the allowable operators must be defined. If operator precedence information is elsewhere, this is a fairly straightforward task; a small common set of defintions is:
prefixOp = unaryOp infixOp = addOp / mulOp / expOp postfixOp = postOp unaryOp = [-+] addOp = [-+] mulOp = [*/] expOp = '^' postOp = '++' / '--'
This is now a complete pPEG
grammar which can parse expressions and produce a ptree result: as done by the predicate parse_expr/2:
?- parse_expr("1+2*3+4",Tree). Tree = expr([number("1"), addOp("+"), expr([number("2"), mulOp("*"), expr([number("3"), addOp("+"), number("4")])])]). ?- parse_expr("-1/2++",Tree). Tree = expr([unaryOp("-"), number("1"), mulOp("/"), expr([number("2"), postOp("++")])]).
As this shows, it's a straight syntactic mapping to the ptree, i.e., both expressions have exactly the same form as there is no operator precedence enforced by the grammar. (See the pPEG_API_Guide
in docs
for examples of how to build precedence into the grammar using rule hierarchies.) Applying operator precedence rules must be done in subsequent semantic analysis using some external table of operator definitions. (See the SWIP-grammar
example for an an example of how this is done in a Prolog parser using current_op/3.)
Decoupling operator precedence information from the operator syntax is a bit unfortunate. But using a rule naming convention to embed this precedence information in the grammar itself is a partial solution, and no additional external table is required. Module pPEGutilities
provides ptree_pratt/2 to simplify the task of applying precedence information to parsed expressions using Pratt parsing techniques. This requires that the names of rules defining operators must have a suffix of the form "_PA" where "P" is any legal character in a pPEG
rule name (`[0-9A-Za-z_]`). The character code of "P" is interpreted as a precedence value (unlike Prolog precedence values, higher values bind tighter) and "A" is defined by the set [LR]
and specifies operator associativity (left or right respectively); the "A" value is used to break ties when precedence values are equal.
The second requirement for operator definition rules is that the right hand side of a Pratt operator rule defines just the operator symbol, i.e.,it's a terminal rule. Here's our example grammar with Pratt operator rules:
prefixOp = notOp_2R / unaryOp_7R infixOp = addOp_4L / mulOp_5L / expOp_6R postfixOp = postOp_7L unaryOp_7R = [-+] addOp_4L = [-+] mulOp_5L = [*/] expOp_6R = '^' postOp_7L = '++' / '--' # / '-'
Expressions in this grammar can be parsed using pratt_parse_expr/2 (defined with the grammar in expr_grammar.pl
):
?- pratt_parse_expr("1+2*3+4",Tree). Tree = expr([number("1"), addOp_4L("+"), expr([number("2"), mulOp_5L("*"), expr([number("3"), addOp_4L("+"), number("4")])])]). ?- pratt_parse_expr("-1/2++",Tree). Tree = expr([unaryOp_7R("-"), number("1"), mulOp_5L("/"), expr([number("2"), postOp_7L("++")])]).
Note that the result trees are exactly the same as before other than the "semantic labels" (the rule names) now contain the precedence information in their suffixes.
The predicate ptree_pratt/2 maps a ptree generated by peg_parse
to a so-called "Pratt tree" which is a ptree with the following properties:
?- pratt_parse_expr("1+2*3+4",Tree), ptree_pratt(Tree,Pratt). Tree = expr([number("1"), addOp_4L("+"), expr([number("2"), mulOp_5L("*"), expr([number("3"), addOp_4L("+"), number("4")])])]), Pratt = +[+[number("1"), *([number("2"), number("3")])], number("4")]. ?- pratt_parse_expr("-1/2++",Tree), ptree_pratt(Tree,Pratt). Tree = expr([unaryOp_7R("-"), number("1"), mulOp_5L("/"), expr([number("2"), postOp_7L("++")])]), Pratt = /([-[number("1")], ++([number("2")])]).
Having produced a Pratt tree with explicit precedence, it's now fairly straight forward to map this to an arithmetic Prolog term which could be evaluated (with user defined arithmetic functions) for the example grammar:
:- arithmetic_function(user:(++ /1)). ++(N,R) :- arithmetic_expression_value(N,V), R is V+1. :- arithmetic_function(user:(-- /1)). --(N,R) :- arithmetic_expression_value(N,V), R is V-1. % convert a pratt tree of an expression in pratt_expr_grammar/1 to an arithmetic term pratt_to_term(number(S), N) :- !, % terminal value number_string(N,S). pratt_to_term('Pexpr'([Pratt]), Exp) :- !, % parenthesized expression pratt_to_term(Pratt,Exp). pratt_to_term(Func, Exp) :- % any operator node Func =.. [F,Args], pratt_termList(Args,Args1), Exp =.. [F|Args1]. pratt_termList([],[]). pratt_termList([A|Args],[A1|Args1]) :- pratt_to_term(A,A1), pratt_termList(Args,Args1).
as demonstrated by the following examples:
?- pratt_parse_expr("1+2*3+4",Tree), ptree_pratt(Tree,Pratt), pratt_to_term(Pratt,Exp), arithmetic_expression_value(Exp,R). Tree = expr([number("1"), addOp_4L("+"), expr([number("2"), mulOp_5L("*"), expr([number("3"), addOp_4L("+"), number("4")])])]), Pratt = +[+[number("1"), *([number("2"), number("3")])], number("4")], Exp = 1+2*3+4, R = 11. ?- pratt_parse_expr("-1/2++",Tree), ptree_pratt(Tree,Pratt), pratt_to_term(Pratt,Exp), arithmetic_expression_value(Exp,R). Tree = expr([unaryOp_7R("-"), number("1"), mulOp_5L("/"), expr([number("2"), postOp_7L("++")])]), Pratt = /([-[number("1")], ++([number("2")])]), Exp = - 1/ ++(2), R = -0.3333333333333333. ?- pratt_parse_expr("-3^ -3--",Tree), ptree_pratt(Tree,Pratt), pratt_to_term(Pratt,Exp), arithmetic_expression_value(Exp,R). Tree = expr([unaryOp_7R("-"), number("3"), expOp_6R("^"), expr([unaryOp_7R("-"), number("3"), postOp_7L("--")])]), Pratt = ^([-[number("3")], -[--([number("3")])]]), Exp = (- 3)^ - --(3), R = 0.1111111111111111.
(arithmetic_expression_value/2 is required to dynamically evaluate expressions with user defined artihmetic functions - see `library(arithmetic))`.)