-
Notifications
You must be signed in to change notification settings - Fork 15
Parsing Expressions for Evaluation
The expressions supported by Kern make use of binary infix, and unary prefix and postfix operators. The predefined parsers come in two forms depending on the desired result: evaluation or data structures. The first group produce results by applying operators to operands like a calculator would. The second group return a data structure with operators and operands in a hierarchy that corresponds to the rules of precedence and associativity that the grammar defines (that is, an abstract syntax tree).
In this tutorial we'll cover parsers of the first form. There are two dimensions for evaluating an expression. One is precedence, which determines which operators evaluate first, the typical case being multiplication and division running before addition and subtraction. The other is associativity, which determines how the operations are grouped. We start with associativity as it is the simpler case.
Expression parsers are located in the blancas.kern.expr namespace. core is always needed; c-style provides lexer parsers that skip over whitespace and comments.
(use 'blancas.kern.core
'blancas.kern.lexer.c-style
'blancas.kern.expr)The combinator chainl1 takes two parsers as arguments; one is for reading operands and the other for reading operators. Actually, it can work on even a single operand, which will be important when we discuss precedence, since reading operands is delegated to sub-expressions.
(run (chainl1 dec-lit add-op) "99") ;; 99It starts paying off when we have a binary operator.
(run (chainl1 dec-lit add-op) "3 + 4") ;; 7In this case, operands are just decimal literals parsed with dec-lit. Parser add-op (for additive operators) accepts a plus or minus sign and returns the corresponding Clojure function to be applied on the operands. chainl1 owes its name to three reasons. One, it will chain operations as it find more to work on; two, associates to the left; three, it expects to read at least one operand. In the current example we can chain addition and subtraction:
(run (chainl1 dec-lit add-op) "15 + 20 + 4 - 13") ;; 26chainl1 associates to the left, thus the above input expression would be equivalent to:
((15 + 20) + 4) - 13
Another operator parser is mul-op for multiplication, division, and modulo (*, /, %).
(run (chainl1 dec-lit mul-op) "5 * 10 * 2 / 4") ;; 25In the above call, all operators have the same precedence and the associativity is left to right:
((5 * 10) * 2) / 4
The combinator chainl works like chainl1 but takes a default value. This parser works in cases where the input expression is optional. If no operand is found in the input, the default value is returned.
(run (chainl dec-lit mul-op 737) "foo") ;; 737You may write your own parser for infix binary operators. For example, we may define %% as a percent operator; the first operand is the amount and the second is the ratio. (Note than % is already used by the predefined mul-op parser for modulo.)
(defn percent-fun [n x]
"Computes x percent of n."
(quot (* n x) 100))
(def percent-op
"Parses the percent operator %%."
(bind [c (times 2 (sym \%))] (return percent-fun)))
(run (chainl1 dec-lit percent-op) "300 %% 20") ;; 60The combinator chainr1 works like chainl1 but associates to the right. For associative operators, like addition and multiplication, this makes no difference.
(run (chainr1 dec-lit add-op) "5 + 8 + 10") ;; 23
(run (chainr1 dec-lit mul-op) "5 * 8 * 10") ;; 400chainr1 parsers the input expressions grouping operands starting from the right.
5 + (8 + 10)
5 * (8 * 10)
There is a big difference, however, for operators that are not associative, like subtraction and division. In the following examples chainr1 gives the wrong result:
(run (chainl1 dec-lit add-op) "15 - 8 - 2") ;; 5 -> (15 - 8) - 2 = 7 - 2 = 5
(run (chainr1 dec-lit add-op) "15 - 8 - 2") ;; 9 -> 15 - (8 - 2) = 15 - 6 = 9
(run (chainl1 dec-lit mul-op) "48 / 12 / 2") ;; 2 -> (48 / 12) / 2 = 4 / 2 = 2
(run (chainr1 dec-lit mul-op) "48 / 12 / 2") ;; 8 -> 48 / (12 / 2) = 48 / 6 = 8Thus we should be mindful of the rules or conventions regarding operator associativity. chainr1 is necessary for operators that are well-known to associate to the right, like power (^), assignment (=), and conditional expressions (?).
a ^ b ^ c == a ^ (b ^ c)
a = b = c == a = (b = c)
b1 ? v1 : b2 ? v2 : v3 == b1 ? v1 : (b2 ? v2 : v3)
The predefined pow-op parser supports the power-of (^) operator. According to convention we use chainr1 to make it associate to the right.
(run (chainr1 dec-lit pow-op) "10^2^3") ;; 1.0E8The combinator chainr works like chainr1 but takes a default value. This parser works in cases where the input expression is optional. If no operand is found in the input, the default value is returned.
(run (chainr float-lit add-op 3.1416) "xyz") ;; 3.1416Parsing prefix and postfix operators works in a similar way to the binary operators, including the ability to chain them. The predefined operator parser uni-op supports prefix logical negation (!) and unary minus (-). There are no predefined postfix operator parsers.
The combinator prefix1 takes two parsers as arguments; one is for reading operands and the other for reading operators. Like chainl1, it can work on even a single operand:
(run (prefix1 dec-lit uni-op) "99") ;; 99The prefix operators can be parsed on single or chained applications.
(run (prefix1 dec-lit uni-op) "-100") ;; -100
(run (prefix1 dec-lit uni-op) "--100") ;; 100The combinator prefix works like prefix1 but takes a default value. This parser works in cases where the input expression is optional. If no operand is found in the input, the default value is returned.
(run (prefix float-lit uni-op 3.1416) "xyz") ;; 3.1416To illustrate the use of postfix opertors, we can define one similar to C's increment. Note that postfix1 allows the chaining of operations.
(def plus-one (bind [op (times 2 (sym \+))] (return inc)))
(run (postfix1 dec-lit plus-one) "5++") ;; 6
(run (postfix1 dec-lit plus-one) "5++++") ;; 7The combinator postfix works like postfix1 but takes a default value. This parser works in cases where the input expression is optional. If no operand is found in the input, the default value is returned.
(run (postfix dec-lit plus-one -1) "xyz") ;; -1We've seen how associativity determines how subexpressions are grouped. We'll now determine how some operators have precedence over others (i.e., bind the operands tighter). A typical example is multiplication having precedence over addition; thus we expect 3 * 4 + 2 * 5 to evaluate to 22. The way to specify that * evaluates first is to make it a subordinate expression. This way its result must be obtained before the main expression can be fully evaluated.
(def sub (chainl1 dec-lit mul-op))
(def main (chainl1 sub add-op))Parser main is the main parser, the one we use in the run function. It doesn't read operands directly, but relies on its subordinate parser sub, which will either read a single operand and return it, or read subexpression and evaluate them, and then return the result. Since the sub-parser is closer to the operands, its operands work first and therefore have higher precedence.
(run main "3 * 4 + 2 * 5") ;; 22From the above example we can extract some simple rules to enforce operator precedence (see the sample grammar in Expression evaluator.)
- Declare sub-parsers from highest to lower precedence.
- The parser with the highest precedence should read the operands.
- Each parser should reference the one with the next higher precedence.
- The parser with the lowest precedence as the main one.
What if we wanted to evaluate the above example with the same precedence? We'd use a single parser.
(def calc (chainl1 dec-lit (<|> add-op mul-op)))
(run calc "3 * 4 + 2 * 5") ;; 70 -> 3 * 4 = 12; 12 + 2 = 14; 14 * 5 = 70Expressions are inherently recursive and thus a common situation is to parse a full expression as part of the sub-expression with the highest precedence. Consider an extension to the example above that adds sub-expressions in parenthesis.
(declare main)
(def fact (<|> dec-lit (parens (fwd main))))
(def sub (chainl1 fact mul-op))
(def main (chainl1 sub add-op))
(run main "5 * (3 + 7) * 2") ;; 100Declaring main beforehand is not enough for the definition of fact; we must call main with (fwd main). That is because we are working with defs, which evaluate their expressions immediately. The fwd macro delays the call to main so that def uses its name but not its value.
An exception to this rule is when a forward-declared parser ends up in a nested function, which is the case whenever such parser is not the first one inside a bind macro. Thus we may write, for example, the Pair production in the JSON Sample:
(declare jvalue)
(def pair (bind [f string-lit _ colon v jvalue]
(return [f v])))Since the value v is bound as the third item, it's safe to just the json parser directly. That is not the case, though, for the JSON Array production:
(def array (brackets (comma-sep (fwd jvalue))))In this last case, fwd is necessary.