Skip to content

Parsing Expressions for Evaluation

Armando Blancas edited this page May 26, 2018 · 16 revisions

The expressions supported by Kern make use of binary infix, and unary prefix and postfix operators. The predefined parsers come in two forms depending on the desired result: evaluation or data structures. The first group produce results by applying operators to operands like a calculator would. The second group return a data structure with operators and operands in a hierarchy that corresponds to the rules of precedence and associativity that the grammar defines (that is, an abstract syntax tree).

In this tutorial we'll cover parsers of the first form. There are two dimensions for evaluating an expression. One is precedence, which determines which operators evaluate first, the typical case being multiplication and division running before addition and subtraction. The other is associativity, which determines how the operations are grouped. We start with associativity as it is the simpler case.

Expression parsers are located in the blancas.kern.expr namespace. core is always needed; c-style provides lexer parsers that skip over whitespace and comments.

(use 'blancas.kern.core
     'blancas.kern.lexer.c-style
     'blancas.kern.expr)

Chain Left

The combinator chainl1 takes two parsers as arguments; one is for reading operands and the other for reading operators. Actually, it can work on even a single operand, which will be important when we discuss precedence, since reading operands is delegated to sub-expressions.

(run (chainl1 dec-lit add-op) "99")  ;; 99

It starts paying off when we have a binary operator.

(run (chainl1 dec-lit add-op) "3 + 4")  ;; 7

In this case, operands are just decimal literals parsed with dec-lit. Parser add-op (for additive operators) accepts a plus or minus sign and returns the corresponding Clojure function to be applied on the operands. chainl1 owes its name to three reasons. One, it will chain operations as it find more to work on; two, associates to the left; three, it expects to read at least one operand. In the current example we can chain addition and subtraction:

(run (chainl1 dec-lit add-op) "15 + 20 + 4 - 13")  ;; 26

chainl1 associates to the left, thus the above input expression would be equivalent to:

((15 + 20) + 4) - 13

Another operator parser is mul-op for multiplication, division, and modulo (*, /, %).

(run (chainl1 dec-lit mul-op) "5 * 10 * 2 / 4")  ;; 25

In the above call, all operators have the same precedence and the associativity is left to right:

((5 * 10) * 2) / 4

The combinator chainl works like chainl1 but takes a default value. This parser works in cases where the input expression is optional. If no operand is found in the input, the default value is returned.

(run (chainl dec-lit mul-op 737) "foo")  ;; 737

Custom Operator Parsers

You may write your own parser for infix binary operators. For example, we may define %% as a percent operator; the first operand is the amount and the second is the ratio. (Note than % is already used by the predefined mul-op parser for modulo.)

(defn percent-fun [n x]
  "Computes x percent of n."
  (quot (* n x) 100))

(def percent-op
  "Parses the percent operator %%."
  (bind [c (times 2 (sym \%))] (return percent-fun)))

(run (chainl1 dec-lit percent-op) "300 %% 20")  ;; 60

Chain Right

The combinator chainr1 works like chainl1 but associates to the right. For associative operators, like addition and multiplication, this makes no difference.

(run (chainr1 dec-lit add-op) "5 + 8 + 10")  ;; 23
(run (chainr1 dec-lit mul-op) "5 * 8 * 10")  ;; 400

chainr1 parsers the input expressions grouping operands starting from the right.

5 + (8 + 10)
5 * (8 * 10)

There is a big difference, however, for operators that are not associative, like subtraction and division. In the following examples chainr1 gives the wrong result:

(run (chainl1 dec-lit add-op) "15 - 8 - 2") ;; 5 -> (15 - 8) - 2 = 7 - 2 = 5
(run (chainr1 dec-lit add-op) "15 - 8 - 2") ;; 9 -> 15 - (8 - 2) = 15 - 6 = 9

(run (chainl1 dec-lit mul-op) "48 / 12 / 2") ;; 2 -> (48 / 12) / 2 = 4 / 2 = 2
(run (chainr1 dec-lit mul-op) "48 / 12 / 2") ;; 8 -> 48 / (12 / 2) = 48 / 6 = 8

Thus we should be mindful of the rules or conventions regarding operator associativity. chainr1 is necessary for operators that are well-known to associate to the right, like power (^), assignment (=), and conditional expressions (?).

a ^ b ^ c  ==  a ^ (b ^ c)
a = b = c  ==  a = (b = c)
b1 ? v1 : b2 ? v2 : v3  ==  b1 ? v1 : (b2 ? v2 : v3)

The predefined pow-op parser supports the power-of (^) operator. According to convention we use chainr1 to make it associate to the right.

(run (chainr1 dec-lit pow-op) "10^2^3")  ;; 1.0E8

The combinator chainr works like chainr1 but takes a default value. This parser works in cases where the input expression is optional. If no operand is found in the input, the default value is returned.

(run (chainr float-lit add-op 3.1416) "xyz")  ;; 3.1416

Unary Prefix and Postfix Operators

Parsing prefix and postfix operators works in a similar way to the binary operators, including the ability to chain them. The predefined operator parser uni-op supports prefix logical negation (!) and unary minus (-). There are no predefined postfix operator parsers.

The combinator prefix1 takes two parsers as arguments; one is for reading operands and the other for reading operators. Like chainl1, it can work on even a single operand:

(run (prefix1 dec-lit uni-op) "99")  ;; 99

The prefix operators can be parsed on single or chained applications.

(run (prefix1 dec-lit uni-op) "-100")  ;; -100
(run (prefix1 dec-lit uni-op) "--100")  ;; 100

The combinator prefix works like prefix1 but takes a default value. This parser works in cases where the input expression is optional. If no operand is found in the input, the default value is returned.

(run (prefix float-lit uni-op 3.1416) "xyz")  ;; 3.1416

To illustrate the use of postfix opertors, we can define one similar to C's increment. Note that postfix1 allows the chaining of operations.

(def plus-one (bind [op (times 2 (sym \+))] (return inc)))
(run (postfix1 dec-lit plus-one) "5++")  ;; 6
(run (postfix1 dec-lit plus-one) "5++++")  ;; 7

The combinator postfix works like postfix1 but takes a default value. This parser works in cases where the input expression is optional. If no operand is found in the input, the default value is returned.

(run (postfix dec-lit plus-one -1) "xyz")  ;; -1

Operator Precedence

We've seen how associativity determines how subexpressions are grouped. We'll now determine how some operators have precedence over others (i.e., bind the operands tighter). A typical example is multiplication having precedence over addition; thus we expect 3 * 4 + 2 * 5 to evaluate to 22. The way to specify that * evaluates first is to make it a subordinate expression. This way its result must be obtained before the main expression can be fully evaluated.

(def sub (chainl1 dec-lit mul-op))
(def main (chainl1 sub add-op))

Parser main is the main parser, the one we use in the run function. It doesn't read operands directly, but relies on its subordinate parser sub, which will either read a single operand and return it, or read subexpression and evaluate them, and then return the result. Since the sub-parser is closer to the operands, its operands work first and therefore have higher precedence.

(run main "3 * 4 + 2 * 5")  ;; 22

From the above example we can extract some simple rules to enforce operator precedence (see the sample grammar in Expression evaluator.)

  • Declare sub-parsers from highest to lower precedence.
  • The parser with the highest precedence should read the operands.
  • Each parser should reference the one with the next higher precedence.
  • The parser with the lowest precedence as the main one.

What if we wanted to evaluate the above example with the same precedence? We'd use a single parser.

(def calc (chainl1 dec-lit (<|> add-op mul-op)))

(run calc "3 * 4 + 2 * 5") ;; 70 -> 3 * 4 = 12; 12 + 2 = 14; 14 * 5 = 70

Forward References

Expressions are inherently recursive and thus a common situation is to parse a full expression as part of the sub-expression with the highest precedence. Consider an extension to the example above that adds sub-expressions in parenthesis.

(declare main)
(def fact (<|> dec-lit (parens (fwd main))))
(def sub  (chainl1 fact mul-op))
(def main (chainl1 sub add-op))

(run main "5 * (3 + 7) * 2")  ;; 100

Declaring main beforehand is not enough for the definition of fact; we must call main with (fwd main). That is because we are working with defs, which evaluate their expressions immediately. The fwd macro delays the call to main so that def uses its name but not its value.

An exception to this rule is when a forward-declared parser ends up in a nested function, which is the case whenever such parser is not the first one inside a bind macro. Thus we may write, for example, the Pair production in the JSON Sample:

(declare jvalue)
(def pair (bind [f string-lit _ colon v jvalue]
            (return [f v])))

Since the value v is bound as the third item, it's safe to just the json parser directly. That is not the case, though, for the JSON Array production:

(def array (brackets (comma-sep (fwd jvalue))))

In this last case, fwd is necessary.

Clone this wiki locally