This week, this article will be a little different. I want to show you that we can use recursive Active Pattern to create an expression parser. While I wouldn't recommend using this instead of a library such as FParsec for parsing, I think it's very cool that something like that is possible with quite a nice syntax, and also that it's a great showcase of the power of Active Patterns.
Goal
Our goal will be to parse a simple math expression that would consist of:
non-negative whole numbers
basic arithmetic operators:
+
, '-', '*', '/'parentheses
The output will be defined by this type:
type Op =
| Plus
| Minus
| Multiply
| Divide
[<RequireQualifiedAccess>]
type Ast =
| Number of int
| Op of Op * Ast * Ast
For example, 12/(1+2)
will be parsed as Op (Divide, Number 12, Op (Plus, Number 1, Number 2))
.
Tokenizer
To make parsing with active patterns easier, we will first transform the input to a list of tokens, representing one item in our language (math expression), and then throw away any whitespaces. This way, we can avoid the need for handling details like multiple spaces. In our case, only multiple characters that create tokens are numbers; any other non-whitespace character creates a new token. To do that, we define a helper function chunkBy
:
// split into chunks, where a chunk is where function f returns true for adjacent elements
let chunkBy f xs =
let rec loop prev acc xs =
match xs, acc with
| [], _ -> List.rev acc
| x :: rest, g :: accRest ->
if f prev x then loop x ((x :: g) :: accRest) rest
else loop x ([x] :: List.rev g :: accRest) rest
| _, [] -> loop prev [[]] xs
match Seq.toList xs with
| [] -> []
| [x] -> [[x]]
| x :: xs ->
loop x [[x]] xs
let tokens (s: string) =
s |> chunkBy (fun x y -> System.Char.IsDigit x && System.Char.IsDigit y)
|> List.map (Seq.filter (System.Char.IsWhiteSpace >> not) >> Seq.map string >> String.concat "")
|> List.filter (not << System.String.IsNullOrEmpty)
The chunkBy
function allows us to define tokens by a condition on successive characters. I won't go into the details of the chunkBy
function, as it is out of the scope of today's article. We use it in the tokens
function, and then delete all whitespaces from chunks.
Number
For a number, we can use the Int
pattern from tip #10:
let (|Int|_|) str =
match System.Int32.TryParse(str: string) with
| (true, x) ->
Some(x)
| _ ->
None
Operators
To parse a (sub)expression like 1+2
, we can make a Split
active pattern:
let (|TakeUntil|_|) x xs =
match List.takeWhile ((<>) x) xs with
| y when y = xs -> None
| y -> Some(y, List.skipWhile ((<>) x) xs)
let (|Split|_|) split xs =
match xs with
| TakeUntil split (x1, (_ :: x2)) -> Some(x1, x2)
| _ -> None
TakeUntil
is a variant of TakeN
from tip #11. Split
finds the first occurrence of the given item and returns the list before that and after.
We can use Split
like this:
match tokens "41+1" with
| Split "+" (s1, s2) -> printfn "%A" (s1, s2)
Parentheses
A pattern that detects a list that begins and ends with a given item is easy as:
let (|Surround|_|) before after xs =
match xs with
| b :: Split after (l, []) when b = before -> Some l
| _ -> None
But for supporting inner parentheses (e.g., expressions like 1 + (4 / (1 + 1))
), we need to count before
and after
items:
let (|Eq|_|) x y =
if x = y then Some()
else None
// detect a list surrounded by before and after items, allow inner surrounds, count pairs
let (|Surround|_|) before after xs =
let rec f d acc xs =
match xs with
| _ when d < 0 -> None // wrong pairing
| Eq after :: [] when d = 1 -> List.rev acc |> Some // closing pair on end -> success
| Eq before :: rest -> f (d + 1) (before :: acc) rest // inner "before" item -> increase depth
| Eq after :: rest -> f (d - 1) (after :: acc) rest // inner "after" item -> decrease depth
| x :: rest -> f d (x :: acc) rest // other item copy to output
| _ -> None
match xs with
| Eq before :: rest -> f 1 [] rest
| _ -> None
Eq
here is only replacement for ... when x = y
to make things clearer.
Operators again
There is a problem with the use of Split
for operators - it splits on the first occurrence, and if there are multiple of them, we don't consider the correct split. For example, for the expression (1 + 2) + 3
, we only consider splitting on the first +
, leading to incorrect subexpressions (1
and 2) + 3
, which are rejected by the Surround
pattern. To solve this, we can make a pattern that tries all possibilities until we find the correct split.
let (|SplitsPick|_|) split f xs =
let rec loop prev xs =
seq {
match xs with
| TakeUntil split (x1, (_ :: x2)) ->
yield (prev @ x1, x2)
yield! loop (prev @ x1 @ [ split ]) x2
| _ -> ()
}
loop [] xs |> Seq.tryPick f
Expr pattern
Putting it all together:
let rec (|BinaryOp|_|) s =
let twoExprs = function | (Expr e1, Expr e2) -> Some (e1, e2) | _ -> None
match s with
| SplitsPick "+" twoExprs (e1, e2) -> Ast.Op (Plus, e1, e2) |> Some
| SplitsPick "-" twoExprs (e1, e2) -> Ast.Op (Minus, e1, e2) |> Some
| SplitsPick "*" twoExprs (e1, e2) -> Ast.Op (Multiply, e1, e2) |> Some
| SplitsPick "/" twoExprs (e1, e2) -> Ast.Op (Divide, e1, e2) |> Some
| _ -> None
and (|Expr|_|) s =
match s with
| [ Int x ] -> (Ast.Number x) |> Some
| Surround "(" ")" (Expr e) -> Some e
| BinaryOp e -> Some e
| e ->
None
Here we are finally using recursive active patterns. Note that BinaryOp
uses Expr
pattern - that way SplitsPick
selects only the correct splitting to two valid expressions.
Now we have a working solution:
let parse s =
match tokens s with
| Expr e -> printfn "%A" e
| xs -> failwithf "cannot parse: %A" xs
parse "(10 * 4) / ((30/10) + 1)"
This will give us:
Op
(Divide, Op (Multiply, Number 10, Number 4),
Op (Plus, Op (Divide, Number 30, Number 10), Number 1))
Performance note
The way how SplitsPick
works is not good performance-wise because we're creating sub-lists for every possibility and running the Expr
pattern recursively on them. There are ways to fix it, for example, making SplitsPick
aware of Surround
logic and going for the top-level operator.
Conclusion
With some list patterns utils, we can get powerful tools for writing a readable parser with active patterns. Also, this example shows how we can transform a sequence into recursive data types and apply patterns recursively.
Full code is available as Gist.