Parsing Operators in Rust

The Advent of Code is just around the corner. This is an advent calendar of programming tasks, the solution of which can serve as a pastime for the fun of joy. After successfully processing a task, you receive a virtual golden star. My first participation was in 2020. At the time, I thought I’d try to do this directly in the Rust programming language, which I hadn’t used before. By the way, this article is not only intended to shed light on an interesting Advent of Code task, but also to highlight the flexibility of my Rust library Exmex, which is discussed below as part of the solution.

Task

We consider the task of the nineteenth day of Advent of Code 2020. The input data of the program are a set of rules and several test strings consisting only of as and bs. The program should determine which of the test strings match the rules. A set of rules looks as follows.

0: 1 2
1: "a"
2: 1 3 | 3 1
3: "b"

Each row represents a rule. To the left of the colon is the index of the rule. To the right of the colon is the actual rule. Numbers in rules denote indices of other rules. For example, 0: 1 2 means that the rules 1 and 2 must be matched. The rule 3: "b", on the other hand, means that one "b" is expected. The vertical lines are non-exclusive or-operators. So 2: 1 3 | 3 1 is fulfilled if 1 3 and/or 3 1 are fulfilled. The question now is which strings match the rule 0 exactly. For the small set of rules from above, you can manually write this down.

 1 2
 |  \
"a"  (1   3)  |   (3   1)
 |    |   |   or   |   |
"a" ("a" "b") or ("b" "a")

These are the strings aab and aba. A more detailed explanation can be found on the Advent of Code page1, from which I took the example. The rules and the test strings to obtain the golden star are way more complex.

Operator View

The task reminded me of a task from my 2nd semester at the university. We wrote an interpreter for movements of the robot Karel. In my memory, this interpreter consists of many different expressions, nodes, expression nodes, and inheritance trees and is terribly2The reader is encouraged to notice negative connotations to inheritance. It is the only raison d’être of the otherwise superfluous introduction of this section. class-oriented. In any case, I need a parser that creates an expression from the string that can be evaluated. Accordingly, I have created the Crate3Rust library that you can install with the standard package manager. Exmex, which provides functionality for parsing mathematical expressions consisting of numbers and operators. Yes, I work on a Crate for several months just to solve an Advent of Code task😜.

Credits to David W. for digging up this comic, https://imgs.xkcd.com/comics/automation.png


Anyway, with Exmex you can, e.g., evaluate the function $(x, y) \mapsto 2x^3-4/y$ through the Rust snippet

use exmex::prelude::*;
let expr = exmex::parse::<f64>("2*x^3-4/y")?;
expr.eval(&[0.5, 2.5])

at the point x=0.5 and y=2.5.

Moreover, Exmex users can define what a number and what an operator is. A number does not have to be a number. Therefore, we now call a number an operand. To solve the Advent of Code task of the 19th day, we define rule operators as operands. Rule operators or simply rules evaluate whether a rule is fulfilled. For the rules we need a type that implements the trait bounds of Exmex’ operands, namely Clone, FromStr, and Debug. We distinguish the following variants of rules:

  • Rules represent either the index of another rule,
  • one of the letters a or b ,
  • the composition of several rules as in 0: 4 1 5
  • or the union of several rules characterized by | as in 2: 1 3 | 3 1 .

According to the 4 cases, we define our type as an enumeration.

#[derive(Clone, Debug)]
enum RuleOp {
    Idx(usize),
    Char(char),
    Concatenate(Vec<Op>),
    Union(Vec<Op>),
}

To allow a rule to determine whether it is fulfilled by certain strings, we implement the method eval. The method looks for matches between rules and the beginning of strings in a list of input strings. For each match, a string consisting of the remaining part is returned.

impl RuleOp {
    pub fn eval<'a>(
        &self, 
        msgs: &Vec<&'a str>, 
        rules: &Vec<RuleOp>
    ) -> Vec<&'a str> { 
if msgs.len() == 0 { return msgs.clone(); } match self { RuleOp::Idx(idx) => rules[*idx].eval(msgs, rules), RuleOp::Char(c) => msgs .iter() .filter(|msg| msg.chars().nth(0) == Some(*c)) .map(|msg| &msg[1..]) // a match at the // beginning is cut .collect::<Vec<_>>(), RuleOp::Concatenate(ops) => { let mut res = msgs.clone(); for op in ops { res = op.eval(&res, rules); } res } RuleOp::Union(ops) => ops .iter() .flat_map(|op| op.eval(msgs, rules)) .collect::<Vec<_>>(), }
} }

To obey to Exmex’ interface, we simply derive Clone and Debug but have to implement FromStr. This enables us to convert literals such as a or 4 from the string into instances of the RuleOp struct.

impl FromStr for RuleOp {
    type Err = ParseIntError;

    fn from_str(s: &str) -> Result<Self, Self::Err> {
        Ok(match s.chars().nth(1) {
            Some(letter) 
                if letter == 'a' || letter == 
                'b' => RuleOp::Char(letter),
            _ => RuleOp::Idx(s.parse::<usize>()?),
        })
    }
}

Next, let’s look at the Exmex-operators that map rules to new rules. Exmex fortunately provides a macro to create an operator factory. We representby "|" the or-operator as above. The composition operator is represented by "o" . To this end, we convert the input strings accordingly before parsing. For instance, 1 3 | 3 1 becomes 1o3|3o1. From two input operators, the or-operator creates a new operator in the variant Union and the composition operator creates a new operator in the Concatenate variant.

ops_factory! (
    OpsOpsFactory,
    RuleOp,
    Operator::make_bin(
        "|",
        BinOp {
            apply: |op1, op2| { 
                RuleOp::Union(vec![ op1, op2]) 
            },            
            prio: 0,  // note that or has a lower 
                      // priority than composition
            is_commutative: true
        } 
), Operator::make_bin( "o", BinOp { apply: |op1, op2| { RuleOp::Concatenate(vec![ op1, op2]) }, prio: 1, is_commutative: false }
) );

Check Rules

We can define a regular expression that corresponds to the literals at the beginning of the string. Here, we use ^([0-9]+|\"a\"|\"b\") that corresponds to the numbers 0-9 and the letters "a" as well as "b"4In the code below, we use the macro literal_matcher_from_pattern to create a type OpsMatcher that matches our literals. This is way Exmex ensures that serialization works correctly, even though we do not need serialization here.. We can now transform the right-hand side of the colon for each row of the form 0: 4 1 5 to 4o1o5 and parse and evaluate it with Exmex. The result is an operator that checks whether a given string satisfies the rule. Accordingly, the following snippet parses a rule for each string.

const LITERAL_PATTERN: &str = "^([0-9]+|\"a\"|\"b\")";
literal_matcher_from_pattern!(OpsMatcher, LITERAL_PATTERN);
type FlatExOps = FlatEx::<RuleOp, OpsOpsFactory, OpsMatcher>;
let rules = rules_strs
.iter()
.map(|s| -> ExResult<_> {
let flatex = FlatExOps::from_str(s)?;
flatex.eval(&[])
})
.collect::<ExResult<Vec<_>>>()
.ok()?;

In the last step, we evaluate the rule 0 for all string-messages hidden in the following snippets and earn a golden star.

messages
    .iter()
    .filter(|msg| {
         // evaluate rule 0 for each string
         let res = rules[0].eval(&vec![ msg.as_str()], &rules);
         we consider an evaluation a match if all letters have 
         // been consumed and precisely the empty string is left 
         // in the result vector.
         res.iter().filter(|s| s.len() == 0).count() == 1
     })

The complete code is on Github.

Leave a comment

Your email address will not be published.