The second part of my Scala compiler construction tutorial has been a long time coming. This post is, unfortunately, not the second part of the article — although that is coming soon. Honest.
Since part 1 was published, Scala 2.7 has been released which — among other things — introduced changes to the parser combinator library. Changes that meant the source code from part 1 will not compile in Scala 2.7. Sorry about that. Updated, working code can be found at the end of this post.
So what exactly has changed? Well, let’s see …
keyword() no longer discards its result
A bit of a refresher first:
In both Scala 2.6 and 2.7, the keyword implicit is called whenever you use a string in your “grammar rules”. For example:
def sum = expr ~ " " ~ expr
Is the equivalent of:
def sum = expr.~(keyword(" ").expr)
In Scala 2.6′s parser combinator library, the result of the keyword() call was actually the UnitParser — that is, a parser that would discard its result. At the time, that meant we could use the tilde operator (“~”) to create a sequenced parser and everything was fine:
def sum = expr ~ " " ~ expr ^^ ((left : Expr, right : Expr) => Sum(left, right))
In 2.7, the keyword() implicit returns a Parser[String] rather than a UnitParser. This means we have to either deal with the newly introduced tokens as follows:
def sum = expr ~ " " ~ expr ^^ ((left : Expr, op : String, right : Expr) => Sum(left, right))
Or alternatively …
Use to indicate the important parts of a parse rule
Let’s start with something simple based on the 2.6 combinators:
def bracketExpr = "(" ~ expr ~ ")" ^^ ((e : Expr) => e)
Again, in 2.7 we know that keyword() is no longer a UnitParser, so we have to deal with it like so:
def bracketExpr = "(" ~ expr ~ ")" ^^ ((l : String, e : Expr, r : String) => e)
Alright, this compiles and does what we expect. But why should we keep those strings around if we don’t need them? They sure do clutter up the code a whole bunch.
can be used to include or discard the result of a given parser. yields a parser that takes the result of both parsers and discards the result on the left.
So, using these two operators we can rewrite bracketExpr as follows:
def bracketExpr = "(" ~> expr e)
Ah. Much better. :)
The ^^^ operator
I had to go digging in the Scala source code to work out what exactly this one does.
First, let’s take a look at some 2.6 code:
def simpleExpr = term * ( " " ^^ ((x : Expr, y : Expr) => Add(x, y)) | "-" ^^ ((x : Expr, y : Expr) => Sub(x, y)) )
In 2.6, this can parse zero or more repetitions of “term” interleaved by “ ” and “-” (check out part 1 if you need a refresher on how the * combinator works). In 2.7 it’s a compile-time error because of the fact keyword() is no longer a UnitParser (are you seeing a pattern here? :) ), and ^^ is trying to pass the resulting String on to the anonymous method in each case.
If we use the ^^^ operator here, we can effectively discard the result of the keyword parse, and build a parser that uses a simple anonymous method to parse the current pair of terms:
def simpleExpr = term * ( " " ^^^ ((x : Expr, y : Expr) => Add(x, y)) | "-" ^^^ ((x : Expr, y : Expr) => Sub(x, y)) )
Exactly what we’re after. This compiles and behaves as expected.
There may be more changes to the parser combinator library which I haven’t covered here, but I’m not going to go looking for any more changes since the updated code seems to work just fine. This should be enough to at least understand the updated code for the compiler described in part 1 without needing to deal with any cryptic compiler errors.
Finally, the new code!
Special thanks to Harshad for sending through working code for 2.7 ages ago which I never got around to posting here. This code, along with the Scala API docs, was used to figure out just what had changed since 2.6. The code below is derived from some code he sent to me a few months back.
I’m really sorry this has been so long in the making. I’ll try to get around to writing the “real” part two of this article. In the meantime, here’s the updated code. Thanks! Please post or email any comments or questions.