Treetop: Bringing the Elegance of Ruby to Syntactic Analysis

August 21, 2007 Pivotal Labs

(This is my proposal for this year’s RubyConf. Fingers crossed!)

Treetop is a parsing framework that brings the elegance and simplicity of Ruby to syntactic analysis. Rather than being just another copy of classic LALR/LR based generators like Lex and Yacc, Treetop blends the unique expressive power of Ruby with cutting edge parsing research. Its “packrat” implementation enables recognition of parsing expression grammars, which dispense with the need for lexical scanning and can take advantage Ruby’s mixin and inheritance model for composition.

I will begin with an overview of recursive descent parsing, explaining why the exponential time complexity of backtracking make naive implementations impractical. Packrat parsers solve this problem with memoization, exchanging heap space for linear time performance. In the process, they obviate the look-ahead techniques that have traditionally been used to reign in the time complexity of parsing.

This opens a new world for grammar definitions.

Without lookahead, a compositionality-destroying tokenization process is no longer necessary. Combined with the embrace of backtracking, this results in a grammar definition language that is more intuitive and free of the hidden obstacles that plague more restrictive conceptual frameworks. In the literature, these definitions are called parsing expression grammars. They are free of ambiguity and explicit operator precedence annotations. They can also be freely composed.

Imagine taking the Ruby grammar and the SQL grammar and combining them to yield a grammar for Ruby that recognizes the syntax of SQL statements embedded in strings. A token-free, compositional approach is the key to this sort of reusability.

The rest of the talk explores the manifestation of these expressive advantages in the Treetop framework. Treetop’s grammar definition language is a superset of Ruby, conservatively extending its syntax with two keywords (grammar and rule), along with inline parsing expressions. It blends parsing expression constructs with Ruby method declarations, which are automatically installed on nodes of the syntax tree produced during the parse. I will introduce these fundamental concepts by example, creating a toy parser that recognizes and evaluates arithmetic expressions. Here is a shortened sample:

module RubyConf
  grammar Arithmetic
    rule additive
      number '+' additive {
        def value
          number.value + additive.value

    rule number
      [1-9][0-9]* {
        def value

I will then discuss the system’s implementation. Important issues include:

  • The metagrammar and the system’s metacircular bootstrapping process
  • The code generator and the mapping of parsing expressions to Ruby code
  • Riding on Ruby’s semantics for grammar composition, including the use of mixins and the super keyword.

Time permitting, I will explore more advanced usage by writing a simple interpreter for the untyped lambda calculus atop the framework, which I will extend with facilities for arithmetic by mixing in the grammar from the first example.

About the Author


Ten Things I Hate about Proxy Objects, Part I
Ten Things I Hate about Proxy Objects, Part I

Has Many Through Has Many Through Has Many Through ... In which the author relates several things he hates...

How I Learned to Stop Hating and Love Action Mailer
How I Learned to Stop Hating and Love Action Mailer

My biggest gripe with ActionMailer is how difficult it is to generate URL's. It's common enough when sendin...


Subscribe to our Newsletter

Thank you!
Error - something went wrong!