npm install jison

Tinkering with new projects should be easy. You should be able to click a link or invoke a command and BAM, there goes your Sunday afternoon. I’m not sure how many people build parsers on the weekend, but hey, Jison is easy to install now anyway.

If you’re using Node.js you should already have the excellent npm installed. If not, that’s easy:

curl -L http://github.com/isaacs/npm/tarball/master | tar xz --strip 1
cd npm
sudo make

Now install jison:

sudo npm install jison

Crazy easy. Now go read the docs for challenges.

Lambda Calculus Evaluator

After writing Jison, I began more experimentation with other facets of programming language design, namely interpreters. I was first exposed to their workings two years ago in a programming languages course at my university, where we wrote an interpreter for a minimal functional language using SML. As an experiment, I used the same technique to evaluate lambda calculus expressions.

Well, that was fun and all, but since I’m doing most of my language experimentation in JavaScript these days, I decided to port the lambda calculus evaluator to JavaScript (and put up a web interface.) It uses call-by-value evaluation semantics, as did the SML original. I used Jison for parsing, naturally.

Sure, it’s Turing Complete, but I doubt you would want to (or could) code your next blogging engine using it.

Anyway, have fun!

Slides and code from “Build Your Own Programming Language with JavaScript”

I gave a Track B presentation this year at JSConf titled “Build Your Own Programming Language with JavaScript”. The conference was incredible, but I’ll post about that later, perhaps. I’ve pushed the code and slides from the talk to github for people to play with.

CoffeeScript compiler now uses Jison

The new CoffeeScript self compiler uses Jison for generating the parser. The whole build script runs on Node.

Jeremy, the creator of CoffeeScript, spurred me to make some quick performance improvements (and bug fixes) to Jison so that the CoffeeScript grammar could build in a timely fashion. And while there are still many improvements to be made, handling such a large and complex grammar is quite a feat!

Jeremy also reported that parsing times are quick, though that’s more of a testament to v8, I’m sure.

Jison, now with more Bison flavor.

I pushed a new version of Jison (0.1.5) the other day that adds some interesting features, namely the ability to use Bison-like grammar definitions and Flex-like lexer definitions.

Bison

Bison flavored grammars

The following is a valid Bison grammar:

%token NUMBER

%%

E
    : E '+' T
    | T
    ;

T
    : NUMBER
    ;

Now it is also a valid Jison grammar. You can generate a parser using it as you would with the JSON format:

narwhal bin/jison examples/calculator.jison

This will generate just the parser, so you won’t be able to parse anything with it just yet (we’ll need a lexer too, which I detail later.) And as you can see, the file format ends with a .jison extension to distinguish it, as it is not fully compatible with Bison.

Jison will ignore %token declarations, as seen in the above grammar. It only recognizes %left, %right, %nonassoc, %start, and %prec declarations, which carry the same semantics as with Bison.

Semantic Actions

Semantic actions should look familiar:

/* description: Parses end executes mathematical expressions. */

%left '+' '-'
%left '*' '/'
%left '^'
%left UMINUS

%%

S
    : e EOF
        {print($1); return $1;}
    ;

e
    : e '+' e
        {$$ = $1+$3;}
    | e '-' e
        {$$ = $1-$3;}
    | e '*' e
        {$$ = $1*$3;}
    | e '/' e
        {$$ = $1/$3;}
    | e '^' e
        {$$ = Math.pow($1, $3);}
    | '-' e
        {$$ = -$2;} %prec UMINUS
    | '(' e ')'
        {$$ = $2;}
    | NUMBER
        {$$ = Number(yytext);}
    | E
        {$$ = Math.E;}
    | PI
        {$$ = Math.PI;}
    ;

This is valid for both Bison and Jison. The difference is that Jison does not attempt to match braces inside of actions like Bison does, so you’ll have to wrap your action in a different set of braces if they contain a closing brace, e.g.:

nonterminal
    : TOKEN
        {{$$ = '}';}}

In this case we wrapped double braces around the action so it knows the single brace is not the end of the action.

The nicest thing about this format over JSON is that porting Bison grammars is much simpler, though you’ll still have to remove all of the C code and duplicate the logic in a separate JavaScript module. Jison also doesn’t handle semantic value data types (not really necessary with a dynamically typed language like JavaScript,) mid-rule actions and error recovery, yet, among other things. Feel free to post an issue requesting your favorite missing feature.

A bit of an aside, but I found the process of bootstrapping this really interesting. You end up with a grammar that can parse itself.

Named Semantic Values

I decided to add some sugar for accessing semantic values within actions by name instead of position. Normally, you’d have to use the position of the corresponding nonterminal in the production, prefixed by a dollar sign $, e.g.:

 exp:    ...
         | '(' exp ')'
             { $$ = $2; }

Now, you can also access the value by using the name of the nonterminal instead of its position, e.g.:

 exp:    ...
         | '(' exp ')'
             { $$ = $exp; }

If the rule is ambiguous (the nonterminal appears more than once,) append a number to the end of the nonterminal name to disambiguate the desired value:

 exp:    ...
         | exp '+' exp
             { $$ = $exp1 + $exp2; }

Association by name leads to a looser coupling (and is easier to grok.)

Defining Lexers

Note that the new grammar format does not allow embedded lexer definitions. This is because there’s a new, separate format for defining lexers, which also serves to keep things more modular.

The format has a more restricted form of regular expression syntax, but no less expressive. The main difference is that strings that should match exactly must be wrapped in quotes. Here’s an example of the format:

%%
\s+                   {/* skip whitespace */}
[0-9]+("."[0-9]+)?\b  {return 'NUMBER';}
"*"                   {return '*';}
"/"                   {return '/';}
"-"                   {return '-';}
"+"                   {return '+';}
"^"                   {return '^';}
"("                   {return '(';}
")"                   {return ')';}
"PI"                  {return 'PI';}
"E"                   {return 'E';}
<<EOF>>               {return 'EOF';}

You can also use macros:

D                     [0-9]

%%
\s+                   {/* skip whitespace */}
{D}+("."{D}+)?\b      {return 'NUMBER';}
"*"                   {return '*';}
"/"                   {return '/';}
"-"                   {return '-';}
"+"                   {return '+';}
"^"                   {return '^';}
"("                   {return '(';}
")"                   {return ')';}
"PI"                  {return 'PI';}
"E"                   {return 'E';}
<<EOF>>               {return 'EOF';}

The format ultimately compiles to JavaScript regular expressions, so use the appropriate escape sequences and such. The file extension for this format is .jisonlex.

JSON-to-Jison

The update includes a utility for converting grammars from the old JSON format to the new Jison format. Just run your JSON file through json2jison to perform the conversion, e.g.:

narwhal bin/json2jison examples/basic.json

This will create a file called basic.jison in the current directory, which should be ready to run through Jison as-is.

If the JSON grammar has an embedded lex specification, json2jison will create a .jisonlex file also, though it will likely need correction (the conversion is lossy, to say the least.) E.g, running:

narwhal bin/json2jison examples/calculator.json

will create calculator.jison and calculator.jisonlex. Compare the generated calculator.jisonlex to the correct one found in examples/calculator.jisonlex to see the difference. Once the syntax of the lex specification is corrected, you can generate a complete parser like so:

narwhal bin/jison calculator.jison calculator.jisonlex

passing both the grammar and lexer specs.

Node Compatible

I should note that Jison is now compatible with node.js when included using require from a module. The examples from the README should work on any CommonJS system that implements the Module specification.

Tim Caswell used Jison as part of his CoffeeScript compiler that runs on Node.

In the future

There’s tons more to implement from Bison, and the grammar should also become more compatible over time, in terms of syntax and features that are supported by Jison.

I also have many topics to cover on using Jison, which I hope to write about more.

Until then, stay parsey, my friends.