shake, with fries

by Zach Carter

Posts tagged programming

Jan29

Jison, now with more Bison flavor.

javascript programming jison bison language parser lexer | comments

I pushed a new version of Jison (0.1.5) the other day that adds some interesting features, namely the ability to use Bison-like grammar definitions and Flex-like lexer definitions.

Bison

Bison flavored grammars

The following is a valid Bison grammar:

%token NUMBER

%%

E
    : E '+' T
    | T
    ;

T
    : NUMBER
    ;

Now it is also a valid Jison grammar. You can generate a parser using it as you would with the JSON format:

narwhal bin/jison examples/calculator.jison

This will generate just the parser, so you won't be able to parse anything with it just yet (we'll need a lexer too, which I detail later.) And as you can see, the file format ends with a .jison extension to distinguish it, as it is not fully compatible with Bison.

Jison will ignore %token declarations, as seen in the above grammar. It only recognizes %left, %right, %nonassoc, %start, and %prec declarations, which carry the same semantics as with Bison.

Semantic Actions

Semantic actions should look familiar:

/* description: Parses end executes mathematical expressions. */

%left '+' '-'
%left '*' '/'
%left '^'
%left UMINUS

%%

S
    : e EOF
        {print($1); return $1;}
    ;

e
    : e '+' e
        {$$ = $1+$3;}
    | e '-' e
        {$$ = $1-$3;}
    | e '*' e
        {$$ = $1*$3;}
    | e '/' e
        {$$ = $1/$3;}
    | e '^' e
        {$$ = Math.pow($1, $3);}
    | '-' e
        {$$ = -$2;} %prec UMINUS
    | '(' e ')'
        {$$ = $2;}
    | NUMBER
        {$$ = Number(yytext);}
    | E
        {$$ = Math.E;}
    | PI
        {$$ = Math.PI;}
    ;

This is valid for both Bison and Jison. The difference is that Jison does not attempt to match braces inside of actions like Bison does, so you'll have to wrap your action in a different set of braces if they contain a closing brace, e.g.:

nonterminal
    : TOKEN
        {{$$ = '}';}}

In this case we wrapped double braces around the action so it knows the single brace is not the end of the action.

The nicest thing about this format over JSON is that porting Bison grammars is much simpler, though you'll still have to remove all of the C code and duplicate the logic in a separate JavaScript module. Jison also doesn't handle semantic value data types (not really necessary with a dynamically typed language like JavaScript,) mid-rule actions and error recovery, yet, among other things. Feel free to post an issue requesting your favorite missing feature.

A bit of an aside, but I found the process of bootstrapping this really interesting. You end up with a grammar that can parse itself.

Named Semantic Values

I decided to add some sugar for accessing semantic values within actions by name instead of position. Normally, you'd have to use the position of the corresponding nonterminal in the production, prefixed by a dollar sign $, e.g.:

 exp:    ...
         | '(' exp ')'
             { $$ = $2; }

Now, you can also access the value by using the name of the nonterminal instead of its position, e.g.:

 exp:    ...
         | '(' exp ')'
             { $$ = $exp; }

If the rule is ambiguous (the nonterminal appears more than once,) append a number to the end of the nonterminal name to disambiguate the desired value:

 exp:    ...
         | exp '+' exp
             { $$ = $exp1 + $exp2; }

Association by name leads to a looser coupling (and is easier to grok.)

Defining Lexers

Note that the new grammar format does not allow embedded lexer definitions. This is because there's a new, separate format for defining lexers, which also serves to keep things more modular.

The format has a more restricted form of regular expression syntax, but no less expressive. The main difference is that strings that should match exactly must be wrapped in quotes. Here's an example of the format:

%%
\s+                   {/* skip whitespace */}
[0-9]+("."[0-9]+)?\b  {return 'NUMBER';}
"*"                   {return '*';}
"/"                   {return '/';}
"-"                   {return '-';}
"+"                   {return '+';}
"^"                   {return '^';}
"("                   {return '(';}
")"                   {return ')';}
"PI"                  {return 'PI';}
"E"                   {return 'E';}
<<EOF>>               {return 'EOF';}

You can also use macros:

D                     [0-9]

%%
\s+                   {/* skip whitespace */}
{D}+("."{D}+)?\b      {return 'NUMBER';}
"*"                   {return '*';}
"/"                   {return '/';}
"-"                   {return '-';}
"+"                   {return '+';}
"^"                   {return '^';}
"("                   {return '(';}
")"                   {return ')';}
"PI"                  {return 'PI';}
"E"                   {return 'E';}
<<EOF>>               {return 'EOF';}

The format ultimately compiles to JavaScript regular expressions, so use the appropriate escape sequences and such. The file extension for this format is .jisonlex.

JSON-to-Jison

The update includes a utility for converting grammars from the old JSON format to the new Jison format. Just run your JSON file through json2jison to perform the conversion, e.g.:

narwhal bin/json2jison examples/basic.json

This will create a file called basic.jison in the current directory, which should be ready to run through Jison as-is.

If the JSON grammar has an embedded lex specification, json2jison will create a .jisonlex file also, though it will likely need correction (the conversion is lossy, to say the least.) E.g, running:

narwhal bin/json2jison examples/calculator.json

will create calculator.jison and calculator.jisonlex. Compare the generated calculator.jisonlex to the correct one found in examples/calculator.jisonlex to see the difference. Once the syntax of the lex specification is corrected, you can generate a complete parser like so:

narwhal bin/jison calculator.jison calculator.jisonlex

passing both the grammar and lexer specs.

Node Compatible

I should note that Jison is now compatible with node.js when included using require from a module. The examples from the README should work on any CommonJS system that implements the Module specification.

Tim Caswell used Jison as part of his CoffeeScript compiler that runs on Node.

In the future

There's tons more to implement from Bison, and the grammar should also become more compatible over time, in terms of syntax and features that are supported by Jison.

I also have many topics to cover on using Jison, which I hope to write about more.

Until then, stay parsey, my friends.

Continue reading »

Dec28

Jison: An API for creating parsers in JavaScript

javascript programming jison bison language parser lexer | comments

Bottom-up parsing

When you hear of someone writing a parser, it is usually of the top-down, recursive descent variety. Visualizing the source as an abstract syntax tree (AST), this means the parser tries to match the root, and by doing so, may recursively try to match branches and stems of the AST as it works its way down.

Bottom-up parsing is different:

Bottom-up parsing (also known as shift-reduce parsing) is a strategy for analyzing unknown data relationships that attempts to identify the most fundamental units first, and then to infer higher-order structures from them. It attempts to build trees upward toward the start symbol. It occurs in the analysis of both natural languages and computer languages. - Wikipedia

Because most bottom-up parsers require a parse table (a complex matrix of instructions for the parser PDA) they are usually not handwritten. Instead, a parser generator such as Yacc, Bison, ocamlyacc, etc are used to generate the parser in the target language. For Bison and Yacc, the language is C, for ocamlyacc, the language is, well, ocaml.

Jison

Jison generates bottom-up parsers in JavaScript. Its API is similar to Bison's, hence the name. It supports many of Bison's major features, plus some of its own. If you are new to parser generators such as Bison, and Context-free Grammars in general, a good introduction is found in the Bison manual. If you already know Bison, Jison should be easy to pickup.

A brief warning before proceeding: the API is ridiculously unstable right now. The goal is to mirror Bison where it makes sense, but we're not even there yet. Also, optimization has not been a main focus as of yet.

Briefly, Jison takes a JSON encoded grammar specification and outputs a JavaScript file capable of parsing the language described by that grammar specification. You can then use the generated script to parse inputs and accept, reject, or perform actions based on the input.

Installation

Prerequisite: To run Jison from the command line, you'll need to have Narwhal installed and available from your PATH.

Clone the github repository:

git clone git://github.com/zaach/jison.git

Using Jison from the command line

Now you're ready to generate some parsers:

cd jison
narwhal bin/jison examples/calculator.json

This will generate a calculator.js file in your current working directory. This file can be used to parse an input file, like so:

echo "2^32 / 1024" > testcalc
narwhal calculator.js testcalc

This will print out 4194304.

Using Jison from a CommonJS module

You can generate parsers programatically from JavaScript as well. Assuming Jison is in your commonjs environment's load path:

// mygenerator.js
var Parser = require("jison").Parser;

var grammar = {
    "lex": {
        "rules": [
           ["\\s+", "/* skip whitespace */"],
           ["[a-f0-9]+", "return 'HEX';"]
        ]
    },

    "bnf": {
        "hex_strings" :[ "hex_strings HEX",
                         "HEX" ]
    }
};

var parser = new Parser(grammar);

// generate source, ready to be written to disk
var parserSource = parser.generate();

// you can also use the parser directly from memory

// returns true
parser.parse("adfe34bc e82a");

// throws lexical error
parser.parse("adfe34bc zxg");

Using the generated parser

So, you have generated your parser through the command line or JavaScript API and have saved it to disk. Now it can be put to use.

As demonstrated before, the parser can be used from the command line:

narwhal calculator.js testcalc

Though, more ideally, the parser will be a dependency of another module. You can require it from another module like so:

// mymodule.js
var parser = require("./calculator").parser;

function exec (input) {
    return parser.parse(input);
}

var twenty = exec("4 * 5");

Or, more succinctly:

// mymodule.js
function exec (input) {
    return require("./calculator").parse(input);
}

var twenty = exec("4 * 5");

Specifying a language

The process of parsing a language involves two phases: lexical analysis (tokenizing) and parsing, which the Lex/Yacc and Flex/Bison combinations are famous for. Both lexing and parsing information can be included in the Jison grammar specification for a language.

For example, here is calculator.json:

{
    "comment": "Parses end executes mathematical expressions.",

    "lex": {
        "rules": [
           ["\\s+",                    "/* skip whitespace */"],
           ["[0-9]+(?:\\.[0-9]+)?\\b", "return 'NUMBER';"],
           ["\\*",                     "return '*';"],
           ["\\/",                     "return '/';"],
           ["-",                       "return '-';"],
           ["\\+",                     "return '+';"],
           ["\\^",                     "return '^';"],
           ["\\(",                     "return '(';"],
           ["\\)",                     "return ')';"],
           ["PI\\b",                   "return 'PI';"],
           ["E\\b",                    "return 'E';"],
           ["$",                       "return 'EOF';"]
        ]
    },

    "operators": [
        ["left", "+", "-"],
        ["left", "*", "/"],
        ["left", "^"],
        ["left", "UMINUS"]
    ],

    "bnf": {
        "S" :[[ "e EOF",   "print($1); return $1;"  ]],

        "e" :[[ "e + e",   "$$ = $1+$3;" ],
              [ "e - e",   "$$ = $1-$3;" ],
              [ "e * e",   "$$ = $1*$3;" ],
              [ "e / e",   "$$ = $1/$3;" ],
              [ "e ^ e",   "$$ = Math.pow($1, $3);" ],
              [ "- e",     "$$ = -$2;", {"prec": "UMINUS"} ],
              [ "( e )",   "$$ = $2;" ],
              [ "NUMBER",  "$$ = Number(yytext);" ],
              [ "E",       "$$ = Math.E;" ],
              [ "PI",      "$$ = Math.PI;" ]]
    }
}

This is all that is needed for Jison to generate a parser capable of recognizing the language and executing the semantic actions.

More examples can be found in the examples/ directory.

JSON isn't so bad, but a more author friendly format is in the works. It will be similar to Bison's .y files.

Sharing scope

In Bison, code is expected to be lexically defined within the scope of the semantic actions. E.g., chunks of code may be included in the generated parser source, which are available from semantic actions.

Jison is more modular. Instead of pulling code into the generated module, the generated module is expected to be required and used by other modules. This means that if you want to expose functionality to the semantic actions, you can't rely on it being available through lexical scoping. Instead, the parser has a yy property which is exposed to actions as the yy free variable. Any functionality attached to this property is available in both lexical and semantic actions through the yy free variable.

An example from orderly.js:

var parser = require("./orderly/parse").parser;

// set parser's shared scope
parser.yy = require("./orderly/scope");

// returns the JSON object
var parse = exports.parse = function (input) {
    return parser.parse(input);
};
...

The scope module contains logic for building data structures, which is used within the semantic actions.

Lexical Analysis

Jison includes a rather rudimentary lexer, though any module that supports the basic lexer API could be used in its place. Jison's lexer uses the lex key of the JSON grammar spec, where the rules for matching a token are defined along with the action to execute on a match. Usually, the action will return the token which is used by the Jison parser. A custom lexer could be used instead with it's own methods of tokenizing.

TODO: More on this.

Parsing algorithms

Like Bison, Jison can recognize languages described by LALR(1) grammars, though it also has modes for LR(0), SLR(1), and LR(1). It also has a special mode for generating LL(1) parse tables (requested by my professor,) and could be extended to generate a recursive descent parser for LL(k) languages in the future. But, for now, Jison is geared toward bottom-up parsing.

*LR(1) mode is currently not practical for use with anything other than toy grammars, but that is entirely a consequence of the algorithm used, and may change in the future.

Real world example

I wrote a parser for Orderly using Jison. The benefits I found were:

  • If modeled after the normative language grammar, it is guaranteed to recognize the correct language.
  • Adding new syntax is straight forward.
  • It was much faster to develop than if I were to attempt implementing a (top-down) parser from scratch. But for others not used to grammar specifications, this might not be the case.
  • Improvements to Jison mean you only have to regenerate your parsing module to see the benefits.

Pie-in-the-sky ideas for the future

  • Practical, minimal LR(1) parsing
  • GLR parsing, which can recognize ambiguous grammars and natural languages
Continue reading »

Nov23

Random Notes on Computability

compsci programming automata thinkingoutloud | comments

I felt like venting a bit. Let's take a brief stroll through computability land.

The difference is computability.

Computability is important to a programmer. It's fundamental to the act, even. Let's just say it's basically everything. Why waste cycles trying to compute the uncomputable! Or better yet, why introduce bugs trying to compute a problem using an insufficient computation model!

Computation models

You already know about these, even if you don't realize it. There are three we encounter frequently:

  1. DFAs
  2. PDAs
  3. Turing Machines

Deterministic Finite Automata

DFAs can be specified using regular expressions. Yes, a regular expression is just a convenient DSL for specifying a state machine. The regex engine in your favorite programming language adds on more power than a vanilla regular expression, but there are still some fundamental limitations to consider.

For instance, whenever you have to count, dynamically, you'll need something more powerful. You'll need some memory to go along with that state.

Say, you want to match all strings with the same number of leading 0s as trailing 1s. Informally specified as:

0n1n

Remember, n is dynamic, it depends on the string. Now, think of a simple regular expression to match that. No, DFAs are not powerful enough to recognize this pattern.

It also can't recognize HTML or XML, for this same reason. You can get away with a limited subset of HTML or XML, but not the full language.

For that, you need at least a PDA.

Push-Down Automata

A PDA is like a DFA, but it can track where it's been using a stack. The stack, along with the string input, is used to determine the next transition. A stack machine!

PDAs can be specified using a Context-free Grammer.

Okay, maybe you don't encounter these frequently. That's because a more powerful model is usually available.

Turing Machines

If you're using a programming language, one that is Turing complete, well, you're building Turing machines. A Turing Machine is an algorithm. A program is just a big (or little) algorithm. This is the most sophisticated level of computation you'll encounter.

Turing machines can be specified using, well, Turing-complete programming languages.

Compilers

The stages of a compiler require progressively higher levels of computation:

  • Lexer: DFA
  • Parser: PDA
  • Type Checker, Code Gen: Turing Machine

Of course, you'd only actually need a TM to implement all stages, but as good software engineering principles hold, less is more.

I'll work on writing more coherent blog posts in the future. Not saying that's my new years resolution though.

Jun21

Stop Using arguments.callee

programming javascript | comments

Please, it's no longer needed. It's also deprecated and uncool. I still see authors use it to recursively call an anonymous function when really, they should be using a named function.

E.g. instead of this (from MDC):

function makeFactorialFunc() {
   alert('making a factorial function!');
   return function(x) {
      if (x <= 1)
         return 1;
      return x * arguments.callee(x - 1);
   };
}

var result = makeFactorialFunc()(5); // returns 120 (5 * 4 * 3 * 2 * 1)

do this:

function makeFactorialFunc() {
   alert('making a factorial function!');
   return function innerFactorialFunc(x) {
      if (x <= 1)
         return 1;
      return x * innerFactorialFunc(x - 1);
   };
}

var result = makeFactorialFunc()(5); // returns 120 (5 * 4 * 3 * 2 * 1)

We give the anonymous function a name (innerFactorialFunc in this case) so it can be referenced in its function body. Makes sense.

arguments.callee is a remnant of a time when named functions couldn't be used this way. Please, let it die.

Edit: take heed, however, JScript has a number of bugs with named functions

May31

A Naive, JScript Compatible Getter Pattern

javascript programming jscript | comments

Modern JavaScript implementations let you define getters and setters for properties of objects. The new syntax involved will, of course, make your code incompatible with less capable JavaScript VMs if you chose to use it, but the behavior of the rest of your code will also be different, even when syntactically identical. This little experiment of mine was aimed to address the latter issue (well, at least for getters.)

The following is a naive implementation of getters that would be somewhat compatible with language supported getters in common usage (setters are just not possible!) Perhaps this technique could prove useful in some contexts but the reliance on type coercion makes it dubious...

Take, for example, the behavior of a rectangle:

var rect = new Rectangle(4,5);

rect.area; // returns 20

rect.width = 6;

rect.area; // returns 30

The area property updates dynamically, by way of a getter. With proper getters, you could implement the Rectangle class like so:

function Rectangle(width, height){
  this.width = width;
  this.height = height;

  this.__defineGetter__('area', function (){
    return this.width * this.height;
  });
}

Now let's see the naive, JScript compatible approach:

function Rectangle(width, height){
  this.width = width;
  this.height = height;

  var that = this;

  var area = function(){};
  area.valueOf = function(){
    return that.width * that.height;
  };

  this.area = area;
}

The key idea here is that area is a complex value that will be coerced into a primitive by most operations, calling its valueOf method. Let's see how well it works:

var rect = new Rectangle(4,5);

assert(typeof rect.area === 'function'); // it's really a function
assert(typeof +rect.area === 'number');  // but can be coerced into a number

assert(rect.area == 20);                 // type coercion
assert(rect.area !== 20);                // === and !== don't coerce type
assert(rect.area + 5 == 25);             // operator type coercion
assert(rect.area + 5 === 25);            // coercion happens before equality comparison

rect.width = 6;
assert(rect.area == 30);                 // area updates as expected

This is definitely a stretch of the intentions of the language, and as I mentioned, very dubious. But, if you could get away with it you would be able to utilize getter behavior on lesser JavaScript VMs without changing your API. Neat.

Edit: use valueOf, not toString. DUH

May21

A Matter of Philosophy (and necessity)

javascript programming oop dom jquery mootools | comments

The article comparing Mootools and jQuery is great -- go ahead and take a gander. The author definitely did his research. In fact, there are only a few nitpicks I had with it.

The author demonstrates well his awareness of the different philosophies of each project. The aim of jQuery is to provide a succinct API for dealing with browser shortcomings and inconsistencies, whereas Mootools aims to be a framework for building web applications. This means Mootools includes things such as OOP patterns, and jQuery does not (and I don't think that will change anytime soon.)

I happen to be a fan of small, modular pieces of functionality that can easily be plugged in when necessary, which is why the jQuery philosophy resonates more with me.*

Intermediate to Advanced Programmers

When the author mentions that intermediate to advanced JavaScript programmers would prefer Mootools, I believe he is speaking in the general sense, for there may also be cases where intermediate to advanced JavaScript programmers have their own methodologies to handle program structure -- whether with OOP or something else. In such cases, jQuery might be more appealing precisely because it lacks the extras Mootools provides.

The author understands this point, no doubt, but I missed a more direct articulation of it, as someone in the situation described above.

I can also see how jQuery would appeal to designers and beginners who haven't delt with, or had a need to deal with the complexities involved in larger application development. jQuery, with its succinct API, draws them in.

JavaScript Style

The author reiterates from time to time that Mootools has a more JavaScript-like style than jQuery. Well, that depends. JavaScript is a multi-paradigm language and depending on which paradigm you lean toward, that statement may prove true for you or it may not. If your style tends to be functional, you might find jQuery fits more snuggly alongside your braces and semicolons. I've gone through many stages as a JavaScript programmer, employing different styles, and I can't say that one style or the other is more JavaScript like. Both prototypal or functional styles feel comfortable to me, though functional style JavaScript is probably less understood or used in general.

The Great Debate - Monkey patching

And then there is the great debate over extending global objects. This can be a deal breaker for some.

Mootools extends global objects to achieve its power and elegance. jQuery uses monadic like wrappers to achieve its power and elegance.

My personal view is that frameworks and third party modules should do as little modification to the global namespace as possible, so as not to conflict with the main application or other modules. If the application developer wants to modify the global namespace, that is their business, but third party code should play nice with others.

Extending global namespaces certainly feels good, but is it more powerful? Not necessarily; you could just as easily pass an object to a function rather than attach the function as a method to the object. But it certainly doesn't look nearly as nifty.

If only there were a way to encapsulate changes to global objects to only within a certain scope...

That said, Mootools is intentionally exposing these language extensions as part of the framework it provides. If you choose Mootools then you don't mind those extensions, or would have done similar extensions your self. As long as you are mindful of the potential for conflict, trouble should be avoidable.

Okay, I think I've gone on long enough. Great article, Aaron.

*And yet, I like Rails. People are weird...

May15

RE: Null and Undefined as Objects in JavaScript

javascript programming | comments

I regret not making everything an object, FWIW. "Make it look like Java" + lazy/fast machine types for numbers + zero time to implement in May 1995.
- Brendan Eich (creator of JavaScript)

Wouldn't that have been something.

Mar17

Sweet, sweet Reia

programming ruby reia erlang | comments

Reia is sweet. At first glance, it looks like Ruby with parenthesis and pattern matching. Pattern matching! From my brief stints with SML (and Erlang,) I've longed to take pattern matching back with me to every language I use.

At second glance -- well, I'm still in the midst of taking a second glance, actually. But, here, some encouraging words:

Reia is its own language and is not a fork of Ruby. The syntax is heavily inspired by Ruby, however past the cosmetic level things get very, very different.

Reia runs on the Erlang VM and attempts to use the object system itself as the concurrency mechanism. All objects in Reia execute concurrently and synchronize with messaging.
- Tony Arcieri (Reia developer)

Concurrent objects? Huzzah! Jörg W Mittag, please tell us more:

Talk about coming full circle ...

In Smalltalk-71, Alan Kay experimented with having each object running in its own process and communicating with asynchronous message sends. There is an older quote from Alan Kay, where he says that one thing he very much regrets, is not emphasizing the message sending aspect of object-orientation enough. And just recently, he said that not running objects in parallel, was one of the biggest mistakes of Smalltalk.

...

Smalltalk-71 was what inspired Carl Hewitt to invent the Actor Model. ... Now, Ruby, a very faithful implementation of the Smalltalk object model, and Erlang, a very faithful implementation of the Actor Model, combine and produce Reia, an object-oriented parallel actor language, in which the concepts of process == actor == object have been unified, just like in Smalltalk-71.

Interesting.

I already had Erlang on my list of languages to learn, as the functional style woos (if you're into that sort of thing). Still, Reia looks comfortable and fun. I'll have to finish my current glance and perhaps take a few more at it...

Feb25

Don't Push Your OOP On Me

javascript programming oop | comments

Implementations of classical inheritance in JavaScript always leave a bad taste on my tongue. John Resig articulates why:

I'm not convinced by this argument. "Because it's what we're familiar with" does not imply that it's the best solution - or even a good solution - neither does "everyone else is doing it." I've been doing JavaScript development for a while now and I'm becoming less-and-less convinced that the traditional MVC/Classical inheritance styles of development (carried over from Java or Ruby, for example) do not translate to JavaScript/DOM well and can, in fact, be improved upon in some dramatic ways.

That said, I was working on a way to achieve multiple inheritance in JavaScript via mixins - but in a traditional prototypal manner! It's cool man, really!

Feb22

An Object That's Not

javascript programming | comments

In JavaScript, objects just love their prototypes. And why wouldn't they? Whenever they are in need of a property, hey, they can always check up the inheritance chain and try to find it. It's good to know someone always has your back!

But, what if no one had your back?

$ js
js> var obj = {};
js> obj.toString();
[object Object]
js> obj.__proto__ = null;
null
js> obj.toString();
typein:4: TypeError: obj.toString is not a function
js>

hwaaa!

js> obj+obj
typein:5: TypeError: can't convert obj to primitive type
js> typeof obj.__proto__
undefined
js>

Poor obj has no prototype to inherit from anymore - we set it to null. It knows nothing of the built in wonders inherited from Object.prototype. No toString, no hasOwnProprty, no nothing.

There's even a way to wipe out Object.prototype for everyone (I'll leave this as an exercise for the reader.)

Direct access to the inherited prototype can sure do some wacky things, indeed. I'll post on the more useful functionalities I've found in a bit.

Jan31

The Miller Device on null and Other Lowly Un-values

javascript programming | comments

null and undefined are weird. Normally, you would check for them directly by using x === null or typeof x === undefined, etc, but what if we use the Miller Device?

An example of usage:

var myArray = [];

if(Object.prototype.toString.apply(myArray) === '[object Array]') {
  // do array operations
}

Every browser gives consistent results for JavaScript's other top level objects:

object = [object Object]
array = [object Array]
string = [object String]
number = [object Number]
NaN = [object Number]
Infinity = [object Number]
function = [object Function]
boolean = [object Boolean]
date = [object Date]
regex = [object RegExp]
error = [object Error]

Testing for null and undefined is a bit more interesting:

Firefox, Opera:

null = [object Window]
undefined = [object Window]

IE 6-8:

null = [object Object]
undefined = [object Object]

Safari:

null = [object DOMWindow]
undefined = [object DOMWindow]

Chrome/V8:

null = [object builtins]
undefined = [object builtins]

Spidermonkey console:

null = [object global]
undefined = [object global]

In each case besides IE and Chrome, the string representation of the global object is being returned. IE returns the uninteresting [object Object] string, and Chrome hints again at why it should be distinguished from Safari, at least when testing JavaScript, by returning [object builtins].

If you were to use The Miller Device to detect null or undefined, you would simply use it on both the left and right sides of the comparison:

if( Object.prototype.toString.apply( MyObject ) ===
    Object.prototype.toString.apply( null ) ) {
  // FAIL
}

Jan31

jsUnity

javascript testing programming | comments

I was looking for a lightwieght, context-agnostic testing framework for JavaScript so I could describe the behavior of whatever script I was writing (and found one, thanks to Ates making it.) The particular script I was working on had no need for a browser or any of the extra environmental cruft - it was pure JavaScript, ready to be plugged in wherever it may go. Or perhaps it will never leave the comfort of a Spidermonkey shell, the point is I just wanted a quick run-down of what it should do, and not have to launch a browser just to see.

JavaScript decoupled from HTML? It is rare indeed. Even with server-side JavaScript you could still be dealing with a document.

Enter jsUnity (nice pun, by the way,) which lets you plug your test suite into any environment. You just supply a custom jsUnity.log function for logging to that environment.

jsUnity.log = function (s) { document.write(s + "</br>"); };

This rubs me the right way. This concept could be extended to allow other customizations for specific environments. The core test suite can stay light while modules are built around it.

The globally scoped assertions kind of bug me, but seem to be modeled after JsUnit style tests. Ideally, those assertions could reside in a module, making them as easy to use and safe to extend.

It's a start, a nod in the right direction. Plus, there's always github for forking.

Hey, JavaScript and HTML are a couple for now, but the way HTML has been flirting around, you never know... JavaScript needs to keep it's figure right and assets in order - mmhm.

Jan28

Odd: Using prototype as a Singleton

javascript programming | comments

I was perusing the JsUnit code examples and noticed how they used the object's prototype as a singleton:

var result = TextTestRunner.prototype.main( args );
JsUtil.prototype.quit( result );

Code like this appears throughout the page!

This strikes me with much fear and confusion. Surely, for use as a singleton, any arbitrary property name would be more appropriate than prototype, the home of instance method and property definitions. Why use a secondary namespace instead of the object itself, anyway?

The code base is old, so maybe conventions were not as strong then (though this seems largely fundamental,) or something was lost in translation.

In any case, I'm still looking for a good testing framework to use on the console, or in arbitrary contexts. Edit: found one.

Jan19

C File Descriptors: A Novice Dialogue

c tutorial programming dialogue | comments

In the Operating Systems class at my university, one of the projects we have to complete is a Unix shell. It's a basic shell which should support piping, redirection, inverse redirection, and background processes. Needless to say, this was an immensely interesting and useful project, though sometimes difficult for the unassuming undergrad.

I had completed it in a previous semester, but a friend called on my guidance last semester for implementing pipes and redirection. She contacted me via IM:

Qi: hey Zact
  Zach
 me: zact??
 Qi: lol
  sorry
 me: ha, np

What follows is my attempt at explaining file descriptors, forks, and redirection, without much formal knowledge and not having worked with C in a year. Dijkstra would certainly not approve of such malling of a radical novelty, but my friend was grateful nonetheless and the dialogue does indeed prove to be useful in achieving a basic understanding of the premises of the project (namely, forks and file descriptors.)

Read at your own peril.

To follow along, also have the tutorial code handy.

All of this trouble was not in vain, however:

Qi: THANKS MUCH MUCH MUCH
  i owe you one
  after exams are done
  i'll cook some noodles
  ;)
 me: !!!!!!!!!

I would have been fine with naught, for I gained much from this as well, if not more.

Jan11

Removing duplicate characters

ruby programming tips | comments

An interesting string method was brought to my attention on the ruby-talk mailing list: string.squeeze. It removes duplicate characters in a string, and performs better than regular expressions or splitting-joining, to boot. Craig posted some benchmarks:

require 'benchmark'

n = 1_000_000

Benchmark.bm(10) do |x|
 x.report("gsub") { n.times do
    "This     is  a test   string.".gsub(/ +/, ' ')
  end
 }
 x.report("gsub!") { n.times do
    "This     is  a test   string.".gsub!(/ +/, ' ')
  end
 }
 x.report("split.join") { n.times do
    "This     is  a test   string.".split.join(' ')
  end
 }
 x.report("squeeze.strip") { n.times do
    "This     is  a test   string.".squeeze(' ').strip
  end
 }
end

               user     system      total        real
gsub        4.470000   0.040000   4.510000 (  4.578173)
gsub!       4.390000   0.020000   4.410000 (  4.468695)
split.join  3.500000   0.020000   3.520000 (  3.556669)
squeeze.strip  1.980000   0.010000   1.990000 (  2.003622)

Nice!

Dec29

Committing to a better editor

vim programming | comments

Yeah, I use vim. You know, for those trivial edits or during remote sessions. Nevermore! I finally got tired of selling myself short. This past weekend I committed myself to use vim (mostly gvim) for any and all of my daily coding. This is something that was long overdue, and after getting real friendly with vim this weekend, I wish I would have done it much sooner. ...

Once I got comfortable with the basics by going through vimtutor, I started configuring my .vimrc file with some convenient mappings. I have a habit of pressing Crtl-S constantly while coding, so I mapped that in vim to save by putting this in my .vimrc file:

map <C-S> <Esc>:w<CR>
map! <C-S> <Esc>:w<CR>

Sweet! I then googled around a bit and picked out additional scripts and mappings that looked useful (most of them found on Vim Tip Wiki.) My .vimrc file is here.

Then it was on to plugins. This blog post by Jamis had a bunch of useful tips. I don't even need a project browser pane thanks to FuzzyFinderTextMate. I just launch gvim from the root of my project (I made a little convenience script to make this even easier) and every file is a few keystrokes away. I should note I used vim 7's new graphical tab feature the first day, but soon found buffers to be quicker and more capable/compatible with plugins, though tabs were a good way to ease into things.

I also love the snippetEmu plugin (the snippets for JavaScript were severly lacking, so I added some.)

I put my whole configuration directory including my .vimrc file on github so I can pull it down into whatever work environment I'm in. Feel free to peruse.

If you are still using Notepad++, Gedit, or some other not-bad-but-not-terribly-awesome editor, commit to something better! That is, if you really are serious about this. This is serious business, darn it!

Yeah, I know, it's awkward. Similar to learning Linux/git/anything worthwhile, it can be awkward at first, but the payoffs can be astronomical. That was the case for me after installing Ubuntu about, eh, two years ago. If I've gained this much from vim in just a weekend, I can't imagine how my work flow will be in two years.

Times are tough, invest in your future!