Wednesday, October 7, 2009

Details

Here's how I'm handling environments and variables in my version of MIT-Scheme. You can skip this post if you don't care about implementation details.

If you remember your beginning Scheme course, you know that an environment consists of a series of ‘frames’ where each frame contains a table of variable bindings and a pointer to the parent frame. When you look up a variable, you search for the binding starting at the nearest frame and work your way up to the root frame (the global environment, or some top-level environment). At least that's the model. Reality is a bit more complicated.

When you apply a closure to some arguments, you create a new environment frame that contains the bindings of the lambda-expression associated with the closure. The frame for a standard environment has a pointer to the closure that was invoked and a vector of value cells for the arguments.
class StandardEnvironmentFrame
{
    StandardClosure closure;
    ValueCell [] bindings;
}

class StandardClosure
{
    StandardLambda lambdaExpression;
    StandardEnvironmentFrame environment;
}

class StandardLambda
{
    Symbol [] parameterList;
    SCode body;
}
Notice the StandardEnvironmentFrame is missing both the table of bindings and the pointer to the parent frame. You can get the parent frame by dereferencing the closure and dereferencing its frame. We're only storing the binding cells in the environment, but you can figure out what names are associated with them by looking at the parameter list of the lambda expression (again, via the closure). Why the roundabout way? Two reasons. It is parsimonious in storage, and there is a legacy reason. Suppose you are evaluating a function call expression. If you are using a stack machine, the natural strategy is to evaluate each subexpression in turn and push the value on the stack. If you do this in the right order, you'll have the closure and the argument values in a contiguous chunk of stack. This is (almost) exactly the layout of the environment frame.

Variable lookup is simple, but tedious. First we scan the parameter list of the lambda. If the variable name is found, the offset in the lambda list will match the offset in the bindings vector, so we just go fetch it. Otherwise, we search the next frame. This is almost the correct code:
object DeepSearch (Symbol name)
{
    int offset = this.envClosure.FormalOffset (name);
    if (offset == -1) {
        return this.envClosure.Environment.DeepSearch (name);
    }
    return this.bindings [offset];
}
It's not quite correct because we didn't take incremental definitions into account. The environment of the closure could be a first-class environment. If this is the case, the user could evaluate a new definition in the environment. The StandardFrame needs a place to store these extra definitions.
class StandardEnvironmentFrame
{
    StandardClosure closure;
    ValueCell [] bindings;
    Dictionary  incrementalDefinitions;
}
And we have to modify DeepSearch to check the incrementals.
object DeepSearch (Symbol name)
{
    int offset = this.envClosure.FormalOffset (name);
    if (offset == -1) {
       if (this.incrementalDefinitions.ContainsKey(name))
           return this.incrementalDefinitions[name];
       else
           return this.envClosure.Environment.DeepSearch (name);
    }
    return this.bindings [offset];
}
As I've mentioned before, the speed of variable lookup is one of the most important factors that determines interpreter performance. The performance of the naive code above is pretty bad. In debugging mode, I sample the top of stack every few microseconds and record what the interpreter is doing. The number one item on the top of the stack is variable lookup.

At this point it is worth running a couple of benchmarks to establish baseline performance.

1 comment:

kbob said...

Incremental definitions. Just to be clear, you mean symbols defined inside a lambda?

(lambda (a b)
(define c ...)
...)

c is an incremental definition?