Sunday, October 11, 2009

To make a short story a bit longer

Optimizing argument lookup helps quite a bit, but the non-argument variables take a lot of time. We waste time by walking the environment chain and scanning the lambda arguments of each frame. There are a couple of ways of speeding this up. The first thing is to notice that if there are no incrementals in any of the frames in the lookup path, then the location of a lexical variable is fixed relative to the current environment. You can use a lexical address of the form n-frames-back by offset in frame. This gives you a lot of bang for the buck, and is what the MIT-Scheme interpreter used to do. (Now that they have a compiler, they just punt and deep search each time.) jrmscheme did the same thing, but despite the optimizations, it simply takes time to walk up the environment chain. Not only is time a problem, the storage requirements are an issue. There are bindings that are no longer in use in the deeper frames, and these are being retained along with the bindings of interest. I decided to change to a ‘flat’ environment implementation.

A flat environment does not have a chain of frames. It has a single frame with two components. The binding vector that is common to all environment structures, and a vector of pointers to the lexically visible binding cells. The lexically visible cells are indirect because the bindings may be shared among several closures. For example, look at this code:
(let ((counter 0) 
      (increment 1))
  ...
  (lambda () (set! counter (+ counter increment)))
  (lambda () counter)
  ...)
We have two lambda expressions that must be closed over the same variable counter. In a chained environment model, invoking either closure would extend the common shared environment where the counter and increment variables were bound. In a flat environment model, invoking the second closure would lead to an environment structure like this:
bindings: #()  ;; empty argument bindings
lexical: #(<pointer to counter>)
while invoking the first closure would lead to this:
bindings: #() ;; empty argument bindings
lexical: #(<pointer to counter> <pointer to increment>)
This is fairly straightforward in principle, but the devil is in the details. It took me a fair amount of time to fiddle around with the code and come up with something that worked.

The first step was to add a new phase to the interpreter. A partial evaluation walk is done on the code before calling the actual eval. In the partial evaluation phase, the code tree is rebuilt and certain nodes in the tree are replaced. A partial environment is kept as the code is walked. When a variable is encountered, it is looked up in the partial environment. If it is found in the immediately enclosing frame, it is turned into an Argument variable with the appropriate offset. If it is not found at all, it is turned into a FreeVariable. If it is found in an intermediate frame, things get tricky. We have to consider the lambda expressions that are associated with the intervening frames and decide how the variable will be accessed.

When we partially evaluate a lambda expression, we first have to check whether we'll need a first-class environment for it. Fortunately, this is an easy test: we just have to walk the body of the expression and look for a call to the the-environment special form. If we need a first-class environment, then we simply punt on any optimization other than Argument variables because we'll just do a deep search every time we access a lexical variable. On the other hand, if we don't need a first-class environment, we construct a PartialStaticClosure for the lambda expression. When we partially evaluate the PartialStaticClosure, we construct a PartialStaticEnvironment that we use for partially evaluating the body of the lambda expression.

The entire process of partial evaluation is very similar to that of ‘normal’ evaluation, but there are a couple of differences. Partial evaluation produces a PartialResult that contains the rebuilt code tree. Conditionals are partially evaluated along both branches and the PartialResults are combined into a new conditional node. Partial closures are partially evaluated right away (instead of being returned) as if they had been applied to arguments, but of course the resulting PartialEnvironment doesn't contain runtime values, it only contains binding information.

A PartialClosure leads to the construction of a PartialEnvironment. Variable lookup in a PartialEnvironment either returns the argument offset, or an indication that a deep search is necessary. The rebuilt lambda expression becomes a StandardLambda, when it is actually evaluated, a StandardClosure is built. When the StandardClosure is applied to arguments, the bindings are placed in a StandardEnvironment. Variable lookup in a StandardEnvironment uses deep search, and in this way retain the old chained environment model for first-class environments when necessary.

But if we don't need a first-class environment, we do things a bit differently. The lambda expression becomes a StaticLambda (the word static in this case meaning that the location of the bindings never change). When a StaticLambda is partially evaluated, a PartialStaticClosure is created. This leads to the construction of a PartialStaticEnvironment where variable lookup happens differently. At regular eval time, a StaticLambda creates a StaticClosure, and when a StaticClosure is applied, it creates a StaticEnvironment.

Alas, I've arrived at work. More details later...

1 comment:

kbob said...

In your profiling, have you measured the distribution of frame sizes? It would be interesting to see how small the average frame is and how common much larger frames are.

As always, thanks for the update.