Thursday, April 8, 2010

At this point, we can save and restore primitive objects, but restoring is somewhat painful because we have to remember the address that the durable storage layer returned to us. We need a way to associate an identifier of our own choosing with the durable address. We'll define a structure called the object-map that is basically a big lookup table that maps logical object ids to the addresses that the durable storage layer uses.

The object-map will be used a lot, so it is important to come up with an implementation that has decent performance. The object-map will also have to be stored durably, so it had better not get unwieldy when it is full of objects. The choice of implementation affects the performance of the store as a whole, so there are dozens of variations on several different themes. I'll discuss just a couple.

Perhaps the simplest and most obvious implementation would be a simple vector of durable addresses. Object N's address would be stored in the Nth entry in the vector. Naturally, this is very fast. Unfortunately, every time we make a logical change to the table, we have to write the entire table back out to the durable storage. (Our durable storage layer has no mechanism for `update in place'. This is by design, but the advantages are not apparent yet. One particular disadvantage is quite obvious now.) If object IDs are permitted to be removed from the table, the table could become sparse. In that case, a simple vector would be a tremendous waste of time.

Another possible implementation would be to represent the object-map as an association list. This would greatly improve the performance of persistent object allocation because only the new association list entry would have to be written to durable storage. Lookup now becomes linear, however. Removing an object from the association list would require writing back every entry between the object and the most recently allocated object.

Another possibility is using a vector as before, but rather than saving the entire vector just save the changes to the table. This technique degenerates into the association list case if the table is never written back.

A popular implementation technique is to use a tree of some sort. A binary tree has the advantage of logarithmic lookup time and, if done cleverly, log time and space insertion and deletion. A tree with a larger branching factor would be faster on lookup, although it would require larger update writes. If the durable storage is structured in physical blocks, it can greatly improve performance if a tree-structured object map uses nodes that correspond to the physical block size.

Regardless of the underlying implementation technique, the API is simple:

object-map/add object-map object-id object
  => new object map

Returns a new object-map that contains all the mappings of the
argument object-map in addition to a new mapping from
object-id to object.  If object-id had a
previous mapping, that mapping is not accessible in the result. 

Object will be serialized into the durable store associated with the
object-map.  The object-map itself will also be serialized into the
store.


object-map/address object-map
  => a durable address

Returns the durable address at which the object-map was serialized.


object-map/create durable-store
  => new object map with no entries

Argument durable-store is forever associated with this object
map.


object-map/durable-store object-map
  => a durable store

Returns the durable-store is associated with this object map.


object-map/lookup object-map object-id
  => an object previously added
     Error if object-id was never added.

Returns the deserialized object associated with object-id.
The object is deserialized from the durable-store associated with the
object-map.


object-map/recover durable-store address
  => object-map or
     Error if address does not conain an object map.

Deserialize an object-map from a durable store at the given address.


;; Optional API items.
object-map/lookup-address object-map object-id
  => the durable address of an object previously added
     Error if object-id was never added.

Returns the durable-address of the serialized object associated with
object-id.  This function will no doubt exist in any
implementation, but the question is whether to export it.


object-map/remove object-map object-id
  => new object map

Creates a new object map that is like the argument
object-map, but does not contain a mapping for
object-id.

Exercise 3: Implement the object-map API on top of the durable storage and serialization APIs.


As an aside. This random excursion into persistent storage has nothing to do with the “Professor and Students A, B, and C” thread. I'm trying to think of the right was to continue that thread and started this unrelated thread as an amusement.

1 comment:

tonyg said...

I'm with you so far.

Had you noticed the similarity between your development and the principles underlying Git? Specifically, blobs and trees, so far. I look forward to seeing what analogues to refs and commits you come up with :-)