Tuesday, August 30, 2011

Measuring GC times

MIT/GNU Scheme has a command line parameter, --heap, that allows one to specify the size of heap in 1024 word blocks. On a 32-bit machine, the heap size is limited because MIT/GNU Scheme stores the type tag in the upper bits of the word. On a 64-bit machine the word size is wider, so it is possible to specify extremely large heaps without worrying that the address will overflow into the type tag.

For kicks, I decided to vary the value of the heap size and see what effect that had on the task of MIT/GNU Scheme compiling itself.
HeapGC CountCumulative GC Time
in ms
6000105625800
8192 (64 MiB)764 19120
10000623 17910
16384 (128 MiB)377 10980
32768 (256 MiB)188 7350
50000124 5560
65536 (512 MiB)94 5000
10000063 4410
131072 (1 GiB)48 3910
15000042 4160
20000032 3800
25000026 3560
262144 (2 GiB)25 3050
30000022 3360
35000019 3260
393216 (3 GiB)18 2740

Collecting this data was tedious, so I spent a fair amount of time thinking about a model that could explain the results. At first glance, everything looks about right: the more memory, the fewer GCs and the less total time is spent garbage collecting. The problem is that the curve predicted by my model doesn't fit the empirical data at all. Of course this means that my model is either wrong or incomplete.

I have since figured out the problem, but I thought I'd let people puzzle over it themselves for a bit, first.