I committed almost nothing to the Thousand Parsec repository this week. That’s because I’m mainly working on skime — a pure-Python VM for Scheme. After one-week hard working, the basic shape of the VM is already there.
Although there are still many work (e.g. the macro system) to do before it can be a really useful VM, I decide to write a simple layer to fit the Schemepy API and run the benchmarks to see whether the time spent on a new fallback is worth.
The result looks promising. Firstly, there’s the time spent to load the VM (mzscheme is not included in the benchmarks shown below. The mzscheme package in Debian sid upgraded to version 4.x, but even the mzscheme executable failed to start in my local box. I still haven’t got time to inspect the reason.):
Time to load the VM ------------------------------------------------------------ user system real total guile 0.040000 0.070000 0.170000 0.280000 skime 0.000000 0.010000 0.010000 0.020000 pyscheme 0.020000 0.040000 0.050000 0.110000
Unlike other benchmarks, pure-Python implementation always wins when it comes to loading the VM because loading a shared library is generally more expensive than importing several Python modules. skime is even faster than pyscheme here. But that might be because skime is still not very complete currently.
However, besides the load-vm benchmarks, skime also beats pyscheme in other benchmarks. One example is the compiling benchmark to measure the performance gained by compiling the code before running it (and the compiled code can be run many times). Both guile and pyscheme do not support compiling. The compiling process of pyscheme backend is simply parsing, while nothing is done in the compiling process of guile backend. However, skime compiles the Scheme code to bytecode and the skime VM is in fact a bytecode interpreter. Consider the following simple insertion-sort example:
(begin
(define (insertion-sort lst)
(define (insert-in-list new lst)
(if (null? lst)
(list new)
(if (< new (car lst))
(cons new lst)
(cons (car lst) (insert-in-list new (cdr lst))))))
(define (helper unsorted sorted)
(if (null? unsorted)
sorted
(helper (cdr unsorted)
(insert-in-list (car unsorted) sorted))))
(helper lst '()))
(define (sorted? lst)
(if (null? lst)
#t
(if (null? (cdr lst))
#t
(if (< (car lst)
(car (cdr lst)))
(sorted? (cdr lst))
#f))))
(define lst '(5 9 2 4 7 6 3 1 8 10 -5 83))
(define sorted (insertion-sort lst))
(sorted? sorted))We compile the code, execute the compiled form 100 times and measure the time:
performance improvements by compiling (sort) ---------------------------------– user system real total guile 0.000300 0.000000 0.000400 0.000700 skime 0.015900 0.000100 0.016000 0.032000 pyscheme 0.107300 0.000200 0.107600 0.215100
It seems that skime can be almost an order of magnitude faster than pyscheme. However, that in fact depends on how many times we repeat the execution. When we repeat 1000 times at another example, the result is even more fantastic:
performance improvements by compiling (calc) ---------------------------------– user system real total guile 0.000250 0.000000 0.000250 0.000500 skime 0.000360 0.000000 0.000360 0.000720 pyscheme 0.010900 0.000000 0.010930 0.021830
The performance of skime is within the same order of magnitude of guile, and about 30 times faster than pyscheme. The test case involves mostly simple calculations:
(begin
(define a 5)
(define b 10)
(define c (- a b))
(+ a
(* b 10 11)
(apply + (list (* b c) 1 2 (- 3 a) b 5))
9
b))Not all benchmarks are supported. Some primitives are needed for skime to run the strcat benchmark (so does pyscheme). However, the simple but famous tak benchmark can also show us something:
tak benchmark ------------------------------------------------------------------ user system real total guile 1.030000 0.010000 1.050000 2.090000 skime 254.350000 0.060000 254.920000 509.330000 pyscheme 1310.230000 0.360000 1312.500000 2623.090000
The benchmark code is shown below:
(define (tak x y z)
(if (not (< y x))
z
(tak (tak (- x 1) y z)
(tak (- y 1) z x)
(tak (- z 1) x y))))
I didn’t run all the test cases because it takes too long. Mainly recursive-call is measured in this benchmark. pyscheme uses Python stack to execute Scheme procedures, so it is relatively expensive to make context switching. And since tail-call is not supported in Python, pyscheme uses trampolined-style Programming to avoid stack overflow, which may cause significant performance issue. On the other hand, skime use light-weighted context switch along with spaghetti stack, which does not suffer from deep recursion and may be trivial to implement continuation.
More over, tail-call is supported by skime. The result of the tail-call benchmark can confirm this:
Tail call performance ---------------------------------------------------------- user system real total guile 0.007000 0.000000 0.007000 0.014000 skime 1.578000 0.002000 1.602000 3.182000 pyscheme 8.498000 0.002000 8.553000 17.053000
In general, skime looks promising. Besides its performance advantage over pyscheme, the feature is (already) more complete. So I’m sure skime will take the place of pyscheme as the default fallback of Schemepy and further development is needed.
Although the task is a little difficult, I enjoy very much developing the skime VM. I’ve been looking forward to have a chance to write a language VM for a long time and this is my first try. I found something completely wrong and rewrote large piece of code several times, but I really learned a lot during the process, and the result is not too bad.
I use a bottom up way to write the bytecode compiler. But after totally confused with context, environment, operand stack and related things, I finally managed to develop the VM with a top down method. Although things can be more fast with a lower-level language, writing the whole thing in Python really helps a lot by letting me concentrate on the system other than tricky language issues. ![]()