Last week, Scott Petersen from Adobe gave a talk at Mozilla
on a toolchain hes been creatingsoon to be open-sourcedthat
allows C code to be targeted to the Tamarin virtual machine.
Aside from being a really interesting piece of technology, I
thought its implications for the web were pretty
impressive.
Before reading this post, readers who arent familiar with
Tamarin may want to read Frank Heckers excellent Adobe,
Mozilla, and Tamarin post from 2006 for some background on its
goals and why its relevant to Mozilla and the open-source
community in general.
If I followed his presentation right, Petersens toolchain
works something like this:
A special version of the GNU C Compilerpossibly
llvm-gcccompiles C code into instructions for the Low Level
Virtual Machine.
The LLVM instructions are converted into opcodes for a custom
Virtual Machine that runs in ActionScript, a variant of
ECMAScript and sibling of JavaScript.
The ActionScript is automatically compiled into Tamarin
bytecode by Adobe Flash, which may be further compiled into
native machine language by Tamarins Just-in-Time (JIT)
compiler.
The toolchain includes lots of other details, such as a custom
POSIX system call API and a C multimedia library that provides
access to Flash. And theres some things that Petersen had to
add to Tamarin, such as a native byte array that maps directly
to RAM, thereby allowing the VMs emulation of memory to have
only a minor overhead over the real thing.
The end result is the ability to run a wide variety of
existing C code in Flash at
acceptable speeds. Petersen demonstrated a version of Quake
running in a Flash app, as
well as a C-based Nintendo emulator running Zelda; both were
eminently playable, and included sound effects and music.
So, once Petersens modifications to Tamarin make their way
into the next version of Adobe Flash, we can expect to see
older commercial games running in the browser. Even more
impressive, though, is the sheer volume of existing code that
can be made to run inside the browser: Petersen showed us the
C-compiled versions of Lua, Ruby, Perl, and Python all running
on the web in secure Flash
sandboxes.
What this means for Python
The potential implications this has for Python are
particularly interesting to me. The ability to run Python on
the web is exciting, to say the least; also interesting is the
fact that by sandboxing CPython in a virtual machine, we solve
a lot of the security issues that currently face the language
when it comes to running untrusted code.
Petersens work also resonates with a few goals of another
project called PyPy. Im going to try to explain the idea behind
PyPy in a later post; for the time being, the slides from my
April 2007 ChiPy presentation on PyPy may serve as a passable
introduction.
In a nutshell, the difference in mindset between PyPy and
Petersens work is that the former is radically innovative in
scope and mission, while the latter is pragmatic. PyPys goal is
essentially to move the canonical implementation of Python from
C to Python itself, and then use a pluggable toolchain to
translate the Python interpreter to any platform with a
configurable set of language and implementation features. In
one fell swoop, this modularizes the composition of the Python
interpreter in such a way that innovating and maintaining
different ports and variants of Python like IronPython, Jython,
and Stackless no longer requires either writing an entire copy
of the same interpreter in a different language or branching
the CPython source code and making pervasive changes to it.
Rather than focusing on innovation, Petersens work focuses
on code reuse. Instead of moving a canonical interpreter
implementation from C to a dynamic language, his strategy is to
simply compile the existing C code to run in a virtual machine
thats implemented in a dynamic language. Both approaches aim to
obviate the necessity of ports of interpreters to different
platforms, and as such their purposes intersect at a common
subset of functionality. But Petersens work cant be used to
facilitate the innovation of the Python language and its
implementation, while PyPy offers few or no tools to reuse
existing non-Python code. Perhaps its possible to combine the
best of both worlds by taking PyPys generated C interpreter and
using Petersens toolchain to allow it to be usable on the web
and other places that Tamarin runs.
What this means for the Open Web
To be honest, Im not quite sure where the dividing line is
between what of Petersens work is Flash-specific and what can
be reused to benefit the Open Web. Since ActionScript is a
sibling language to JavaScript, its possible that the custom VM
he created can be run in a browser with relatively few
modificationsalbeit much more slowly in Firefox at the time
being, since SpiderMonkey-Tamarin integration is not yet
complete. Once thats further along, though, I imagine it should
be possible to create C libraries that can be used in the
toolchain to allow sandboxed C code to interact with web pages
rather than Flash apps.
Should this be feasible, I think it will possibly be the
ultimate in a relatively recent string of next-generation Javascript
virtual machines that allow existing code to run safely in
browsers.
Also, in the context of the web, download size is a
significant concern because applications are essentially
streamed to clients. While Petersens toolchain means that its
possible to instantly inherit most of CPythons benefits on the
web, it also means that we get all of its flaws along with
itsuch as the fact that the standard CPython distribution is a
few megabytes large. But theres ways to get around this.
In any case, Im really excited to see how both Petersens
work and PyPy proceed. I just hope I havent mis-represented
either one of them here due to a lack of understanding; Ill try
to correct this blog post as I become aware of my
mistakes.