Wednesday, January 21, 2009

Extending PHP in Project Zero

The de facto open source implementation of PHP is written almost entirely in C. Looking under the hood shows that it is really two fairly distinct parts. First there is the Zend Engine. This is the core language engine that parses PHP scripts and interprets the resulting opcodes. 

The Zend Engine also contains the implementation of PHP types like variables (zval) and arrays (HashTable). So where are all the PHP functions and classes implemented then?

The answer is that the Zend Engine provides an extension API. This allows anyone to write classes, functions, constants and much more that plug in to the runtime. Taking a peek at the source code in CVS reveals just how many extensions there are. 

This C API is well documented in Sara Goleman's book. It talks in detail about how to extend PHP and indeed how to embed PHP in your own applications.

Zero has a very similar architecture that separates engine and extension. The main difference is that the core PHP language engine is written in Java. Zero also has a comprehensive API called XAPI-J that allows Java extensions to be written for the Zero PHP runtime. 

We use XAPI-J to implement virtually all the PHP extension functions supported in Zero (listed here). But there is a bit more to this story than meets the eye. The slides Rob Nicholson presented at last year's PHP Quebec cast some light on this. 

Zero actually ships a pretty large number of the C extensions. We do this by re-compiling the source code against a different implementation of the C API. The Zero implementation of the C API (called XAPI-C) knows how to talk to the Zero PHP runtime through JNI

There are several benefits to this solution. First and foremost we maintain compatibility with the open source PHP. It is really important to PHP developers on Zero that we are as close to 100% compatible with as possible, and using the code certainly helps! 

Secondly, we avoid re-implementing all those extension functions from scratch. Why bother re-writing everything in Java when there is a perfectly good implementation already available! There are in fact a few performance critical functions that we do implement in Java, the array functions are one good example.

If you are interested in writing extensions for Zero you will find the documentation and examples for XAPI-J on the Zero web site.


  1. What about speed and memory footprint of using JNI vs. native C linking? Is there any noticeable difference?

  2. Thanks for the writeup. Providing PHP's extension API is also a goal for Pipp, which in PHP on Parrot.

  3. Hi, yes there is quite a big difference between native linking and JNI calls. Native linking is by far the fastest so the challenge in XAPI-C has been to minimise the overhead wherever possible.

    The key is to use native extensions functions when they are relatively chunky. By which I mean the amount of work they do in the function is considerably larger than the time it takes to call it! For lightweight functions we generally re-implement in Java.

    We also do a bunch of caching in the native side to avoid calling back into the JVM. For example, all the arguments to the extension function are transferred in a Java direct byte buffer. This is way more efficient than calling back into the JVM for each individual argument.

    Another example might be where an extension creates and populates an output array. In this case we can do all the array manipulation on the native side, and only at the end, copy the array back into the JVM in a single optimised call.