Thursday, January 29, 2009

Charles Darwin and PHP

This year is the 150th anniversary of the publication of On the Origin of Species (24 November 1859) and the 200th anniversary of Darwin's birth (12 February 1809). The BBC recently screened an absolutely fascinating program called "What Darwin Didn't Know". It is on iPlayer here although I'm not sure it is available outside of the UK - sorry if not!

What really struck me was not only his insight, but also how Darwin devoted an entire chapter to describe the difficulties of the theory! Remarkable as well how he accepted the theory was incomplete and asked for future generations to fill in the gaps.

All this got me thinking about applying the idea of natural selection to open source software. I was sure there must be an analogy for how open source projects grow and die that mimics the world of evolutionary biology. 

I also think natural selection provides a great explanation for why PHP is so hugely successful. I don't often hear people describe PHP as a pure and elegant programming language. A criticism that purists in the OOP world sometimes direct at us. 

But that is rather missing the point I think! PHP succeeds because it has evolved over a relatively long period of time to do things quickly, easily, and well.


Anyway, I set to task to find some prior art on this but my Googling didn't find any great matches. So my next thought was to ferret around and find some obsolete programming skills to show what has actually died out. 

Alas, as before, nothing quite hit the mark. I did however dig up this web site where people are documenting obsolete skills. The problem is that most of the obsolete skills listed turn out to either still be used, or staging comebacks in one form or another (such as putting a needle on a vinyl record)!

So if you do know of somewhere that talks about natural selection and open source software, please let me know.

COM/.NET Interop in Zero PHP

Zero doesn't currently support the COM/.NET extension in PHP. 

No matter though, there is a handy open source project called JACOB that bridges between Java and COM/.NET. In fact, there is an easier way to do this using a Groovy library called Scriptom. This is really just a friendly wrapper around JACOB to provide a better syntax for calling methods and accessing properties. 

The aim in this example is to list the processes on a Windows machine - here are the steps to do it in PHP. I'm using the latest Zero download of Sebring and Eclipse PDT as the IDE.

1. Download Scriptom and put the JAR and DLL in the Zero lib directory:

2. Run a resolve step in Eclipse so that the JAR file is added to the class path.

3. Add the following extension to php.ini (if not already there):
extension=com.ibm.p8.engine.xapi.groovy.GroovyExtensionLibrary

4. Copy and paste this code into a PHP script in the public folder:

<?php
define("wbemFlagForwardOnly", 32);

java_import("com.jacob.com.ComThread");
ComThread::InitMTA(true);
groovy_import("org.codehaus.groovy.scriptom.Scriptom");
$locator = new ActiveXObject("WbemScripting.SWbemLocator");
$services = $locator->ConnectServer('.');

$processes = $services->ExecQuery(
  'SELECT * FROM Win32_Process',
  'WQL', wbemFlagForwardOnly);

$total = 0;

// Iterate through the COM collection
foreach ($processes as $current) {
  if (Scriptom::isNull($current->CreationDate) == FALSE) {
$date = new ActiveXObject("WbemScripting.SWbemDateTime");
$date->Value = $current->CreationDate;
$creation = $date->GetVarDate(TRUE);
$command = $current->CommandLine;
$command = (Scriptom::isNull($command) ? "None" : $command);
echo "Process found [".$current->Name.
        " - ".$command."]"; $total++;
  }
}

ComThread::Release();
echo "Found $total processes in all";
?>

Before you say it, yes I agree, the code is really yuck but that's COM for you. 

5. Run the application and point a browser at your script:

If you are curious about what all these crazy objects are, there is lots more information on the Microsoft web site. The example is using Windows Management Instrumentation (WMI) to enumerate the processes.

The calls to ComThread::InitMTA() and Release() are important to make sure Zero doesn't hang. COM has some weird stuff in it to do with apartments and threading.

Tuesday, January 27, 2009

The Long Road to 64 Bits

There is an interesting story in January's Communications of the ACM called The Long Road to 64 Bits. The magazine is free online here. It charts the long and often twisted road from 32 to 64 bit architectures. 

From a software perspective the challenge is around writing C code that compiles and runs cleanly on 32 and 64 bit platforms. The issue being that compilers on different hardware architectures adopt different sized data types for integers, longs and pointers. 

So for example, on Windows 32 bit (and many Unix's from the 1990's) all three were 32 bits wide, hence the tag ILP32. On 64 bit Windows, integers and longs remained 32 bits wide while pointers necessarily increased to 64 bits - so that goes by the tag IL32P64, or just P64 for short. Meanwhile many 64 bit Unix systems adopted I32LP64 - meaning integers remained 32 bits while longs and pointers widened to 64 bits. 

PHP has to deal with this issue since it is written in C and is ported to many platforms. This issue also comes up when mixing Java and C code because Java has a different approach. Java set out to simplify the programming model by fixing the sizes of all data types. An integer in Java is 32 bits, always 32 bits and never anything apart from 32 bits. Java also has a long data type which is always 64 bits and has no unsigned data types at all.



Problems arise when Java code (with a fixed size data type) interacts with C code (that varies the data type size depending on the platform). Java has no pre-processor like C so it's not easy to write Java code that would, for example, use a Java integer on 32 bit machines, and magically use a Java long on 64 bit machines (and could therefore match the C code). So instead we generally use the wider data type (a Java long) on both 32 and 64 bit machines (and down cast where necessary).

At this point I should also mention z/OS which for it's own honourable reasons supported 24, 31 and 64 bit architectures...!

Wednesday, January 21, 2009

Extending PHP in Project Zero

The de facto open source implementation of PHP is written almost entirely in C. Looking under the hood shows that it is really two fairly distinct parts. First there is the Zend Engine. This is the core language engine that parses PHP scripts and interprets the resulting opcodes. 

The Zend Engine also contains the implementation of PHP types like variables (zval) and arrays (HashTable). So where are all the PHP functions and classes implemented then?

The answer is that the Zend Engine provides an extension API. This allows anyone to write classes, functions, constants and much more that plug in to the php.net runtime. Taking a peek at the php.net source code in CVS reveals just how many extensions there are. 

This C API is well documented in Sara Goleman's book. It talks in detail about how to extend PHP and indeed how to embed PHP in your own applications.

Zero has a very similar architecture that separates engine and extension. The main difference is that the core PHP language engine is written in Java. Zero also has a comprehensive API called XAPI-J that allows Java extensions to be written for the Zero PHP runtime. 

We use XAPI-J to implement virtually all the PHP extension functions supported in Zero (listed here). But there is a bit more to this story than meets the eye. The slides Rob Nicholson presented at last year's PHP Quebec cast some light on this. 

Zero actually ships a pretty large number of the php.net C extensions. We do this by re-compiling the php.net source code against a different implementation of the php.net C API. The Zero implementation of the C API (called XAPI-C) knows how to talk to the Zero PHP runtime through JNI

There are several benefits to this solution. First and foremost we maintain compatibility with the open source PHP. It is really important to PHP developers on Zero that we are as close to 100% compatible with php.net as possible, and using the php.net code certainly helps! 

Secondly, we avoid re-implementing all those extension functions from scratch. Why bother re-writing everything in Java when there is a perfectly good implementation already available! There are in fact a few performance critical functions that we do implement in Java, the array functions are one good example.

If you are interested in writing extensions for Zero you will find the documentation and examples for XAPI-J on the Zero web site.

PHP and Java Collections: More PHP, Less Java

We've added some nice integration with the Java collection classes in Zero.

Java has a rich set of collection classes. The three really important ones are Set, Map and List. We use the Java collections quite a bit when integrating Java libraries in PHP applications. So it got us thinking whether we could provide some good PHP syntax for these classes. It turns out there are quite a few things we could do to improve working with these classes in PHP.

For example, Java lists now support the following:

<?php
java_import("java.util.ArrayList");
$list = new ArrayList();
$list->add("Hello World!");

$list[0] = "Hello Again!"; 
$value = $list[0]; 
$array = (array) $list;
$check = isset($list[0]);
unset($list[0]);

// Append some items to list
$list[] = "Hello Again!";
$list[] = TRUE;
$list[] = "Updated!";
?>

Java maps and sets also get the full treatment. The reason why I like this is that it avoids lots of Java API calls. Some example code for hacking around with a Map using the Java APIs is shown below. Notice all the method calls, just so much more long winded than PHP's nice expressive syntax!

<?php
$map = new Java("java.util.HashMap");
$map->put("title", "Java Bridge!");
$value = $map->get("title");
$check = $map->containsKey("title");
$map->remove("stuff");
?>

More information is on the Project Zero PHP/Java Bridge reference page here.

Creative Commons picture attribute to MicMac.

Monday, January 12, 2009

PHP/Groovy Bridge

Ah, something new and shiny in Project Zero to talk about.

Zero supports three languages: Java, PHP and Groovy. It also supports a workflow runtime called Assemble and since that stores its flows as XML, and workflows are kind of like code, I suppose it could be considered an unofficial fourth language. 

Anyway I digress. Up until recently the PHP runtime was treating Groovy code just the same as Java. Anyone could compile a Groovy class to Java class files using the Groovy compiler. Those class files could then be dropped into a Zero application and used in PHP through the PHP/Java Bridge.

The problem is that Groovy isn’t the same as Java. Groovy adds lots of scripting features over and above Java’s current capabilities. Closures being one obvious highlight. This shines through if Groovy classes are used through a PHP/Java Bridge, it all just looks like Java. Take for example the following Groovy class that makes use of Groovy’s interceptors:

class Dynamic {
    def storage = [:]

  def invokeMethod(String name, args) {
  return "Hello World!";
  }

  def getProperty(String name) { 
storage[name] 
  }

  void setProperty(String name, value) { 
storage[name] = value 
  }
}

Another piece of Groovy can use this as follows:

foo = new Dynamic();
foo.bar();
foo.guff = "Hello World!";
print foo.guff;

Now see how this looks from a Java client:

public class Test {
  public static void main(String[] args) {
  Dynamic foo = new Dynamic();
  String result = (String) 
  foo.invokeMethod("sayHello", null);

        foo.setProperty("guff", "Hello World!");
        result = (String) foo.getProperty("guff");
    }
}

Yuck! And the same applies if we use this class in PHP through the Java Bridge. Well the PHP/Groovy Bridge solves this problem by integrating PHP and Groovy as first class citizens - see how it looks with the Groovy Bridge:

<?php

groovy_import("dynamic.groovy");

$foo = new Dynamic();
$foo->bar();
$foo->guff = "Hello World!";
echo $foo->guff;

?>

That’s better! Intuitive use of the Groovy class as intended. The groovy_import function is just one aspect of the PHP/Groovy Bridge. It also supports a range of other nice Groovy interop features including closures, currying and range objects. With this in mind I’ve put a slide cast up here where I ramble my way through an explanation of the PHP/Groovy Bridge.

The product documentation for the PHP/Groovy Bridge is also available here.

Wednesday, January 7, 2009


I've put some slides up here that Andy Coleman presented at the WebSphere Technical Conference in November last. They discuss how sMash PHP fits into Message Broker and some of the really sweet integration points with the PHP language. Our inspiration for this was the SimpleXML extension which provides a fantastically easy way to navigate XML documents. 

Message Broker has an existing Java API which we made heavy use of for this work. Much of the work involved was to catch method and field accesses in the PHP script and turn them into Java API calls on Message Broker. For example, take a look at the following message tree:


Here's a first attempt at navigating through this message tree to the title:

$xml = $input->getLastChild();
$doc = $xml->getFirstChild();
$ch1 = $doc->getFirstChild();
$ch2 = $ch1->getNextSibling()
$title = $ch2->getFirstChild();

Yuck! This is really horrible, lots of method calls and very little fluency. Ok we can do better, let's try again:

$xml = $input->getChild(‘XMLNSC’);
$doc = $xml->getChild(‘document’);
$ch2 = $doc->getChild(‘chapter’, 1);
$title = $ch2->getAttribute(‘title’);

Well it's a bit better but still far too much like hard work.

$title = 
    $input->XMLNSC->document->chapter[1][‘title’]

That's better! A really expressive way of navigating through messages. This is only scratching the surface of the PHP integration and the slides cover many other examples such as array integration, message shredding and content based routing.

Tuesday, January 6, 2009

Static Fields and Methods

One nice feature of PHP is its support for magic methods. These allow a class or object to catch a range of events such as method calls and property accesses. This kind of interception is really powerful and allows classes to implement all sorts of dynamic behaviour. For example, a database wrapper could intercept property access to get a property value from a database row. Many scripting languages offer something similar, for example Groovy has its own set of interceptors.

php.net is implemented in C and under the covers it implements magic methods by attaching a table of function pointers to each object (called the object handlers). In most cases the default implementation of an object handler is to simply call the appropriate PHP method on the class. 

For example, there is a __call object handler which calls the __call PHP magic method if it exists. Built in classes like SimpleXML and MySQLi use object handlers to implement whatever logic they need. All of this and a lot more besides is documented in Sara Golemans book.

In Project Zero we added magic methods like __call, __get and __set a long time ago. Over the Christmas break we finally got round to adding their static equivalents: __callStatic, __getStatic, __setStatic and __issetStatic

Here's an example of how __callStatic can be used:

<?php

class Foo {
  static public function __callStatic($a, $b) {
  print "__callStatic [$a]\n";
  }
}

foo::Test();

?>

We have also surfaced this feature into the Java Bridge:

<?php

java_import("java.lang.Integer", NULL, FALSE);
var_dump(Integer::$MIN_VALUE);
var_dump(Integer::$MAX_VALUE);

var_dump(Integer::toHexString("1234567890"));
var_dump(Integer::toOctalString("1234567890"));

// Signatures also work as normal!
$signature = new JavaSignature(JAVA_STRING);
var_dump(Integer::parseInt($signature, "1234567890"));

?>

This makes for a much cleaner syntax than using JavaClass. JavaClass was always something of an anomaly (er, hack) as it required the script to create an instance of an object just to access static class methods and properties. 

More information about this feature will be available just as soon as the Project Zero documentation has rebuilt tonight [:o)

Thursday, January 1, 2009

Thoughts on Bridging PHP and Java

I thought it might be interesting to talk a little about how we bridge between Java and PHP in Project Zero. So to start with I'm going to explain some background to the problem and then in a follow up post talk more about how the Java Bridge works in practice.

Java and PHP
I'm not going to enumerate every difference between the languages but I will talk a little about the main issues that effect the Java Bridge. First of all I should say that Java and PHP share much common ground in their object oriented design: objects, interfaces, classes, methods, fields, static members, exceptions, visibility (public, private etc) and so on. 

They also have some differences and this is where the problems often arise when integrating the two languages. For example, PHP methods, classes and function names are case insensitive, while in Java they are case sensitive. So it is possible, although unlikely, that a PHP script could invoke a method foo() which has several possible target methods on a Java object (Foo, FOO, foo etc).

Overloaded Methods
PHP does not support overloading of constructors nor methods. Java does support overloaded methods and constructors. Overloading in Java means two or more methods or constructors on the same class that have the same name but different argument types. Overloading on the return type alone is not allowed. 

The most difficult issue with overloading is selecting the right overload. This has to be done at runtime for reasons explained later.

Static and Instance Methods
PHP also allows instance methods on an object to be called statically, consider the following code:

class Foo {
    function bar() { }
}

Foo::bar();


This runs fine albeit with the following strict coding notice to highlight the potential problem:

Strict Standards: Non-static method Foo::bar() should not be called statically in foo.php on line 7

In case you are wondering, if the method trys to access an instance field as follows:

class Foo {
  function bar() { 
$this->a = "Hello World!";
    }
}

Foo::bar();

Then a fatal error terminates the script with this output:

Fatal error: Using $this when not in object context in foo.php on line 4

In the context of the Java Bridge, we cannot call an instance method on a Java object in a static way because this is simply not possible in Java. We can however do the reverse, which is to allow a PHP script to call a static method through an instance of the object. As the Java API documentation for a dynamic invoke shows, the obj argument is ignored if the method is static.

Dynamic Typing (Duck Typing)
From my experience, the biggest difficulty by far is with data typing. In PHP, variables have no declared data type. One minute a variable might be an integer and the next a string. In Java all variables have a fixed type, albeit with some latitude granted by polymorphism. Furthermore PHP does an amazing job of coercing types to match the circumstances:

echo strlen(array(1,2,3));

In this snippet I'm passing an array to a function clearly expecting a string. PHP politely informs me via a notice that it had to do something strange:

Notice: Array to string conversion in foo.php on line 3

And outputs 5 (why 5? Because it converted the array to the string Array).

Here is another simple example of type coercion:

echo abs("123456789");

In this snippet the abs function converts the string to an integer. 

So what does this mean? 
It means selecting the Java method to invoke is often the most difficult (and time consuming) part of the Java Bridge. This is especially true for overloaded constructors and methods. Perhaps the data type of an argument is a string and there is no candidate Java method that accepts a string argument. But instead, what if the string can be coerced to an integer like the abs snippet above? The permutations are many and varied and that's what makes bridging languages such an interesting problem.

The key thing is that we always follow PHP's example of being incredibly tolerant and wherever possible do the best thing we can. In some cases we simply cannot be absolutely sure we have chosen the right method or constructor. In which case we pick what we think is best and output a message saying the script needs to give us a better clue (for example, by using the JavaSignature class).

In many regards the Java Bridge acts much like a compiler does in selecting methods and fields. The big difference is that the Java Bridge does the selection at runtime based on the data types of arguments. A compiler does the selection at compile time based on the declared data types of arguments. And of course the Java Bridge has to be quick, really quick. Java's reflection API is much faster than it used to be but it is still relatively slow to lookup classes, methods and parameter types.