Thursday, January 1, 2009

Thoughts on Bridging PHP and Java

I thought it might be interesting to talk a little about how we bridge between Java and PHP in Project Zero. So to start with I'm going to explain some background to the problem and then in a follow up post talk more about how the Java Bridge works in practice.

Java and PHP
I'm not going to enumerate every difference between the languages but I will talk a little about the main issues that effect the Java Bridge. First of all I should say that Java and PHP share much common ground in their object oriented design: objects, interfaces, classes, methods, fields, static members, exceptions, visibility (public, private etc) and so on. 

They also have some differences and this is where the problems often arise when integrating the two languages. For example, PHP methods, classes and function names are case insensitive, while in Java they are case sensitive. So it is possible, although unlikely, that a PHP script could invoke a method foo() which has several possible target methods on a Java object (Foo, FOO, foo etc).

Overloaded Methods
PHP does not support overloading of constructors nor methods. Java does support overloaded methods and constructors. Overloading in Java means two or more methods or constructors on the same class that have the same name but different argument types. Overloading on the return type alone is not allowed. 

The most difficult issue with overloading is selecting the right overload. This has to be done at runtime for reasons explained later.

Static and Instance Methods
PHP also allows instance methods on an object to be called statically, consider the following code:

class Foo {
    function bar() { }


This runs fine albeit with the following strict coding notice to highlight the potential problem:

Strict Standards: Non-static method Foo::bar() should not be called statically in foo.php on line 7

In case you are wondering, if the method trys to access an instance field as follows:

class Foo {
  function bar() { 
$this->a = "Hello World!";


Then a fatal error terminates the script with this output:

Fatal error: Using $this when not in object context in foo.php on line 4

In the context of the Java Bridge, we cannot call an instance method on a Java object in a static way because this is simply not possible in Java. We can however do the reverse, which is to allow a PHP script to call a static method through an instance of the object. As the Java API documentation for a dynamic invoke shows, the obj argument is ignored if the method is static.

Dynamic Typing (Duck Typing)
From my experience, the biggest difficulty by far is with data typing. In PHP, variables have no declared data type. One minute a variable might be an integer and the next a string. In Java all variables have a fixed type, albeit with some latitude granted by polymorphism. Furthermore PHP does an amazing job of coercing types to match the circumstances:

echo strlen(array(1,2,3));

In this snippet I'm passing an array to a function clearly expecting a string. PHP politely informs me via a notice that it had to do something strange:

Notice: Array to string conversion in foo.php on line 3

And outputs 5 (why 5? Because it converted the array to the string Array).

Here is another simple example of type coercion:

echo abs("123456789");

In this snippet the abs function converts the string to an integer. 

So what does this mean? 
It means selecting the Java method to invoke is often the most difficult (and time consuming) part of the Java Bridge. This is especially true for overloaded constructors and methods. Perhaps the data type of an argument is a string and there is no candidate Java method that accepts a string argument. But instead, what if the string can be coerced to an integer like the abs snippet above? The permutations are many and varied and that's what makes bridging languages such an interesting problem.

The key thing is that we always follow PHP's example of being incredibly tolerant and wherever possible do the best thing we can. In some cases we simply cannot be absolutely sure we have chosen the right method or constructor. In which case we pick what we think is best and output a message saying the script needs to give us a better clue (for example, by using the JavaSignature class).

In many regards the Java Bridge acts much like a compiler does in selecting methods and fields. The big difference is that the Java Bridge does the selection at runtime based on the data types of arguments. A compiler does the selection at compile time based on the declared data types of arguments. And of course the Java Bridge has to be quick, really quick. Java's reflection API is much faster than it used to be but it is still relatively slow to lookup classes, methods and parameter types.

No comments:

Post a Comment