UPDATE: Facebook updated HipHop PHP to enable optimization of the C++ compiled code. The benchmark results were updated to show the performance of the optimized C++ code generated by HipHop PHP.
PHP is a compiled language since PHP 4.0
Native machine code PHP compilers
Non-native PHP compilers
PHP compiler building tips
PHP compiler benchmarks
IntroductionThe introduction of HipHop, the PHP compiler from Facebook developers, brought a sudden interest to the matter of compiling PHP into code that executes faster.
Several PHP compilers existed since many years ago, but the fact that it is actually Facebook releasing their compiler made it a very relevant matter for PHP developers, as Facebook is currently the busiest site in the world that is developed mostly in PHP.
PHP is a compiled language since PHP 4.0The idea of what is a compiler seems to be a subject that causes great confusion. Some people assume that a compiler is a program that converts source code in one language into an executable program. The definition of what is a compiler is actually broader than that.
A compiler is a program that transforms source code into another representation of the code. The target representation is often machine code, but it may as well be source code in another language or even in the same language.
PHP became a compiled language in the year 2000, when PHP 4 was released for the first time. Until version 3, PHP source code was parsed and executed right away by the PHP interpreter.
PHP 4 introduced the the Zend engine. This engine splits the processing of PHP code into several phases. The first phase parses PHP source code and generates a binary representation of the PHP code known as Zend opcodes. Opcodes are sets of instructions similar to Java bytecodes. These opcodes are stored in memory. The second phase of Zend engine processing consists in executing the generated opcodes.
The Zend engine was built in such way that right after the first phase, the opcodes may be stored in the server shared memory space. This is done by special PHP extensions knows as opcode caching extensions. There are several PHP caching extensions also known as PHP accelerator extensions.
The purpose of these extensions is to skip the initial compilation step. If a PHP script was previously compiled and stored in shared memory, next time the same script is executed, the caching extension just loads the compiled opcodes from the shared memory very quickly. This way PHP gains a lot of time by skipping the initial opcode compilation step.
Keep in mind that despite the name, accelerator extensions do not make compiled PHP opcodes actually execute faster. Eventual acceleration of the execution of PHP opcodes may be achieved with optimizer extensions. These are special extensions that analyze the compiled opcodes and rearrange them in order that they may execute the same code faster.
This diagram shows the flow of PHP code compilation steps that happen when using the Zend Engine based PHP distributions.
Additionally, you may use PHP encoder extensions. These are PHP extensions that take the compiled PHP opcodes from memory and store them in files. These files are usually encoded in an encrypted format using a proprietary algorithm to prevent easy decoding.
These extensions allow distributing PHP code in a platform independent format without distributing the original source code. PHP encoder extensions are usually meant for selling proprietary PHP libraries.
Native machine code PHP compilersAnother form of PHP compilers are those that generate native machine code files. Those files contain machine code that is executed by the target machine CPU. This type of compilers is more recent. There are a few native PHP compilers. Let me give you an overview of the native PHP compilers that were evaluated.
They developed and released the first native PHP compiler known as pcc. Its development started in 2002. Initially it was launched as a commercial product, but in 2007 it was turned into an Open Source project.
The original version used a Scheme language compiler named Bigloo to generate native executable code. It can generate either standalone binaries or extensions for the Apache Web server. The resulting binaries or Web server extensions replace the code of whole PHP applications. The code generated by Roadsend PHP does not use any code nor runtime libraries from the Zend Engine.
Roadsend PHP also provides its own Web server, so you can generate standalone Web server executables without relying on Apache or any other Web server.
Its runtime engine also comes with an interpreter that can execute dynamically loaded or dynamically generated PHP code during the execution of a compiled PHP script. This way it can support mixing compiled with non-compiled PHP scripts.
In late 2008, Roadsend PHP developers started the project Raven, also known as rphp . It is basically a rewrite of the original PHP compiler using C++ and LLVM as code generator.
Trinity College of Dublin.
PHC can generate a PHP extension with the code of a compiled PHP script. Alternatively, it can also generate standalone binary executables by linking against PHP embed SAPI.
HipHop transforms PHP source code into C++ code. Then the resulting C++ code can be compiled into native binary executable code with a C++ compiler, like for instance GCC.
HipHop was developed to make more efficient use of the Facebook servers. Currently, over 90% of Facebook Web traffic is handled by PHP code compiled by the HipHop compiler.
HipHop compiles the code of a set of PHP scripts into an executable program that works as a multi-threaded Web server. This way, the compiled PHP code not only runs faster but also makes more efficient use of server machine memory.
A multi-threaded Web server uses less memory because it uses a single memory pool for all simultaneous requests. This is different from using multi-process Web servers, like when you use Apache in pre-fork mode. This is the mode that most Apache based PHP sites use. In this mode, each request is handled by a separate OS process. Each process has its own memory pool.
The main problem of using multi-process Web servers is that when a process uses a large chunk of memory, that memory is not returned to the OS until the process exits. Even if subsequent requests handled by the same process do not need so much memory, the unused memory space cannot be reused by other processes that may be handling other simultaneous requests.
Apache can be forced to kill pre-forked processes once in a while in order to perform memory recycling, but there is always a waste of memory until the process exit happens.
The memory usage details are very important for sites that require more than one server machine. The memory used by the Web server processes determines how many simultaneous requests each Web server machine can handle.
For instance, if you have a Web server with 1GB of RAM and each PHP request takes 10MB of RAM, in theory it can handle less than 100 of simultaneous requests. If all programs running in the same machine use more than the physically available RAM, the OS has to use virtual memory and the machine starts slowing down, as it has to swap physical memory blocks with virtual memory segments in disk.
This is why a server that takes an excessive amount of simultaneous requests can be halted, causing what is known as a DOS: Denial Of Service. With Apache, you can use the configuration directive MaxClients to limit the number of simultaneous requests. That will prevent the machine to halt, but the Web server starts queuing incoming requests. This means that the requests may be handled with a great delay or even be ignored.
The new aspect of HipHop PHP compiler is that Facebook developers have made an effort to convert PHP extensions in such way that they are thread-safe. This means that they can run without crashing multi-threaded Web server programs, which is what the HipHop compiler generates.
Non-native PHP compilersThere are also non-native PHP compilers that transform PHP code in Java bytecodes, like Quercus and Project Zero, or .NET assemblies like Phalanger. However, the performance of these compilers was not evaluated for this article.
PHP Compiler building tipsBefore moving to the actual results, I suppose you may want to try building these compilers by yourself to make your own performance tests. Here follow some tips to help you build them avoiding some of the troubles.
Zend ServerZend Server is available in precompiled packages, which are very easy to install and configure. So, there are no specific building tips for it. The only detail worth noting is that the version with PHP 5.3 crashed on evaluation environment. So I just installed the version that comes with PHP 5.2.11.
PHP as static Apache modulePHP as static module of Apache is very easy to build from source in most Linux distributions. First you need to download and unpack Apache 1.3 source code and run the initial configure script. Then follow the usual ./configure; make; make install procedure after downloading and unpacking PHP source code. Finally, go back to the Apache source directory to make and install it.
You can use Zend Optimizer from the Zend Server distribution when running PHP as Apache module. So you do not have to perform any additional installations to use Zend Optimizer. You can also copy php.ini and other configuration files from Zend Server installation.
PHCPHC is also very easy to build from source in any Linux distribution. Just follow the regular ./configure; make; make install steps. You may need to disable the requirement of the libgc library, as it may crash PHC executable in a 64 bit environment.
Also keep in mind that if you want to build standalone PHP executables, you have also to build PHP with embed SAPI from the source. Just download and install the regular PHP source archives using the --enable-embed parameter of the configure command.
HipHop PHPHipHop PHP source code was released just a few days ago to the public. It was very hard to build. It only builds well on some 64 bit Linux distributions. It requires building specific versions of certain libraries that usually are not available in all Linux distributions.
It takes a lot of time, patience and knowledge to build HipHop PHP. Also, many bugs are being fixed based on the feedback of the PHP community. If you do not have much time or enough knowledge to build packages from the source code, it is recommended that you wait binary packages are made available so you can test it yourself with much less time and patience.
If you insist be ready to spend a lot of time building all that is necessary. Try building it with Ubuntu, Fedora or CentOS preferably.
PHP compiler benchmarksSeveral tests were conducted to evaluate the performance of PHP code using different compiler environments. A script named bench.php was used for the actual benchmark evaluation. It is available from the official PHP development repository.
This script only performs tasks that do not execute any I/O operations, like accessing files, networks or database servers. Despite real world PHP scripts often need to perform I/O operations, the bench.php script is ideal to evaluate the raw performance of compiled code, as the speed of the compiled code only depends on the compiler approach and any PHP runtime libraries that may be needed.
The method that was used is executes the bench.php 5 times for each type of PHP compiler environment that was evaluated. It was only considered the shortest period of time that the script took to execute.
The reason for this approach is that the benchmark was performed on a Linux. Since Linux is a preemptive multi-tasking environment, each time the script executes, it is affected by other processes disputing the CPU on the machine. The shortest execution time should be closer to when running the benchmarked program if it had the CPU all for itself.
Below follows the table of results for the different PHP compiler environments that were evaluated.
|Compiler environment||Name of the compiler environment that was evaluated|
|Time||Best time it took to execute from the 5 times it was evaluated|
|Relative speed||Speed improvement in percentage relative to the reference PHP environment, which is using a shared library based Web server, Zend Server in this case.|
|PHP static||PHP 5.2.11 compiled as static module of Apache 1.3 Web server|
|PHP static + Zend Optimizer||Same as PHP static but using Zend Optimizer shipped with Zend Server Community Edition 4.0.6|
|Zend Server||PHP 5.2.11 linked dynamically as module of Apache shipped with Zend Server Community Edition 4.0.6|
|Zend Server + Zend Optimizer||Same as Zend Server but with Zend Optimizer extension enabled|
|PHC||PHP compiled as standalone executable with PHC|
|PHC optimized||Same as PHC but optimized with option -O3|
|HipHop||PHP compiled by the Facebook HipHop PHP compiler|
|Zend Server + Zend Optimizer||5.649||1.28%|
|PHP static + Zend Optimizer||5.427||5.16%|
The best performersAs you may see above, the fastest code was generated by PHC and HipHop PHP compilers. Currently, the main difference between these compilers is that HipHop generates whole Web server executables and PHC generates PHP extensions that can be used in a regular PHP installation based on Zend Engine.
This means that not only PHC generates very fast code, but also that it can also be used to distribute PHP scripts as proprietary extensions, while HipHop compiler generates monolithic PHP application executables meant only for final deployment.
PHC compiled code only seems to perform well when enabling optimizer options. HipHop PHP also has optimizer level options but no big difference was noted when trying different optimizer level values.
PHP static beats PHP dynamically loadedNext you may see that PHP compiled as a static Web server module is what gives better results.
The apparent reason why it runs faster than PHP with the Zend Server Community Edition, is that in this case it is compiled as dynamically linked module of Apache 2.2. This was believed to make PHP run a slower then when it is compiled and linked statically with Apache 1.3. This fact was denoted by Ilia Alshanetsky, a PHP contributor, in an article that compares PHP running in Apache versions 1 and 2.
However, that is an old article. So I contacted Ilia recently asking if PHP still suffers of performance degradation when compiled as dynamically linked Web server module. Ilia told me that it was a problem in the way PHP symbols are exported. He also said that the problem was fixed some time ago.
So the current reason for the difference in performance between static and dynamically linked modules is uncertain. Maybe the problem above was not completely fixed, or it was fixed in PHP 5.3, which was not evaluated in these tests. The fact is that statically linked PHP still runs faster.
Unfortunately, many hosting providers are not aware of these details and end up using PHP installations for Linux distributions that are based on dynamically linked libraries. This means that PHP will run necessarily slower.
If you control your own Web servers, it is better that you compile PHP by yourself and link statically against Apache 1.3. That will provide you a good speed improvement without needing to change your PHP scripts.
APC does not make PHP code run fasterSeveral people that evaluated these results before asked whether PHP static and Zend Server were executed with APC opcode caching extension enabled. The fact is that the use of opcode caching extensions, such as APC, is irrelevant in the evaluation of the performance of the compiled code.
By the time PHP starts executing the benchmark script, the opcodes were already compiled and are loaded in memory. Caching extensions act before the actual execution, so they do no influence the speed of execution of the PHP code.
Opcode optimizers can improve results a bitIt is also interesting that Zend Optimizer provides a slight improvement to the speed of execution of PHP scripts. This means that it can rewrite the PHP opcodes generated by the Zend Engine opcode compiler in a way that they execute faster. Over time, it may be possible that the optimizations that Zend Optimizer performs will end being integrated in the regular PHP distribution, so Zend Optimizer may not be necessary.
Anyway, keep in mind that using an optimizer like Zend's, just by itself may not be a good idea, unless you are using a compatible opcode caching extension. The reason for this fact is that the optimizer will take some time to evaluate and rewrite the opcodes. If you do not use an opcode caching extension, the optimization step may slow down PHP compilation more than what you gain in the execution of the optimized opcodes.
Roadsend PHP was not evaluatedUnfortunately, it was not
possible to fully evaluate Roadsend pcc compiler. A bug in the current
version prevents to completely run the whole bench.php script. Using the pcc interpreter mode, the bench.php script runs, but the interpreter
performs poorly, and so its results are not useful to make a fair comparison.
Anyway, the pcc compiler is no longer being actively developed. It will be replaced by the Raven project compiler - rphp . This compiler uses C++ and will borrow some code from the PHC project . However, since it is still being developed, it was not possible to evaluate it now.
Object Oriented code was not evaluatedOther than that, not all compilers support all PHP language constructs. To make a fair comparison, PHP 5.2.11 was used avoid distortions caused by the of PHP versions that provide different levels of PHP execution efficiency.
The bench.php script itself only uses procedural code. So it was not evaluated the support to all object oriented programming features in each compiler. Anyway, since this article was meant just to evaluate raw performance, the level of compatibility with the official PHP distribution provided by each compiler was not evaluated in this article.
I/O operations may eliminate compiled code execution gainsTo conclude, the fact that some compilers generate code that runs 79% and 35%
faster, does not mean that it will proportionally reduce the number of
servers that busy sites need to handle many simultaneous
As mentioned above, PHP spends most of the time performing I/O operations, like accessing files, network and databases. What really matters to determine how many simultaneous users a Web server can handle, is the amount of RAM that the Web server takes in average to handle simultaneous requests.
That said, compiling PHP code to make pure CPU tasks (non-I/O dependent) be finished faster is probably not what you need to do first to optimize your sites. You may want to take a look at this past article that provides several tips on making your PHP sites handle high traffic loads.
Which compiler is best for who?Given the above, the most important factor for very busy sites that need many servers to handle all user requests is the memory usage efficiency provided by compilers that generate thread safe code that can be linked statically to a Web server. Currently only HipHop compiler seems to provide this feature.
On the other hand, if you have sites that are not so busy, PHC approach seems to be a much better approach, as it provides better speed execution increase while generating PHP extensions that can be run with a regular Zend Engine based PHP installation.
Roadsend rphp compiler also seems to be promising in terms of performance, especially when it is known the fact that it is borrowing code from the PHC compiler optimizer engine. It will also feature a JIT (Just In Time) compiler, so both static and dynamically loaded or generated code will run fully optimized and compiled to native machine code. rphp is also targeting PHP 6 language features and already has Unicode support.
Since the Roadsend PHP approach is a different, for now it only has support to a number PHP extensions while PHC can be used with any of the extensions available in the current PHP installation.
Cooperation between projects will benefit the PHP communityAs you may have noticed, there is not a single approach that is ideal for all the types of PHP deployments. I think it would probably better to merge the efforts and combine the strengths of each approach for the benefit of a wider PHP community. But that cooperation effort depends mostly on the will of the developers responsible for each approach.
Thank you for some special collaboratorsBefore ending this article I would like to mention that this is the work of several weeks of research. However, it would not have been possible without the help of several PHP developers that mainly helped building and testing PHP in the different PHP compiler environments that were evaluated.
I just will mention the most important in no particular order. Cesar Rodas provided a server machine with a CentOS 64 bit distribution and help building many of the tested packages, Paul Biggar of the PHC project, Shannon Weyrick of the Roadsend project, Haiping Zhao HipHop project lead from Facebook, Scott MacVicar PHP core developer that is now working for Facebook and lead the HipHop project Open Source release, and many others that I do not know their real names but helped with tips shared on IRC.