distcc's pump mode: A New Design for Distributed C/C++ Compilation
Thursday, August 7, 2008
For a while now, Google has been using distcc, a distributed C/C++ compilation system, to speed up building software made of millions of lines of code. With distcc, we can build code an order of magnitude faster than we could if everyone had to compile on their own workstation. But even with distcc, compiles could take a long time: compiling the Google Webserver might take 20 minutes. We started looking at distcc to see if we could make it even faster.
We're proud to report that we've succeeded: we've developed an algorithm we call "pump mode", which can be added to distcc to speed it up by a factor of 3. Pump mode works by pushing even more processing onto the servers. Based on an incremental static analysis of the source code, pump mode is able to quickly identify the sets of files needed for the preprocessing phase of compiling C/C++ programs and send them to the compilation servers for preprocessing. This achieves a dramatic decrease in the CPU load of the workstation and of course much better build speed. We have tested pump mode on some open source software and seen improvements in build speed between 50% (the Linux kernel) and 200% (Samba). With simple changes to the project Makefiles, most projects we have looked at would be even faster!
The pump mode extension has been Google's main C/C++ build system for over a year now.
Distcc's pump mode was developed by a small team at Google that included myself, Manos Renieris, Fergus Henderson, and Craig Silverstein. The pump mode extension complements the recently released open source gold linker, which addresses the other basic bottleneck for fast building of C/C++ software.
Distcc's pump mode is included in release 3.0 of distcc. This is the first release since 2004 when Martin Pool, the original author of the code base, released version 2.18.3. Distcc 3.0 contains many other contributions from a variety of contributors, including Avahi Zeroconf support by Lennart Poettering, "lsdistcc" by Dan Kegel, and bug fixes and portability improvements by Nadim Khemir, Maks Verver, Niklaus Giger, Sascha Demetrio, Alex Besogonov, Ben Skeggs, Lisa Seelye, Lei Zhang, Michael Moss, Dongmin Zhang, and others. Disctcc is now maintained by Fergus Henderson.