XRay: a function call tracing system

Tuesday, May 3, 2016

At Google we spend a lot of time debugging and tuning the performance of our production systems. Some standard practices when doing this involves using profilers, debuggers, and analysis of logs and execution traces. Doing this at scale, in production, is difficult. One of the ways for getting high fidelity data from production systems is to build applications with instrumentation, and then reconstruct the instrumentation data into a form humans can consume (summary statistics, reports, etc.). Instrumentation comes at a cost though, sometimes too high to make it feasible to deploy in production.

Getting this balance right is hard. This is why we've developed XRay, a function call tracing system that has very little overhead when not enabled, but can be dynamically turned on and only impose moderate costs. XRay works as a combination of compiler-inserted instrumentation points which functionally do nothing (called "nop sleds") and a library that can be enabled and disabled at runtime which replaces the nop sleds with the appropriate instrumentation instructions.

We've been using XRay to debug internal systems, from core infrastructure services like Bigtable to ad serving systems. XRay's detailed function tracing has enabled several teams in Google to debug issues that would be really hard to solve without XRay.

We think XRay is an important piece of technology, not only at Google, but for developers around the world. It's because of this that we're working on making XRay opensource. To kick-start that process, we're releasing a white paper describing the technical details of XRay. In the following weeks, we will be engaging the LLVM community, where we are committed to making XRay available for wide use and distribution.

We hope that by open-sourcing XRay we can contribute to the advancement of debugging real-world applications. We're looking forward to working with the LLVM community and other projects to make the data XRay generates useful for debugging a wide variety of applications.

By Dean Michael Berris, Google Engineering