opensource.google.com

Menu

Advanced TPU optimization with XProf: Continuous profiling, utilization insights, and LLO bundles

Monday, March 23, 2026

Advanced TPU optimization with XProf: Continuous profiling, utilization insights, and LLO bundles

In our previous post, we introduced the updated XProf and the Cloud Diagnostics XProf library, which are designed to help developers identify model bottlenecks and optimize memory usage. As machine learning workloads on TPUs continue to grow in complexity—spanning both massive training runs and large-scale inference—developers require even deeper visibility into how their code interacts with the underlying hardware.
Today, we are exploring three advanced capabilities designed to provide "flight recorder" visibility and instruction-level insights: Continuous Profiling Snapshots, the Utilization Viewer, and LLO Bundle Visualization.

Continuous Profiling Snapshots: The "Flight Recorder" for ML

Standard profiling often relies on "sampling mode," where users manually trigger high-fidelity traces for short, predefined durations. While effective for general optimization, this traditional approach can miss transient anomalies, intermittent stragglers, or unexpected performance regressions that occur during long-running training jobs.
To address this visibility gap, XProf is introducing Continuous Profiling Snapshots. This feature functions as an "always-on" flight recorder for your TPU workloads.
How it works: Continuous profiling snapshots (Google Colab) operates quietly in the background with minimal system overhead (approximately 7µs per packet CPU overhead). It utilizes a host-side circular buffer of roughly 2GB to seamlessly retain the last ~90 seconds of performance data. This architecture allows developers to snapshot performance data programmatically precisely when an anomaly occurs, bypassing the overhead and unpredictability of traditional one-shot profiling.

A diagram illustrating the limitation of traditional trace capturing, where a transient performance anomaly is missed because the trace capture was manually triggered before or after the anomaly occurred.
Figure 1: Traditional trace capturing without Continuous Profiling Snapshots.

A diagram showing how Continuous Profiling Snapshots capture comprehensive context. A performance anomaly occurs, and the 'always-on' circular buffer allows the user to snapshot the performance data, capturing the anomaly and the preceding 90 seconds of context.
Figure 2: Comprehensive context captured via Continuous Profiling Snapshots.

Key technical features include:

  • Circular Buffer Management: Continuously holds recent trace data to ensure you can capture the exact moments leading up to an anomaly or regression.
  • Out-of-band State Tracking: A lightweight service polls hardware registers for P-state (voltage and frequency) and trace-drop counters, ensuring the snapshot contains the necessary environmental context for accurate analysis.
  • Context Reconstruction: The system safely decouples state capture from the trace stream. This ensures that any arbitrary snapshot retains the ground truth required for precise, actionable debugging.

Visualizing Hardware Efficiency with the Utilization Viewer

Raw performance counters are powerful, but interpreting thousands of raw hardware metrics can be a daunting, time-consuming process. The new Utilization Viewer bridges the gap between raw data streams and actionable optimization strategies.
This tool translates raw performance counter values into easily understandable utilization percentages for specific hardware components, such as the TensorCore (TC), SparseCore (SC), and High Bandwidth Memory (HBM).

A screenshot or visualization of raw performance counter data, presented as a long, detailed list of thousands of uninterpreted hardware metrics and event counts.
Figure: Raw Performance Counter
Figure 3: Deriving actionable insights from raw performance counters.

From Counters to Insights: Instead of requiring developers to manually analyze a raw list of event counts, the Utilization Viewer automatically derives high-level metrics. For example, it can translate raw bus activity into a clear utilization percentage (e.g., displaying an average MXU bus utilization of 7.3%). This immediate clarity allows you to determine at a glance whether your model is compute-bound or memory-bound.

A visualization from the Utilization Viewer showing automatically derived high-level metrics, displaying clear utilization percentages for key hardware components like TensorCore (TC), SparseCore (SC), and High Bandwidth Memory (HBM), to help determine if a model is compute-bound or memory-bound.
Figure 4: Perf Counters Visualization in Utilization Viewer

Inspecting the Metal: Low-Level Operations (LLO) Bundles

For advanced users and kernel developers utilizing Pallas, we are now exposing Low-Level Operations (LLO) bundle data. LLO bundles represent the specific machine instructions issued to the TPU's functional units during every clock cycle.

This feature is critical for "Instruction Scheduling" verification—ensuring that the compiler is honoring your programming intentions and correctly re-ordering instructions to maximize hardware performance.

New Visualizations via Trace View Integration: You can now visualize LLO bundles directly within the trace viewer. Through dynamic instrumentation, XProf inserts traces exactly when a bundle executes. This provides exact execution times and block utilization metrics, rather than relying on static compiler estimates.

Why it matters: Accessing this level of granularity enables hyper-specific bottleneck analysis. For instance, developers can now identify idle cycles within the Matrix Multiplication Unit (MXU) pipeline, making it easier to spot and resolve latency between vmatmul and vpop instructions.

Conclusion

Whether you are trying to capture a fleeting performance regression with Continuous Profiling, verifying kernel efficiency with LLO Bundles, or assessing overall hardware saturation with the Utilization Viewer, these new features bring internal-grade Google tooling directly to the open-source community. These tools are engineered to provide the absolute transparency required to optimize high-scale ML workloads.

Get started by checking out the updated resources:

Open Source, Open Doors, Apply Now for Google Summer of Code!

Monday, March 16, 2026

Join Google Summer of Code (GSoC) and start contributing to the world of open source development! Applications for GSoC are open from now - March 31, 2026 at 18:00 UTC.

Google Summer of Code is celebrating its 22nd year in 2026! GSoC started back in 2005 and has brought over 22,000 new contributors from 123 countries into the open source community. This is an exciting opportunity for students and beginners to open source (18+) to gain real-world experience during the summer. You will spend 12+ weeks coding, learning about open source development, and earn a stipend under the guidance of experienced mentors.

Apply and get started!

Please remember that mentors are volunteers and they are being inundated with hundreds of requests from interested participants. It may take time for them to respond to you. Follow their Contributor Guidance instructions exactly. Do not just start submitting PRs without reading their guidance section first.

Complete your registration and submit your project proposals on the GSoC site before the deadline on Tuesday, March 31, 2026 at 18:00 UTC.

We wish all our applicants the best of luck!

OpenTitan shipping in production

Wednesday, March 4, 2026

Last year, we shared the exciting news that fabrication of production OpenTitan silicon had begun. Today, we're proud to announce that OpenTitan® is now shipping in commercially available Chromebooks.

The first OpenTitan part is being produced by Nuvoton, a leader in silicon security.

a close up of a blue circuit board focused on an IC

What is OpenTitan?

Over the past seven years, Google has worked with the open source communities to build OpenTitan, the first open source silicon Root of Trust (RoT). The RoT is the foundation upon which all other security properties of a device are derived, and anchoring this in silicon provides the strongest possible security guarantees that the code being executed is authorized and verified.

The OpenTitan project and its community are actively supported and maintained by lowRISC C.I.C., an independent non-profit.

OpenTitan provides the community with a high-quality, low-cost, commoditized hardware RoT that can be used across the Google ecosystem and also facilitates the broader adoption of Google-endorsed security features across the industry. Because OpenTitan is open source, you can choose to buy it from a commercial partner or manufacture it yourself based on your use case. In any of these scenarios, you can review and test OpenTitan's capabilities with a degree of transparency never afforded before in security silicon. This allows optimization for the use case at hand, whether it is having multiple reliable suppliers or ensuring the complete end-to-end control of the manufacturing process.

With OpenTitan, we are pushing the boundaries of what can be expected from a silicon RoT. For example, OpenTitan is the first commercially available open source RoT to support post-quantum cryptography (PQC) secure boot based on SLH-DSA. This helps future proof the security posture of these devices against potential adversaries with the capability to break classical public-key cryptography (e.g., RSA) via quantum computing. In addition, by applying commercial-grade design verification (DV) and top-level testing to an open source design, we have pushed for the highest quality while still allowing these chips to be transparent and independently verifiable. An added advantage of this approach is that we expect the high quality IP developed for OpenTitan to be re-usable in other projects going forward.

In addition to delivering this first instance of OpenTitan silicon as a product, we are proud of the processes that we have collaboratively developed along the way. In particular, both individual IP blocks and the top-level Earl Grey design have functional and code coverage above 90%—to the highest industry standards—with 40k+ tests running nightly. Regressions are caught and resolved quickly, ensuring design quality is maintained over the long term. Ownership transfer gives confidence that the silicon is working for you and helps to move away from co-signing so that you are in full control of your own update schedule. And since any IP is of little value without the ability to navigate and deploy it, we've prioritized thorough and accurate documentation, together with onboarding materials to streamline welcoming new developers to the project.

With lowRISC CIC and in collaboration with our OpenTitan partners, we pioneered open source security silicon development. While challenges are expected when doing something for the first time, the benefits of working in the open source have been clear: fast and efficient cross-organizational collaboration, retention of expertise regardless of employer, shared maintenance burdens, and high levels of academic research engagement.

What's next?

Firstly, bringup to deploy OpenTitan in Google's datacenters is underway and expected to land later this year.

Secondly, while we're thrilled about the advantages that this first generation OpenTitan part brings to Google's security posture, we have more on our roadmap, and have already begun work on a second generation part that will support lattice-based PQC (e.g., ML-DSA and ML-KEM) for secure boot and attestation. Stay tuned – more info on this coming soon!

Thirdly, OpenTitan started with the security use case because it is the hardest to get right. Having successfully demonstrated that we are able to deliver secure open silicon, we're confident that the same methodology can be used to develop additional open source designs targeting a wide range of use cases (whether the focus is on security, safety, or elsewhere). We're excited to see re-use of IP that was developed for OpenTitan being adapted for Caliptra, a RoT block that can be integrated into datacenter-class SoCs.

Getting Involved

OpenTitan shipping in production is a defining milestone for us and all contributors to the project. We're excited to see more open source silicon developed for commercial use cases in the future, and to see this ecosystem grow with lowRISC's introduction of new membership tiers.

As the following metrics show (baselined from the project's public launch in 2019), the OpenTitan community is rapidly growing:

  • Over ten times the number of commits at launch: from 2,500 to over 29,200.
  • 275+ contributors to the code base
  • 3.2k Github stars

If you are interested in learning more, contributing to OpenTitan, or using OpenTitan IP in one of your projects, visit the open source GitHub repository or reach out to the OpenTitan team.

Announcing CEL-expr-python: the Common Expression Language in Python, now open source

Tuesday, March 3, 2026

We're excited to announce the open source release of CEL-expr-python, a Python implementation of the Common Expression Language (CEL)! CEL (cel.dev) is a powerful, non-Turing complete expression language designed for simplicity, speed, safety, and portability. CEL is designed to be embedded in an application, and you can use CEL to make decisions, validate data, or apply rules based on the information your application has.

What is CEL-expr-python?

CEL-expr-python provides a native Python API for compiling and evaluating CEL expressions that's maintained by the CEL team. We'd like to acknowledge the fantastic work already done by the open source communities around support for CEL in Python, and look forward to your contributions to help us further enrich the CEL ecosystem.

The CEL team has chosen to develop CEL-expr-python by wrapping our official C++ implementation to ensure maximum consistency with CEL semantics while enabling Python users to extend and enrich the experience on top of this production-ready core in Python directly. Additionally, new features and optimizations implemented in CEL C++ will automatically and immediately become available in CEL-expr-python.

Who is it for?

If you're working on a Python project that needs to:

  • Evaluate expressions defined dynamically (e.g., loaded from a database, configuration, or user input).
  • Implement and enforce policies in a clear, concise, and secure manner.
  • Validate data against a set of rules.

...then CEL-expr-python is for you!

Why use CEL-expr-python?

CEL has become a prevalent technology for applications like policy enforcement, data validation, and dynamic configuration. CEL-expr-python allows Python developers to leverage the same benefits, including:

  • Safety: CEL expressions are side-effect free and terminate guaranteed.
  • Speed: Designed for efficient evaluation.
  • Portability: Expressions are language-agnostic.
  • Familiarity: Builds upon established CEL concepts.

With CEL-expr-python, you can now seamlessly integrate this technology within your Python stack.

Get Started!

Check out the CEL-expr-python Repository here: https://github.com/cel-expr/cel-python

We're thrilled to bring CEL-expr-python to the open source communities and can't wait to see what you build with it!

Here's a code snippet demonstrating how to initialize CEL-expr-python and evaluate an expression.

from cel_expr_python import cel

# Define variables
cel_env = cel.NewEnv(variables={"who": cel.Type.STRING})
expr = cel_env.compile("'Hello, ' + who + '!'")

# Evaluate and print the compiled expression
print(expr.eval(data={"who": "World"})))  // Hello, World!

For a more in-depth tutorial, check out our codelab here: https://github.com/cel-expr/cel-python/blob/main/codelab/index.lab.md

The CEL-expr-python repository will be initially available as read-only. We encourage you to try it out in your projects and share your experiences. Feel free to leave feedback in our github issue queue, as we are eager to hear your feedback and will be working promptly to address any issues or suggestions.

While we are not accepting external contributions at this moment, we are committed to building a community around CEL-expr-python and plan to open up for contributions in the future. Stay tuned for updates.

This Week in Open Source #15

Friday, February 20, 2026

This Week in Open Source for February 20th, 2026

A look around the world of open source

We're preparing for a busy conference season, with events like SCALE 23x and KubeCon + CloudNativeCon Europe on the horizon. A core part of our mission is "learning and sharing what we learn" so that our communities can continue to thrive together. Conferences are a great place to fulfill that mission.

This week, we're highlighting a few "Open Source Reads" that tackle some of the biggest questions facing our ecosystem today—from the complex ethics of AI-generated content to the global impact of open AI models. We hope these links provide valuable context as we work together to sustain the critical infrastructure we all rely on.

Upcoming Events

  • February 24 - 25: The Linux Foundation Member Summit is happening in Napa, California. It is the annual gathering for Linux Foundation members that fosters collaboration, innovation, and partnerships among the leading projects and organizations working to drive digital transformation with open source technologies.
  • March 5 - 8: SCALE 23x is happening in Pasadena, California. It is North America's largest community-run open source conference and includes four days of sessions, workshops, and community activities focused on open source, security, DevOps, cloud native, and more.
  • March 9 - 10: FOSSASIA Summit 2026 is happening in Bangkok, Thailand. It will be a two-day hybrid event that showcases the latest in open technologies, fostering collaboration across enterprises, developers, educators, and communities.
  • March 16 - 17: FOSS Backstage is happening in Berlin, Germany. This conference brings together the brightest minds in the industry to discuss and explore all about FOSS community, management and compliance.
  • March 22: Maintainer Summit EU is happening just before CloudNativeCon in Amsterdam, The Netherlands. This is an exclusive event for the people behind our projects to gather face-to-face, collaborate, and celebrate cloud first projects.
  • March 23 - 26: Kubecon + CloudNativeCon Europe is happening in Amsterdam, The Netherlands. This is the flagship conference for the Cloud Native Computing Foundation (CNCF) and brings together adopters and technologists from leading open source and cloud first communities.
  • March 26 - 29: ATmosphereConf is happening in Vancouver, British Columbia. This conference from the AT Protocol Community took a 2 day conference then booked the venue for the two days prior (March 26th & 27th) with smaller theaters and break out rooms for everything from extended events to developer training to building together.
  • April 7 - 8: PyTorch Conference EU is happening in Paris, France. Hosted by the PyTorch Foundation, this conference gathers top-tier AI pioneers, researchers, and developers to explore the future of AI.

Open Source Reads and Links

  • [Blog] An AI Agent wrote a hit piece on me - An AI agent wrote a harmful article about a maintainer after he rejected its code for a popular Python library. This shows a new risk where AI can attack people to get what it wants. We must be careful as AI misbehavior can hurt reputations and trust in software. This was followed by a part 2 after more things happened.
  • [Blog] Stop closing the door; fix the house - A different take on the crossover between AI and open source. Instead of closing contributions due to poor AI generated code, maintainers should guide contributors and AI tools with clear rules and automation. This helps keep quality high and keeps the community open.
  • [Article] Everyone uses open source, but patching moves too slowly - "Maintenance is the highest form of creation." Open source requires maintenance, especially when 60% of security incidents hit unpatched code. How can we work together to keep our communities healthy and secure?
  • [Paper] AI-powered open-source infrastructure for accelerating materials discovery and advanced manufacturing - Gen AI isn't the only type of AI in the game. This paper explains how AI and open-source tools help speed up the discovery of new materials. Through using data, simulations, and machine learning together we can build efficient and sustainable platforms.
  • [Blog] AI is destroying open source and it's not even good yet - Here's another post about how maintainers face more work and frustration because AI often makes mistakes and doesn't help fix problems. This growing issue could get worse as AI becomes more widely used without careful oversight. So, what types of conversations should we be having to create that oversight?
  • [Article] What's next for Chinese open source AI? - Chinese companies are creating and sharing powerful open AI models that anyone can use and modify. Because these models are cheaper and widely adopted globally, they challenge the western AI models. This open approach is changing how AI innovation happens and who controls its future.

Which of these stories will you be chatting about at your next meetup or conference? Let us know! Share with us on our @GoogleOSS X account or our @opensource.google Bluesky account.

.