opensource.google.com

Menu

Unlocking TPU performance: Deep kernel profiling with XProf

Monday, June 8, 2026

Unlocking TPU performance: Deep kernel profiling with XProf

As machine learning workloads scale to unprecedented heights, developers are increasingly writing highly specialized Tensor Processing Unit (TPU) kernels using frameworks like Pallas, Mosaic, and Triton to maximize hardware performance.

However, customizing high-performance kernels has historically introduced a major engineering challenge: optimization blind spots. To legacy performance profilers, custom compilation paths appear as opaque execution paths. Developers are left with single, massive execution blocks in their trace captures, lacking granular visibility into what is actually occurring inside the chip's internal components. Did a vector processing instruction stall? Was matrix math idle due to data loading bottlenecks?

Traditional profiling relies heavily on compile-time static cost models to estimate kernel efficiency. While helpful for standard operations, these models cannot capture dynamic runtime realities like instruction execution stalls, memory subsystem congestion, or hardware scheduling conflicts.

To open this opaque execution path, we are excited to introduce the Kernel Profiling suite in XProf—a low-level hardware debugging suite engineered specifically for Pallas kernel authoring and optimization on Google TPUs. By combining static compilation tracking with dynamic, sub-microsecond hardware telemetry, XProf Kernel provides the deep transparency required to optimize high-scale ML workloads.

Deep visibility: HLO Graphs & MLIR Inspection

The first step in debugging any custom kernel is understanding how your high-level code is translated by the compiler. When compiling a JAX or PyTorch model, the compiler generates a High-Level Optimizer (HLO) graph. Previously, custom calls inside these graphs remained completely obscured.

XProf's updated Graph Viewer resolves this by exposing the internal compilation logic of these custom regions directly. To unlock this deep visibility, developers must pass the appropriate debug flags to the XLA compilation environment.
--xla_enable_custom_call_region_trace=true
--xla_xprof_register_llo_debug_info=true

Once these flags are active, any trace captured via XProf includes comprehensive compiler metadata. In the XProf Graph Viewer, clicking on a custom-call block reveals an interactive panel titled "Custom Call Text." This displays the raw, lowered MLIR (Multi-Level Intermediate Representation) code generated by the compiler.

A screenshot of the TensorBoard XProf interface displaying an HLO graph, with a Custom Call Text panel open to reveal raw MLIR code
Figure 1: XProf interface displaying an HLO graph, with a "Custom Call Text" panel to reveal raw MLIR code

By displaying the MLIR text side-by-side with high-level source-code representations, developers can immediately verify whether the compiler is correctly fusing operations and structuring memory tiles as intended.

Tracing Instrumented Low-Level Operations (LLO) Analysis

To provide cycle-level execution visibility, XProf exposes Low-Level Operations (LLO) bundle data directly inside the Trace Viewer. An LLO bundle represents the actual machine instructions issued to the TPU core's functional units during every clock cycle.

Through dynamic instrumentation, XProf inserts hardware markers exactly when a LLO bundle region executes. Within the Trace Viewer, this manifests as dedicated, time-aligned execution tracks representing the TPU bundle's slot utilization metrics from static analysis:

  • MXU (Matrix Multiply Unit): Tracks active, busy cycles of high-throughput matrix-multiplication pipelines.
  • Scalar and Vector ALUs: Displays the execution profile of mathematical operations, letting you spot pipeline imbalances.
  • Vector Fills, Loads, Spills, and Stores: Exposes HBM-to-register data movement, critical for identifying bandwidth-throttling bottlenecks.
  • XLU (Cross-Lane Unit): Monitors collective communications and data shuffling across physical TPU cores.
XProf Capture Profile trace viewer interface showing dynamic hardware execution tracks
Figure 2: XProf Capture Profile trace viewer interface showing dynamic hardware execution tracks

Runtime Performance Counter Sampling

While static analysis effectively verifies instruction counts or vector store logic, it remains detached from the dynamic realities of runtime execution. To bridge this gap, XProf introduces fine-grained, periodic performance counter sampling—available starting with TPU v7 (Ironwood). This capability empowers developers to move beyond static estimation and measure precisely how hardware blocks are utilized in real-time, providing the empirical ground truth needed to identify whether compute units are truly active or stalled by memory subsystems.

Consider the optimization of a tiled matrix multiplication (Matmul) kernel. While a static trace might indicate a logically perfect sequence of operations, real-world performance often falters if the Matrix Multiply Unit (MXU) sits idle while awaiting data from High-Bandwidth Memory (HBM). To diagnose and resolve such bottlenecks, developers can utilize a structured three-step profiling workflow:

  1. Set up the Profiling Environment: Configure the TPU v7 (Ironwood) runtime by defining specific hardware counters—such as scalar issues or synchronization waits.
  2. Capture a Kernel Profile: Use the XProf request interface to capture fine-grained performance counters, which can then be visualized as a time-series within the Trace Viewer.
  3. Interpret the Data: Analyze the resulting counters to distinguish between a Memory-Bound Scenario (characterized by massive spikes in sync_wait) and an Optimized Scenario. For instance, implementing triple buffering to overlap memory loads with MXU compute can reduce runtime from 125.5µs to 88µs—a ~30% performance gain validated by a drastic reduction in synchronization events.

By shifting from static code inspection to empirical runtime telemetry, hardware behavior explicitly validates optimization strategies, ensuring every cycle on the silicon is spent productively. For a hands-on example to check out these techniques, please explore our Pallas Matmul w/ Perf Counters demo.

XProf timeline highlighting a comparison between a detailed Runtime Perf Counter section sampling at a 1-microsecond frequency and a Static LLO Region track below it
Figure 3: XProf timeline highlighting a comparison between a detailed "Runtime Perf Counter" section sampling at a 1-microsecond frequency and a "Static LLO Region" track below it

Visualizing the "Utilization Gap"

This dynamic tracking exposes the significant gap left by traditional static analysis tools. A static tool analyzes instructions linearly, completely ignoring time. It might flag an MXU instruction block as "100% Utilized."

In contrast, XProf plots actual hardware execution over time. You might discover that a long-running Scalar ALU operation is stalling the entire execution pipeline, leaving the powerful MXU completely idle. By visualizing these temporal idle gaps, developers can adjust data shapes, memory alignments, and instruction sequencing to maximize compute density.

STATIC ESTIMATION:
[========== Block Execution: MXU Flagged 100% Utilized ==========]

XPROF REAL-WORLD TIMELINE:
├─ [Scalar ALU (Active)] ─┼─ [MXU (Active)] ─┼── [MXU (Idle / Memory Stall)] ──┤
│ Stalling pipeline...     │ Compute phase     │ Starved; waiting for HBM Load    │
Figure 4 : The UI shows the active TPU Core functional unit tracks (MXU, Scalar ALU, Vector ALU, and memory data pipelines) aligned side-by-side with the active framework Ops, exposing exact execution times and real-time idle cycles.

Overall Utilization from Performance Counters

Navigating profiling metrics can be daunting. Relying on metrics calculated via compile-time cost models often misrepresents performance when applied to custom compilation paths. To solve this, XProf establishes a clear Hierarchy of Trust:

                  ┌───────────────────────────────┐
                  │     Absolute Ground Truth     │
                  │  (HBM, Hardware Registers,    │ (100% Trustworthy)
                  │       TPO Metrics, CSRs)      │
                  └───────────────┬───────────────┘
                                  ▼
                  ┌───────────────────────────────┐
                  │       Estimated Metrics       │
                  │   (Program Optimal FLOPs,     │ (Requires caution with
                  │      Goodput Efficiency)      │  custom compiling paths)
                  └───────────────────────────────┘
Figure 5: Hierarchy of Metrics
  1. The Absolute Ground Truth (100% Trustworthy): Metrics derived directly from physical hardware registers (HBM utilization, TPO metrics, unprivileged hardware stats). When profiling custom kernels, these represent physical reality and should be your primary optimization anchors.
  2. Estimated Metrics (Use with Caution): Metrics like "Compared to program optimal FLOPS" or "Goodput efficiency" rely on XLA cost models. Because custom compilation paths bypass standard passes, these metrics can be highly skewed or outright non-functional.

For the unvarnished truth, XProf exposes the Perf Counters View, providing direct, tabular access to over 16,000 raw hardware counters read straight from the TPU silicon.

A screenshot of the XProf Perf Counters tabular view, displaying a list of unprivileged hardware counters alongside their corresponding raw decimal and hexadecimal values
Figure 6: XProf Perf Counters Tabular View

Understanding Trace Tracks: The height of a trace track does not represent a normalized 0-100% percentage. It represents the maximum raw counter value observed in that interval. For example, if a counter increments by 100 cycles over a 500-nanosecond trace window (roughly 1,000 clock cycles on a 2.0 GHz core), it indicates exactly 10% physical utilization of that unit.

To configure and profile the runtime performance counters sampling method, please follow the instructions from <openxla.org/xprof/kernel-profiling.html>.

Advanced Sampling: Event-Triggered Profiling

Previously, dynamic capturing was limited to Periodic Sampling Mode—polling counters based on a host-level timer, which hit a physical resolution floor of 1 microsecond.

           CORE 0           CORE 1           CORE 2           CORE 3
      ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
      │  28 Counters │ │  28 Counters │ │  28 Counters │ │  28 Counters │
      └──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘
      └─────────────────────────────────────────────────────────────────┘
                            4 x 28 Sparse Matrix
Figure 7: Sparse Matrix Configuration

To capture lightning-fast hardware cycles, XProf now supports External Event-Triggered Mode. The dynamic sampler intercepts physical TPU trace instructions and boundary triggers (such as entering/exiting custom call scopes), allowing for sub-microsecond capture latency and precise attribution.

Developers can configure up to 28 hardware counters per core, distributed across up to four active SparseCores, creating a 4 x 28 profiling matrix that maximizes data variety while protecting workload performance.

Activating this is straightforward via standard JAX JIT profilers:

options = jax.profiler.ProfileOptions()

# Example request for externally triggered collection
options.advanced_configuration = {
"tpu_enable_periodic_counter_sampling" : True,
"tpu_tc_perf_counter_sampling_options" : (
          'is_external_trigger:true scaling:0 counter_size_bits:1 indices:10 indices:11 indices:56 indices:57 indices:58'
),
}

# For periodic sampling, please use interval_us instead of is_external_trigger.

Getting Started

Ready to transition from guessing performance to measuring and optimizing the physical limits of your ML silicon? Explore these open-source resources to get started with XProf Kernel today:

Journey to JPEG XL: How open source experiments shaped the future of image coding

Wednesday, June 3, 2026

Building the Next Generation Image Standard

The internet runs on images. Since the early days of the web, there has been a relentless tension between visual fidelity and bandwidth. For decades, the industry relied on the venerable JPEG standard for images loading fast. It served us remarkably well, but as displays moved to High Dynamic Range (HDR) and Wide Color Gamut (WCG), the format began to show its limits.

The road to JPEG XL (JXL) wasn't a straight line. It was a decade-long exploration, creating a series of milestone projects testing radical ideas in psychovisual modeling, entropy coding, and optimization. Today, as JPEG XL sees rapid adoption across operating systems and professional standards, we’re looking back at the experiments that made it possible.


The Early Foundation: 2011–2017

Our study began with a focus on understanding the limits of existing technology. We didn't start by trying to write a new standard; we started by trying to make the current ones better, and learning their limitations. This allowed us to make the new formalism more flexible and efficient in the right places.

  • WebP Lossless and Brotli: Lossy WebP drew its lineage from video technology, the WebP Lossless (2011) represented an architectural and scoping departure. We debuted the entropy image concept, an innovative method utilizing a secondary image to orchestrate the selection of static entropy codes for the primary visual data. We reapplied this approach later with data-driven context modeling in the Brotli compression format, enabling rich context modeling without slowing decoding.
  • Butteraugli: Around 2014, we realized that raw mathematical compression (PSNR) wasn't enough, and simple psychovisual approximations (SSIM and similar) failed in color-rich environments. We built Butteraugli and the XYB color space to mimic the human visual system's edge detection and opponent-color processes in varying scale, allowing us to compress images more effectively.
  • We pushed the legacy JPEG 1 standard (ISO/IEC 10918, introduced in 1992) to its absolute limits through two key projects: Guetzli and Brunsli. These initiatives provided invaluable insights into the strengths and limitations of traditional JPEG compression methods. Guetzli (2016) is a slow high-density perceptual encoder that used Butteraugli to find the optimal quantization tables, pushing legacy JPEGs to be 20-30% smaller. Brunsli (2015) meanwhile, focuses on lossless recompression, allowing users to repack existing JPEGs into a smaller footprint without losing a single bit of original data. After finishing with JPEG XL standardization, we returned to Guetzli's scope in 2024 and made the encoding much faster and HDR-compatible in Jpegli.

The feedback from these launches, ranging from the technical details of WebP Lossless to the psychovisual audits of Guetzli, proved indispensable. While we already targeted the highest visual fidelity, feedback from detail-critical e-commerce helped us to refine the requirements.


The Convergence: 2017–2019 PIK Era and the 2019 FUIF Integration

By 2017 we had powerful separate tools and it was time to fuse them. In open sourcing PIK we combined the efficiency of Brunsli with the psychovisual optimizations of Guetzli. Further, PIK introduced a real adaptive quantization field and other optimizations. PIK formed our proposal to the ISO standardization body. The committee's final call for proposals pushed toward extreme density, requiring bit rates as low as 0.06 BPP, equivalent to 35 times the compression of internet-quality images and 80 times that of camera output. This expansion of scope necessitated a significant complexification of the format and the encoder, leading to the Variable-block-size Discrete Cosine Transform (VarDCT) architecture that remains central to JPEG XL today.

We proposed to merge our PIK proposal with the FUIF (Free Universal Image Format) proposal from Cloudinary. PIK used Brotli-style distribution selection at encoding time, while FUIF refined codes incrementally during decoding. The final JPEG XL standard became a best-of-both-worlds compromise: we used PIK's faster-to-decode distribution selection with FUIF's sophisticated context trees. The merger represented a departure from conventional one platform driven standardization, and prioritized technical synergy and collaboration.

A flowchart titled 'Building Blocks of the JPEG XL Standard' showing a left-to-right progression across three periods. The first period, 'Early Building Blocks (2011-2017)', contains four boxes: WebP Lossless & Brotli, Butteraugli & XYB, Guetzli, and Brunsli. Arrows point from these early technologies into the second period, 'The Convergence (2017-2019)', which consists of two main boxes: PIK and FUIF. Finally, multiple lines flow from both PIK and FUIF, converging into the third period, 'Final Standard'. This final section features a large orange box labeled 'JXL: JPEG XL Standard', which is described as merging PIK's distribution selection with FUIF's context trees.

JPEG XL Today: An Ecosystem Takes Root

JPEG XL's efficiency, psychovisually-optimized quality, file size, and coding speed, are being noticed. We are seeing bottom-up adoption in various industries, the most demanding fields are leading the way. Because of its ability to handle high bit-depth, high quality and even lossless data efficiently and robustly, JPEG XL has become foundational in several fields:

  • Photography: Used in Digital Negative (DNG 1.7), Apple's ProRAW, and others.
  • Medical: Adopted by DICOM, the international standard for medical images.
  • Publishing: Integration into future versions of the PDF and EPUB standards.

The ecosystem has been maturing rapidly. Adobe's photography software, Apple's iOS, macOS, and visionOS have native support, as do Linux distributions like Ubuntu and Microsoft's JPEG XL Image Extension for Windows. Our libjxl-tiny inspired Shikino High-Tech, Inc. and CAST to release the first commercial JPEG XL encoder IP core for ASIC and FPGA designs, aimed at real-time, low-power image capture. Safari (2023) led among major browsers, while Firefox and Chrome currently maintain experimental support.

Two men in a bright office collaborating at a whiteboard. The board contains a hand-drawn flowchart titled 'VARDCT BLOCK JOINING STRATEGY'. The diagram illustrates small square blocks combining into larger patterned rectangles, connected by arrows. Text labels in the flowchart include 'Decision Logic: Rate-Distortion Cost', 'Merging Criteria', 'Entropy Coding Efficiency', 'Neighboring Blocks', and 'Variable Block Sizes'. The man on the left is pointing to the bottom left of the diagram, while the man on the right, who has long hair and a beard, is writing a mathematical equation on the board with a marker.
JPEG XL design was not only countless hours of optimization, experimentation and eye-balling the results, but also creative discussions at a whiteboard. In this Gemini-reconstructed scene, Luca Versari and Jyrki Alakuijala (left-to-right) debate VarDCT block selection heuristics.

Looking Forward

The story of JPEG XL stands as a testament to the efficacy of long-horizon planning validated by intermediate functional milestones—with minimum-viable prototypes like Guetzli and practical tools like Brunsli and Brotli—that invite feedback from the open-source community. A small research team can innovate by crystallizing solutions through quick iterations, with thousands, if not tens of thousands, of experiments in psychovisual modeling, entropy, coding speed and complexity, and the entire industry can eventually navigate toward a more efficient, beautiful future.

We started by trying to squeeze a few more bytes out of a 1992 JPEG 1 standard; with JPEG XL we hope to have established a foundation for digital imaging that can last for the next three decades.

Announcing Apache Iceberg 1.11.0

Wednesday, May 27, 2026

Apache Iceberg project has just launched version 1.11.0! A lot has happened since the last version.

Iceberg 1.11.0 adds support for Apache Spark 4.1 and Apache Flink 2.1, the latest releases of the two engines and makes both the default build targets

The rest are more structural. The REST catalog learns to plan scans server-side, shifting metadata work off the query engine. A new partition statistics scan API gives optimizers a clean, supported way to read a table's shape. Built-in table encryption arrives with envelope encryption and Google KMS support. And Google Storage Analytics library integration makes your Iceberg workloads faster than before.

Let's take a look at some of the biggest changes.

Spark & Flink Updates

As Spark and Flink are moving forward, the 1.11.0 release is pushing forward for new version support in both.

  • Spark 4.1 & DSv2 Migration: Spark 4.1 unlocks is MERGE INTO with automatic schema evolution: Spark's newer MERGE syntax accepts a WITH SCHEMA EVOLUTION clause, so a MERGE whose source carries columns the target table lacks can add those columns to the table within the same statement, with no separate ALTER TABLE round trip. Beyond the version bump, the 1.11 Spark connector also modernizes against Spark's newer DataSource V2 APIs and adds an asynchronous micro-batch planner that speeds up Structured Streaming.
  • Flink Ecosystem Updates: Initial work for Flink 2.1 support has landed in the core repository, continuing Iceberg's promise of providing first-class, low-latency streaming sink capabilities. The centerpiece of the Flink work is the DynamicIcebergSink, an experimental sink that breaks the old one-sink-per-table model: a single sink routes each record to a table chosen at runtime, creating tables on demand and evolving their schemas and partition specs on the fly as the input changes including dropping columns once you opt in with dropUnusedColumns. In addition to DynamicIcebergSInk work Flink started supporting nanosecond, variant and unknown types from V3 Spec.

Server-side scan planning

In previous versions of Iceberg, the client handled the heavy lifting of scan orchestration. The driver of engine would traverse the table's metadata tree, retrieving manifest lists and files from object storage to filter data against specific partition requirements. Iceberg 1.11.0 shifts this computational burden into the catalog through server-side scan planning.

Instead of manually traversing manifests, the engine submits a single POST …/plan request detailing the scan allowing the REST catalog to return optimized FileScanTasks.

The API is designed to handle data at any scale: smaller scans return immediate results, extensive operations return a plan-id for polling, and massive datasets are retrieved via parallel plan-tasks through POST …/tasks.

ALT TEXT
Planning moves off the query engine and into the catalog — the driver no longer touches metadata in object storage.

Built-in table encryption

As data lakes increasingly serve as the central hub for sensitive PII and financial data, relying solely on bucket-level storage encryption is no longer enough. Iceberg 1.11.0 introduces built-in table encryption, bringing fine-grained, KMS-backed security directly to the table level.

This provides data platform teams with robust capabilities for security and compliance:

  • Zero-Trust Storage Security: Even if a malicious actor gains direct access to your underlying object storage bucket, the data remains completely unreadable.
  • Total Index Protection: It isn't just the raw data that is protected; Iceberg encrypts the manifest lists as well, preventing attackers from inferring sensitive information from table statistics.
  • Tamper-Proof Data: Built-in authentication tags guard against unauthorized modifications, ensuring data integrity.
  • Effortless Key Rotation: Keys are rotated automatically as they age, satisfying strict compliance mandates without requiring you to rewrite massive datasets.

Iceberg achieves this using envelope encryption with a three-tier key hierarchy. A table master key lives securely in your KMS and never touches Iceberg storage. This master key wraps key-encryption keys (KEKs), which are stored safely inside the table metadata. Finally, each KEK wraps a unique, per-file data-encryption key (DEK).

Every data file and manifest list is then encrypted with AES-GCM under its own unique DEK. This decoupled architecture ensures maximum security while maintaining the high performance expected of Iceberg workloads.

File Format API

Historically, Iceberg's format-handling code was tightly coupled, growing organically around Parquet, Avro, and ORC. Adding a new format or enforcing consistent feature support (like V3 default values or new column types) across all formats meant duplicating complex engine-specific switch/case code paths.

Iceberg 1.11.0 introduces the finalized File Format API, bringing a consistent API to reading and writing all of these file formats.

Instead of hardcoded engines handling binary extraction, the architecture introduces:

  • FormatModel: A standardized implementation defining how a file format handles reader/writer construction and its specific capabilities.
  • FormatModelRegistry: A central directory where query engines fetch appropriate read and write builders.

This API (which is already seeing adoption around other Apache Iceberg implementations) provides a significant code cleanup for the future of the project. It also opens the door for more file formats as time goes on.

Moreover, this new interface facilitates the implementation of Column Families, enabling vertical partitioning of storage. This advancement lets teams perform targeted updates or rewrites on isolated columns—such as recalculating vector embeddings—while leaving the remaining table data undisturbed.

SQL UDF Specification

1.11.0 includes the SQL UDF specification, which adds a brand new metadata format for both Scalar and Table Functions:

  • Immutable Versioning and Rollback: UDF metadata is written as self-contained, versioned JSON files stored right in the object store. If a data engineer deploys a buggy UDF update, administrators can execute an atomic rollback to a previous version log state
  • Standardized Schema Typings: Parameters and return types map cleanly to Iceberg Type JSON representations, directly accommodating complex nested maps, structs, and the upcoming Iceberg V3 variant type.
  • Engine Specific Execution: Each SQL UDF has a function implementation for each engine, allowing users to leverage engine-specific functionality in their UDFs.

Google Analytics Library Integration

For Google Cloud customers, version 1.11.0 delivers substantial throughput gains by embedding the GCS Analytics Core library into GCSFileIO (Issue #14326, PR #14333).

This integration introduces Footer Prefetching, which optimizes Parquet length checks by caching object suffixes to remove network overhead. Combined with threaded VectoredIO for concurrent multi-range operations and specialized small object caching for sub-1MB files, these enhancements eliminate persistent I/O bottlenecks. Initial benchmarks indicate that these architectural improvements can reduce Parquet metadata parsing latency and boost total record processing speeds, empowering high-scale Spark, Flink, and Trino workloads to run with improved efficiency on Google Cloud Storage.

Getting Started with 1.11.0

We are excited to be part of the Apache Iceberg community and innovating together. As a compliant Iceberg REST Catalog, Lakehouse for Apache Iceberg (formerly BigLake) already has support for version 1.11.0.

To upgrade your environment, update your build dependencies to version 1.11.0. Remember to review your deployment runtimes to ensure compatibility with the new JDK 17 baseline, and test your workloads if you are transitioning from Spark 3.4.

For a full breakdown of every bug fix, contributor attribution, and dependency bump, check out the official Apache Iceberg Releases Page

TestParameterInjector introduces an idiomatic Kotlin API

Monday, May 25, 2026

In March 2021, we announced the open source release of TestParameterInjector: a simple but powerful parameterized test runner for JUnit4. In September 2022, we followed up with JUnit5 support, bringing our framework to developers who had moved on to the Jupiter API.

We're excited to announce our biggest update yet for our Kotlin users: KotlinTestParameters.

The de facto standard for parameterized testing

When we first introduced TestParameterInjector, we shared a graph showing its rapid adoption within Google. Over the past few years, that trajectory has continued to a point where TestParameterInjector is the de facto parameterized test framework.

Graph of the different parameterized test frameworks in Google

Usage of all other alternative frameworks continues to steadily decline, while TestParameterInjector's adoption keeps growing rapidly. It has fundamentally lowered the barrier to writing data-driven unit tests, empowering Googlers and open source developers alike to maximize test coverage with minimal boilerplate. We believe its ubiquity internally is a strong testament to its reliability and utility for the broader developer communities.

The Kotlin challenge

As Kotlin's popularity has surged, developers have naturally been writing more of their TestParameterInjector tests in Kotlin. However, specifying explicit test values in Kotlin historically meant falling back to Java-centric paradigms.

If you wanted to provide specific values to a test, you typically had three options, none of which felt truly idiomatic in Kotlin:

  1. @TestParameter({"123", "456"}): This relies on string arrays, limiting you to a subset of types that the string parsing supports.
  2. @TestParameters: This allows for more complex sets of data, but relies on YAML strings (e.g.,"{age: 17, expectIsAdult: false}"). These strings however are not type-safe, and are completely ignored by IDE refactoring tools.
  3. Provider classes: For complex types that couldn't be easily represented in strings, you have to write Provider classes, adding a bit of boilerplate code and indirection.

Enter KotlinTestParameters

To bring a seamless experience to Kotlin, we are introducing a significant new Kotlin-only feature: KotlinTestParameters.

By leveraging Kotlin's default function arguments, you can now define parameterized tests in a fully type-safe, concise, and refactor-friendly way using the testValues() function (and friends).

Here is what it looks like in practice:

import com.google.testing.junit.testparameterinjector.TestParameterInjector
import com.google.testing.junit.testparameterinjector.TestParameter
import com.google.testing.junit.testparameterinjector.KotlinTestParameters.testValues
import com.google.testing.junit.testparameterinjector.KotlinTestParameters.namedTestValues
import org.junit.Test
import org.junit.runner.RunWith

@RunWith(TestParameterInjector::class)
class MyTest {

  // Testing simple types directly
  @Test
  fun simpleTest(@TestParameter limit: Int = testValues(20, 100)) {
    // This test method is run twice: once for limit=20 and once for limit=100
  }

  // Testing complex types without YAML strings or Provider classes!
  data class TestCase(val age: Int, val expectIsAdult: Boolean)

  @Test
  fun complexTest(
    @TestParameter testCase: TestCase = namedTestValues(
      "teenager" to TestCase(age = 17, expectIsAdult = false),
      "young adult" to TestCase(age = 22, expectIsAdult = true)
    )
  ) {
    // This test method is run twice with fully typed data class instances
  }
}

Why we recommend making the switch

Because testValues() seamlessly integrates with Kotlin's language features, any type is supported. This completely eliminates the need for stringly-typed YAML maps or verbose Provider classes.

We firmly believe that KotlinTestParameters is a massive leap forward in readability and maintainability. Moving forward, this should be the default way of specifying test values for all new Kotlin tests, replacing the older @TestParameters, @TestParameter({"..."}), and Provider class patterns.

Try it out!

You can read more and start using KotlinTestParameters today over on our GitHub repository.

Let us know what you think on GitHub if you have any questions, comments, or feature requests!

Disrupting the presentation layer using autonomous workflows

Thursday, May 21, 2026

Empowering every engineer to do more with Kubernetes

Kubernetes is the gold standard for container orchestration. Its power, flexibility, and rich API surface are exactly why it has become the foundation of modern cloud-first infrastructure. Today, engineers express that power through the K8s API, declarative YAML manifests, and cloud consoles, a remarkably expressive toolbox.
We believe the next step is to expand how engineers interact with that toolbox. A Kubernetes expert should be able to converse with a deep-domain peer that speaks fluent control-plane and can reason about cluster state in real time. An engineer who isn't a Kubernetes specialist should be able to express higher-order intent, such as "deploy my application," "rebalance this workload" and have it carried out safely against the same powerful APIs. Both audiences get more leverage out of the platform they already trust.
This is the vision behind Kube-Agents: a system of intelligent, autonomous, and human-in-the-loop agents that act as a new, intent-driven presentation layer for Kubernetes. We are moving from declarative intent via API to higher-order, human intent-driven operations while preserving everything that makes Kubernetes great underneath.

The Vision: Expanding the Presentation Layer

Today, engineers do impressive work stitching together metrics, alerts, and multi-step commands to keep clusters healthy. Agents extend that work, not replace it. By complementing existing interfaces with autonomous agents, engineers can choose the level of abstraction that fits the task: drop down to kubectl and YAML when precision matters, or describe intent in plain language when speed and clarity matter more. The agents continuously observe system state and can execute complex operations in real time on the engineer's behalf. This isn't about hiding Kubernetes. It's about giving every engineer a more capable collaborator on top of it.

Meet the Agents: A Specialized Team

Our architecture is currently built upon three core specialized agents, each acting as a new kind of intent-driven collaborator for different stakeholders:

  1. The Platform Agent
    • Role: A partner for the central governance layer, your management plane.
    • Focus: Codifying best practices and keeping platform blueprints evergreen and synchronized across the entire fleet.
    • Example: When a new egress policy is defined at the org level, the Platform Agent propagates it to the Dev Team agents and confirms enforcement, giving platform teams confidence in compliance while letting developers stay focused on their applications.
  2. The Cluster Operator Agent
    • Role: A trusted teammate for your infrastructure operators.
    • Focus: Global concerns like multi-cluster balancing, automated provisioning, security patching, and zero-downtime version upgrades.
    • Example: It can detect a degrading node and proactively migrate workloads before application latency spikes, expanding what a single operator can safely manage at scale.
  3. The Development Team Agent
    • Role: A production-savvy peer for developers.
    • Focus: The primary collaborator for developers. It supports the full workload lifecycle — reconciling manifest drift, right-sizing resources, and assisting with real-time debugging.
    • Example: When a developer asks "Why is my service failing?" in chat, the agent responds with relevant logs, correlated metrics, and a diagnosis of recent config changes — meeting a Kubernetes expert at depth and meeting a less specialized developer at intent.

Leveraging Industry Benchmarks

DevOps Bench is a comprehensive suite of benchmarks. These specialized agents learn from those results so they are equipped with the context to make well-reasoned decisions when autonomously supporting infrastructure work.

The First Demo

To be truly useful at the presentation layer, these agents can't be short-lived request/response scripts. They need to be persistent, long-running "team members" capable of continuous learning and collaboration.
As a first step, we've launched a set of workspaces compatible with OpenClaw for a demo, installable into your OpenClaw environment, leveraging existing out-of-the-box capabilities around identity, storage persistence, and memory. The agents included are: the Platform Agent, the Cluster Operator Agent, and the Development Team Agent.

  • Autonomous GitOps & JIT Probing (Dev Team Agent): Demonstrates prompt-driven staging deployments and dynamically generated, context-aware probers. The agent adheres strictly to GitOps workflows by opening PRs for infrastructure updates (such as node failure tolerance) and actively prevents configuration drift by reconciling manual manifest edits upon merge.
  • Self-Healing Infrastructure (Dev Team Agent): Showcases automated troubleshooting when a manifest is deployed with an image name typo. The agent executes a complete, autonomous 5-step remediation loop—Notification, Learning, Recommendation, Mutation, and Validation—to detect, fix, and verify the deployment without human intervention.
  • Multi-Agent Governance & Policy Coordination (Cluster Operator & Dev Team Agents): Highlights cross-agent negotiation when the Cluster Operator attempts to downscale underutilized resources for cost savings. The Dev Team Agent steps in to enforce minimum capacity policies, successfully prioritizing application reliability and governance over financial savings.
Animated walkthrough showcasing three OpenClaw agent demos in sequence: first, the Platform Agent configuring core infrastructure and identity; second, the Cluster Operator Agent managing cluster health and scaling; and finally, the Development Team Agent deploying applications and managing developer workflows.

Going forward, we will further productize this pattern, building on open standards to define agents and their capabilities (AGENTS.md, skills, MCP), and provide an out-of-the-box harness to orchestrate these agents.

Redefining the Stack

An intent-driven presentation layer is just the beginning. With an agentic interface in place, we can keep evolving the underlying infrastructure — adopting new components or integrating directly with additional infrastructure APIs — while engineers continue to interact with the system the way they already do. The interface stays intent-driven and stable; the agents adapt to the evolving stack underneath, so investment in how teams work today carries forward.

Call to Action

We're building Kube-Agents in the open because we believe the best infrastructure solutions are built collaboratively. Our goal is to use our expertise to give back to the open source community, while also actively learning from the ecosystem's real-world challenges. By working together, we can define best practices that benefit everyone.
If you're interested in helping shape the future of Kubernetes management, check out the Kube Agents repo.

We are seeking engagement on two fronts:

  • Share your use cases: What would you most like an autonomous teammate to help with? We want to learn from your unique operational needs—whether it's multi-cluster balancing, specific debugging scenarios, or policy enforcement—to ensure we're building tools that provide real leverage.
  • Define future roles: What new specialized agents should exist? We value your input on the roles these agents should fulfill to best serve diverse team structures and operational requirements.

Join the conversation, contribute your ideas, and help us build a self-driving cloud that works for everyone. Check out the Kube Agents project to open an issue or start a discussion.

.