opensource.google.com

Menu

TestParameterInjector introduces an idiomatic Kotlin API

Monday, May 25, 2026

In March 2021, we announced the open source release of TestParameterInjector: a simple but powerful parameterized test runner for JUnit4. In September 2022, we followed up with JUnit5 support, bringing our framework to developers who had moved on to the Jupiter API.

We're excited to announce our biggest update yet for our Kotlin users: KotlinTestParameters.

The de facto standard for parameterized testing

When we first introduced TestParameterInjector, we shared a graph showing its rapid adoption within Google. Over the past few years, that trajectory has continued to a point where TestParameterInjector is the de facto parameterized test framework.

Graph of the different parameterized test frameworks in Google

Usage of all other alternative frameworks continues to steadily decline, while TestParameterInjector's adoption keeps growing rapidly. It has fundamentally lowered the barrier to writing data-driven unit tests, empowering Googlers and open source developers alike to maximize test coverage with minimal boilerplate. We believe its ubiquity internally is a strong testament to its reliability and utility for the broader developer communities.

The Kotlin challenge

As Kotlin's popularity has surged, developers have naturally been writing more of their TestParameterInjector tests in Kotlin. However, specifying explicit test values in Kotlin historically meant falling back to Java-centric paradigms.

If you wanted to provide specific values to a test, you typically had three options, none of which felt truly idiomatic in Kotlin:

  1. @TestParameter({"123", "456"}): This relies on string arrays, limiting you to a subset of types that the string parsing supports.
  2. @TestParameters: This allows for more complex sets of data, but relies on YAML strings (e.g.,"{age: 17, expectIsAdult: false}"). These strings however are not type-safe, and are completely ignored by IDE refactoring tools.
  3. Provider classes: For complex types that couldn't be easily represented in strings, you have to write Provider classes, adding a bit of boilerplate code and indirection.

Enter KotlinTestParameters

To bring a seamless experience to Kotlin, we are introducing a significant new Kotlin-only feature: KotlinTestParameters.

By leveraging Kotlin's default function arguments, you can now define parameterized tests in a fully type-safe, concise, and refactor-friendly way using the testValues() function (and friends).

Here is what it looks like in practice:

import com.google.testing.junit.testparameterinjector.TestParameterInjector
import com.google.testing.junit.testparameterinjector.TestParameter
import com.google.testing.junit.testparameterinjector.KotlinTestParameters.testValues
import com.google.testing.junit.testparameterinjector.KotlinTestParameters.namedTestValues
import org.junit.Test
import org.junit.runner.RunWith

@RunWith(TestParameterInjector::class)
class MyTest {

  // Testing simple types directly
  @Test
  fun simpleTest(@TestParameter limit: Int = testValues(20, 100)) {
    // This test method is run twice: once for limit=20 and once for limit=100
  }

  // Testing complex types without YAML strings or Provider classes!
  data class TestCase(val age: Int, val expectIsAdult: Boolean)

  @Test
  fun complexTest(
    @TestParameter testCase: TestCase = namedTestValues(
      "teenager" to TestCase(age = 17, expectIsAdult = false),
      "young adult" to TestCase(age = 22, expectIsAdult = true)
    )
  ) {
    // This test method is run twice with fully typed data class instances
  }
}

Why we recommend making the switch

Because testValues() seamlessly integrates with Kotlin's language features, any type is supported. This completely eliminates the need for stringly-typed YAML maps or verbose Provider classes.

We firmly believe that KotlinTestParameters is a massive leap forward in readability and maintainability. Moving forward, this should be the default way of specifying test values for all new Kotlin tests, replacing the older @TestParameters, @TestParameter({"..."}), and Provider class patterns.

Try it out!

You can read more and start using KotlinTestParameters today over on our GitHub repository.

Let us know what you think on GitHub if you have any questions, comments, or feature requests!

Disrupting the presentation layer using autonomous workflows

Thursday, May 21, 2026

Empowering every engineer to do more with Kubernetes

Kubernetes is the gold standard for container orchestration. Its power, flexibility, and rich API surface are exactly why it has become the foundation of modern cloud-first infrastructure. Today, engineers express that power through the K8s API, declarative YAML manifests, and cloud consoles, a remarkably expressive toolbox.
We believe the next step is to expand how engineers interact with that toolbox. A Kubernetes expert should be able to converse with a deep-domain peer that speaks fluent control-plane and can reason about cluster state in real time. An engineer who isn't a Kubernetes specialist should be able to express higher-order intent, such as "deploy my application," "rebalance this workload" and have it carried out safely against the same powerful APIs. Both audiences get more leverage out of the platform they already trust.
This is the vision behind Kube-Agents: a system of intelligent, autonomous, and human-in-the-loop agents that act as a new, intent-driven presentation layer for Kubernetes. We are moving from declarative intent via API to higher-order, human intent-driven operations while preserving everything that makes Kubernetes great underneath.

The Vision: Expanding the Presentation Layer

Today, engineers do impressive work stitching together metrics, alerts, and multi-step commands to keep clusters healthy. Agents extend that work, not replace it. By complementing existing interfaces with autonomous agents, engineers can choose the level of abstraction that fits the task: drop down to kubectl and YAML when precision matters, or describe intent in plain language when speed and clarity matter more. The agents continuously observe system state and can execute complex operations in real time on the engineer's behalf. This isn't about hiding Kubernetes. It's about giving every engineer a more capable collaborator on top of it.

Meet the Agents: A Specialized Team

Our architecture is currently built upon three core specialized agents, each acting as a new kind of intent-driven collaborator for different stakeholders:

  1. The Platform Agent
    • Role: A partner for the central governance layer, your management plane.
    • Focus: Codifying best practices and keeping platform blueprints evergreen and synchronized across the entire fleet.
    • Example: When a new egress policy is defined at the org level, the Platform Agent propagates it to the Dev Team agents and confirms enforcement, giving platform teams confidence in compliance while letting developers stay focused on their applications.
  2. The Cluster Operator Agent
    • Role: A trusted teammate for your infrastructure operators.
    • Focus: Global concerns like multi-cluster balancing, automated provisioning, security patching, and zero-downtime version upgrades.
    • Example: It can detect a degrading node and proactively migrate workloads before application latency spikes, expanding what a single operator can safely manage at scale.
  3. The Development Team Agent
    • Role: A production-savvy peer for developers.
    • Focus: The primary collaborator for developers. It supports the full workload lifecycle — reconciling manifest drift, right-sizing resources, and assisting with real-time debugging.
    • Example: When a developer asks "Why is my service failing?" in chat, the agent responds with relevant logs, correlated metrics, and a diagnosis of recent config changes — meeting a Kubernetes expert at depth and meeting a less specialized developer at intent.

Leveraging Industry Benchmarks

DevOps Bench is a comprehensive suite of benchmarks. These specialized agents learn from those results so they are equipped with the context to make well-reasoned decisions when autonomously supporting infrastructure work.

The First Demo

To be truly useful at the presentation layer, these agents can't be short-lived request/response scripts. They need to be persistent, long-running "team members" capable of continuous learning and collaboration.
As a first step, we've launched a set of workspaces compatible with OpenClaw for a demo, installable into your OpenClaw environment, leveraging existing out-of-the-box capabilities around identity, storage persistence, and memory. The agents included are: the Platform Agent, the Cluster Operator Agent, and the Development Team Agent.

  • Autonomous GitOps & JIT Probing (Dev Team Agent): Demonstrates prompt-driven staging deployments and dynamically generated, context-aware probers. The agent adheres strictly to GitOps workflows by opening PRs for infrastructure updates (such as node failure tolerance) and actively prevents configuration drift by reconciling manual manifest edits upon merge.
  • Self-Healing Infrastructure (Dev Team Agent): Showcases automated troubleshooting when a manifest is deployed with an image name typo. The agent executes a complete, autonomous 5-step remediation loop—Notification, Learning, Recommendation, Mutation, and Validation—to detect, fix, and verify the deployment without human intervention.
  • Multi-Agent Governance & Policy Coordination (Cluster Operator & Dev Team Agents): Highlights cross-agent negotiation when the Cluster Operator attempts to downscale underutilized resources for cost savings. The Dev Team Agent steps in to enforce minimum capacity policies, successfully prioritizing application reliability and governance over financial savings.
Animated walkthrough showcasing three OpenClaw agent demos in sequence: first, the Platform Agent configuring core infrastructure and identity; second, the Cluster Operator Agent managing cluster health and scaling; and finally, the Development Team Agent deploying applications and managing developer workflows.

Going forward, we will further productize this pattern, building on open standards to define agents and their capabilities (AGENTS.md, skills, MCP), and provide an out-of-the-box harness to orchestrate these agents.

Redefining the Stack

An intent-driven presentation layer is just the beginning. With an agentic interface in place, we can keep evolving the underlying infrastructure — adopting new components or integrating directly with additional infrastructure APIs — while engineers continue to interact with the system the way they already do. The interface stays intent-driven and stable; the agents adapt to the evolving stack underneath, so investment in how teams work today carries forward.

Call to Action

We're building Kube-Agents in the open because we believe the best infrastructure solutions are built collaboratively. Our goal is to use our expertise to give back to the open source community, while also actively learning from the ecosystem's real-world challenges. By working together, we can define best practices that benefit everyone.
If you're interested in helping shape the future of Kubernetes management, check out the Kube Agents repo.

We are seeking engagement on two fronts:

  • Share your use cases: What would you most like an autonomous teammate to help with? We want to learn from your unique operational needs—whether it's multi-cluster balancing, specific debugging scenarios, or policy enforcement—to ensure we're building tools that provide real leverage.
  • Define future roles: What new specialized agents should exist? We value your input on the roles these agents should fulfill to best serve diverse team structures and operational requirements.

Join the conversation, contribute your ideas, and help us build a self-driving cloud that works for everyone. Check out the Kube Agents project to open an issue or start a discussion.

The Journey Begins: Meet the 2026 GSoC Contributors!

Thursday, April 30, 2026

A warm welcome to the 1,141 Contributors of Google Summer of Code (GSoC) 2026! We are excited to start this new edition alongside our 184 mentoring orgs. Organizations reviewed a record-breaking 23,371 proposals to find the best matches for their communities.

2026 Application Statistics:

  • 15,245 applicants from 131 countries submitting a total of 23,371 proposals
  • Over 2,000 mentors and org admins

What's Next?

Before the first line of code is written, there is Community Bonding. This 3.5-week GSoC tradition is about more than just tool configuration; it's about immersion. It's a dedicated space for Contributors to master the codebase, align with community standards, and understand the 'why' behind their projects. By the time the coding period begins, every Contributor is ready to turn project fundamentals into real-world impact.

The official coding period begins on May 25. For our contributors, this period represents a deep dive into collaborative development, offering the chance to learn new tools and contribute to the heartbeat of open source projects.

Thank you, Mentors!

Finally, we want to express our deepest gratitude to our phenomenal Mentors and Org Admins. As AI profoundly shifts the landscape of open source communities, GSoC is no exception. Your patience, grit, and tireless volunteer efforts are the heartbeat of this program, ensuring its continued success as we welcome a new generation of contributors into the open source ecosystem.

Introducing AMS: Activation-based model scanner for open-weight LLM safety verification

Monday, April 27, 2026

The open-weight model ecosystem is thriving—and so is its shadow. A 2025 study identified over 8,000 safety-modified model repositories on Hugging Face alone, with modified models complying with unsafe requests at rates of 74% compared to 19% for their original instruction-tuned counterparts.

For organizations deploying open-weight models, a critical question emerges: how do you know the model you downloaded is safe to run?

We believe defensive security tools should be widely available. AMS represents our contribution to a safer AI ecosystem—one where developers everywhere can verify model integrity before deployment.

Today we're releasing AMS (Activation-based Model Scanner), an open source tool that answers this question in 10–40 seconds—without sending a single prompt.

The Problem with Behavioral Testing

Traditional safety verification relies on behavioral testing: send harmful prompts, check if the model refuses. This approach has three fundamental limitations.

It's slow. Comprehensive benchmarks like HarmBench require hundreds of queries. For organizations running continuous integration pipelines or screening large model registries, this can be impractical.

It's incomplete. No benchmark covers every harmful behavior. Models can exhibit safe behavior on known test sets while remaining unsafe on novel or out-of-distribution prompts.

It's gameable. Models can be fine-tuned to refuse benchmark prompts while complying with novel attacks—a known limitation of purely behavioral evaluation approaches.

A Structural Approach

AMS scanner validating clean and tampered models at select layers of the model stack, using activation geometry comparisons to detect anomalies
Clean vs Tampered Models

AMS takes a different approach entirely. Instead of testing what a model says, it measures how a model thinks.

Safety training creates measurable geometric structure in a model's activation space. Instruction-tuned models develop internal "direction vectors"—representations that separate harmful content from benign content with high statistical confidence (4–8σ separation). When safety training is removed—through fine-tuning, abliteration, or training on unfiltered data—this geometric structure collapses.

AMS measures this collapse directly. The approach is grounded in recent research on representation engineering, which demonstrates that high-level concepts are encoded linearly in LLM activation space and can be reliably extracted via simple linear probes on intermediate-layer hidden states.

git clone https://github.com/GoogleCloudPlatform/activation-model-scanner.git
cd activation-model-scanner && pip install -e .

# Standard scan (3 concepts: harmful_content, injection_resistance, refusal_capability)
ams scan ./my-model

# Quick scan (2 concepts, ~40% faster)
ams scan ./my-model --mode quick

# Full scan (4 concepts including truthfulness)
ams scan ./my-model --mode full

# JSON output for CI/CD pipelines
ams scan ./my-model --json

What AMS Detects

AMS operates as a two-tier scanner. Tier 1 measures whether safety-relevant activation structure exists at all—no baseline required. Tier 2 compares a model's activation fingerprint against a verified baseline to detect subtle modifications, including supply chain substitution.

In our validation across 14 model configurations:

  • Instruction-tuned models (Llama, Gemma, Qwen) show 3.8–8.4σ separation—consistent with strong safety training
  • Uncensored variants (Dolphin, Lexi) show collapsed separation at 1.1–1.3σ—flagged as CRITICAL
  • Abliterated models show partial degradation at 3.3σ—flagged as WARNING
  • Base models (no safety training) show 0.69σ—confirming the absence of safety structure
  • Quantized models (INT4/INT8) show less than 5% separation drift—safe to scan production deployments

Use Cases

Diagram showing three threat vectors : fine-tuned backdoors (hidden trigger behaviours), weight poisoning (direct parameter edit) and supply chain swap (substituted checkpoint)
Threat Landscape

CI/CD Safety Gates

Integrate AMS into your model deployment pipeline to block unsafe models before they reach production. An example Github Actions workflow:

jobs:
model-safety-check:
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v3

    - name: Install AMS
      run: pip install ams-scanner[cli]

    - name: Scan model
      run: |
        ams scan ./model \
          --verify meta-llama/Llama-3-8B-Instruct \
          --json > scan-results.json

    - name: Upload results
      uses: actions/upload-artifact@v3
      with:
        name: ams-scan-results
        path: scan-results.json

Supply Chain Verification

Confirm that downloaded weights match their claimed identity using Tier 2 fingerprint comparison.

# First, create a baseline from the official model
ams baseline create ./my-model

# Then verify an unknown model against it
ams scan ./suspicious-model --verify ./my-model

Registry Screening

Automatically screen models at upload or download time to flag degraded safety structure before deployment.

# Standard scan (3 concepts: harmful_content, injection_resistance, refusal_capability)
ams scan ./my-model

# Quick scan (2 concepts, ~40% faster)
ams scan ./my-model --mode quick

# Full scan (4 concepts including truthfulness)
ams scan ./my-model --mode full

# JSON output for CI/CD pipelines
ams scan ./my-model --json

How It Works

AMS processes a set of contrastive prompt pairs—examples that differ only in whether they contain harmful content—through the model under inspection. It extracts hidden states at an intermediate layer (typically 35–40% depth), computes a direction vector that separates the two classes, and measures class separation as a σ score.

Flowchart illustrating AMS scanning process: contrastive prompt pairs enter the model, hidden states are extracted at an intermediate layer, direction vectors are computed, and class separation is measured to produce PASS, WARNING, or CRITICAL results
How it Works

The key insight is that this measurement requires no generation, no benchmark queries, and no ground-truth labels. The entire scan completes in a single forward pass per prompt pair, typically 10–40 seconds on GPU hardware.

The probe consists of a single direction vector (~16KB for standard 4096-dimensional models). No model weights are modified. The tool works with any Hugging Face-compatible model.

Get Started

AMS is available now under Apache 2.0:

We welcome contributions, baseline additions for new model families, and feedback from the communities. See the contributing guide in the repository for details.

Meet the A2Family

Thursday, April 23, 2026

At Google, we know that building on open source gives teams the freedom and flexibility to use meaningful technologies faster. Openness drives innovation and security, and it is core to our mission. As we look toward the future of computing, we want to ensure that developers across all open source communities have the foundational tools they need to build secure and collaborative AI systems.

That is why we are excited for you to get to know the "A2Family"—a suite of open source protocols and tools designed to help you build, connect, and scale your AI agents.

A2A: The cornerstone of agent interoperability

The Agent2Agent (A2A) Protocol is an open standard designed to enable seamless communication and collaboration between AI agents. It provides the definitive common language for agent interoperability in a world where agents are built using diverse frameworks and by different vendors.

Originally developed by Google, A2A has now been donated to the Linux Foundation. As a famous open source aphorism reminds us: "If you want to go fast, go alone. If you want to go far, go together." A2A brings this collaborative philosophy to AI, allowing agents to delegate sub-tasks, exchange information, and coordinate actions to solve complex problems that a single agent cannot.

MCP & Skills: Agents need tools and skills

Since day one A2A has loved MCP, and we love skills too ♥️. Agents discover, negotiate, converse, make plans, adapt when those plans don't work out – that's a different interaction pattern than a tool and that's what A2A was built for. But for your agents to function, they need access to tools, and instructions on how to use those tools safely and securely. While MCP and A2A might not be from the same origin story, they are a family that works better together.

When you're not sure – if it's a quick deterministic resource or action, it's a tool, but if you may end up with a conversation, it's an agent. Another good mental model is "are you the expert agent which uses tools" (MCP) or "is there some other expert agent you are collaborating with" (A2A).

A2UI: A protocol for agent-driven interfaces

When agents need to communicate with humans, how can they safely send rich interfaces across trust boundaries? Instead of relying on text-only responses or risky code execution, we use A2UI.

A2UI enables AI agents to generate rich, interactive user interfaces that render across web, mobile, and desktop platforms—without executing arbitrary code. It is secure by design, allowing agents to use only pre-approved components from your catalog through declarative component descriptions.

You may also have heard of MCP Apps (formerly MCP UI). It is a complementary alternative to A2UI which ships your agent driven widget inside of an iframe orchestrated with MCP events and tool calls. There are some interesting ways of configuring A2UI and MCP Apps together, for generative UI inside of an iframe or generative UI driving the iframe.

The AG UI protocol, developed by CopilotKit, is a standard for connecting agents to front ends with low latency. It makes developer lives much easier, with integrations to most agent frameworks and front ends. If you are using AG UI, you already have both A2UI and A2A support!

AP2: Securing the agent economy

When an autonomous agent initiates a payment, current systems struggle with questions of authorization, authenticity, and accountability. To solve this, we introduced the Agent Payments Protocol (AP2), an open protocol for the emerging Agent Economy.

Available as an open extension for the A2A protocol, AP2 is designed to enable secure, reliable, and interoperable agent commerce for developers, merchants, and the payments industry. The protocol engineers trust into the system using verifiable digital credentials (VDCs), which are tamper-evident, cryptographically signed digital objects that serve as the building blocks of a transaction.

UCP: The common language for agentic commerce

While AP2 secures the transaction, the Universal Commerce Protocol (UCP) defines the building blocks for the entire shopping journey, from discovering and buying to post-purchase experiences. UCP provides a common language for platforms, agents, and businesses, allowing the diverse commerce ecosystem to interoperate through a single standard without the need for custom builds.

UCP seamlessly connects different systems using open industry standards, featuring built-in support for both the A2A and AP2 protocols. It empowers retailers to meet customers wherever they are, ensuring that businesses retain control of their own rules and remain the Merchant of Record with full ownership of the customer relationship.

Bringing it all together with ADK

Protocols need a solid foundation to run on. Enter the Agent Development Kit (ADK).

Technically not part of the A2Family, ADK is an open-source agent development framework that lets you build, debug, and deploy reliable AI agents at enterprise scale. Available in Python, TypeScript, Go, and Java, ADK helps you build production agents, not just prototypes. It connects everything together, allowing you to easily equip your agents with tools, integrate them with the A2A protocol, and scale them globally on your infrastructure of choice.

Google champions collaboration, transparency, and shared progress to build a better future for everyone through open technologies. We are thrilled to share these tools with you and cannot wait to see what we can build together.

What kind of multi-agent workflows are you planning to build with the A2Family? Let us know in the comments below or tag us on social media!

.