opensource.google.com

Menu

Posts from 2026

Announcing Apache Iceberg 1.11.0

Wednesday, May 27, 2026

Apache Iceberg project has just launched version 1.11.0! A lot has happened since the last version.

Iceberg 1.11.0 adds support for Apache Spark 4.1 and Apache Flink 2.1, the latest releases of the two engines and makes both the default build targets

The rest are more structural. The REST catalog learns to plan scans server-side, shifting metadata work off the query engine. A new partition statistics scan API gives optimizers a clean, supported way to read a table's shape. Built-in table encryption arrives with envelope encryption and Google KMS support. And Google Storage Analytics library integration makes your Iceberg workloads faster than before.

Let's take a look at some of the biggest changes.

Spark & Flink Updates

As Spark and Flink are moving forward, the 1.11.0 release is pushing forward for new version support in both.

  • Spark 4.1 & DSv2 Migration: Spark 4.1 unlocks is MERGE INTO with automatic schema evolution: Spark's newer MERGE syntax accepts a WITH SCHEMA EVOLUTION clause, so a MERGE whose source carries columns the target table lacks can add those columns to the table within the same statement, with no separate ALTER TABLE round trip. Beyond the version bump, the 1.11 Spark connector also modernizes against Spark's newer DataSource V2 APIs and adds an asynchronous micro-batch planner that speeds up Structured Streaming.
  • Flink Ecosystem Updates: Initial work for Flink 2.1 support has landed in the core repository, continuing Iceberg's promise of providing first-class, low-latency streaming sink capabilities. The centerpiece of the Flink work is the DynamicIcebergSink, an experimental sink that breaks the old one-sink-per-table model: a single sink routes each record to a table chosen at runtime, creating tables on demand and evolving their schemas and partition specs on the fly as the input changes including dropping columns once you opt in with dropUnusedColumns. In addition to DynamicIcebergSInk work Flink started supporting nanosecond, variant and unknown types from V3 Spec.

Server-side scan planning

In previous versions of Iceberg, the client handled the heavy lifting of scan orchestration. The driver of engine would traverse the table's metadata tree, retrieving manifest lists and files from object storage to filter data against specific partition requirements. Iceberg 1.11.0 shifts this computational burden into the catalog through server-side scan planning.

Instead of manually traversing manifests, the engine submits a single POST …/plan request detailing the scan allowing the REST catalog to return optimized FileScanTasks.

The API is designed to handle data at any scale: smaller scans return immediate results, extensive operations return a plan-id for polling, and massive datasets are retrieved via parallel plan-tasks through POST …/tasks.

ALT TEXT
Planning moves off the query engine and into the catalog — the driver no longer touches metadata in object storage.

Built-in table encryption

As data lakes increasingly serve as the central hub for sensitive PII and financial data, relying solely on bucket-level storage encryption is no longer enough. Iceberg 1.11.0 introduces built-in table encryption, bringing fine-grained, KMS-backed security directly to the table level.

This provides data platform teams with robust capabilities for security and compliance:

  • Zero-Trust Storage Security: Even if a malicious actor gains direct access to your underlying object storage bucket, the data remains completely unreadable.
  • Total Index Protection: It isn't just the raw data that is protected; Iceberg encrypts the manifest lists as well, preventing attackers from inferring sensitive information from table statistics.
  • Tamper-Proof Data: Built-in authentication tags guard against unauthorized modifications, ensuring data integrity.
  • Effortless Key Rotation: Keys are rotated automatically as they age, satisfying strict compliance mandates without requiring you to rewrite massive datasets.

Iceberg achieves this using envelope encryption with a three-tier key hierarchy. A table master key lives securely in your KMS and never touches Iceberg storage. This master key wraps key-encryption keys (KEKs), which are stored safely inside the table metadata. Finally, each KEK wraps a unique, per-file data-encryption key (DEK).

Every data file and manifest list is then encrypted with AES-GCM under its own unique DEK. This decoupled architecture ensures maximum security while maintaining the high performance expected of Iceberg workloads.

File Format API

Historically, Iceberg's format-handling code was tightly coupled, growing organically around Parquet, Avro, and ORC. Adding a new format or enforcing consistent feature support (like V3 default values or new column types) across all formats meant duplicating complex engine-specific switch/case code paths.

Iceberg 1.11.0 introduces the finalized File Format API, bringing a consistent API to reading and writing all of these file formats.

Instead of hardcoded engines handling binary extraction, the architecture introduces:

  • FormatModel: A standardized implementation defining how a file format handles reader/writer construction and its specific capabilities.
  • FormatModelRegistry: A central directory where query engines fetch appropriate read and write builders.

This API (which is already seeing adoption around other Apache Iceberg implementations) provides a significant code cleanup for the future of the project. It also opens the door for more file formats as time goes on.

Moreover, this new interface facilitates the implementation of Column Families, enabling vertical partitioning of storage. This advancement lets teams perform targeted updates or rewrites on isolated columns—such as recalculating vector embeddings—while leaving the remaining table data undisturbed.

SQL UDF Specification

1.11.0 includes the SQL UDF specification, which adds a brand new metadata format for both Scalar and Table Functions:

  • Immutable Versioning and Rollback: UDF metadata is written as self-contained, versioned JSON files stored right in the object store. If a data engineer deploys a buggy UDF update, administrators can execute an atomic rollback to a previous version log state
  • Standardized Schema Typings: Parameters and return types map cleanly to Iceberg Type JSON representations, directly accommodating complex nested maps, structs, and the upcoming Iceberg V3 variant type.
  • Engine Specific Execution: Each SQL UDF has a function implementation for each engine, allowing users to leverage engine-specific functionality in their UDFs.

Google Analytics Library Integration

For Google Cloud customers, version 1.11.0 delivers substantial throughput gains by embedding the GCS Analytics Core library into GCSFileIO (Issue #14326, PR #14333).

This integration introduces Footer Prefetching, which optimizes Parquet length checks by caching object suffixes to remove network overhead. Combined with threaded VectoredIO for concurrent multi-range operations and specialized small object caching for sub-1MB files, these enhancements eliminate persistent I/O bottlenecks. Initial benchmarks indicate that these architectural improvements can reduce Parquet metadata parsing latency and boost total record processing speeds, empowering high-scale Spark, Flink, and Trino workloads to run with improved efficiency on Google Cloud Storage.

Getting Started with 1.11.0

We are excited to be part of the Apache Iceberg community and innovating together. As a compliant Iceberg REST Catalog, Lakehouse for Apache Iceberg (formerly BigLake) already has support for version 1.11.0.

To upgrade your environment, update your build dependencies to version 1.11.0. Remember to review your deployment runtimes to ensure compatibility with the new JDK 17 baseline, and test your workloads if you are transitioning from Spark 3.4.

For a full breakdown of every bug fix, contributor attribution, and dependency bump, check out the official Apache Iceberg Releases Page

TestParameterInjector introduces an idiomatic Kotlin API

Monday, May 25, 2026

In March 2021, we announced the open source release of TestParameterInjector: a simple but powerful parameterized test runner for JUnit4. In September 2022, we followed up with JUnit5 support, bringing our framework to developers who had moved on to the Jupiter API.

We're excited to announce our biggest update yet for our Kotlin users: KotlinTestParameters.

The de facto standard for parameterized testing

When we first introduced TestParameterInjector, we shared a graph showing its rapid adoption within Google. Over the past few years, that trajectory has continued to a point where TestParameterInjector is the de facto parameterized test framework.

Graph of the different parameterized test frameworks in Google

Usage of all other alternative frameworks continues to steadily decline, while TestParameterInjector's adoption keeps growing rapidly. It has fundamentally lowered the barrier to writing data-driven unit tests, empowering Googlers and open source developers alike to maximize test coverage with minimal boilerplate. We believe its ubiquity internally is a strong testament to its reliability and utility for the broader developer communities.

The Kotlin challenge

As Kotlin's popularity has surged, developers have naturally been writing more of their TestParameterInjector tests in Kotlin. However, specifying explicit test values in Kotlin historically meant falling back to Java-centric paradigms.

If you wanted to provide specific values to a test, you typically had three options, none of which felt truly idiomatic in Kotlin:

  1. @TestParameter({"123", "456"}): This relies on string arrays, limiting you to a subset of types that the string parsing supports.
  2. @TestParameters: This allows for more complex sets of data, but relies on YAML strings (e.g.,"{age: 17, expectIsAdult: false}"). These strings however are not type-safe, and are completely ignored by IDE refactoring tools.
  3. Provider classes: For complex types that couldn't be easily represented in strings, you have to write Provider classes, adding a bit of boilerplate code and indirection.

Enter KotlinTestParameters

To bring a seamless experience to Kotlin, we are introducing a significant new Kotlin-only feature: KotlinTestParameters.

By leveraging Kotlin's default function arguments, you can now define parameterized tests in a fully type-safe, concise, and refactor-friendly way using the testValues() function (and friends).

Here is what it looks like in practice:

import com.google.testing.junit.testparameterinjector.TestParameterInjector
import com.google.testing.junit.testparameterinjector.TestParameter
import com.google.testing.junit.testparameterinjector.KotlinTestParameters.testValues
import com.google.testing.junit.testparameterinjector.KotlinTestParameters.namedTestValues
import org.junit.Test
import org.junit.runner.RunWith

@RunWith(TestParameterInjector::class)
class MyTest {

  // Testing simple types directly
  @Test
  fun simpleTest(@TestParameter limit: Int = testValues(20, 100)) {
    // This test method is run twice: once for limit=20 and once for limit=100
  }

  // Testing complex types without YAML strings or Provider classes!
  data class TestCase(val age: Int, val expectIsAdult: Boolean)

  @Test
  fun complexTest(
    @TestParameter testCase: TestCase = namedTestValues(
      "teenager" to TestCase(age = 17, expectIsAdult = false),
      "young adult" to TestCase(age = 22, expectIsAdult = true)
    )
  ) {
    // This test method is run twice with fully typed data class instances
  }
}

Why we recommend making the switch

Because testValues() seamlessly integrates with Kotlin's language features, any type is supported. This completely eliminates the need for stringly-typed YAML maps or verbose Provider classes.

We firmly believe that KotlinTestParameters is a massive leap forward in readability and maintainability. Moving forward, this should be the default way of specifying test values for all new Kotlin tests, replacing the older @TestParameters, @TestParameter({"..."}), and Provider class patterns.

Try it out!

You can read more and start using KotlinTestParameters today over on our GitHub repository.

Let us know what you think on GitHub if you have any questions, comments, or feature requests!

Disrupting the presentation layer using autonomous workflows

Thursday, May 21, 2026

Empowering every engineer to do more with Kubernetes

Kubernetes is the gold standard for container orchestration. Its power, flexibility, and rich API surface are exactly why it has become the foundation of modern cloud-first infrastructure. Today, engineers express that power through the K8s API, declarative YAML manifests, and cloud consoles, a remarkably expressive toolbox.
We believe the next step is to expand how engineers interact with that toolbox. A Kubernetes expert should be able to converse with a deep-domain peer that speaks fluent control-plane and can reason about cluster state in real time. An engineer who isn't a Kubernetes specialist should be able to express higher-order intent, such as "deploy my application," "rebalance this workload" and have it carried out safely against the same powerful APIs. Both audiences get more leverage out of the platform they already trust.
This is the vision behind Kube-Agents: a system of intelligent, autonomous, and human-in-the-loop agents that act as a new, intent-driven presentation layer for Kubernetes. We are moving from declarative intent via API to higher-order, human intent-driven operations while preserving everything that makes Kubernetes great underneath.

The Vision: Expanding the Presentation Layer

Today, engineers do impressive work stitching together metrics, alerts, and multi-step commands to keep clusters healthy. Agents extend that work, not replace it. By complementing existing interfaces with autonomous agents, engineers can choose the level of abstraction that fits the task: drop down to kubectl and YAML when precision matters, or describe intent in plain language when speed and clarity matter more. The agents continuously observe system state and can execute complex operations in real time on the engineer's behalf. This isn't about hiding Kubernetes. It's about giving every engineer a more capable collaborator on top of it.

Meet the Agents: A Specialized Team

Our architecture is currently built upon three core specialized agents, each acting as a new kind of intent-driven collaborator for different stakeholders:

  1. The Platform Agent
    • Role: A partner for the central governance layer, your management plane.
    • Focus: Codifying best practices and keeping platform blueprints evergreen and synchronized across the entire fleet.
    • Example: When a new egress policy is defined at the org level, the Platform Agent propagates it to the Dev Team agents and confirms enforcement, giving platform teams confidence in compliance while letting developers stay focused on their applications.
  2. The Cluster Operator Agent
    • Role: A trusted teammate for your infrastructure operators.
    • Focus: Global concerns like multi-cluster balancing, automated provisioning, security patching, and zero-downtime version upgrades.
    • Example: It can detect a degrading node and proactively migrate workloads before application latency spikes, expanding what a single operator can safely manage at scale.
  3. The Development Team Agent
    • Role: A production-savvy peer for developers.
    • Focus: The primary collaborator for developers. It supports the full workload lifecycle — reconciling manifest drift, right-sizing resources, and assisting with real-time debugging.
    • Example: When a developer asks "Why is my service failing?" in chat, the agent responds with relevant logs, correlated metrics, and a diagnosis of recent config changes — meeting a Kubernetes expert at depth and meeting a less specialized developer at intent.

Leveraging Industry Benchmarks

DevOps Bench is a comprehensive suite of benchmarks. These specialized agents learn from those results so they are equipped with the context to make well-reasoned decisions when autonomously supporting infrastructure work.

The First Demo

To be truly useful at the presentation layer, these agents can't be short-lived request/response scripts. They need to be persistent, long-running "team members" capable of continuous learning and collaboration.
As a first step, we've launched a set of workspaces compatible with OpenClaw for a demo, installable into your OpenClaw environment, leveraging existing out-of-the-box capabilities around identity, storage persistence, and memory. The agents included are: the Platform Agent, the Cluster Operator Agent, and the Development Team Agent.

  • Autonomous GitOps & JIT Probing (Dev Team Agent): Demonstrates prompt-driven staging deployments and dynamically generated, context-aware probers. The agent adheres strictly to GitOps workflows by opening PRs for infrastructure updates (such as node failure tolerance) and actively prevents configuration drift by reconciling manual manifest edits upon merge.
  • Self-Healing Infrastructure (Dev Team Agent): Showcases automated troubleshooting when a manifest is deployed with an image name typo. The agent executes a complete, autonomous 5-step remediation loop—Notification, Learning, Recommendation, Mutation, and Validation—to detect, fix, and verify the deployment without human intervention.
  • Multi-Agent Governance & Policy Coordination (Cluster Operator & Dev Team Agents): Highlights cross-agent negotiation when the Cluster Operator attempts to downscale underutilized resources for cost savings. The Dev Team Agent steps in to enforce minimum capacity policies, successfully prioritizing application reliability and governance over financial savings.
Animated walkthrough showcasing three OpenClaw agent demos in sequence: first, the Platform Agent configuring core infrastructure and identity; second, the Cluster Operator Agent managing cluster health and scaling; and finally, the Development Team Agent deploying applications and managing developer workflows.

Going forward, we will further productize this pattern, building on open standards to define agents and their capabilities (AGENTS.md, skills, MCP), and provide an out-of-the-box harness to orchestrate these agents.

Redefining the Stack

An intent-driven presentation layer is just the beginning. With an agentic interface in place, we can keep evolving the underlying infrastructure — adopting new components or integrating directly with additional infrastructure APIs — while engineers continue to interact with the system the way they already do. The interface stays intent-driven and stable; the agents adapt to the evolving stack underneath, so investment in how teams work today carries forward.

Call to Action

We're building Kube-Agents in the open because we believe the best infrastructure solutions are built collaboratively. Our goal is to use our expertise to give back to the open source community, while also actively learning from the ecosystem's real-world challenges. By working together, we can define best practices that benefit everyone.
If you're interested in helping shape the future of Kubernetes management, check out the Kube Agents repo.

We are seeking engagement on two fronts:

  • Share your use cases: What would you most like an autonomous teammate to help with? We want to learn from your unique operational needs—whether it's multi-cluster balancing, specific debugging scenarios, or policy enforcement—to ensure we're building tools that provide real leverage.
  • Define future roles: What new specialized agents should exist? We value your input on the roles these agents should fulfill to best serve diverse team structures and operational requirements.

Join the conversation, contribute your ideas, and help us build a self-driving cloud that works for everyone. Check out the Kube Agents project to open an issue or start a discussion.

The Journey Begins: Meet the 2026 GSoC Contributors!

Thursday, April 30, 2026

A warm welcome to the 1,141 Contributors of Google Summer of Code (GSoC) 2026! We are excited to start this new edition alongside our 184 mentoring orgs. Organizations reviewed a record-breaking 23,371 proposals to find the best matches for their communities.

2026 Application Statistics:

  • 15,245 applicants from 131 countries submitting a total of 23,371 proposals
  • Over 2,000 mentors and org admins

What's Next?

Before the first line of code is written, there is Community Bonding. This 3.5-week GSoC tradition is about more than just tool configuration; it's about immersion. It's a dedicated space for Contributors to master the codebase, align with community standards, and understand the 'why' behind their projects. By the time the coding period begins, every Contributor is ready to turn project fundamentals into real-world impact.

The official coding period begins on May 25. For our contributors, this period represents a deep dive into collaborative development, offering the chance to learn new tools and contribute to the heartbeat of open source projects.

Thank you, Mentors!

Finally, we want to express our deepest gratitude to our phenomenal Mentors and Org Admins. As AI profoundly shifts the landscape of open source communities, GSoC is no exception. Your patience, grit, and tireless volunteer efforts are the heartbeat of this program, ensuring its continued success as we welcome a new generation of contributors into the open source ecosystem.

Introducing AMS: Activation-based model scanner for open-weight LLM safety verification

Monday, April 27, 2026

The open-weight model ecosystem is thriving—and so is its shadow. A 2025 study identified over 8,000 safety-modified model repositories on Hugging Face alone, with modified models complying with unsafe requests at rates of 74% compared to 19% for their original instruction-tuned counterparts.

For organizations deploying open-weight models, a critical question emerges: how do you know the model you downloaded is safe to run?

We believe defensive security tools should be widely available. AMS represents our contribution to a safer AI ecosystem—one where developers everywhere can verify model integrity before deployment.

Today we're releasing AMS (Activation-based Model Scanner), an open source tool that answers this question in 10–40 seconds—without sending a single prompt.

The Problem with Behavioral Testing

Traditional safety verification relies on behavioral testing: send harmful prompts, check if the model refuses. This approach has three fundamental limitations.

It's slow. Comprehensive benchmarks like HarmBench require hundreds of queries. For organizations running continuous integration pipelines or screening large model registries, this can be impractical.

It's incomplete. No benchmark covers every harmful behavior. Models can exhibit safe behavior on known test sets while remaining unsafe on novel or out-of-distribution prompts.

It's gameable. Models can be fine-tuned to refuse benchmark prompts while complying with novel attacks—a known limitation of purely behavioral evaluation approaches.

A Structural Approach

AMS scanner validating clean and tampered models at select layers of the model stack, using activation geometry comparisons to detect anomalies
Clean vs Tampered Models

AMS takes a different approach entirely. Instead of testing what a model says, it measures how a model thinks.

Safety training creates measurable geometric structure in a model's activation space. Instruction-tuned models develop internal "direction vectors"—representations that separate harmful content from benign content with high statistical confidence (4–8σ separation). When safety training is removed—through fine-tuning, abliteration, or training on unfiltered data—this geometric structure collapses.

AMS measures this collapse directly. The approach is grounded in recent research on representation engineering, which demonstrates that high-level concepts are encoded linearly in LLM activation space and can be reliably extracted via simple linear probes on intermediate-layer hidden states.

git clone https://github.com/GoogleCloudPlatform/activation-model-scanner.git
cd activation-model-scanner && pip install -e .

# Standard scan (3 concepts: harmful_content, injection_resistance, refusal_capability)
ams scan ./my-model

# Quick scan (2 concepts, ~40% faster)
ams scan ./my-model --mode quick

# Full scan (4 concepts including truthfulness)
ams scan ./my-model --mode full

# JSON output for CI/CD pipelines
ams scan ./my-model --json

What AMS Detects

AMS operates as a two-tier scanner. Tier 1 measures whether safety-relevant activation structure exists at all—no baseline required. Tier 2 compares a model's activation fingerprint against a verified baseline to detect subtle modifications, including supply chain substitution.

In our validation across 14 model configurations:

  • Instruction-tuned models (Llama, Gemma, Qwen) show 3.8–8.4σ separation—consistent with strong safety training
  • Uncensored variants (Dolphin, Lexi) show collapsed separation at 1.1–1.3σ—flagged as CRITICAL
  • Abliterated models show partial degradation at 3.3σ—flagged as WARNING
  • Base models (no safety training) show 0.69σ—confirming the absence of safety structure
  • Quantized models (INT4/INT8) show less than 5% separation drift—safe to scan production deployments

Use Cases

Diagram showing three threat vectors : fine-tuned backdoors (hidden trigger behaviours), weight poisoning (direct parameter edit) and supply chain swap (substituted checkpoint)
Threat Landscape

CI/CD Safety Gates

Integrate AMS into your model deployment pipeline to block unsafe models before they reach production. An example Github Actions workflow:

jobs:
model-safety-check:
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v3

    - name: Install AMS
      run: pip install ams-scanner[cli]

    - name: Scan model
      run: |
        ams scan ./model \
          --verify meta-llama/Llama-3-8B-Instruct \
          --json > scan-results.json

    - name: Upload results
      uses: actions/upload-artifact@v3
      with:
        name: ams-scan-results
        path: scan-results.json

Supply Chain Verification

Confirm that downloaded weights match their claimed identity using Tier 2 fingerprint comparison.

# First, create a baseline from the official model
ams baseline create ./my-model

# Then verify an unknown model against it
ams scan ./suspicious-model --verify ./my-model

Registry Screening

Automatically screen models at upload or download time to flag degraded safety structure before deployment.

# Standard scan (3 concepts: harmful_content, injection_resistance, refusal_capability)
ams scan ./my-model

# Quick scan (2 concepts, ~40% faster)
ams scan ./my-model --mode quick

# Full scan (4 concepts including truthfulness)
ams scan ./my-model --mode full

# JSON output for CI/CD pipelines
ams scan ./my-model --json

How It Works

AMS processes a set of contrastive prompt pairs—examples that differ only in whether they contain harmful content—through the model under inspection. It extracts hidden states at an intermediate layer (typically 35–40% depth), computes a direction vector that separates the two classes, and measures class separation as a σ score.

Flowchart illustrating AMS scanning process: contrastive prompt pairs enter the model, hidden states are extracted at an intermediate layer, direction vectors are computed, and class separation is measured to produce PASS, WARNING, or CRITICAL results
How it Works

The key insight is that this measurement requires no generation, no benchmark queries, and no ground-truth labels. The entire scan completes in a single forward pass per prompt pair, typically 10–40 seconds on GPU hardware.

The probe consists of a single direction vector (~16KB for standard 4096-dimensional models). No model weights are modified. The tool works with any Hugging Face-compatible model.

Get Started

AMS is available now under Apache 2.0:

We welcome contributions, baseline additions for new model families, and feedback from the communities. See the contributing guide in the repository for details.

Meet the A2Family

Thursday, April 23, 2026

At Google, we know that building on open source gives teams the freedom and flexibility to use meaningful technologies faster. Openness drives innovation and security, and it is core to our mission. As we look toward the future of computing, we want to ensure that developers across all open source communities have the foundational tools they need to build secure and collaborative AI systems.

That is why we are excited for you to get to know the "A2Family"—a suite of open source protocols and tools designed to help you build, connect, and scale your AI agents.

A2A: The cornerstone of agent interoperability

The Agent2Agent (A2A) Protocol is an open standard designed to enable seamless communication and collaboration between AI agents. It provides the definitive common language for agent interoperability in a world where agents are built using diverse frameworks and by different vendors.

Originally developed by Google, A2A has now been donated to the Linux Foundation. As a famous open source aphorism reminds us: "If you want to go fast, go alone. If you want to go far, go together." A2A brings this collaborative philosophy to AI, allowing agents to delegate sub-tasks, exchange information, and coordinate actions to solve complex problems that a single agent cannot.

MCP & Skills: Agents need tools and skills

Since day one A2A has loved MCP, and we love skills too ♥️. Agents discover, negotiate, converse, make plans, adapt when those plans don't work out – that's a different interaction pattern than a tool and that's what A2A was built for. But for your agents to function, they need access to tools, and instructions on how to use those tools safely and securely. While MCP and A2A might not be from the same origin story, they are a family that works better together.

When you're not sure – if it's a quick deterministic resource or action, it's a tool, but if you may end up with a conversation, it's an agent. Another good mental model is "are you the expert agent which uses tools" (MCP) or "is there some other expert agent you are collaborating with" (A2A).

A2UI: A protocol for agent-driven interfaces

When agents need to communicate with humans, how can they safely send rich interfaces across trust boundaries? Instead of relying on text-only responses or risky code execution, we use A2UI.

A2UI enables AI agents to generate rich, interactive user interfaces that render across web, mobile, and desktop platforms—without executing arbitrary code. It is secure by design, allowing agents to use only pre-approved components from your catalog through declarative component descriptions.

You may also have heard of MCP Apps (formerly MCP UI). It is a complementary alternative to A2UI which ships your agent driven widget inside of an iframe orchestrated with MCP events and tool calls. There are some interesting ways of configuring A2UI and MCP Apps together, for generative UI inside of an iframe or generative UI driving the iframe.

The AG UI protocol, developed by CopilotKit, is a standard for connecting agents to front ends with low latency. It makes developer lives much easier, with integrations to most agent frameworks and front ends. If you are using AG UI, you already have both A2UI and A2A support!

AP2: Securing the agent economy

When an autonomous agent initiates a payment, current systems struggle with questions of authorization, authenticity, and accountability. To solve this, we introduced the Agent Payments Protocol (AP2), an open protocol for the emerging Agent Economy.

Available as an open extension for the A2A protocol, AP2 is designed to enable secure, reliable, and interoperable agent commerce for developers, merchants, and the payments industry. The protocol engineers trust into the system using verifiable digital credentials (VDCs), which are tamper-evident, cryptographically signed digital objects that serve as the building blocks of a transaction.

UCP: The common language for agentic commerce

While AP2 secures the transaction, the Universal Commerce Protocol (UCP) defines the building blocks for the entire shopping journey, from discovering and buying to post-purchase experiences. UCP provides a common language for platforms, agents, and businesses, allowing the diverse commerce ecosystem to interoperate through a single standard without the need for custom builds.

UCP seamlessly connects different systems using open industry standards, featuring built-in support for both the A2A and AP2 protocols. It empowers retailers to meet customers wherever they are, ensuring that businesses retain control of their own rules and remain the Merchant of Record with full ownership of the customer relationship.

Bringing it all together with ADK

Protocols need a solid foundation to run on. Enter the Agent Development Kit (ADK).

Technically not part of the A2Family, ADK is an open-source agent development framework that lets you build, debug, and deploy reliable AI agents at enterprise scale. Available in Python, TypeScript, Go, and Java, ADK helps you build production agents, not just prototypes. It connects everything together, allowing you to easily equip your agents with tools, integrate them with the A2A protocol, and scale them globally on your infrastructure of choice.

Google champions collaboration, transparency, and shared progress to build a better future for everyone through open technologies. We are thrilled to share these tools with you and cannot wait to see what we can build together.

What kind of multi-agent workflows are you planning to build with the A2Family? Let us know in the comments below or tag us on social media!

A year of open collaboration: Celebrating the anniversary of A2A

Thursday, April 16, 2026

The A2A logo wearing a birthday hat

One year ago, on April 9th, 2025 Google announced the Agent2Agent(A2A) protocol. We saw the need for a "common language" that allows AI agents built on different frameworks to collaborate well across diverse systems. Then, on June 23, 2025 at the Open Source Summit North America in Denver, Mike Smith stood on stage to share a pivotal moment for the future of AI interoperability when Google officially donated the A2A protocol to the Linux Foundation, establishing it as a vendor-neutral, community-governed standard.

This move was driven by a core belief: for AI agents to truly transform how we work and live, they must be able to communicate across framework boundaries and organizational silos without being locked into a single provider's ecosystem. By placing A2A under the neutral stewardship of the Linux Foundation, we opened the doors for the entire industry to build, contribute, and innovate together.

A Foundation of Partners

The formation of the A2A Project was made possible through the support of our founding members, including Amazon Web Services, Cisco, Microsoft, Salesforce, SAP, and ServiceNow. Over the past twelve months, this coalition has grown, with over 100 technology companies now supporting the project.

From Prototype to Production

The momentum since the donation has been remarkable. What began as a Google-led initiative has evolved into critical infrastructure for horizontal, peer-to-peer collaboration. Just one month ago, in March, the project reached a major milestone with the release of A2A Protocol v1.0, the first stable, fully production-ready version of the standard.

Key achievements from the community this year include:

  • Enhanced Security: The implementation of Signed Agent Cards for cryptographic identity verification, ensuring trust in multi-agent workflows.
  • Web-Aligned Architecture: Refined specifications that support familiar load-balancing and security patterns for enterprise-scale deployments.
  • Ecosystem Interoperability: Demonstrating how diverse agents built with ADK, LangGraph, AG2 and CrewAI can delegate tasks and coordinate complex workflows seamlessly.
  • Experts teaching experts: We have learned from our open collaboration and have shared our knowledge.

Looking Ahead

This flourishing ecosystem of agent protocols helps standardize how agents communicate, interact with the world, and solve real-world problems. The A2Family includes AP2 (Agent Payment Protocol), A2UI (Agent to User Interface), and UCP (Universal Commerce Protocol), which are examples of new protocols created using A2A's open extensibility model for agent communication.

As we celebrate this first anniversary, we are more committed than ever to the "A2Family." The A2A protocol is designed to be complementary to existing standards like the Model Context Protocol (MCP); while MCP manages internal tool integration, A2A handles the vital external coordination between autonomous entities.

We want to thank the vibrant ecosystem of developers, contributors, and partners who have helped harden this protocol into a world-class standard over the last year.

Join the A2April Celebration!

We're celebrating the first anniversary of A2A all month long with "A2April". You can join the fun by sharing a photo of yourself in the community using the hashtag #A2April. To help you get festive, we've put together a commemorative party hat template with full assembly instructions.

Here's to many more years of innovation and open collaboration!

Acknowledgements

Thank you to the following contributors: Mike Smith, Alan Blount, Kassandra Dhillon, Daryl Ducharme, and April Kyle Nassi

Jaspr: Why web development in Dart might just be a good idea

Wednesday, April 15, 2026

Jaspr, the open source web framework, is built on Dart

Most developers know Dart as the language that powers Flutter, the multi-platform app framework. But the Dart ecosystem has so much more to offer. For example: Jaspr, a web framework that provides a familiar Flutter-like experience, but is made for building fast, SEO-friendly, and dynamic websites natively in Dart.

Dart on the web is not a new idea. Initially, Dart was designed to run natively in browsers, similar to JavaScript. Google even developed AngularDart, a pure-Dart version of the popular JS framework. And although this is no longer supported, it resulted in some surprisingly powerful web tooling for Dart. Back in 2016, teams at Google chose Dart for its strong type safety and excellent development experience, and it has only improved since then.

However, all of this was unknown to me when I started building Jaspr in 2022. As a web developer who had transitioned to Flutter, I had grown to love Dart and wanted to explore using it for web development. So Jaspr started as a personal challenge: What would a modern web framework look like if it was built entirely in Dart?

Creating Jaspr as an open source project has been one of the most challenging, but also rewarding journeys of my career. Starting out as a solo maintainer is definitely hard work, but it comes with absolute creative freedom. I can explore unconventional ideas, design APIs exactly how I envision them, and integrate modern features seen in other frameworks. All without being slowed down by processes or roadmaps. I poured more than three years of late nights and weekends into the framework. That dedication finally paid off in a way I had never imagined: Google selected Jaspr to completely rebuild and power the official Dart and Flutter websites.

Architecture & design

To understand how Jaspr actually works, let's look at its underlying design. Jaspr is primarily targeted at Flutter developers venturing into web development. Having a clearly defined niche like this greatly helped me shape the framework and prioritize features, while not getting spread too thin as a maintainer.

One of Jaspr's core design principles is that it should look and feel familiar to Flutter, while relying on native web technologies like HTML and CSS. This sets it apart from Flutter, which since 2021 can also target the web, but instead optimizes for rendering consistency between platforms. It relies fully on the Canvas API for rendering, which comes at the cost of slower loading times and lower SEO. Therefore, Jaspr is the missing piece for Flutter developers wanting to build fast and optimized websites with great SEO.

Jaspr results in a syntax that is remarkably close to Flutter's, and functionality that is much closer to something like React with an efficient, DOM-based rendering algorithm.

Example: Jaspr component | Flutter widget | React component

As you can see, Jaspr's StatelessComponent mirrors Flutter's StatelessWidget, but constructs HTML similar to React with JSX. Jaspr also provides a type-safe API for writing CSS rules directly in Dart.

Client-side rendering is only one aspect of what Jaspr can do. Jaspr is built as a full-stack general purpose framework supporting both Server-Side Rendering (SSR) and Static Site Generation (SSG). In the JavaScript ecosystem, you usually find a hard split between rendering libraries (React, Vue) and meta-frameworks (Next, Nuxt, Astro). Jaspr combines these concepts into one versatile and coherent framework.

In order to achieve this wide range of features with the limited resources I had, I naturally had to make compromises. Since I didn't want to limit the quality of any feature, my strategy focuses more on limiting features to what's important. I also learned to prioritize simple solutions and to design APIs that are flexible and composable.

For instance, I built jaspr_content as a plugin for developing content-driven sites from Markdown and other sources, similar to Astro or VitePress. It provides all the core features needed to build massive documentation websites, and instead of serving every use case out of the box, it is flexible and open enough to be fully customizable. In fact, jaspr_content is what currently powers the new flutter.dev and dart.dev documentation, which contain over 3,900 pages.

Tooling and developer experience

In my opinion, a framework is only as good as its tooling, and this is where Dart truly shines and has provided Jaspr developers with a great developer experience. For example, Flutter is known for its stateful hot-reload, enabling you to swap out code instantly without losing client-side state. But hot-reload is actually a Dart feature, enabled by its unique compiler architecture.

For browser development, the dartdevc compiler performs modular and incremental compilation to JavaScript. It supports stateful hot-reload and provides a seamless debugging experience. By cleverly leveraging source-maps, you can step through native Dart code right in the browser, complete with breakpoints, value inspection, and runtime expression evaluation.

An image show what debugging Jaspr or Dart code looks like when using Chrome DevTools
Debugging Jaspr / Dart code using Chrome DevTools

For production builds, Dart uses the dart2js compiler to generate a heavily optimized, tree-shaken JavaScript bundle, or the newer dart2wasm compiler for even better runtime performance through WebAssembly. On the server side, Dart's JIT compiler provides that same hot-reload and debugging capabilities, while its AOT compiler compiles your server code to optimized, platform-specific, native binaries for production environments.

Jaspr builds on top of these and other capabilities, for example by giving developers full-stack debugging, custom lints and code assists, and something I call component scopes. This is a neat editor feature that adds inline hints to your components, showing whether they are rendered on the server, the client, or both. When building full-stack apps, this makes it much easier to reason about which platform APIs or libraries you can safely use in a specific file. I'm also working on more features to make the full-stack development aspect even smoother. For example, a full-stack hot-reload where on any server-side change, whether updating code or (for example), editing a markdown file, the new pre-rendered HTML is "hot-reloaded" into the page while keeping all client-side state. Features like these are only possible due to Jaspr's approach to combine both server- and client-side rendering into one framework.

Impact and outlook

Last year, Google selected Jaspr for the Dart and Flutter websites, including dart.dev, flutter.dev and docs.flutter.dev (repo), which is used by over a million monthly active users. The sites were migrated from JS- and python-based static site generators to Jaspr and jaspr_content, resulting in a unified setup with less context switching and an easier contribution experience. The move to Jaspr also streamlined the development of brand-new interactive tutorials on dart.dev/learn and docs.flutter.dev/learn. For me this is not only an incredible trust in the capabilities of Jaspr, but also a great way to dogfood Jaspr at scale; it allowed me to invest more time and resources into improving Jaspr.

With AI constantly shifting the scope of software development, I believe the concept of being a strict "domain expert" (a purely mobile or purely web developer) will matter less. However, developers and teams will increasingly value coherent tech stacks to reduce context-switching and leverage unified tooling. Just as React Native became massively popular because it allowed web developers to reuse their skills for mobile (or for companies to "reuse" their developers), Jaspr is a great option for teams working with both Flutter and the web. Apart from using existing skills, Jaspr and Flutter projects can also share up to 100% of their business logic, models, and validation code.

Dart's type safety and high-quality tooling position it well for modern web development. Jaspr evolved to be the missing piece, a cohesive framework with modern features and a great development experience.

I personally see Jaspr as an antithesis to the trend of AI causing everyone to converge onto the same stack, especially in web development. While this also has some benefits, I believe there is immense value in exploring alternative ecosystems. This can push boundaries, surface new ideas, and keep our industry vibrant.

If there's one takeaway from my journey, it's this: Don't be afraid to build the tools you want to use. You never know where that codebase will take you, and it can be incredibly rewarding.

If you're a Dart or Flutter developer curious about building websites with the skills you already have, there's never been a better time to start. Try out Jaspr now on its online playground (which is also built with Jaspr!) or by following the Jaspr quickstart.

Learn more about Flutter's migration in We rebuilt Flutter's websites with Dart and Jaspr.

Oh, and if you're wondering where the name "Jaspr" came from — it's named after my dog, Jasper. If you ever find yourself wandering around jaspr.site and want to Meet Jasper, keep an eye out… you just might find a little easter egg tribute to him.

Leveraging CPU memory for faster, cost-efficient TPU LLM training

Friday, April 10, 2026

Intel Xeon 6 Processor

Host offloading with JAX on Intel® Xeon® processors

As Large Language Models (LLMs) continue to scale into the hundreds of billions of parameters, device memory capacity has become a big limiting factor in training, as intermediate activations from every layer in the forward pass are needed in the backward pass. To reduce device memory pressure, these activations can be rematerialized during the backward pass, trading memory for recomputation. While rematerialization enables larger models to fit within limited device memory, it significantly increases training time and cost.

Intel® Xeon® processors (5th and 6th Gen) with Advanced Matrix Extensions (AMX) enable practical host offloading of selected memory- and compute-intensive components in JAX training workflows. This approach can help teams train larger models, relieve accelerator memory pressure, improve end-to-end throughput, and reduce total cost of ownership—particularly on TPU-based Google Cloud instances.

By publishing these results and implementation details, Google and Intel aim to promote transparency and share practical guidance with the community. This post describes how to enable activation offloading for JAX on TPU platforms and outlines considerations for building scalable, cost-aware hybrid CPU–accelerator training workflows.

Figure 1. Google Cloud TPU Pod commonly used in LLM training.

Host offloading

Traditional LLM training is usually done on device accelerators alone. However, modern host machines have much larger memory size than accelerators (512GB or more) and can offer extra compute power, e.g., TFLOPS in case of Intel® Xeon® Scalable Processor with AMX capability. Leveraging host resources can be a great alternative to rematerialization. Host offloading selectively moves computation or data between host and device to optimize performance and memory usage.

Host memory offloading keeps frequently-accessed tensors on the device and spills the rest to CPU memory as an extra level of cache. Activation offloading transfers activations computed on-device in the forward pass to the host, stores them in the host memory, and brings them back to the device in the backward pass for gradient computation. This unlocks the ability to train larger models, use bigger batch sizes, and improve throughput.

Figure 2: Memory offloading during forward and backward pass

In this blog post, we provide a practical guide to offload activations through JAX to efficiently train larger models on TPUs with an Intel® Xeon® Scalable Processor.

Enabling memory offloading in JAX

JAX offers multiple strategies for offloading activations, model parameters, and optimizer states to the host. Users can use checkpoint_names() to create a checkpoint for a tensor. The snippet below shows how to create a checkpoint  x:

from jax.ad_checkpoint import checkpoint_name 
 
def layer_name(x, w): 
  w1, w2 = w 
  x = checkpoint_name(x, "x") 
  y = x @ w1 
  return y @ w2, None  

Users can provide checkpoint_policies() to select the appropriate memory optimization strategy for intermediate values. There are three strategies:

  1. Recomputing during backward pass (default behavior)
  2. Storing on device
  3. Offloading to host memory after forward pass and loading back during backward pass

The code below moves x from device to the pinned host memory after the forward pass.
from jax import checkpoint_policies as cp

policy = cp.save_and_offload_only_these_names( 
  names_which_can_be_saved=[],         # No values stored on device 
  names_which_can_be_offloaded=["x"],  # Offload activations labeled "x" 
  offload_src="device",                # Move from device memory 
  offload_dst="pinned_host"            # To pinned host memory 
) 

Measuring Host Offloading Benefits on TPU v5p

We examined TPU host-offloading on JAX on both fine-tuning and training workloads. All our experiments were run on Google Cloud Platform, using a single v5p-8 TPU instance with single host 4th Gen Intel® Xeon® Scalable Processor.

Fine-tuning PaliGemma2: Using the base PaliGemma2 28B model for vision-language tasks, we fine-tuned the attention layers of the language model (Gemma2 27B) while keeping all other parameters frozen. During fine-tuning, we set the LLM sequence length to 256 and the batch size to 256.

The default checkpoint policy is nothing_saveable, which does not keep any activations on-device during the forward pass. The activations are rematerialized during the backward pass for gradient computation. While this approach reduces memory pressure on the TPU, it increases compute time. To apply host offloading, we offload Q, K, and V projection weights using save_and_offload_only_these_names. These activations are transferred to host memory (D2H) during the forward pass and fetched back during the backward pass (H2D), so the device neither stores nor recomputes them. Figure 2 shows 10% reduction in training time from host offloading. This translates directly into a similar reduction in TPU core-hours, yielding meaningful cost savings. The complete fine-tuning recipe is available at [JAX host offloading].

Figure 3: (Top) Training time comparison between full rematerialization and host offloading.
(Bottom) Memory analysis with and without host offloading.

Training Llama2-13B using MaxText: MaxText offers several rematerialization strategies that can be specified in the training configuration file. We used the policy remat_policy: 'qkv_proj_offloaded' to offload Q, K, and V projection weights. Figure 3 shows ~5% reduction in per-step training time compared to fully rematerializing all activations ( remat_policy: 'full').

Figure 4: MaxText Llama2-13B training statistics with and without host offloading.
The step time was 5% faster with host offloading.

When to offload activations

Activation offloading is beneficial when the time to transfer activations across host and device is lower than the time to recompute them. The timing depends on multiple factors such as PCIe bandwidth, model size, batch size, sequence length, activation tensor sizes, compute capabilities of the device, etc. An additional factor is how much the data movement can be overlapped with computation to keep the device busy. Figure 4 demonstrates an efficient overlap of the device-to-host transfer with compute during the backward pass in PaliGemma2 28B training.

Figure 5: A JAX trace of PaliGemma2 training viewed on Perfetto.
Memory offloading overlaps with compute effectively during backward pass host to device.

Smaller model variants such as PaliGemma2 3B and 9B did not see benefits from host offloading because it is faster to rematerialize all tensors than to transfer them to and from the host. Therefore, identifying the appropriate workload and offloading policy is crucial to realizing performance gain from host offloading

Call to Action

If you train on TPUs and are limited by device memory, consider evaluating activation offloading. Start by labeling candidate activations (for example, Q/K/V projections) and compare step time, memory headroom, and overall cost across representative workloads.

In our experiments, we observed up to ~10% improvement in end-to-end training time for larger workloads, which can reduce total cost of ownership (TCO) by shortening time-to-train or enabling the same workload on smaller instances.

Acknowledgments

Emilio Cota, and Karlo Basioli from Google and Eugene Zhulenev (formerly at Google).

Celebrate A2April!

Thursday, April 9, 2026

Happy 1st Birthday to A2A! Join the community in celebrating the first anniversary of the A2A and its recent 1.0 release. April 9th marks the official birthday, and we're celebrating all month long with #A2April. To help you celebrate, we've used Gemini to make a party hat.

Use the template and instructions below to create your commemorative party hat.

Assembly Instructions

  1. Print: Print this document on heavy cardstock for the best results.
  2. Cut: Carefully cut along the solid outer border of the semi-circle template.
  3. Fold: Gently curve the template into a cone shape, overlapping the "Glue/Tape Tab" underneath the opposite edge.
  4. Secure: Use double-sided tape or a glue stick along the tab to hold the cone shape.
  5. Finish: Punch two small holes on opposite sides of the base and thread through an elastic string or ribbon to secure the hat to your head.

Party Hat Visualization

Make sure to print in landscape mode

Ways to Celebrate

  • Social Media: Share a photo of yourself wearing your hat with the tag #A2April to help generate that social media buzz.
  • Blog Series: Keep an eye out for the upcoming A2April blog series featuring quotes from the team and stories from the open source community.
  • Community Quotes: If you're using A2A in production, reach out to us via social media and share your story for the birthday post.

Kubernetes goes AI-First: Unpacking the new AI conformance program

Monday, April 6, 2026

As AI workloads move from experimental notebooks into massive production environments, the industry is rallying around a new standard to ensure these workloads remain portable, reliable, and efficient.

At the heart of this shift is the launch of the Certified Kubernetes AI Conformance program.

This initiative represents a significant investment in common, accessible, industry-wide standards, ensuring that the benefits of AI-first Kubernetes are available to everyone.

How Kubernetes is Evolving for an AI-First World

Traditional Kubernetes was built for stateless, cloud-first applications. However, AI workloads introduce unique complexities that standard conformance doesn't fully cover:

  • Specific Hardware Demands: AI models require precise control over accelerators like GPUs and TPUs.
  • Networking and Latency: Inference and distributed training require low-latency networking and specialized configurations.
  • Stateful Nature: Unlike traditional web apps, AI often relies on complex, stateful data pipelines.

The AI Conformance program acts as a superset of standard Kubernetes conformance. To be AI-conformant, a platform must first pass all standard Kubernetes tests and then meet additional requirements specifically for AI.

Key Pillars of the AI Conformance Program

The Kubernetes AI Conformance program is being driven in the open via the AI Conformance program. This cross-company effort is led by industry experts Janet Kuo (Google), Mario Fahlandt (Kubermatic GmbH), Rita Zhang (Microsoft), and Yuan Tang (RedHat). This program is a collaborative effort within the open source ecosystem, involving multiple organizations and individuals. By developing this program in the open, the community ensures the standard is built on trust and directly addresses the diverse needs of the global ecosystem. The program establishes a verified set of capabilities that platforms across the industry, like Google Kubernetes Engine (GKE) and Azure Kubernetes Service (AKS) are already adopting.

Dynamic Resource Allocation (DRA)

DRA is the cornerstone of the new standard. It shifts resource allocation from simple accelerator quantity to fine-grained hardware control via attributes. For data scientists, this means they can now request specific hardware based on characteristics such as memory capacity or specialized capabilities, ensuring the environment perfectly matches the model's needs.

All-or-Nothing Scheduling

Distributed training jobs often face "deadlocks" where some pods start while others wait for resources, wasting expensive GPU time. AI Conformance mandates support for solutions like Kueue, allowing developers to ensure a job only begins when all required resources are available, improving cluster efficiency.

Intelligent Autoscaling for AI Workloads

Conformant clusters must support Horizontal Pod Autoscaling (HPA) based on custom AI metrics, such as GPU or TPU utilization, rather than just standard CPU/memory. This allows clusters to scale up for heavy inference demand and scale down to save costs when idle.

Standardized Observability for High Performance

To manage AI at scale, you need deep visibility. The program requires platforms to expose rich accelerator performance metrics directly, enabling teams to monitor inference latency, throughput, and hardware health in a standardized way.

What's Next?

The launch of AI Conformance is just the beginning. As we head further into 2026, the community is adding automated testing for certification and expanding the standard to include more advanced inference patterns and stricter security requirements.

The ultimate goal? Making "AI-readiness" an inherent, invisible part of the Kubernetes standard.

To get involved and help shape the future of AI on Kubernetes, consider joining AI Conformance in Open Source Kubernetes. We welcome diverse perspectives, as your expertise and feedback are crucial to building a robust and inclusive standard for all.

Gemma 4: Expanding the Gemmaverse with Apache 2.0

Thursday, April 2, 2026

Gemma 4: Expanding the Gemmaverse with Apache 2.0

For over 20 years, Google has maintained an unwavering commitment to the open-source community. Our belief has been simple: open technology is good for our company, good for our users, and good for our world. This commitment to fostering collaborative learning and rigorous testing has consistently proven more effective than pursuing isolated improvements. It's been our approach ever since the 2005 launch of Google Summer of Code, and through our open-sourcing of Kubernetes, Android, and Go, and it remains central to our ongoing, daily work alongside maintainers and organizations.

Today, we are taking a significant step forward in that journey. Since first launch, the community has downloaded Gemma models over 400 million times and built a vibrant universe of over 100,000 inspiring variants, known in the community as the Gemmaverse.

The release of Gemma 4 under the Apache 2.0 license — our most capable open models ranging from edge devices to 31B parameters — provides cutting-edge AI models for this community of developers. The industry-standard Apache license broadens the horizon for Gemma 4's applicability and usefulness, providing well-understood terms for modification, reuse, and further development.

A long legacy of open research

We are committed to making helpful, accessible AI technology and research so that everyone can innovate and grow. That's why many of our innovations are freely available, easy to deploy, and useful to developers across the globe. We have a long history of making our foundational machine-learning research, including word2vec, Jax, and the seminal Transformers paper, publicly available for anyone to use and study.

We accelerated this commitment last year. By sharing models that interpret complex genomic data and identify tumor variants, we contributed to the "magic cycle" of research breakthroughs that translate into real-world impact. This week, however, marks a pivotal moment — Gemma 4 models are the first in the Gemmaverse to be released under the OSI-approved Apache 2.0 license.

Empowering developers and researchers to deliver breakthrough innovations

Since we first launched Gemma in 2024, the community of early adopters has grown into a vast ecosystem of builders, researchers, and problem solvers. Gemma is already supporting sovereign digital infrastructure, from automating state licensing in Ukraine to scaling Project Navarasa across India's 22 official languages. And we know that developers need autonomy, control, and clarity in licensing for further AI innovation to reach its full potential.

Gemma 4 brings three essential elements of free and open-source software directly to the community:

  • Autonomy: By letting people build on and modify the Gemma 4 models, we are empowering researchers and developers with the freedom to advance their own breakthrough innovations however they see fit.
  • Control: We understand that many developers require precise control over their development and deployment environments. Gemma 4 allows for local, private execution that doesn't rely on cloud-only infrastructure.
  • Clarity: By applying the industry-standard Apache 2.0 license terms, we are providing clarity about developers' rights and responsibilities so that they can build freely and confidently from the ground up without the need to navigate prescriptive terms of service.

Building together to drive real-world impact

Gemma 4, as a release, is an invitation. Whether you are a scientific researcher exploring the language of dolphins, an industry developer building the next generation of open AI agents, or a public institution looking to provide more effective, efficient, and localized services to your citizens, Google is excited to continue building with you. The Gemmaverse is your playground, and with Apache 2.0, the possibilities are more boundless than ever.

We can't wait to see what you build.

Google Cloud: Investing in the future of PostgreSQL

Tuesday, March 31, 2026

At Google Cloud, we are deeply committed to open source, and PostgreSQL is a cornerstone of our managed database offerings, including Cloud SQL & AlloyDB.

Continuing our work with the PostgreSQL community, we've been contributing to the core engine and participating in the patch review process. Below is a summary of that technical activity, highlighting our efforts to enhance the performance, stability, and resilience of the upstream project. By strengthening these core capabilities, we aim to drive innovation that benefits the entire global PostgreSQL ecosystem and its diverse user base.

Our investments in PostgreSQL logical replication aim to unlock critical capabilities for all users. By enhancing conflict detection, we are paving the way for robust active-active replication setups, increasing write scalability and high availability. We are also focused on expanding logical replication to cover missing objects. This is key to enabling major version upgrades with minimal downtime, offering a more flexible alternative to pg_upgrade. Furthermore, our ongoing contributions to bug fixes are dedicated to improving the overall stability and resilience of PostgreSQL for everyone in the community.

Technical contributions: July 2025 – December 2025

The following sections detail technical enhancements and bug fixes contributed to the PostgreSQL open source project between July 2025 and December 2025. Primary engineering efforts were dedicated to advancing logical replication toward active-active capabilities, implementing missing features, optimizing pg_upgrade, and fixing bugs.

Logical Replication Enhancements

Logical replication is a critical feature of PostgreSQL enabling capabilities like near zero down time, major version upgrades, selective replication, active-active replication. We have been working towards closing some of the key gaps.

Automatic Conflict Detection

Active-active replication is a mechanism for increasing PostgreSQL write scalability. One of the most significant hurdles for active-active PostgreSQL setups is handling row-level conflicts when the same data is modified on two different nodes. Historically, these conflicts could stall replication, requiring manual intervention.

In this cycle, the community committed Automatic Conflict Detection which is the first phase of Automatic Conflict Detection and Resolution. This foundation allows the replication worker to automatically detect when an incoming change (Insert, Update, or Delete) conflicts with the local state.

Contributors: Dilip Kumar helped by performing code and design reviews. He is currently advancing the project's second phase, focusing on implementing conflict logging into a dedicated log table.

Logical replication of sequences

Until recently, logical replication in PostgreSQL was primarily limited to table data. Sequences did not synchronize automatically. This meant that during a migration or a major version upgrade, DBAs had to manually sync sequence values to prevent "duplicate key" errors on the new primary node. Since many databases rely on sequences, this was a significant hurdle for logical replication.

Contributors: Dilip Kumar helped by performing code and design reviews.

Drop subscription deadlock

The DROP SUBSCRIPTION command previously held an exclusive lock while connecting to the publisher to delete a replication slot.

If the publisher was a new database on the same server, the connection process would stall while trying to access that same locked catalog.

This conflict created a "self-deadlock," where the command was essentially waiting for itself to finish.

Contributors: Dilip Kumar analyzed and authored the fix.

Upgrade Resilience

Operational ease of use and friction-less upgrades are important to PostgreSQL users. We have been working on improving the upgrade experience.

pg_upgrade optimization for Large Objects

For databases with massive volumes of Large Objects, upgrades could previously span several days. This bottleneck is resolved by exporting the underlying data table directly rather than executing individual Large Object commands, resulting in an upgrade process that is several orders of magnitude faster.

Contributors: Hannu Krosing, Nitin Motiani and, Saurabh Uttam, highlighted the severity of the issue, proposed the initial fix and actively drove it to the resolution.

Prevent logical slot invalidation during upgrade:

Upgrade to PG17 fails if max_slot_wal_keep_size is not set to -1. This fix improves pg_upgrade's resilience, eliminating the need for users to manually set max_slot_wal_keep_size to -1. The server now automatically retains the necessary WAL data for upgrading logical replication slots, simplifying the upgrade process and reducing the risk of errors.

Contributors: Dilip Kumar analyzed and authored the fix.

pg_upgrade NOT NULL constraint related bug fix

A bug in pg_dump previously failed to preserve non-inherited NOT NULL constraints on inherited columns during upgrades from version 17 or older.

The fix updates the underlying query to ensure these specific schema constraints are correctly identified and migrated during the pg_upgrade process.

Contributors: Dilip Kumar analyzed and authored the fix.

Miscellaneous Bug Fixes

We continue to contribute bug fixes to help improve the stability and quality of PostgreSQL.

Make pgstattuple more robust about empty or invalid index pages

pgstattuple is a PostgreSQL extension for analyzing the physical storage of tables and indexes at the row (tuple) level, to determine whether a table is in need of maintenance. However, pgstattuple would raise errors with empty or invalid index pages in hash and gist code. This bug handles the empty and invalid index pages to make pgstattuple more robust.

Contributors: Nitin Motiani and Dilip Kumar, participated as author and reviewer.

Loading extension from different path

A bug incorrectly stripped the prefix from nested module paths when dynamically loading shared library files. This caused libraries in subdirectories to fail to load. The bug fix ensures the prefix is only removed for simple filenames, allowing the dynamic library expander to correctly find nested paths

Contributors: Dilip Kumar, reported and co-authored the fix for this bug.

WAL flush logic hardening

XLogFlush() and XLogNeedsFlush() are internal PostgreSQL functions that ensure log records are written to the WAL to ensure durability. In certain edge cases, like the end-of-recovery checkpoint, the functions relied on inconsistent criteria to decide which code path to follow. This inconsistency posed a risk for upcoming features i.e. Asynchronous I/O for writes that require XLogNeedsFlush() to work reliably.

Contributors: Dilip Kumar, co-authored the fix for this bug.

Major Features in Development

Beyond our recent commits, the team is actively working on several high-impact proposals to further strengthen the PostgreSQL ecosystem.

  • Conflict Log Table for Detection: Dilip Kumar is developing a proposal for a conflict log table designed to offer a queryable, structured record of all logical replication conflicts. This feature would include a configuration option to determine whether conflict details are recorded in the history table, server logs, or both.
  • Adding pg_dump flag for parallel export to pipes: Nitin Motiani is working on this feature. This introduces a flag which allows the user to provide pipe commands while doing parallel export/import from pg_dump/pg_restore (in directory format).

Leadership

Beyond code, our team supports the ecosystem through community leadership. We are pleased to share that Dilip Kumar has been selected for the PGConf.dev 2026 Program Committee to help shape the project's premier developer conference.

Community Roadmap: Your Feedback Matters

We encourage you to utilize the comments area to propose new capabilities or refinements you wish to see in future iterations, and to identify key areas where the PostgreSQL open-source community should focus its investments.

Acknowledgement

We want to thank our open source contributors for their dedication to improving the upstream project.

Dilip Kumar: PostgreSQL significant contributor

Hannu Krosing: PostgreSQL significant contributor

Nitin Motiani: Contributing features and bug fixes

Saurabh Uttam: Contributing bug fixes

We also extend our sincere gratitude to the wider PostgreSQL open source members, especially the committers and reviewers, for their guidance, reviews, and for collaborating with us to make PostgreSQL the most advanced open source database in the world.

.