Posts from May 2025

Introducing New Open Source Documentation Resources

Wednesday, May 28, 2025

shapes representing pie charts, a circuit board, and text edited with red markings

Today we're introducing two new open source documentation resources for open source software maintainers, a Docs Advisor guide and a set of Documentation Project Archetypes. These tools are intended to help maintainers make effective use of limited resources when it comes to planning and executing open source documentation work.

The Docs Advisor is a guide intended to demystify documentation work, including help picking a documentation approach, understanding your audience and available resources, and how to write, revise, evaluate, and maintain your documentation.

Documentation Project Archetypes are a set of thirteen project field guides. Each archetype represents a different type of documentation project, the problems it can solve, and how to bring the right collaborators together on the project to create great docs.

Origin story

More than 130 open source projects wrote 200+ case studies and project reports as a part of their participation in the Google Season of Docs program from 2019 to 2024. These case studies and project reports represent a variety of documentation projects from a wide range of open source groups. In these wrap-ups, project maintainers and technical writers describe how they approached their documentation projects, capturing many successes and more than a few challenges.

These reports are a treasure trove of lessons learned–but it's unrealistic to expect time-crunched open source maintainers to read through them all. So we got in touch with Daniel Beck and Erin Kissane to chat about ways to help organize and summarize some of these lessons learned.

These conversations turned into the Docs Advisor guide (‘like having an experienced technical writer hanging over your shoulder') and the thirteen Documentation Project Archetypes.

Our goal with these resources was to turn all of the hard-won experience of the Google Season of Docs participants into explicit documentation advice and guidance for open source maintainers.

More about the Docs Advisor

The Docs Advisor guide is intended to demystify the work of good documentation. It collects practices and processes from within technical writing and docs communities and from user experience, information architecture, and content strategy.

In Part 1, you'll pick an overall approach that suits the needs of your project.
In Part 2, you'll learn enough about your community and their needs to ensure that your hard work will be helping real people.
In Part 3, you'll assess your existing resources and pull together everything you need to move quickly and confidently through the work of creating and revising your docs.
In Part 4, you'll get to work writing and revising your docs and set yourself to successfully evaluate your work and maintain it.

The Docs Advisor guide also includes a docs plan template to help you accomplish your docs plan work, including:

What approach will you take to your documentation work, as a whole?
What risks do you need to mitigate?
Are there any documents to make or steps to perform to increase your chances of success?

The Docs Advisor incorporates guidance from interviews with open source maintainers and technical writers as well as from the Google Season of Docs case studies, and integrates the Documentation Project Archetypes into the recommendations for maintainers planning docs work.

More about the Archetypes

Documentation Project Archetypes are meant to help you recognize common types of documentation work (whether you're writing a new user guide or replatforming your docs site), the situations in which they apply, and organize yourself to bring the work to completion.

The archetypes cover the following areas:

Planning and evaluating your docs: Experiment and analysis archetypes support future docs work, by learning more about your existing docs, your audience, and your capacity to deliver meaningful change.
Producing new docs: Creation archetypes make new docs that directly help your audience complete tasks and achieve their goals.
Revising and transforming existing docs: Revision archetypes modify existing docs, to improve quality, reduce maintenance costs, and reach wider audiences.
Equipping yourself with docs tools and process: Tool and process archetypes adopt new tools or practices that help you make more, better, or higher quality docs.

All of the archetypes are available on GitHub.

The Edit: a secretary bird holding a red pencil and a doc showing copy marked up for editing

The Audit: an otter holding an abacus and a red pie-shaped wedge against a background of pie charts and line charts

The Factory: robot arms holding a red angled block against a backdrop of abstract circuitry in green and black

Doc tools in the wild

We are excited to share these tools and are looking forward to seeing how they are used and evolve.

Daniel demoed the concept and first completed archetype, The Migration, at FOSDEM 2025 in his talk Patterns for maintainer and tech writer collaboration. He also talked about the work on the API Resilience Podcast episode "Patterns in Documentation."

We hope to get valuable feedback during a proposed Doc Archetypes session at Open Source Summit Europe 2025 (acceptance pending).

We are also excited to be developing some Doc Archetype illustration cards with Heather Cummings — a few previews are already live on The Edit, The Audit, and The Factory.

If you have questions or suggestions, please raise an issue in the Open Docs repo.

By Elena Spitzer & Erin McKean, Google Open Source Programs Office

Transforming Kubernetes and GKE into the leading platform for AI/ML

Wednesday, May 21, 2025

The world is rapidly embracing the power of AI/ML, from training cutting-edge foundation models to deploying intelligent applications at scale. As these workloads become more sophisticated and demanding, the infrastructure required to support them must evolve. Kubernetes has emerged as the standard for container orchestration, but AI/ML introduces unique challenges that push traditional infrastructure to its limits.

AI training jobs often require massive scale, needing to coordinate thousands of specialized hardware like GPUs and TPUs. Reliability is critical, as failures can be costly for long running, large-scale training jobs. Efficient resource sharing across teams and workloads is essential given the expense of accelerators. Furthermore, deploying and scaling AI models for inference demands low latency and faster startup times for large container images and models.

At Google, we are deeply invested in the AI/ML revolution. This is why we are doubling down on our commitment to advancing Kubernetes as the foundational open standard for these workloads. Our strategy centers on evolving the core Kubernetes platform to meet the needs of the "next trillion core hours," specifically focusing on batch and AI/ML. We then bring these advancements, alongside enterprise-grade management and optimizations, to users through Google Kubernetes Engine (GKE).

Here's how we are transforming Kubernetes and GKE:

Redefining Kubernetes' relationship with specialized hardware

Kubernetes was initially designed for more uniform CPU compute. The surge of AI/ML brought new requirements for seamless integration and efficient management of expensive, sparse, and diverse accelerators. To support these new demands, Google has been a key investor in upstream Kubernetes to offer robust support for a diverse portfolio of the latest accelerators, including multiple generations of TPUs and a wide range of NVIDIA GPUs.

A core Kubernetes enhancement driven by Google and the community to better support AI/ML workloads is Dynamic Resource Allocation (DRA). This framework, developed in the heart of Kubernetes, provides a more flexible and extensible way for workloads to request and consume specialized hardware resources beyond traditional CPU and memory, which is crucial for efficiently managing accelerators. Building on such foundational open-source capabilities, GKE can then offer features like Custom Compute Classes, which improve the obtainability of these resources through intelligent fallback priorities across different capacity types like reservations, on-demand, and Spot instances. Google's active contributions to advanced resource management and scheduling capabilities within the Kubernetes community ensure that the platform evolves to meet the sophisticated demands of AI/ML, making efficient use of these specialized hardware resources more broadly accessible.

Unlocking scale and reliability

AI/ML workloads demand unprecedented scale and have new failure modes compared to traditional applications. GKE is built to handle this, supporting up to 65,000 nodes in a single cluster. We've demonstrated the ability to run the largest publicly announced training jobs, coordinating 50,000 TPU chips with near-ideal scaling efficiency.

Critically, we are enhancing core Kubernetes capabilities to support the scale and reliability needed for AI/ML. For instance, to better manage distributed AI workloads like serving large models split across multiple hosts, Google has been instrumental in developing features like JobSet (emerging from earlier concepts like LeaderWorkerSet) within the Kubernetes community (SIG Apps). This provides robust orchestration for co-scheduled, interdependent groups of Pods. We are also actively working upstream to improve Kubernetes reliability and stability through initiatives like Production Readiness Reviews, promoting safer upgrade paths, and enhancing etcd stability for the benefit of all Kubernetes users.

Optimizing Kubernetes performance for efficient inference

Low-latency and cost-efficient inference is critical for AI applications. For serving, the GKE Inference Gateway routes requests based on model server metrics like KVCache utilization and pending queue length, reducing serving costs by up to 30% and tail latency by 60% compared to traditional load balancing. We've even achieved vLLM fungibility across TPUs and GPUs, allowing users to serve the same model on either accelerator without incremental effort.

To address slow startup times for large AI/ML container images (often 20GB+), GKE offers rapid scale-out features. Secondary boot disks allow preloading container images and data, resulting in up to 29x faster container mounting time. GCS FUSE enables streaming data directly from Cloud Storage, leading to faster model load times. Furthermore, GKE Inference Quickstart provides data-driven, optimized Kubernetes deployment configurations, saving extensive benchmarking effort and enabling up to 30% lower cost, 60% lower tail latency, and 40% higher throughput.

Simplifying the Kubernetes experience and enhancing observability for AI/ML

We understand that data scientists and ML researchers may not be Kubernetes experts. Google aims to simplify the setup and management of AI-optimized Kubernetes clusters. This includes contributions to Kubernetes usability efforts and SIG-Usability. Managed offerings like GKE provide multiple paths to set up AI-optimized environments, from default configurations to customizable blueprints. Offerings like GKE Autopilot further abstract away infrastructure management, aiming for the ease of use that benefits all users.
Ensuring visibility into AI/ML workloads is paramount. Google actively supports and contributes to the integration of standard open-source observability tools within the Kubernetes ecosystem, such as Prometheus, Grafana, and OpenTelemetry. Building on this open foundation, GKE then provides enhanced, out-of-the-box observability integrated with popular AI frameworks & tools, including specific insights into workload startup latency and end-to-end tracing.

Looking ahead: continued investment in Open Source Kubernetes for AI/ML

The transformation continues. Our roadmap includes exciting developments in upstream Kubernetes for easily deploying and managing large-scale clusters, support for new GPU & TPU generations integrated through open-source mechanisms, and continued community-driven innovations in fast startup, reliability, and ease of use for AI/ML workloads.

Google is committed to making Kubernetes the premier open-source platform for AI/ML, pushing the boundaries of scale, performance, and efficiency while maintaining stability and ease of use. By driving innovation in core Kubernetes and building powerful, deeply integrated capabilities in our managed offering, GKE, we are empowering organizations to accelerate their AI/ML initiatives and unlock the next generation of intelligent applications built on an open foundation.

Come explore the possibilities with Kubernetes and GKE for your AI/ML workloads!

By Francisco Cabrera & Federico Bongiovanni, GCP Google Kubernetes Engine

Announcing LMEval: An Open Source Framework for Cross-Model Evaluation

Wednesday, May 14, 2025

Announcing LMEval: An Open Source Framework for Cross-Model Evaluation

Authors: Elie Bursztein - Distinguished Research Scientist & David Tao - Software Engineer, Applied Security and Safety Research

Simplifying Cross-Provider Model Benchmarking

At InCyber Forum Europe in April, we open sourced LMEval, a large model evaluation framework, to help others accurately and efficiently compare how models from various providers perform across benchmark datasets. This announcement coincided with a joint talk with Giskard about our collaboration to increase trust in model safety and security. Giskard uses LMeval to run the Phare benchmark that independently evaluates popular models' security and safety.

Results from the Phare benchmark that leverages LMEval for evaluation

Example of LMEval running on a multimodal benchmark across two models.

Rapid Changes in the Landscape of Large Models

New Large Language Models (LLMs) are released constantly, often promising improvements and new features. To keep up with this fast-paced lifecycle, developers, researchers, and organizations must quickly and reliably evaluate if those newer models are better suited for their specific applications. So far, rapid model evaluation has proven difficult, as it requires tools that allow scalable, accurate, easy-to-use, cross-provider benchmarking.

Introducing LMEval: Simplifying Cross-Provider Model Benchmarking

To address this challenge, we are excited to introduce LMEval (Large Model Evaluator), an open source framework that Google developed to streamline the evaluation of LLMs across diverse benchmark datasets and model providers. LMEval is designed from the ground up to be accurate, multimodal, and easy-to-use. Its key features include:

Multi-Provider Compatibility: Evaluating models shouldn't require wrestling with different APIs for each provider. LMEval leverages the LiteLLM framework to offer out-of-the-box compatibility with major model providers including Google, OpenAI, Anthropic, Ollama, and Hugging Face. You can define your benchmark once and run it consistently across various models with minimal code changes.

Incremental & Efficient Evaluation: Re-running an entire benchmark suite every time a new model or version is released is slow, inefficient and costly. LMEval's intelligent evaluation engine plans and executes evaluations incrementally. It runs only the necessary evaluations for new models, prompts, or questions, saving significant time and compute resources. Its multi-threaded engine further accelerates this process.

Multimodal & Multi-Metric Support: Modern foundation models go beyond text. LMEval is designed for multimodal evaluation, supporting benchmarks that include text, images and code. Adding new modalities is straightforward. Furthermore, it supports various scoring metrics to support a wide range of benchmark formats from boolean questions, to multi-choices, to free form generation. Additionally, LMEval provides support for safety/punting detection.

Scalable & Secure Storage: To store benchmark results in a secure and efficient manner, LMEval utilizes a self-encrypting SQLite database. This approach protects benchmark data and results from inadvertent crawling/indexing while they stay easily accessible through LMEval.

Getting Started with LMEval

Creating and running evaluations with LMEval is designed to be intuitive. Here's a simplified example demonstrating how to evaluate two Gemini model versions on a benchmark:

Example of LMEval running on a multimodal benchmark across two models.

Results from the Phare benchmark that leverages LMEval for evaluation

The LMEval GitHub repository includes example notebooks to help you get started.

Visualizing Results with LMEvalboard

Understanding benchmark results requires more than just summary statistics. To help with this, LMEval includes LMEvalboard, a companion dashboard tool that offers an interactive visualization of how models stack up against each other. LMEvalboard provides valuable insights into model strengths and weaknesses, complementing traditional raw evaluation data.

LMEvalboard UI allows to quickly analyze how models compares on a given benchmark

LMEvalboard allows you to:

View Overall Performance: Quickly compare all models' accuracy across the entire benchmark.
Analyze a Single Model: Dive deep into a specific model's performance characteristics across different categories using radar charts and drill down on specific examples of failures
Perform Head-to-Head Comparisons: Directly compare two models, visualizing their performance differences across categories and examine specific questions where they disagree.

Try LMEval Today!

We invite you to explore LMEval, use it for your own evaluations, and contribute to its development by heading to the LMEval GitHub repository: https://github.com/google/lmeval

Acknowledgements

LMEval would not have been possible without the help of many people, including: Luca Invernizzi, Lenin Simicich, Marianna Tishchenko, Amanda Walker, and many other Googlers.

Kubernetes 1.33 is available on GKE!

Friday, May 9, 2025

Kubernetes 1.33 is now available in the Google Kubernetes Engine (GKE) Rapid Channel! For more information about the content of Kubernetes 1.33, read the official Kubernetes 1.33 Release Notes and the specific GKE 1.33 Release Notes.

Enhancements in 1.33:

In-place Pod Resizing

Workloads can be scaled horizontally by updating the Pod replica count, or vertically by updating the resources required in the Pods container(s). Before this enhancement, container resources defined in a Pod's spec were immutable, and updating any of these details within a Pod template would trigger Pod replacement impacting service's reliability.

In-place Pod Resizing (IPPR, Public Preview) allows you to change the CPU and memory requests and limits assigned to containers within a running Pod through the new /resize pod subresource, often without requiring a container restart decreasing service's disruptions.

This opens up various possibilities for vertical scale-up of stateful processes without any downtime, seamless scale-down when the traffic is low, and even allocating larger resources during startup, which can then be reduced once the initial setup is complete.

Review Resize CPU and Memory Resources assigned to Containers for detailed guidance on using the new API.

DRA

Kubernetes Dynamic Resource Allocation (DRA), currently in beta as of v1.33, offers a more flexible API for requesting devices than Device Plugin. (Instructions for opt-in beta features in GKE)

Recent updates include the promotion of driver-owned resource claim status to beta. New alpha features introduced are partitionable devices, device taints and tolerations for managing device availability, prioritized device lists for versatile workload allocation, and enhanced admin access controls. Preparations for general availability include a new v1beta2 API to improve user experience and simplify future feature integration, alongside improved RBAC rules and support for seamless driver upgrades. DRA is anticipated to reach general availability in Kubernetes v1.34.

containerd 2.0

With GKE 1.33, we are excited to introduce support for containerd 2.0. This marks the first major version update for the underlying container runtime used by GKE. Adopting this version ensures that GKE continues to leverage the latest advancements and security enhancements from the upstream containerd community.

It's important to note that as a major version update, containerd 2.0 introduces many new features and enhancements while also deprecating others. To ensure a smooth transition and maintain compatibility for your workloads, we strongly encourage you to review your Cloud Recommendations. These recommendations will help identify any workloads that may be affected by these changes. Please see "Migrate nodes to containerd 2" for detailed guidance on making your workloads forward-compatible.

Multiple Service CIDRs

This enhancement introduced a new implementation of allocation logic for Service IPs. The updated IP address allocator logic uses two newly stable API objects: ServiceCIDR and IPAddress. Now generally available, these APIs allow cluster administrators to dynamically increase the number of IP addresses available for Services by creating new ServiceCIDR objects.

Highlight of Googlers' contributions in 1.33 cycle:

Coordinated Leader Election

The Coordinated Leader Election feature progressed to beta, introducing significant enhancements in how a lease-candidate's availability is determined for an election. Specifically, the ping-acknowledgement checking process has been optimized to be fully concurrent instead of the previous sequential approach ensuring faster and more efficient detection of unresponsive candidates, which is essential for promptly identifying truly available lease candidates and maintaining the reliability of the leader election process.

Compatibility Versions

New CLI flags were added to apiserver as options for adjusting API enablement wrt an apiserver's emulated version. --emulation-forward-compatible is an option to implicitly enable all APIs which are introduced after the emulation version and have higher priority than APIs of the same group resource enabled at the emulation version.
--runtime-config-emulation-forward-compatible is an option to explicit enable specific APIs introduced after the emulation version through the runtime-config

zPages

ComponentStatusz and ComponentFlagz alpha features are now available to be turned on for all control plane components.
Components now expose two new HTTP endpoints, /statusz and /flagz, providing enhanced visibility into their internal state. /statusz details the component's uptime, golang, binary and emulation versions info, while /flagz reveals the command-line arguments used at startup.

Streaming List Responses

To improve cluster stability when handling large datasets, streaming encoding for List responses was introduced as a new Beta feature. Previously, serializing entire List responses into a single memory block could strain kube-apiserver memory. The new streaming encoder processes and transmits each item in a list individually, preventing large memory allocations. This significantly reduces memory spikes, improves API server reliability, and enhances overall cluster performance, especially for clusters with large resources, all while maintaining backward compatibility and requiring no client-side changes.

Snapshottable API server cache

Further enhancing API server performance and stability, a new Alpha feature introduces snapshotting to the watchcache. This allows serving LIST requests for historical or paginated data directly from its in-memory cache. Previously, these types of requests would query etcd directly, requiring to pipe the data through multiple encoding, decoding, and validation stages. This process often led to increased memory pressure, unpredictable performance, and potential stability issues, especially with large resources. By leveraging efficient B-tree based snapshotting within the watchcache, this enhancement significantly reduces direct etcd load and minimizes memory allocations on the API server. This results in more predictable performance, increased API server reliability, and better overall resource utilization, while incorporating mechanisms to ensure data consistency between the cache and etcd.

Declarative Validation

Kubernetes thrives on its large, vibrant community of contributors. We're constantly looking for ways to help make it easier to maintain and contribute to this project. For years, one area that posed challenges was how the Kubernetes API itself was validated: using hand-written Go code. This traditional method has proven to be difficult to authors, challenging to review and cumbersome to document, impacting overall maintainability and the contributor experience. To address these pain points, the declarative validation project was initiated.
In 1.33, the foundational infrastructure was established to transition Kubernetes API validation from handwritten Go code to a declarative model using IDL tags. This release introduced the validation-gen code generator, designed to parse these IDL tags and produce Go validation functions.

Ordered Namespace Deletion

The current namespace deletion process is semi-random, which may lead to security gaps or unintended behavior, such as Pods persisting after the deletion of their associated NetworkPolicies. By implementing an opinionated deletion mechanism, the Pods will be deleted before other resources with respect to logical and security dependencies. This design enhances the security and reliability of Kubernetes by mitigating risks arising from the non-deterministic deletion order.

Acknowledgements

As always, we want to thank all the Googlers that provide their time, passion, talent and leadership to keep making Kubernetes the best container orchestration platform. We would like to mention especially Googlers who helped drive the contributions mentioned in this blog: Tim Allclair, Natasha Sarkar, Vivek Bansal, Anish Shah, Dawn Chen, Tim Hockin, John Belamaric, Morten Torkildsen, Yu Liao,Cici Huang, Samuel Karp, Chris Henzie, Luiz Oliveira, Piotr Betkier, Alex Curtis, Jonah Peretz, Brad Hoekstra, Yuhan Yao, Ray Wainman, Richa Banker, Marek Siarkowicz, Siyuan Zhang, Jeffrey Ying, Henry Wu, Yuchen Zhou, Jordan Liggitt, Benjamin Elder, Antonio Ojea, Yongrui Lin, Joe Betz, Aaron Prindle and the Googlers who helped bring 1.33 to GKE!

- Benjamin Elder & Sen Lu, Google Kubernetes Engine

GSoC 2025: We have our Contributors!

Thursday, May 8, 2025

Congratulations to the 1272 Contributors from 68 countries accepted for GSoC 2025! Our 185 Mentoring Orgs have been very busy this past month - reviewing 23,559 proposals, having countless discussions with applicants, and finally, completing the rigorous selection process to find the right Contributors for their community.

Here are some highlights of the 2025 GSoC applicants:

15,240 applicants from 130 countries submitting 23,559 proposals
Over 2,350 mentors and organization administrators
66.3% of applicants have no prior open source experience

Now that the 2025 GSoC Contributors have been announced, the Organizations and Contributors will be spending 3 weeks together in the Community Bonding period. This time is a very important part of the GSoC program. Designed to get new contributors quickly up to speed, Mentors will use the next three weeks to introduce GSoC Contributors to their community, helping them understand the codebase and norms of their project, adjusting deliverables for the project and understanding the impact and reach of their summer project.

Contributors will begin writing code for Organizations on June 2nd - the official beginning of a totally new adventure! We're absolutely delighted to kick off another year alongside our amazing community.

A huge thanks to all the enthusiastic applicants who participated and, of course, to our phenomenal volunteer Mentors and Organization Administrators. Your weeks of thoughtful proposal reviews and proactive engagement with participants have been invaluable in introducing them to the world of open source.

And congratulations once again to our 2025 GSoC Contributors! Our goal is that GSoC serves as the catalyst for Contributors to become long term participants (and maybe even maintainers!) of open source communities of every shape and size. Now is their chance to dive in and learn more about open source and connect with these amazing communities.

Posts from May 2025

Wednesday, May 28, 2025

Origin story

More about the Docs Advisor

More about the Archetypes

Doc tools in the wild

Wednesday, May 21, 2025

Redefining Kubernetes' relationship with specialized hardware

Unlocking scale and reliability

Optimizing Kubernetes performance for efficient inference

Simplifying the Kubernetes experience and enhancing observability for AI/ML

Looking ahead: continued investment in Open Source Kubernetes for AI/ML

Wednesday, May 14, 2025

Announcing LMEval: An Open Source Framework for Cross-Model Evaluation

Simplifying Cross-Provider Model Benchmarking

Rapid Changes in the Landscape of Large Models

Introducing LMEval: Simplifying Cross-Provider Model Benchmarking

Getting Started with LMEval

Visualizing Results with LMEvalboard

Try LMEval Today!

Acknowledgements

Friday, May 9, 2025

Enhancements in 1.33:

In-place Pod Resizing

DRA

containerd 2.0

Multiple Service CIDRs

Highlight of Googlers' contributions in 1.33 cycle:

Acknowledgements

Thursday, May 8, 2025

– Stephanie Taylor, Mary Radomile and Lucila Ortiz, Google Open Source