opensource.google.com

Menu

Posts from 2025

Introducing New Open Source Documentation Resources

Wednesday, May 28, 2025

shapes representing pie charts, a circuit board, and text edited with red markings

Today we're introducing two new open source documentation resources for open source software maintainers, a Docs Advisor guide and a set of Documentation Project Archetypes. These tools are intended to help maintainers make effective use of limited resources when it comes to planning and executing open source documentation work.

The Docs Advisor is a guide intended to demystify documentation work, including help picking a documentation approach, understanding your audience and available resources, and how to write, revise, evaluate, and maintain your documentation.

Documentation Project Archetypes are a set of thirteen project field guides. Each archetype represents a different type of documentation project, the problems it can solve, and how to bring the right collaborators together on the project to create great docs.

Origin story

More than 130 open source projects wrote 200+ case studies and project reports as a part of their participation in the Google Season of Docs program from 2019 to 2024. These case studies and project reports represent a variety of documentation projects from a wide range of open source groups. In these wrap-ups, project maintainers and technical writers describe how they approached their documentation projects, capturing many successes and more than a few challenges.

These reports are a treasure trove of lessons learned–but it's unrealistic to expect time-crunched open source maintainers to read through them all. So we got in touch with Daniel Beck and Erin Kissane to chat about ways to help organize and summarize some of these lessons learned.

These conversations turned into the Docs Advisor guide (‘like having an experienced technical writer hanging over your shoulder') and the thirteen Documentation Project Archetypes.

Our goal with these resources was to turn all of the hard-won experience of the Google Season of Docs participants into explicit documentation advice and guidance for open source maintainers.

More about the Docs Advisor

The Docs Advisor guide is intended to demystify the work of good documentation. It collects practices and processes from within technical writing and docs communities and from user experience, information architecture, and content strategy.

  • In Part 1, you'll pick an overall approach that suits the needs of your project.
  • In Part 2, you'll learn enough about your community and their needs to ensure that your hard work will be helping real people.
  • In Part 3, you'll assess your existing resources and pull together everything you need to move quickly and confidently through the work of creating and revising your docs.
  • In Part 4, you'll get to work writing and revising your docs and set yourself to successfully evaluate your work and maintain it.

The Docs Advisor guide also includes a docs plan template to help you accomplish your docs plan work, including:

  • What approach will you take to your documentation work, as a whole?
  • What risks do you need to mitigate?
  • Are there any documents to make or steps to perform to increase your chances of success?

The Docs Advisor incorporates guidance from interviews with open source maintainers and technical writers as well as from the Google Season of Docs case studies, and integrates the Documentation Project Archetypes into the recommendations for maintainers planning docs work.

More about the Archetypes

Documentation Project Archetypes are meant to help you recognize common types of documentation work (whether you're writing a new user guide or replatforming your docs site), the situations in which they apply, and organize yourself to bring the work to completion.

The archetypes cover the following areas:

  • Planning and evaluating your docs: Experiment and analysis archetypes support future docs work, by learning more about your existing docs, your audience, and your capacity to deliver meaningful change.
  • Producing new docs: Creation archetypes make new docs that directly help your audience complete tasks and achieve their goals.
  • Revising and transforming existing docs: Revision archetypes modify existing docs, to improve quality, reduce maintenance costs, and reach wider audiences.
  • Equipping yourself with docs tools and process: Tool and process archetypes adopt new tools or practices that help you make more, better, or higher quality docs.

All of the archetypes are available on GitHub.

The Edit: a secretary bird holding a red pencil and a doc showing copy marked up for editing The Audit: an otter holding an abacus and a red pie-shaped wedge against a background of pie charts and line charts The Factory: robot arms holding a red angled block against a backdrop of abstract circuitry in green and black

Doc tools in the wild

We are excited to share these tools and are looking forward to seeing how they are used and evolve.

Daniel demoed the concept and first completed archetype, The Migration, at FOSDEM 2025 in his talk Patterns for maintainer and tech writer collaboration. He also talked about the work on the API Resilience Podcast episode "Patterns in Documentation."

We hope to get valuable feedback during a proposed Doc Archetypes session at Open Source Summit Europe 2025 (acceptance pending).

We are also excited to be developing some Doc Archetype illustration cards with Heather Cummings — a few previews are already live on The Edit, The Audit, and The Factory.

If you have questions or suggestions, please raise an issue in the Open Docs repo.

By Elena Spitzer & Erin McKean, Google Open Source Programs Office

Transforming Kubernetes and GKE into the leading platform for AI/ML

Wednesday, May 21, 2025

The world is rapidly embracing the power of AI/ML, from training cutting-edge foundation models to deploying intelligent applications at scale. As these workloads become more sophisticated and demanding, the infrastructure required to support them must evolve. Kubernetes has emerged as the standard for container orchestration, but AI/ML introduces unique challenges that push traditional infrastructure to its limits.

AI training jobs often require massive scale, needing to coordinate thousands of specialized hardware like GPUs and TPUs. Reliability is critical, as failures can be costly for long running, large-scale training jobs. Efficient resource sharing across teams and workloads is essential given the expense of accelerators. Furthermore, deploying and scaling AI models for inference demands low latency and faster startup times for large container images and models.

At Google, we are deeply invested in the AI/ML revolution. This is why we are doubling down on our commitment to advancing Kubernetes as the foundational open standard for these workloads. Our strategy centers on evolving the core Kubernetes platform to meet the needs of the "next trillion core hours," specifically focusing on batch and AI/ML. We then bring these advancements, alongside enterprise-grade management and optimizations, to users through Google Kubernetes Engine (GKE).

Here's how we are transforming Kubernetes and GKE:

Redefining Kubernetes' relationship with specialized hardware

Kubernetes was initially designed for more uniform CPU compute. The surge of AI/ML brought new requirements for seamless integration and efficient management of expensive, sparse, and diverse accelerators. To support these new demands, Google has been a key investor in upstream Kubernetes to offer robust support for a diverse portfolio of the latest accelerators, including multiple generations of TPUs and a wide range of NVIDIA GPUs.

A core Kubernetes enhancement driven by Google and the community to better support AI/ML workloads is Dynamic Resource Allocation (DRA). This framework, developed in the heart of Kubernetes, provides a more flexible and extensible way for workloads to request and consume specialized hardware resources beyond traditional CPU and memory, which is crucial for efficiently managing accelerators. Building on such foundational open-source capabilities, GKE can then offer features like Custom Compute Classes, which improve the obtainability of these resources through intelligent fallback priorities across different capacity types like reservations, on-demand, and Spot instances. Google's active contributions to advanced resource management and scheduling capabilities within the Kubernetes community ensure that the platform evolves to meet the sophisticated demands of AI/ML, making efficient use of these specialized hardware resources more broadly accessible.

Unlocking scale and reliability

AI/ML workloads demand unprecedented scale and have new failure modes compared to traditional applications. GKE is built to handle this, supporting up to 65,000 nodes in a single cluster. We've demonstrated the ability to run the largest publicly announced training jobs, coordinating 50,000 TPU chips with near-ideal scaling efficiency.

Critically, we are enhancing core Kubernetes capabilities to support the scale and reliability needed for AI/ML. For instance, to better manage distributed AI workloads like serving large models split across multiple hosts, Google has been instrumental in developing features like JobSet (emerging from earlier concepts like LeaderWorkerSet) within the Kubernetes community (SIG Apps). This provides robust orchestration for co-scheduled, interdependent groups of Pods. We are also actively working upstream to improve Kubernetes reliability and stability through initiatives like Production Readiness Reviews, promoting safer upgrade paths, and enhancing etcd stability for the benefit of all Kubernetes users.

Optimizing Kubernetes performance for efficient inference

Low-latency and cost-efficient inference is critical for AI applications. For serving, the GKE Inference Gateway routes requests based on model server metrics like KVCache utilization and pending queue length, reducing serving costs by up to 30% and tail latency by 60% compared to traditional load balancing. We've even achieved vLLM fungibility across TPUs and GPUs, allowing users to serve the same model on either accelerator without incremental effort.

To address slow startup times for large AI/ML container images (often 20GB+), GKE offers rapid scale-out features. Secondary boot disks allow preloading container images and data, resulting in up to 29x faster container mounting time. GCS FUSE enables streaming data directly from Cloud Storage, leading to faster model load times. Furthermore, GKE Inference Quickstart provides data-driven, optimized Kubernetes deployment configurations, saving extensive benchmarking effort and enabling up to 30% lower cost, 60% lower tail latency, and 40% higher throughput.

Simplifying the Kubernetes experience and enhancing observability for AI/ML

We understand that data scientists and ML researchers may not be Kubernetes experts. Google aims to simplify the setup and management of AI-optimized Kubernetes clusters. This includes contributions to Kubernetes usability efforts and SIG-Usability. Managed offerings like GKE provide multiple paths to set up AI-optimized environments, from default configurations to customizable blueprints. Offerings like GKE Autopilot further abstract away infrastructure management, aiming for the ease of use that benefits all users.
Ensuring visibility into AI/ML workloads is paramount. Google actively supports and contributes to the integration of standard open-source observability tools within the Kubernetes ecosystem, such as Prometheus, Grafana, and OpenTelemetry. Building on this open foundation, GKE then provides enhanced, out-of-the-box observability integrated with popular AI frameworks & tools, including specific insights into workload startup latency and end-to-end tracing.

Looking ahead: continued investment in Open Source Kubernetes for AI/ML

The transformation continues. Our roadmap includes exciting developments in upstream Kubernetes for easily deploying and managing large-scale clusters, support for new GPU & TPU generations integrated through open-source mechanisms, and continued community-driven innovations in fast startup, reliability, and ease of use for AI/ML workloads.

Google is committed to making Kubernetes the premier open-source platform for AI/ML, pushing the boundaries of scale, performance, and efficiency while maintaining stability and ease of use. By driving innovation in core Kubernetes and building powerful, deeply integrated capabilities in our managed offering, GKE, we are empowering organizations to accelerate their AI/ML initiatives and unlock the next generation of intelligent applications built on an open foundation.

Come explore the possibilities with Kubernetes and GKE for your AI/ML workloads!

By Francisco Cabrera & Federico Bongiovanni, GCP Google Kubernetes Engine

Announcing LMEval: An Open Source Framework for Cross-Model Evaluation

Wednesday, May 14, 2025

Announcing LMEval: An Open Source Framework for Cross-Model Evaluation

Authors: Elie Bursztein - Distinguished Research Scientist & David Tao - Software Engineer, Applied Security and Safety Research

Simplifying Cross-Provider Model Benchmarking

At InCyber Forum Europe in April, we open sourced LMEval, a large model evaluation framework, to help others accurately and efficiently compare how models from various providers perform across benchmark datasets. This announcement coincided with a joint talk with Giskard about our collaboration to increase trust in model safety and security. Giskard uses LMeval to run the Phare benchmark that independently evaluates popular models' security and safety.

Results from the Phare benchmark that leverages LMEval for evaluation
Example of LMEval running on a multimodal benchmark across two models.

Rapid Changes in the Landscape of Large Models

New Large Language Models (LLMs) are released constantly, often promising improvements and new features. To keep up with this fast-paced lifecycle, developers, researchers, and organizations must quickly and reliably evaluate if those newer models are better suited for their specific applications. So far, rapid model evaluation has proven difficult, as it requires tools that allow scalable, accurate, easy-to-use, cross-provider benchmarking.

Introducing LMEval: Simplifying Cross-Provider Model Benchmarking

To address this challenge, we are excited to introduce LMEval (Large Model Evaluator), an open source framework that Google developed to streamline the evaluation of LLMs across diverse benchmark datasets and model providers. LMEval is designed from the ground up to be accurate, multimodal, and easy-to-use. Its key features include:

  • Multi-Provider Compatibility: Evaluating models shouldn't require wrestling with different APIs for each provider. LMEval leverages the LiteLLM framework to offer out-of-the-box compatibility with major model providers including Google, OpenAI, Anthropic, Ollama, and Hugging Face. You can define your benchmark once and run it consistently across various models with minimal code changes.
  • Incremental & Efficient Evaluation: Re-running an entire benchmark suite every time a new model or version is released is slow, inefficient and costly. LMEval's intelligent evaluation engine plans and executes evaluations incrementally. It runs only the necessary evaluations for new models, prompts, or questions, saving significant time and compute resources. Its multi-threaded engine further accelerates this process.
  • Multimodal & Multi-Metric Support: Modern foundation models go beyond text. LMEval is designed for multimodal evaluation, supporting benchmarks that include text, images and code. Adding new modalities is straightforward. Furthermore, it supports various scoring metrics to support a wide range of benchmark formats from boolean questions, to multi-choices, to free form generation. Additionally, LMEval provides support for safety/punting detection.
  • Scalable & Secure Storage: To store benchmark results in a secure and efficient manner, LMEval utilizes a self-encrypting SQLite database. This approach protects benchmark data and results from inadvertent crawling/indexing while they stay easily accessible through LMEval.

Getting Started with LMEval

Creating and running evaluations with LMEval is designed to be intuitive. Here's a simplified example demonstrating how to evaluate two Gemini model versions on a benchmark:

 Example of LMEval running on a multimodal benchmark across two models.
Results from the Phare benchmark that leverages LMEval for evaluation

The LMEval GitHub repository includes example notebooks to help you get started.

Visualizing Results with LMEvalboard

Understanding benchmark results requires more than just summary statistics. To help with this, LMEval includes LMEvalboard, a companion dashboard tool that offers an interactive visualization of how models stack up against each other. LMEvalboard provides valuable insights into model strengths and weaknesses, complementing traditional raw evaluation data.

LMEvalboard UI allows to quickly analyze how models compares on a given benchmark
LMEvalboard UI allows to quickly analyze how models compares on a given benchmark

LMEvalboard allows you to:

  • View Overall Performance: Quickly compare all models' accuracy across the entire benchmark.
  • Analyze a Single Model: Dive deep into a specific model's performance characteristics across different categories using radar charts and drill down on specific examples of failures
  • Perform Head-to-Head Comparisons: Directly compare two models, visualizing their performance differences across categories and examine specific questions where they disagree.

Try LMEval Today!

We invite you to explore LMEval, use it for your own evaluations, and contribute to its development by heading to the LMEval GitHub repository: https://github.com/google/lmeval

Acknowledgements

LMEval would not have been possible without the help of many people, including: Luca Invernizzi, Lenin Simicich, Marianna Tishchenko, Amanda Walker, and many other Googlers.

Kubernetes 1.33 is available on GKE!

Friday, May 9, 2025

Kubernetes 1.33 is now available in the Google Kubernetes Engine (GKE) Rapid Channel! For more information about the content of Kubernetes 1.33, read the official Kubernetes 1.33 Release Notes and the specific GKE 1.33 Release Notes.

Enhancements in 1.33:

In-place Pod Resizing

Workloads can be scaled horizontally by updating the Pod replica count, or vertically by updating the resources required in the Pods container(s). Before this enhancement, container resources defined in a Pod's spec were immutable, and updating any of these details within a Pod template would trigger Pod replacement impacting service's reliability.

In-place Pod Resizing (IPPR, Public Preview) allows you to change the CPU and memory requests and limits assigned to containers within a running Pod through the new /resize pod subresource, often without requiring a container restart decreasing service's disruptions.

This opens up various possibilities for vertical scale-up of stateful processes without any downtime, seamless scale-down when the traffic is low, and even allocating larger resources during startup, which can then be reduced once the initial setup is complete.

Review Resize CPU and Memory Resources assigned to Containers for detailed guidance on using the new API.

DRA

Kubernetes Dynamic Resource Allocation (DRA), currently in beta as of v1.33, offers a more flexible API for requesting devices than Device Plugin. (Instructions for opt-in beta features in GKE)

Recent updates include the promotion of driver-owned resource claim status to beta. New alpha features introduced are partitionable devices, device taints and tolerations for managing device availability, prioritized device lists for versatile workload allocation, and enhanced admin access controls. Preparations for general availability include a new v1beta2 API to improve user experience and simplify future feature integration, alongside improved RBAC rules and support for seamless driver upgrades. DRA is anticipated to reach general availability in Kubernetes v1.34.

containerd 2.0

With GKE 1.33, we are excited to introduce support for containerd 2.0. This marks the first major version update for the underlying container runtime used by GKE. Adopting this version ensures that GKE continues to leverage the latest advancements and security enhancements from the upstream containerd community.

It's important to note that as a major version update, containerd 2.0 introduces many new features and enhancements while also deprecating others. To ensure a smooth transition and maintain compatibility for your workloads, we strongly encourage you to review your Cloud Recommendations. These recommendations will help identify any workloads that may be affected by these changes. Please see "Migrate nodes to containerd 2" for detailed guidance on making your workloads forward-compatible.

Multiple Service CIDRs

This enhancement introduced a new implementation of allocation logic for Service IPs. The updated IP address allocator logic uses two newly stable API objects: ServiceCIDR and IPAddress. Now generally available, these APIs allow cluster administrators to dynamically increase the number of IP addresses available for Services by creating new ServiceCIDR objects.

Highlight of Googlers' contributions in 1.33 cycle:

Coordinated Leader Election

The Coordinated Leader Election feature progressed to beta, introducing significant enhancements in how a lease-candidate's availability is determined for an election. Specifically, the ping-acknowledgement checking process has been optimized to be fully concurrent instead of the previous sequential approach ensuring faster and more efficient detection of unresponsive candidates, which is essential for promptly identifying truly available lease candidates and maintaining the reliability of the leader election process.

Compatibility Versions

New CLI flags were added to apiserver as options for adjusting API enablement wrt an apiserver's emulated version. --emulation-forward-compatible is an option to implicitly enable all APIs which are introduced after the emulation version and have higher priority than APIs of the same group resource enabled at the emulation version.
--runtime-config-emulation-forward-compatible is an option to explicit enable specific APIs introduced after the emulation version through the runtime-config

zPages

ComponentStatusz and ComponentFlagz alpha features are now available to be turned on for all control plane components.
Components now expose two new HTTP endpoints, /statusz and /flagz, providing enhanced visibility into their internal state. /statusz details the component's uptime, golang, binary and emulation versions info, while /flagz reveals the command-line arguments used at startup.

Streaming List Responses

To improve cluster stability when handling large datasets, streaming encoding for List responses was introduced as a new Beta feature. Previously, serializing entire List responses into a single memory block could strain kube-apiserver memory. The new streaming encoder processes and transmits each item in a list individually, preventing large memory allocations. This significantly reduces memory spikes, improves API server reliability, and enhances overall cluster performance, especially for clusters with large resources, all while maintaining backward compatibility and requiring no client-side changes.

Snapshottable API server cache

Further enhancing API server performance and stability, a new Alpha feature introduces snapshotting to the watchcache. This allows serving LIST requests for historical or paginated data directly from its in-memory cache. Previously, these types of requests would query etcd directly, requiring to pipe the data through multiple encoding, decoding, and validation stages. This process often led to increased memory pressure, unpredictable performance, and potential stability issues, especially with large resources. By leveraging efficient B-tree based snapshotting within the watchcache, this enhancement significantly reduces direct etcd load and minimizes memory allocations on the API server. This results in more predictable performance, increased API server reliability, and better overall resource utilization, while incorporating mechanisms to ensure data consistency between the cache and etcd.

Declarative Validation

Kubernetes thrives on its large, vibrant community of contributors. We're constantly looking for ways to help make it easier to maintain and contribute to this project. For years, one area that posed challenges was how the Kubernetes API itself was validated: using hand-written Go code. This traditional method has proven to be difficult to authors, challenging to review and cumbersome to document, impacting overall maintainability and the contributor experience. To address these pain points, the declarative validation project was initiated.
In 1.33, the foundational infrastructure was established to transition Kubernetes API validation from handwritten Go code to a declarative model using IDL tags. This release introduced the validation-gen code generator, designed to parse these IDL tags and produce Go validation functions.

Ordered Namespace Deletion

The current namespace deletion process is semi-random, which may lead to security gaps or unintended behavior, such as Pods persisting after the deletion of their associated NetworkPolicies. By implementing an opinionated deletion mechanism, the Pods will be deleted before other resources with respect to logical and security dependencies. This design enhances the security and reliability of Kubernetes by mitigating risks arising from the non-deterministic deletion order.

Acknowledgements

As always, we want to thank all the Googlers that provide their time, passion, talent and leadership to keep making Kubernetes the best container orchestration platform. We would like to mention especially Googlers who helped drive the contributions mentioned in this blog: Tim Allclair, Natasha Sarkar, Vivek Bansal, Anish Shah, Dawn Chen, Tim Hockin, John Belamaric, Morten Torkildsen, Yu Liao,Cici Huang, Samuel Karp, Chris Henzie, Luiz Oliveira, Piotr Betkier, Alex Curtis, Jonah Peretz, Brad Hoekstra, Yuhan Yao, Ray Wainman, Richa Banker, Marek Siarkowicz, Siyuan Zhang, Jeffrey Ying, Henry Wu, Yuchen Zhou, Jordan Liggitt, Benjamin Elder, Antonio Ojea, Yongrui Lin, Joe Betz, Aaron Prindle and the Googlers who helped bring 1.33 to GKE!

- Benjamin Elder & Sen Lu, Google Kubernetes Engine

GSoC 2025: We have our Contributors!

Thursday, May 8, 2025

Google Summer of Code logo. A simplistic sun made of 2 orange square on top of each other, one rotated. In the middle is the XML opening and closing brackets in white

Congratulations to the 1272 Contributors from 68 countries accepted for GSoC 2025! Our 185 Mentoring Orgs have been very busy this past month - reviewing 23,559 proposals, having countless discussions with applicants, and finally, completing the rigorous selection process to find the right Contributors for their community.

Here are some highlights of the 2025 GSoC applicants:

  • 15,240 applicants from 130 countries submitting 23,559 proposals
  • Over 2,350 mentors and organization administrators
  • 66.3% of applicants have no prior open source experience

Now that the 2025 GSoC Contributors have been announced, the Organizations and Contributors will be spending 3 weeks together in the Community Bonding period. This time is a very important part of the GSoC program. Designed to get new contributors quickly up to speed, Mentors will use the next three weeks to introduce GSoC Contributors to their community, helping them understand the codebase and norms of their project, adjusting deliverables for the project and understanding the impact and reach of their summer project.

Contributors will begin writing code for Organizations on June 2nd - the official beginning of a totally new adventure! We're absolutely delighted to kick off another year alongside our amazing community.

A huge thanks to all the enthusiastic applicants who participated and, of course, to our phenomenal volunteer Mentors and Organization Administrators. Your weeks of thoughtful proposal reviews and proactive engagement with participants have been invaluable in introducing them to the world of open source.

And congratulations once again to our 2025 GSoC Contributors! Our goal is that GSoC serves as the catalyst for Contributors to become long term participants (and maybe even maintainers!) of open source communities of every shape and size. Now is their chance to dive in and learn more about open source and connect with these amazing communities.


Stephanie Taylor, Mary Radomile and Lucila Ortiz, Google Open Source

Get ready for Google I/O: Program lineup revealed

Wednesday, April 23, 2025

The Google I/O agenda is live. We're excited to share Google’s biggest announcements across AI, Android, Web, and Cloud May 20-21. Tune in to learn how we’re making development easier so you can build faster.

We'll kick things off with the Google Keynote at 10:00 AM PT on May 20th, followed by the Developer Keynote at 1:30 PM PT. This year, we're livestreaming two days of sessions directly from Mountain View, bringing more of the I/O experience to you, wherever you are.

Here’s a sneak peek of what we’ll cover:

    • AI advancements: Learn how Gemini models enable you to build new applications and unlock new levels of productivity. Explore the flexibility offered by options like our Gemma open models and on-device capabilities.
    • Build excellent apps, across devices with Android: Crafting exceptional app experiences across devices is now even easier with Android. Dive into sessions focused on building intelligent apps with Google AI and boosting your productivity, alongside creating adaptive user experiences and leveraging the power of Google Play.
    • Powerful web, made easier: Exciting new features continue to accelerate web development, helping you to build richer, more reliable web experiences. We’ll share the latest innovations in web UI, Baseline progress, new multimodal built-in AI APIs using Gemini Nano, and how AI in DevTools streamline building innovative web experiences.

Plan your I/O

Join us online for livestreams May 20-21, followed by on-demand sessions and codelabs on May 22. Register today and explore the full program for sessions like these:

We're excited to share what's next and see what you build!

By the Google I/O team

Google Summer of Code 2025 Contributor Applications Now Open!

Monday, March 24, 2025


Join Google Summer of Code (GSoC) and contribute to the world of open source! Applications for GSoC are open from March 24 to April 8, 2025.

Since 2005, GSoC has successfully brought over 21,000 new contributors from 123 countries into the open source community. This is an exciting opportunity for students and beginners to open source (18+) to gain real-world experience this summer. You will spend 12+ weeks coding, learning about open source development, and earn a competitive stipend under the guidance of experienced mentors.



Learn More and Apply

  • Make your proposal stand out! Read the Writing a proposal doc written by former contributors.
  • Reach out now to your preferred organizations via their contact methods listed on the GSoC site

Interested contributors may register and submit project proposals on the GSoC site from now until Tuesday, April 8th at 18:00 UTC.

Best of luck to all our applicants!

By Stephanie Taylor, Mary Radomile and Lucila Ortíz – GSoC Program Admins

Meet the Mentoring organizations of GSoC 2025!

Thursday, February 27, 2025

We are thrilled to share that we have selected 185 open source projects for the 21st year of Google Summer of Code (GSoC)

Get to know more about each organization via their individual GSoC program page. There you will find the best way to engage with each community, view project ideas, and read their contributor guidance for applying to their organization.


Applications for the GSoC Contributors are open March 24 - April 8, 2025

The 2025 GSoC program is open to students and to beginners in open source software development. If you are eager to enhance your chances of becoming a GSoC contributor this year, we highly recommend following these steps:

  • Get inspired by watching the 'Introduction to GSoC' video for a quick overview and the Community Talks to learn more about past projects that contributors completed.
  • Review the Contributor Guide and Advice for Applying to GSoC.
  • Review the list of accepted organizations.
    • We recommend finding two to three Orgs that interest you and reading through their project ideas. Use the filters on the site to help you narrow down based on the programming languages you are familiar with or categories that interest you.
  • Once you find an idea that excites you, reach out to the organization right away via their preferred communication methods. Communicating early and often will increase your chances of being accepted.
    • Introduce yourself to the mentors and community and ask questions to determine if this project idea is a good fit for your skill set and interests.
    • Use the information you received to write up your proposal.

Join us in our upcoming Info session!


Finally, you can find more information about the program on our website which includes the full 2025 timeline. You’ll also find the FAQ, Program Rules and some videos with more details about GSoC for both contributors and mentors.

Welcome aboard 2025 Mentoring Organizations! We are looking forward to an amazing year!

By Stephanie Taylor, Mary Radomile & Lucila Ortíz – GSoC Program Admins

Tag-Based Protection Made Easy

Tuesday, February 18, 2025


Scalable, Customizable, and Automated Backup Management

Managing backups across cloud environments can be challenging, particularly when dealing with large-scale infrastructure and constantly evolving workloads. With tag-based protection, organizations can automate backup assignments, ensure broad resource protection, and tailor policies to fit their needs – all through an open-source approach designed for flexibility and scalability leveraging Google Cloud Backup and DR.


Why Open Source? Flexibility and Customization

Traditional backup management often requires manual configurations, making it difficult to scale. By leveraging open-source automation, this solution allows users to:

  • Customize backup policies using VM tags that align with business needs (e.g., application, environment, or criticality).
  • Eliminate manual effort with automated backup assignments and removals.
  • Ensure bulk resource protection, dynamically adjusting backup coverage as infrastructure scales.
  • Integrate seamlessly with existing Google Cloud workflows, APIs, and automation tools.

With open-source flexibility, users can tailor backup strategies to fit their exact needs – automating, scaling, and adapting in real-time.


Scalable and Dynamic Backup Management

This approach provides:

  • Bulk inclusion/exclusion of projects and folders, simplifying administration.
  • Dynamic adjustments based on real-time tag updates.
  • Cloud Run automation to execute backups at scheduled intervals (hourly, daily, weekly, etc.).
  • Comprehensive protection reports, ensuring visibility into backup coverage.

Seamless Google Cloud Integration

To maximize efficiency, this open-source backup automation ensures:

  • Role-based access through predefined Google Cloud permissions (Tag Viewer, Backup, and DR Backup User).
  • Enhanced security by ensuring only authorized VMs are included in backup plans.

Get Started with the Open-Source Script

The backup automation script is available on GitHub, allowing users to customize and contribute to its development:

🔗 Explore the repository

By leveraging Google Cloud’s open-source backup automation, teams can effortlessly scale, automate, and customize their backup strategies – reducing operational overhead while ensuring critical resources remain protected.

By Ashika Ganesh – Product Manager, Google Cloud

Fabrication begins for production OpenTitan silicon

Thursday, February 6, 2025

With malicious software on the rise, how can you be certain that a computer, server, or mobile device is running the code (and provisioning data) that was intended? You can't just ask the code itself, so where do you start? The answer is deceptively simple – start where you have certainty and build up a chain of trust. For communication on the web, we rely on Certificate Authorities (CAs) to ensure the security of web content before it reaches the user. In products composed of an interconnected jungle of hardware and software, like Chromebooks and our Cloud infrastructure, we rely on a small dedicated secure microcontroller called a Root of Trust (RoT). And, some devices even have several RoTs for specialized needs.

Over the past six years, Google has been working with the open source community to build OpenTitan, the first open source silicon RoT. Today, we are excited to announce that we have started fabrication of the first production-ready OpenTitan silicon by Nuvoton. This silicon will be the first broadly used RoT chip at Google with a fully transparent design and origin. We have production OpenTitan chips available for lab testing and evaluation with larger volumes available from Nuvoton starting in Spring 2025.

ALT TEXT

History of RoTs and OpenTitan at Google

In 2009, Google began shipping devices with dedicated off-the-shelf RoTs. By 2014, it became clear that higher levels of assurance would only be attainable by investing in a first party RoT solution. A first party solution enabled Google to have full visibility and control over the security of its products throughout their life cycles. Previous off-the-shelf parts were black- or gray-box solutions where vendors are responsible for designing their own hardware and software – all with limited or no access to the source. Without full transparency, it is impossible to completely understand the security assurances for products using these proprietary parts. In addition, it was becoming harder to meet product needs with off-the-shelf RoT solutions, from footprint to function to cost – we needed a better solution for Chromebooks, Cloud, and later, Pixel.

Today, open source software powers nearly every consumer experience, from open source operating systems like Linux, to web browsers like Chromium. Open source is often the most economically efficient solution for developing foundational technology: it enables companies to work together and pool resources to build common, compatible products. Until now, this development approach has not been demonstrated in a commercially relevant setting for silicon.

OpenTitan is the first open-source silicon project to reach commercial availability based on the engineering samples we released last year. The OpenTitan project started from scratch in 2018 with a coalition of commercial, academic, and not-for-profit partners. The OpenTitan project is hosted by lowRISC CIC in Cambridge, UK. Google and project partners – Nuvoton, ETH Zurich, G+D Mobile Security, lowRISC, Rivos, Seagate, Western Digital, Winbond, zeroRISC, and a number of independent contributors – provide open source hardware register-transfer level (RTL) and design verification (DV) code, along with integration guidelines, and reference firmware to drive adoption throughout industry.


The Future

With the introduction of production-ready OpenTitan chips, we are excited to welcome an era where security is based on transparency from the very beginning of the stack. OpenTitan is the first commercially available open source RoT to support PQC secure boot based on SLH-DSA (formerly known as SPHINCS+). Our vision is that these chips will help drive broader industry adoption not only of open designs and their security properties, but also of this innovative method of open source collaboration between organizations.

Samples of production OpenTitan silicon are now available, with reference provisioning and application-level firmware releases coming soon. Product integrations have begun to intercept Chromebooks shipping later this year, with datacenter integrations following shortly after.


Getting Involved

With OpenTitan, we’ve introduced brand new methodologies for how commodity chips get designed that are increasingly economical moving forward. OpenTitan provides Google with a high-quality, low-cost, commoditized hardware RoT that can be used across the Google ecosystem. This will also facilitate the broader adoption of Google-endorsed security features across the industry.

The fabrication of production OpenTitan silicon is the realization of many years of dedication and hard work from our team. It is a significant moment for us and all contributors to the project. OpenTitan’s broad community has been critical to its success. As the following metrics show (baselined from the project’s public launch in 2019), the OpenTitan community is rapidly growing:

  • Almost nine times the number of commits at launch: from 2,500 to over 24,200.
  • 176 contributors to the code base
  • 17k+ merged pull requests
  • 1.5M+ LoC, including 500k LoC of HDL
  • 2.5k Github stars

If you are interested in learning more or contributing to OpenTitan, visit the open source GitHub repository or reach out to the OpenTitan team.

By Cyrus Stoller and Miguel Osorio – OpenTitan

The New Frontier of Security: Creating Safe and Secure AI Models

Wednesday, January 29, 2025


Are you looking to safely create the next state-of-the-art AI model? Today we’re sharing a list of recommendations on how to create and distribute your models securely.


Choosing the Right Foundation: Safe Model Formats

Before you start building your model, consider using a safe file format, as it can influence your development tool options. However, if you've already created a model, you can also convert it to a safe format before sharing it.

Once trained, models are saved and distributed as binary files. Common formats include PyTorch (Pickles usually with .pt, .pth extensions), TensorFlow SavedModel (.pb), GGUF (.gguf), and Safetensors (.safetensors). However, binary files are dangerous because it's hard to verify if their content is safe. This is especially true with formats such as Pickles and SavedModels, which are designed to include arbitrary code, raising the risk of remote code execution (RCE) on users' machines.

To mitigate these risks:

  • If sharing only model weights: Consider formats such as Safetensors. These formats only contain model weights, and are therefore safe from RCE.
  • If sharing weights and metadata: Consider formats like GGUF, which include weights and additional metadata but not executable code configurations.
  • For any format, but especially if your model requires custom code: Keep reading to see how to help users verify that they're getting the correct model.

Secure Releases and Verification Methods

To ensure your users are getting the model you originally deployed, consider automating your releases, making them transparent and auditable. Instead of training your model on your local machine, consider using a predefined script to train your model within an isolated environment. When building smaller models, using GitHub Actions can be a good option. However, for larger models, GitHub Actions might not have the necessary hardware capabilities or availability. In that case, and if budget allows, consider using other platforms with proper security safeguards, such as Google Cloud Platform (GCP).

If building your model on a cloud platform is not an option, you can sign the model on your local machine to give your users confidence it was created by you.

However, if building your model on a cloud platform, sign the model and generate a provenance attestation for the release. This allows users to not only confirm that the model was created by you, but that it came from your approved infrastructure, was trained following the specific instructions defined in your training script, and wasn't tampered with by a malicious actor.

While signatures and provenance do not guarantee the absence of malicious intent from the developer, they provide users with a means to verify the integrity of the model they downloaded.

On GitHub, signing and provenance can be easily achieved using GitHub Artifact Attestations. For the general use case, tools like Sigstore and SLSA are also available to sign and attest provenance to your deployments.

For a few examples, check out this workflow to build a model with SLSA on GitHub and on GCP, and an example of how to sign models.


Educate Your Users

After sharing your model with the world, it is essential to educate users on the safety and security concerns surrounding model consumption. You should therefore:

  • Document potential biases in your models and datasets.
  • Clearly display all licenses associated with the model and datasets.
  • Benchmark your model, assessing and disclosing metrics around hallucinations, prompt injection risk and fairness.

With this information users can make informed decisions and abide by the ethical and technical guidelines associated with your model. For instance, they might choose to implement an input sanitization layer to enhance their software security.


Establish a Security Policy

Your model’s privacy, safety, and security guarantees can be documented in separate files or in a security policy. A security policy helps you address users' safety and security concerns regarding your models. It is a dedicated file that instructs users on how to privately report vulnerabilities, such as prompt injection strategies or potential out-of-memory (OOM) errors, allowing you to investigate and address the potential vulnerabilities before they become public knowledge. It is also a good place to define the scope of what your project considers a vulnerability.

In summary, considering model security from the outset of development is crucial. Additionally, ensuring safe distribution and informing users of potential risks is essential. It's important to remember that security is a continuous process – more like a marathon than a sprint – and constant vigilance is necessary to mitigate potential threats.


Keep Improving

The steps above will put your models’ security on a solid footing, but there is always more you can learn and do. Please take a look at Google's Secure AI Framework for a deeper dive in this subject, and take its risk self assessment to better understand which risks are most important for you.

By Gabriela Gutierrez and Pedro Nacht – GOSST Upstream Team

Producer java library for Data Lineage is now open source

Tuesday, January 28, 2025

Integrating OpenLineage producers with GCP Lineage just got a lot easier


What is Data Lineage

Data Lineage is a GCP feature that allows tracking data movement. This tool helps data owners and analysts detect anomalies in data flows, find connections between data sources and verify the potential consequences of planned changes in data pipelines.

Lineage is injected automatically for some Google Cloud products (BigQuery, Cloud Data Fusion, Cloud Composer, Dataproc, Vertex AI). That means, if Lineage integration with any of those products is enabled in the projects, data movements coming from executing jobs by these products will be reported to GCP Lineage.

For custom integrations, the API can be used to report and fetch lineage.

After injecting, lineage can be viewed in the Google Cloud console (available from DataCatalog UI, BigQuery UI, Vertex UI). There are two representations: graph view, with data sources as nodes and data movements as edges, and list view, a tabular representation. Lineage information can also be fetched from the API.

More information is available in the documentation.


GCP Lineage information model

We describe data flows using the following concepts:

  • Process is a definition of some data transformation. For example, a SQL or Spark script.
  • Run is an execution of a Process.
  • Lineage Event is a data transformation event. It is reported in context of a Run.
  • A Link represents a connection between two data sources, when data in the link’s Target depends on its Source. A Lineage Event contains a list of Links.

OpenLineage support

OpenLineage is an open standard for reporting lineage information. It unifies lineage reporting between systems, which means the events generated in this format can be consumed by any product supporting it. This leads to more flexibility: adding or replacing a lineage producer does not imply changing the consumer, and vice versa.

OpenLineage format is adopted by a number of lineage producers and consumers, meaning there is already tooling available to report lineage from/to those systems. GCP Lineage is one of those consumers: users can report events in OpenLineage format, see the resulting lineage on the UI, and query it via the API.

OpenLineage is the preferred method for reporting lineage in GCP Lineage. It is used by the Dataproc lineage integration. To find out more about sending OpenLineage events to GCP Lineage refer to the documentation.

After injecting lineage in OpenLineage format, it can be accessed in the same way as if it was injected via other API methods or automatically: from the Google Cloud console or the API.


Why producer library

The GCP Lineage producer library is an extension of the client library. Client libraries are recommended for calling Cloud APIs programmatically. They handle low level API call details, leaving the necessary user code simpler and shorter.

The producer library further simplifies integration by providing ready to use code needed to call the API from Java. It adds additional functionality such as synchronous and asynchronous clients, translating OpenLineage JSON messages to the API friendly format, error handling etc.

Using the producer library, all the code needed to send a request to GCP Lineage API is:

SyncLineageProducerClient client = SyncLineageProducerClient.create();
ProcessOpenLineageRunEventRequest request =
        ProcessOpenLineageRunEventRequest.newBuilder()
            .setParent(parent)
            .setOpenLineage(openLineageMessage)
            .build();
client.processOpenLineageRunEvent(request);

The field openLineageMessage here is a protobuf Struct that includes information about job execution, inputs and outputs and other metadata. The object model is described in the documentation. An example message is:

{
  "eventType": "START",
  "eventTime": "2023-04-04T13:21:16.098Z",
  "run": {
    "runId": "502483d6-3e3d-474f-9380-da565eaa7516",
    "facets": {
       "spark_properties": {
        "_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.22.0/integration/spark",
        "_schemaURL": "https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet",
        "properties": {
          "spark.master": "yarn",
          "spark.app.name": "sparkJobTest.py"
        }
      }
    }
  },
  "job": {
    "namespace": "project-name",
    "name": "cluster-name",
    "facets": {
    "jobType": {
        "_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.22.0/integration/spark",
        "_schemaURL": "https://openlineage.io/spec/facets/2-0-3/JobTypeJobFacet.json#/$defs/JobTypeJobFacet",
        "processingType": "BATCH",
        "integration": "SPARK",
        "jobType": "SQL_JOB"
      },

    }
  },
  "inputs": [
    {
      "namespace": "bigquery",
      "name": "project.dataset.input_table",
    }],
  "outputs": [
   {
      "namespace": "bigquery",
      "name": "project.dataset.output_table",
    }],
  "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.18.0/integration/spark",
  "schemaURL": "https://openlineage.io/spec/1-0-5/OpenLineage.json#/$defs/RunEvent"
}

Learn more about building an OpenLineage message.


Best Practices for Constructing OpenLineage Messages

The openLineageMessage should follow the OpenLineage format. The fields that are required for correct parsing by the GCP Lineage API are:

job

mapped to Process

job.namespace

used to construct Process name

job.name

used to construct Process name

run

mapped to Run

run.runId

used to construct Run name

producer

URI identifying the producer of this metadata

eventTime

time of the data movement

schemaURL

URL pointing to the schema definition for this message

In addition to those, the fields used to create lineage are:

eventType

corresponds to the status of the Run

inputs

mapped to sources of links. Must be specified according to the naming conventions

outputs

mapped to targets of links. Must be specified according to the naming conventions

The GCP Lineage API supports OpenLineage major versions 1 and 2. For more information please refer to the documentation.


How to access GCP Lineage?

The code is now publicly available on GitHub. The library is also published to Maven.


GcpLineageTransport

To simplify integration with GCP Lineage, we offer GcpLineageTransport. It is available on the OpenLineage GitHub repository and is built to a separate maven artifact. It is built on top of the producer library mentioned above.

Using the transport minimises the code for sending events to GCP Lineage. The GcpLineageTransport can be configured as the event sink for any existing OpenLineage producer such as Airflow, Spark, and Flink. Find more information and examples on GCP Lineage.

By Mary Idamkina – Data Lineage

.