opensource.google.com

Menu

Posts from 2025

Announcing LMEval: An Open Source Framework for Cross-Model Evaluation

Wednesday, May 14, 2025

Announcing LMEval: An Open Source Framework for Cross-Model Evaluation

Authors: Elie Bursztein - Distinguished Research Scientist & David Tao - Software Engineer, Applied Security and Safety Research

Simplifying Cross-Provider Model Benchmarking

At InCyber Forum Europe in April, we open sourced LMEval, a large model evaluation framework, to help others accurately and efficiently compare how models from various providers perform across benchmark datasets. This announcement coincided with a joint talk with Giskard about our collaboration to increase trust in model safety and security. Giskard uses LMeval to run the Phare benchmark that independently evaluates popular models' security and safety.

Results from the Phare benchmark that leverages LMEval for evaluation
Example of LMEval running on a multimodal benchmark across two models.

Rapid Changes in the Landscape of Large Models

New Large Language Models (LLMs) are released constantly, often promising improvements and new features. To keep up with this fast-paced lifecycle, developers, researchers, and organizations must quickly and reliably evaluate if those newer models are better suited for their specific applications. So far, rapid model evaluation has proven difficult, as it requires tools that allow scalable, accurate, easy-to-use, cross-provider benchmarking.

Introducing LMEval: Simplifying Cross-Provider Model Benchmarking

To address this challenge, we are excited to introduce LMEval (Large Model Evaluator), an open source framework that Google developed to streamline the evaluation of LLMs across diverse benchmark datasets and model providers. LMEval is designed from the ground up to be accurate, multimodal, and easy-to-use. Its key features include:

  • Multi-Provider Compatibility: Evaluating models shouldn't require wrestling with different APIs for each provider. LMEval leverages the LiteLLM framework to offer out-of-the-box compatibility with major model providers including Google, OpenAI, Anthropic, Ollama, and Hugging Face. You can define your benchmark once and run it consistently across various models with minimal code changes.
  • Incremental & Efficient Evaluation: Re-running an entire benchmark suite every time a new model or version is released is slow, inefficient and costly. LMEval's intelligent evaluation engine plans and executes evaluations incrementally. It runs only the necessary evaluations for new models, prompts, or questions, saving significant time and compute resources. Its multi-threaded engine further accelerates this process.
  • Multimodal & Multi-Metric Support: Modern foundation models go beyond text. LMEval is designed for multimodal evaluation, supporting benchmarks that include text, images and code. Adding new modalities is straightforward. Furthermore, it supports various scoring metrics to support a wide range of benchmark formats from boolean questions, to multi-choices, to free form generation. Additionally, LMEval provides support for safety/punting detection.
  • Scalable & Secure Storage: To store benchmark results in a secure and efficient manner, LMEval utilizes a self-encrypting SQLite database. This approach protects benchmark data and results from inadvertent crawling/indexing while they stay easily accessible through LMEval.

Getting Started with LMEval

Creating and running evaluations with LMEval is designed to be intuitive. Here's a simplified example demonstrating how to evaluate two Gemini model versions on a benchmark:

 Example of LMEval running on a multimodal benchmark across two models.
Results from the Phare benchmark that leverages LMEval for evaluation

The LMEval GitHub repository includes example notebooks to help you get started.

Visualizing Results with LMEvalboard

Understanding benchmark results requires more than just summary statistics. To help with this, LMEval includes LMEvalboard, a companion dashboard tool that offers an interactive visualization of how models stack up against each other. LMEvalboard provides valuable insights into model strengths and weaknesses, complementing traditional raw evaluation data.

LMEvalboard UI allows to quickly analyze how models compares on a given benchmark
LMEvalboard UI allows to quickly analyze how models compares on a given benchmark

LMEvalboard allows you to:

  • View Overall Performance: Quickly compare all models' accuracy across the entire benchmark.
  • Analyze a Single Model: Dive deep into a specific model's performance characteristics across different categories using radar charts and drill down on specific examples of failures
  • Perform Head-to-Head Comparisons: Directly compare two models, visualizing their performance differences across categories and examine specific questions where they disagree.

Try LMEval Today!

We invite you to explore LMEval, use it for your own evaluations, and contribute to its development by heading to the LMEval GitHub repository: https://github.com/google/lmeval

Acknowledgements

LMEval would not have been possible without the help of many people, including: Luca Invernizzi, Lenin Simicich, Marianna Tishchenko, Amanda Walker, and many other Googlers.

Kubernetes 1.33 is available on GKE!

Friday, May 9, 2025

Kubernetes 1.33 is now available in the Google Kubernetes Engine (GKE) Rapid Channel! For more information about the content of Kubernetes 1.33, read the official Kubernetes 1.33 Release Notes and the specific GKE 1.33 Release Notes.

Enhancements in 1.33:

In-place Pod Resizing

Workloads can be scaled horizontally by updating the Pod replica count, or vertically by updating the resources required in the Pods container(s). Before this enhancement, container resources defined in a Pod's spec were immutable, and updating any of these details within a Pod template would trigger Pod replacement impacting service's reliability.

In-place Pod Resizing (IPPR, Public Preview) allows you to change the CPU and memory requests and limits assigned to containers within a running Pod through the new /resize pod subresource, often without requiring a container restart decreasing service's disruptions.

This opens up various possibilities for vertical scale-up of stateful processes without any downtime, seamless scale-down when the traffic is low, and even allocating larger resources during startup, which can then be reduced once the initial setup is complete.

Review Resize CPU and Memory Resources assigned to Containers for detailed guidance on using the new API.

DRA

Kubernetes Dynamic Resource Allocation (DRA), currently in beta as of v1.33, offers a more flexible API for requesting devices than Device Plugin. (Instructions for opt-in beta features in GKE)

Recent updates include the promotion of driver-owned resource claim status to beta. New alpha features introduced are partitionable devices, device taints and tolerations for managing device availability, prioritized device lists for versatile workload allocation, and enhanced admin access controls. Preparations for general availability include a new v1beta2 API to improve user experience and simplify future feature integration, alongside improved RBAC rules and support for seamless driver upgrades. DRA is anticipated to reach general availability in Kubernetes v1.34.

containerd 2.0

With GKE 1.33, we are excited to introduce support for containerd 2.0. This marks the first major version update for the underlying container runtime used by GKE. Adopting this version ensures that GKE continues to leverage the latest advancements and security enhancements from the upstream containerd community.

It's important to note that as a major version update, containerd 2.0 introduces many new features and enhancements while also deprecating others. To ensure a smooth transition and maintain compatibility for your workloads, we strongly encourage you to review your Cloud Recommendations. These recommendations will help identify any workloads that may be affected by these changes. Please see "Migrate nodes to containerd 2" for detailed guidance on making your workloads forward-compatible.

Multiple Service CIDRs

This enhancement introduced a new implementation of allocation logic for Service IPs. The updated IP address allocator logic uses two newly stable API objects: ServiceCIDR and IPAddress. Now generally available, these APIs allow cluster administrators to dynamically increase the number of IP addresses available for Services by creating new ServiceCIDR objects.

Highlight of Googlers' contributions in 1.33 cycle:

Coordinated Leader Election

The Coordinated Leader Election feature progressed to beta, introducing significant enhancements in how a lease-candidate's availability is determined for an election. Specifically, the ping-acknowledgement checking process has been optimized to be fully concurrent instead of the previous sequential approach ensuring faster and more efficient detection of unresponsive candidates, which is essential for promptly identifying truly available lease candidates and maintaining the reliability of the leader election process.

Compatibility Versions

New CLI flags were added to apiserver as options for adjusting API enablement wrt an apiserver's emulated version. --emulation-forward-compatible is an option to implicitly enable all APIs which are introduced after the emulation version and have higher priority than APIs of the same group resource enabled at the emulation version.
--runtime-config-emulation-forward-compatible is an option to explicit enable specific APIs introduced after the emulation version through the runtime-config

zPages

ComponentStatusz and ComponentFlagz alpha features are now available to be turned on for all control plane components.
Components now expose two new HTTP endpoints, /statusz and /flagz, providing enhanced visibility into their internal state. /statusz details the component's uptime, golang, binary and emulation versions info, while /flagz reveals the command-line arguments used at startup.

Streaming List Responses

To improve cluster stability when handling large datasets, streaming encoding for List responses was introduced as a new Beta feature. Previously, serializing entire List responses into a single memory block could strain kube-apiserver memory. The new streaming encoder processes and transmits each item in a list individually, preventing large memory allocations. This significantly reduces memory spikes, improves API server reliability, and enhances overall cluster performance, especially for clusters with large resources, all while maintaining backward compatibility and requiring no client-side changes.

Snapshottable API server cache

Further enhancing API server performance and stability, a new Alpha feature introduces snapshotting to the watchcache. This allows serving LIST requests for historical or paginated data directly from its in-memory cache. Previously, these types of requests would query etcd directly, requiring to pipe the data through multiple encoding, decoding, and validation stages. This process often led to increased memory pressure, unpredictable performance, and potential stability issues, especially with large resources. By leveraging efficient B-tree based snapshotting within the watchcache, this enhancement significantly reduces direct etcd load and minimizes memory allocations on the API server. This results in more predictable performance, increased API server reliability, and better overall resource utilization, while incorporating mechanisms to ensure data consistency between the cache and etcd.

Declarative Validation

Kubernetes thrives on its large, vibrant community of contributors. We're constantly looking for ways to help make it easier to maintain and contribute to this project. For years, one area that posed challenges was how the Kubernetes API itself was validated: using hand-written Go code. This traditional method has proven to be difficult to authors, challenging to review and cumbersome to document, impacting overall maintainability and the contributor experience. To address these pain points, the declarative validation project was initiated.
In 1.33, the foundational infrastructure was established to transition Kubernetes API validation from handwritten Go code to a declarative model using IDL tags. This release introduced the validation-gen code generator, designed to parse these IDL tags and produce Go validation functions.

Ordered Namespace Deletion

The current namespace deletion process is semi-random, which may lead to security gaps or unintended behavior, such as Pods persisting after the deletion of their associated NetworkPolicies. By implementing an opinionated deletion mechanism, the Pods will be deleted before other resources with respect to logical and security dependencies. This design enhances the security and reliability of Kubernetes by mitigating risks arising from the non-deterministic deletion order.

Acknowledgements

As always, we want to thank all the Googlers that provide their time, passion, talent and leadership to keep making Kubernetes the best container orchestration platform. We would like to mention especially Googlers who helped drive the contributions mentioned in this blog: Tim Allclair, Natasha Sarkar, Vivek Bansal, Anish Shah, Dawn Chen, Tim Hockin, John Belamaric, Morten Torkildsen, Yu Liao,Cici Huang, Samuel Karp, Chris Henzie, Luiz Oliveira, Piotr Betkier, Alex Curtis, Jonah Peretz, Brad Hoekstra, Yuhan Yao, Ray Wainman, Richa Banker, Marek Siarkowicz, Siyuan Zhang, Jeffrey Ying, Henry Wu, Yuchen Zhou, Jordan Liggitt, Benjamin Elder, Antonio Ojea, Yongrui Lin, Joe Betz, Aaron Prindle and the Googlers who helped bring 1.33 to GKE!

- Benjamin Elder & Sen Lu, Google Kubernetes Engine

GSoC 2025: We have our Contributors!

Thursday, May 8, 2025

Google Summer of Code logo. A simplistic sun made of 2 orange square on top of each other, one rotated. In the middle is the XML opening and closing brackets in white

Congratulations to the 1272 Contributors from 68 countries accepted for GSoC 2025! Our 185 Mentoring Orgs have been very busy this past month - reviewing 23,559 proposals, having countless discussions with applicants, and finally, completing the rigorous selection process to find the right Contributors for their community.

Here are some highlights of the 2025 GSoC applicants:

  • 15,240 applicants from 130 countries submitting 23,559 proposals
  • Over 2,350 mentors and organization administrators
  • 66.3% of applicants have no prior open source experience

Now that the 2025 GSoC Contributors have been announced, the Organizations and Contributors will be spending 3 weeks together in the Community Bonding period. This time is a very important part of the GSoC program. Designed to get new contributors quickly up to speed, Mentors will use the next three weeks to introduce GSoC Contributors to their community, helping them understand the codebase and norms of their project, adjusting deliverables for the project and understanding the impact and reach of their summer project.

Contributors will begin writing code for Organizations on June 2nd - the official beginning of a totally new adventure! We're absolutely delighted to kick off another year alongside our amazing community.

A huge thanks to all the enthusiastic applicants who participated and, of course, to our phenomenal volunteer Mentors and Organization Administrators. Your weeks of thoughtful proposal reviews and proactive engagement with participants have been invaluable in introducing them to the world of open source.

And congratulations once again to our 2025 GSoC Contributors! Our goal is that GSoC serves as the catalyst for Contributors to become long term participants (and maybe even maintainers!) of open source communities of every shape and size. Now is their chance to dive in and learn more about open source and connect with these amazing communities.


Stephanie Taylor, Mary Radomile and Lucila Ortiz, Google Open Source

Get ready for Google I/O: Program lineup revealed

Wednesday, April 23, 2025

The Google I/O agenda is live. We're excited to share Google’s biggest announcements across AI, Android, Web, and Cloud May 20-21. Tune in to learn how we’re making development easier so you can build faster.

We'll kick things off with the Google Keynote at 10:00 AM PT on May 20th, followed by the Developer Keynote at 1:30 PM PT. This year, we're livestreaming two days of sessions directly from Mountain View, bringing more of the I/O experience to you, wherever you are.

Here’s a sneak peek of what we’ll cover:

    • AI advancements: Learn how Gemini models enable you to build new applications and unlock new levels of productivity. Explore the flexibility offered by options like our Gemma open models and on-device capabilities.
    • Build excellent apps, across devices with Android: Crafting exceptional app experiences across devices is now even easier with Android. Dive into sessions focused on building intelligent apps with Google AI and boosting your productivity, alongside creating adaptive user experiences and leveraging the power of Google Play.
    • Powerful web, made easier: Exciting new features continue to accelerate web development, helping you to build richer, more reliable web experiences. We’ll share the latest innovations in web UI, Baseline progress, new multimodal built-in AI APIs using Gemini Nano, and how AI in DevTools streamline building innovative web experiences.

Plan your I/O

Join us online for livestreams May 20-21, followed by on-demand sessions and codelabs on May 22. Register today and explore the full program for sessions like these:

We're excited to share what's next and see what you build!

By the Google I/O team

Google Summer of Code 2025 Contributor Applications Now Open!

Monday, March 24, 2025


Join Google Summer of Code (GSoC) and contribute to the world of open source! Applications for GSoC are open from March 24 to April 8, 2025.

Since 2005, GSoC has successfully brought over 21,000 new contributors from 123 countries into the open source community. This is an exciting opportunity for students and beginners to open source (18+) to gain real-world experience this summer. You will spend 12+ weeks coding, learning about open source development, and earn a competitive stipend under the guidance of experienced mentors.



Learn More and Apply

  • Make your proposal stand out! Read the Writing a proposal doc written by former contributors.
  • Reach out now to your preferred organizations via their contact methods listed on the GSoC site

Interested contributors may register and submit project proposals on the GSoC site from now until Tuesday, April 8th at 18:00 UTC.

Best of luck to all our applicants!

By Stephanie Taylor, Mary Radomile and Lucila Ortíz – GSoC Program Admins

Meet the Mentoring organizations of GSoC 2025!

Thursday, February 27, 2025

We are thrilled to share that we have selected 185 open source projects for the 21st year of Google Summer of Code (GSoC)

Get to know more about each organization via their individual GSoC program page. There you will find the best way to engage with each community, view project ideas, and read their contributor guidance for applying to their organization.


Applications for the GSoC Contributors are open March 24 - April 8, 2025

The 2025 GSoC program is open to students and to beginners in open source software development. If you are eager to enhance your chances of becoming a GSoC contributor this year, we highly recommend following these steps:

  • Get inspired by watching the 'Introduction to GSoC' video for a quick overview and the Community Talks to learn more about past projects that contributors completed.
  • Review the Contributor Guide and Advice for Applying to GSoC.
  • Review the list of accepted organizations.
    • We recommend finding two to three Orgs that interest you and reading through their project ideas. Use the filters on the site to help you narrow down based on the programming languages you are familiar with or categories that interest you.
  • Once you find an idea that excites you, reach out to the organization right away via their preferred communication methods. Communicating early and often will increase your chances of being accepted.
    • Introduce yourself to the mentors and community and ask questions to determine if this project idea is a good fit for your skill set and interests.
    • Use the information you received to write up your proposal.

Join us in our upcoming Info session!


Finally, you can find more information about the program on our website which includes the full 2025 timeline. You’ll also find the FAQ, Program Rules and some videos with more details about GSoC for both contributors and mentors.

Welcome aboard 2025 Mentoring Organizations! We are looking forward to an amazing year!

By Stephanie Taylor, Mary Radomile & Lucila Ortíz – GSoC Program Admins

Tag-Based Protection Made Easy

Tuesday, February 18, 2025


Scalable, Customizable, and Automated Backup Management

Managing backups across cloud environments can be challenging, particularly when dealing with large-scale infrastructure and constantly evolving workloads. With tag-based protection, organizations can automate backup assignments, ensure broad resource protection, and tailor policies to fit their needs – all through an open-source approach designed for flexibility and scalability leveraging Google Cloud Backup and DR.


Why Open Source? Flexibility and Customization

Traditional backup management often requires manual configurations, making it difficult to scale. By leveraging open-source automation, this solution allows users to:

  • Customize backup policies using VM tags that align with business needs (e.g., application, environment, or criticality).
  • Eliminate manual effort with automated backup assignments and removals.
  • Ensure bulk resource protection, dynamically adjusting backup coverage as infrastructure scales.
  • Integrate seamlessly with existing Google Cloud workflows, APIs, and automation tools.

With open-source flexibility, users can tailor backup strategies to fit their exact needs – automating, scaling, and adapting in real-time.


Scalable and Dynamic Backup Management

This approach provides:

  • Bulk inclusion/exclusion of projects and folders, simplifying administration.
  • Dynamic adjustments based on real-time tag updates.
  • Cloud Run automation to execute backups at scheduled intervals (hourly, daily, weekly, etc.).
  • Comprehensive protection reports, ensuring visibility into backup coverage.

Seamless Google Cloud Integration

To maximize efficiency, this open-source backup automation ensures:

  • Role-based access through predefined Google Cloud permissions (Tag Viewer, Backup, and DR Backup User).
  • Enhanced security by ensuring only authorized VMs are included in backup plans.

Get Started with the Open-Source Script

The backup automation script is available on GitHub, allowing users to customize and contribute to its development:

🔗 Explore the repository

By leveraging Google Cloud’s open-source backup automation, teams can effortlessly scale, automate, and customize their backup strategies – reducing operational overhead while ensuring critical resources remain protected.

By Ashika Ganesh – Product Manager, Google Cloud

Fabrication begins for production OpenTitan silicon

Thursday, February 6, 2025

With malicious software on the rise, how can you be certain that a computer, server, or mobile device is running the code (and provisioning data) that was intended? You can't just ask the code itself, so where do you start? The answer is deceptively simple – start where you have certainty and build up a chain of trust. For communication on the web, we rely on Certificate Authorities (CAs) to ensure the security of web content before it reaches the user. In products composed of an interconnected jungle of hardware and software, like Chromebooks and our Cloud infrastructure, we rely on a small dedicated secure microcontroller called a Root of Trust (RoT). And, some devices even have several RoTs for specialized needs.

Over the past six years, Google has been working with the open source community to build OpenTitan, the first open source silicon RoT. Today, we are excited to announce that we have started fabrication of the first production-ready OpenTitan silicon by Nuvoton. This silicon will be the first broadly used RoT chip at Google with a fully transparent design and origin. We have production OpenTitan chips available for lab testing and evaluation with larger volumes available from Nuvoton starting in Spring 2025.

ALT TEXT

History of RoTs and OpenTitan at Google

In 2009, Google began shipping devices with dedicated off-the-shelf RoTs. By 2014, it became clear that higher levels of assurance would only be attainable by investing in a first party RoT solution. A first party solution enabled Google to have full visibility and control over the security of its products throughout their life cycles. Previous off-the-shelf parts were black- or gray-box solutions where vendors are responsible for designing their own hardware and software – all with limited or no access to the source. Without full transparency, it is impossible to completely understand the security assurances for products using these proprietary parts. In addition, it was becoming harder to meet product needs with off-the-shelf RoT solutions, from footprint to function to cost – we needed a better solution for Chromebooks, Cloud, and later, Pixel.

Today, open source software powers nearly every consumer experience, from open source operating systems like Linux, to web browsers like Chromium. Open source is often the most economically efficient solution for developing foundational technology: it enables companies to work together and pool resources to build common, compatible products. Until now, this development approach has not been demonstrated in a commercially relevant setting for silicon.

OpenTitan is the first open-source silicon project to reach commercial availability based on the engineering samples we released last year. The OpenTitan project started from scratch in 2018 with a coalition of commercial, academic, and not-for-profit partners. The OpenTitan project is hosted by lowRISC CIC in Cambridge, UK. Google and project partners – Nuvoton, ETH Zurich, G+D Mobile Security, lowRISC, Rivos, Seagate, Western Digital, Winbond, zeroRISC, and a number of independent contributors – provide open source hardware register-transfer level (RTL) and design verification (DV) code, along with integration guidelines, and reference firmware to drive adoption throughout industry.


The Future

With the introduction of production-ready OpenTitan chips, we are excited to welcome an era where security is based on transparency from the very beginning of the stack. OpenTitan is the first commercially available open source RoT to support PQC secure boot based on SLH-DSA (formerly known as SPHINCS+). Our vision is that these chips will help drive broader industry adoption not only of open designs and their security properties, but also of this innovative method of open source collaboration between organizations.

Samples of production OpenTitan silicon are now available, with reference provisioning and application-level firmware releases coming soon. Product integrations have begun to intercept Chromebooks shipping later this year, with datacenter integrations following shortly after.


Getting Involved

With OpenTitan, we’ve introduced brand new methodologies for how commodity chips get designed that are increasingly economical moving forward. OpenTitan provides Google with a high-quality, low-cost, commoditized hardware RoT that can be used across the Google ecosystem. This will also facilitate the broader adoption of Google-endorsed security features across the industry.

The fabrication of production OpenTitan silicon is the realization of many years of dedication and hard work from our team. It is a significant moment for us and all contributors to the project. OpenTitan’s broad community has been critical to its success. As the following metrics show (baselined from the project’s public launch in 2019), the OpenTitan community is rapidly growing:

  • Almost nine times the number of commits at launch: from 2,500 to over 24,200.
  • 176 contributors to the code base
  • 17k+ merged pull requests
  • 1.5M+ LoC, including 500k LoC of HDL
  • 2.5k Github stars

If you are interested in learning more or contributing to OpenTitan, visit the open source GitHub repository or reach out to the OpenTitan team.

By Cyrus Stoller and Miguel Osorio – OpenTitan

The New Frontier of Security: Creating Safe and Secure AI Models

Wednesday, January 29, 2025


Are you looking to safely create the next state-of-the-art AI model? Today we’re sharing a list of recommendations on how to create and distribute your models securely.


Choosing the Right Foundation: Safe Model Formats

Before you start building your model, consider using a safe file format, as it can influence your development tool options. However, if you've already created a model, you can also convert it to a safe format before sharing it.

Once trained, models are saved and distributed as binary files. Common formats include PyTorch (Pickles usually with .pt, .pth extensions), TensorFlow SavedModel (.pb), GGUF (.gguf), and Safetensors (.safetensors). However, binary files are dangerous because it's hard to verify if their content is safe. This is especially true with formats such as Pickles and SavedModels, which are designed to include arbitrary code, raising the risk of remote code execution (RCE) on users' machines.

To mitigate these risks:

  • If sharing only model weights: Consider formats such as Safetensors. These formats only contain model weights, and are therefore safe from RCE.
  • If sharing weights and metadata: Consider formats like GGUF, which include weights and additional metadata but not executable code configurations.
  • For any format, but especially if your model requires custom code: Keep reading to see how to help users verify that they're getting the correct model.

Secure Releases and Verification Methods

To ensure your users are getting the model you originally deployed, consider automating your releases, making them transparent and auditable. Instead of training your model on your local machine, consider using a predefined script to train your model within an isolated environment. When building smaller models, using GitHub Actions can be a good option. However, for larger models, GitHub Actions might not have the necessary hardware capabilities or availability. In that case, and if budget allows, consider using other platforms with proper security safeguards, such as Google Cloud Platform (GCP).

If building your model on a cloud platform is not an option, you can sign the model on your local machine to give your users confidence it was created by you.

However, if building your model on a cloud platform, sign the model and generate a provenance attestation for the release. This allows users to not only confirm that the model was created by you, but that it came from your approved infrastructure, was trained following the specific instructions defined in your training script, and wasn't tampered with by a malicious actor.

While signatures and provenance do not guarantee the absence of malicious intent from the developer, they provide users with a means to verify the integrity of the model they downloaded.

On GitHub, signing and provenance can be easily achieved using GitHub Artifact Attestations. For the general use case, tools like Sigstore and SLSA are also available to sign and attest provenance to your deployments.

For a few examples, check out this workflow to build a model with SLSA on GitHub and on GCP, and an example of how to sign models.


Educate Your Users

After sharing your model with the world, it is essential to educate users on the safety and security concerns surrounding model consumption. You should therefore:

  • Document potential biases in your models and datasets.
  • Clearly display all licenses associated with the model and datasets.
  • Benchmark your model, assessing and disclosing metrics around hallucinations, prompt injection risk and fairness.

With this information users can make informed decisions and abide by the ethical and technical guidelines associated with your model. For instance, they might choose to implement an input sanitization layer to enhance their software security.


Establish a Security Policy

Your model’s privacy, safety, and security guarantees can be documented in separate files or in a security policy. A security policy helps you address users' safety and security concerns regarding your models. It is a dedicated file that instructs users on how to privately report vulnerabilities, such as prompt injection strategies or potential out-of-memory (OOM) errors, allowing you to investigate and address the potential vulnerabilities before they become public knowledge. It is also a good place to define the scope of what your project considers a vulnerability.

In summary, considering model security from the outset of development is crucial. Additionally, ensuring safe distribution and informing users of potential risks is essential. It's important to remember that security is a continuous process – more like a marathon than a sprint – and constant vigilance is necessary to mitigate potential threats.


Keep Improving

The steps above will put your models’ security on a solid footing, but there is always more you can learn and do. Please take a look at Google's Secure AI Framework for a deeper dive in this subject, and take its risk self assessment to better understand which risks are most important for you.

By Gabriela Gutierrez and Pedro Nacht – GOSST Upstream Team

Producer java library for Data Lineage is now open source

Tuesday, January 28, 2025

Integrating OpenLineage producers with GCP Lineage just got a lot easier


What is Data Lineage

Data Lineage is a GCP feature that allows tracking data movement. This tool helps data owners and analysts detect anomalies in data flows, find connections between data sources and verify the potential consequences of planned changes in data pipelines.

Lineage is injected automatically for some Google Cloud products (BigQuery, Cloud Data Fusion, Cloud Composer, Dataproc, Vertex AI). That means, if Lineage integration with any of those products is enabled in the projects, data movements coming from executing jobs by these products will be reported to GCP Lineage.

For custom integrations, the API can be used to report and fetch lineage.

After injecting, lineage can be viewed in the Google Cloud console (available from DataCatalog UI, BigQuery UI, Vertex UI). There are two representations: graph view, with data sources as nodes and data movements as edges, and list view, a tabular representation. Lineage information can also be fetched from the API.

More information is available in the documentation.


GCP Lineage information model

We describe data flows using the following concepts:

  • Process is a definition of some data transformation. For example, a SQL or Spark script.
  • Run is an execution of a Process.
  • Lineage Event is a data transformation event. It is reported in context of a Run.
  • A Link represents a connection between two data sources, when data in the link’s Target depends on its Source. A Lineage Event contains a list of Links.

OpenLineage support

OpenLineage is an open standard for reporting lineage information. It unifies lineage reporting between systems, which means the events generated in this format can be consumed by any product supporting it. This leads to more flexibility: adding or replacing a lineage producer does not imply changing the consumer, and vice versa.

OpenLineage format is adopted by a number of lineage producers and consumers, meaning there is already tooling available to report lineage from/to those systems. GCP Lineage is one of those consumers: users can report events in OpenLineage format, see the resulting lineage on the UI, and query it via the API.

OpenLineage is the preferred method for reporting lineage in GCP Lineage. It is used by the Dataproc lineage integration. To find out more about sending OpenLineage events to GCP Lineage refer to the documentation.

After injecting lineage in OpenLineage format, it can be accessed in the same way as if it was injected via other API methods or automatically: from the Google Cloud console or the API.


Why producer library

The GCP Lineage producer library is an extension of the client library. Client libraries are recommended for calling Cloud APIs programmatically. They handle low level API call details, leaving the necessary user code simpler and shorter.

The producer library further simplifies integration by providing ready to use code needed to call the API from Java. It adds additional functionality such as synchronous and asynchronous clients, translating OpenLineage JSON messages to the API friendly format, error handling etc.

Using the producer library, all the code needed to send a request to GCP Lineage API is:

SyncLineageProducerClient client = SyncLineageProducerClient.create();
ProcessOpenLineageRunEventRequest request =
        ProcessOpenLineageRunEventRequest.newBuilder()
            .setParent(parent)
            .setOpenLineage(openLineageMessage)
            .build();
client.processOpenLineageRunEvent(request);

The field openLineageMessage here is a protobuf Struct that includes information about job execution, inputs and outputs and other metadata. The object model is described in the documentation. An example message is:

{
  "eventType": "START",
  "eventTime": "2023-04-04T13:21:16.098Z",
  "run": {
    "runId": "502483d6-3e3d-474f-9380-da565eaa7516",
    "facets": {
       "spark_properties": {
        "_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.22.0/integration/spark",
        "_schemaURL": "https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunFacet",
        "properties": {
          "spark.master": "yarn",
          "spark.app.name": "sparkJobTest.py"
        }
      }
    }
  },
  "job": {
    "namespace": "project-name",
    "name": "cluster-name",
    "facets": {
    "jobType": {
        "_producer": "https://github.com/OpenLineage/OpenLineage/tree/1.22.0/integration/spark",
        "_schemaURL": "https://openlineage.io/spec/facets/2-0-3/JobTypeJobFacet.json#/$defs/JobTypeJobFacet",
        "processingType": "BATCH",
        "integration": "SPARK",
        "jobType": "SQL_JOB"
      },

    }
  },
  "inputs": [
    {
      "namespace": "bigquery",
      "name": "project.dataset.input_table",
    }],
  "outputs": [
   {
      "namespace": "bigquery",
      "name": "project.dataset.output_table",
    }],
  "producer": "https://github.com/OpenLineage/OpenLineage/tree/0.18.0/integration/spark",
  "schemaURL": "https://openlineage.io/spec/1-0-5/OpenLineage.json#/$defs/RunEvent"
}

Learn more about building an OpenLineage message.


Best Practices for Constructing OpenLineage Messages

The openLineageMessage should follow the OpenLineage format. The fields that are required for correct parsing by the GCP Lineage API are:

job

mapped to Process

job.namespace

used to construct Process name

job.name

used to construct Process name

run

mapped to Run

run.runId

used to construct Run name

producer

URI identifying the producer of this metadata

eventTime

time of the data movement

schemaURL

URL pointing to the schema definition for this message

In addition to those, the fields used to create lineage are:

eventType

corresponds to the status of the Run

inputs

mapped to sources of links. Must be specified according to the naming conventions

outputs

mapped to targets of links. Must be specified according to the naming conventions

The GCP Lineage API supports OpenLineage major versions 1 and 2. For more information please refer to the documentation.


How to access GCP Lineage?

The code is now publicly available on GitHub. The library is also published to Maven.


GcpLineageTransport

To simplify integration with GCP Lineage, we offer GcpLineageTransport. It is available on the OpenLineage GitHub repository and is built to a separate maven artifact. It is built on top of the producer library mentioned above.

Using the transport minimises the code for sending events to GCP Lineage. The GcpLineageTransport can be configured as the event sink for any existing OpenLineage producer such as Airflow, Spark, and Flink. Find more information and examples on GCP Lineage.

By Mary Idamkina – Data Lineage

See the code that powered the Pebble smartwatches

Monday, January 27, 2025

We are excited to announce that the source code that powered Pebble smartwatches is now available for download.

This is part of an effort from Google to help and support the volunteers who have come together to maintain functionality for Pebble watches after the original company ceased operations in 2016.


A quick look back

Pebble was initially launched through a very successful Kickstarter project. Pebble’s first Kickstarter was the single most funded at the time, and its successor Kickstarter for the Pebble Time repeated that feat – and remains the second most funded today! Over the course of four years, Pebble sold over two million smartwatches, cultivating a thriving community of thousands of developers who created over ten thousand Pebble apps and watchfaces.

In 2016, Fitbit acquired Pebble, including Pebble’s intellectual property. Later on, Fitbit itself was acquired by Google, taking the Pebble OS with it.

Despite the Pebble hardware and software support being discontinued eight years ago, Pebble still has thousands of dedicated fans.


What is being released

We are releasing most of the source code for the Pebble operating system. This repository contains the entire OS, which provides all the standard smartwatch functionality – notifications, media controls, fitness tracking, and support for custom apps and watchfaces – on tiny ARM Cortex-M microcontrollers. Built with FreeRTOS, it contains multiple modules for memory management, graphics, and timekeeping, as well as an extensive framework to load and run custom applications written in C, as well as in Javascript via the Jerryscript Javascript engine. The Pebble architecture allowed for a lightweight system delivering a rich user experience as well as a very long battery life.

It's important to note that some proprietary code was removed from this codebase, particularly for chipset support and the Bluetooth stack. This means the code being released contains all the build system files (using the waf build system), but it will not compile or link as released.


The path forward

From here, we are hoping this release will assist the dedicated community and volunteers from the Rebble project to carry forward the support for Pebble watches that users still love. For someone to build a new firmware update, there is a non-trivial amount of work to do in finding replacements for the pieces that were stripped out of this code, as well as updating this source code that has not been maintained for a few years.

By Matthieu Jeanson, Katharine Berry, and Liam McLoughlin

Introducing Eclipsa Audio: immersive audio for everyone

Wednesday, January 15, 2025

In the real world, we hear sounds from all around us. Some sounds are ahead of us, some are to our sides, some are behind us, and - yes - some are above or below us. Spatial audio technology brings an immersive audio experience that goes beyond traditional stereo sound. It creates a 3D soundscape, making you feel like sounds are coming from all around you, not just from the left and right speakers.

Spatial audio technologies were first developed over 50 years ago, and playback has been available to consumers for over a decade, but creating spatial audio has been mostly limited to professionals in the movie or music industries. That’s why Google and Samsung are releasing Eclipsa Audio, an open source spatial audio format for everyone.


From Creation to Distribution to Experience

Eclipsa Audio is based on Immersive Audio Model and Formats (IAMF), an audio format developed by Google, Samsung, and other key contributors within the Alliance for Open Media (AOM), and released under the AOM royalty-free license. Because IAMF is open source, Eclipsa Audio files can be created by anyone using freely available audio tools, which support a wide variety of workflows:

A diagram shows three different workflows for encoding video and audio using `iamf_tools` and `ffmpeg` to create MP4 files with IAmF audio and video.  Each workflow handles a different input type, including ADM-BWF, Wav files, Textproto, and Video.

An open source reference renderer [1] is freely available for standalone spatial audio playback, or you can test your Eclipsa Audio files right in your browser at the Binaural Web Demo Application.

Starting in 2025, creators will be able to upload videos with Eclipsa Audio tracks to YouTube. As the first in the industry to adopt Eclipsa Audio, Samsung is integrating the technology across its 2025 TV lineup — from the Crystal UHD series to the premium flagship Neo QLED 8K models — to ensure that consumers who want to experience this advanced technology can choose from a wide range of options. Google and Samsung will be launching a certification and brand licensing program in 2025 to provide quality assurance to manufacturers and consumers for products that support Eclipsa Audio.


Next Steps

To simplify the creation of Eclipsa Audio files, later this spring we will release a free Eclipsa Audio plugin for AVID Pro Tools Digital Audio Workstation. We also plan to bring native Eclipsa Audio playback to the Google Chrome browser as well as to TVs and Soundbars from multiple manufacturers later in 2025. Eclipsa Audio support will also arrive in an upcoming Android AOSP release; stay tuned for more information.

We believe that Eclipsa Audio has the potential to change the way we experience sound. We are excited to see how it is used to create new and innovative audio experiences.

By Matt Frost, Jani Huoponen, Jan Skoglund, Roshan Baliga – the Open Audio team

[1]Special thanks to Arm for providing high performance optimizations to the IAMF reference software.

.