Google Open Source Blog: September 2024

Posts from September 2024

Keys to a resilient Open Source future

Wednesday, September 18, 2024

In today’s world, free and open source software is a critical component of almost every software solution. Research shows that 70% of modern software relies on open source components, with over 97% of applications leveraging open source code. Unsurprisingly, 90% of companies are using or applying open source code in some form.

These statistics highlight the importance of open source software in modern technology and software development. At the same time, they demonstrate that as its relevance grows, so do the challenges associated with keeping it safe. At Open Source Summit EU, we discussed these challenges and how open source security could be improved. Let’s begin by breaking down the landscape of open source.

The open source ecosystem is fragmented, with diverse languages, build systems, and testing pipelines, making it difficult to maintain consistent security standards. This fragmentation forces developers to juggle multiple roles, such as managing security vulnerabilities, often without adequate tools and support. As a result, inconsistencies and security gaps arise, leaving open source projects vulnerable to attacks. Creating consistent security practices across the board is key to addressing vulnerabilities, which standardization helps to minimize while streamlining the development process.

Google’s SLSA (Supply Chain Levels for Software Artifacts) framework and OSV (Open Source Vulnerabilities) schemas are prime examples of how de facto standardization can transform open source security. SLSA has united several companies to create a standard that enables developers to improve their supply chain security posture, helping prevent attacks like those experienced by SolarWinds and Codecov.

The OSV schema has also been successful, with more than 20 language ecosystems adopting it. This schema allows vulnerabilities to be exported in a precise, machine-readable format, making them easier to manage and address. Thanks to its standardized format, over 150,000 vulnerabilities in open source software have been aggregated and made accessible to anyone in the world via a single API call.

However, many tasks remain manual, making them time-consuming and more prone to human error. Developers must integrate multiple tools at different stages of the software development cycle. The future of open source security lies in creating a fully integrated platform—a tool suite that integrates the best-in-industry tools and solutions, and provides simple hooks for continuous operation in the CI/CD system. Automation is crucial.

The key to revolutionizing open source security is AI, as it can automate manual and error-prone tasks, and reduce the burden on developers.

Google has already started leveraging AI in open source security by successfully using it to write and improve fuzzer unit tests. Google's OSS-Fuzz has been a game changer with a 40% increase in code coverage for over 160 projects. Since its inception, it has identified over 12,000 vulnerabilities with a 90% fix rate. Its effectiveness is due to its close integration with the developer’s workflow, testing the latest commits and helping to fix regressions quickly.

While AI remains an area of active research, and it has not yet solved all security challenges, Google is eager to collaborate with the community to push the boundaries of what AI can achieve in open source security.

Google's approach to open source security is now focused on long-term thinking and scalable solutions. To make a meaningful difference at scale, it is focusing on three key aspects:

Simplifying and applying security best practices consistently: Common, usable standards are key to reducing vulnerabilities and maintaining a secure ecosystem.

Developing an intelligent and integrated platform: A seamless, integrated platform that automates security tasks and naturally integrates into the developer workflow.

Leveraging AI to accelerate and enhance security: Reducing the workload on developers and catching vulnerabilities that might go undetected.

By maintaining this focus and continuing to collaborate with the community, Google and the open source ecosystem can ensure that FOSS remains a secure, reliable foundation for the software solutions of tomorrow.

By Abhishek Arya – Principal Engineer, Open Source and Supply Chain Security

Google Open Sources Smart Buildings Simulator and Dataset to Accelerate Sustainable Innovation

Tuesday, September 17, 2024

In our ongoing commitment to sustainability and technological advancement, Google is excited to announce a significant step forward in the realm of smart buildings. Today, we are open-sourcing two invaluable resources:

1. TensorFlow Smart Buildings Simulator: A powerful tool designed to train reinforcement learning agents to optimize energy consumption and minimize carbon emissions in buildings.

2. Smart Buildings Dataset: A comprehensive collection of six years of telemetry data from three Google buildings, providing real-world insights for developing and validating optimal control solutions.

Empowering the Future of Smart Buildings

Buildings account for a substantial portion of global energy consumption and greenhouse gas emissions. As we strive to create a more sustainable future, optimizing the energy efficiency of buildings is paramount. Artificial intelligence and machine learning offer promising solutions, and Google is dedicated to accelerating progress in this field.

The TensorFlow Smart Buildings Simulator provides researchers and developers with a realistic and customizable environment to train reinforcement learning agents. These agents can learn to make intelligent decisions about heating, cooling, ventilation, and lighting systems, balancing occupant comfort with energy efficiency and carbon reduction goals. By open-sourcing this simulator, we aim to empower the community to develop innovative control strategies that can be applied to real-world buildings.

Complementing the simulator, the Smart Buildings Dataset offers a wealth of real-world data collected from three Google buildings over six years. This dataset encompasses a wide range of telemetry, including temperature, humidity, occupancy, lighting levels, and energy consumption. By making this data available, we hope to enable researchers to develop data-driven models, validate their simulations, and gain deeper insights into the complex dynamics of building systems.

Collaboration for a Sustainable Future

We believe that open collaboration is key to driving innovation and progress in the smart buildings domain. By open-sourcing these resources, Google aims to foster a vibrant ecosystem of researchers, academics, and industry professionals working together to enhance sustainability and advance the field of smart buildings.

We envision universities leveraging these resources to conduct cutting-edge research, develop new algorithms, and train the next generation of engineers. Industry partners can utilize the simulator and dataset to test and validate their solutions, accelerate development cycles, and bring more efficient and sustainable products to market.

Google's Commitment to Sustainability

This open-source initiative aligns with Google's broader commitment to sustainability. We have set ambitious goals to operate on 24/7 carbon-free energy by 2030 and achieve net-zero emissions across all our operations and value chain by 2040. By sharing our tools and data, we hope to contribute to a global effort to reduce the environmental impact of buildings and create a more sustainable future for all.

Get Involved

We invite researchers, developers, and industry professionals to explore these open-source resources and join us in our mission to build a more sustainable world. Together, we can harness the power of AI and data to transform the way we design, operate, and interact with buildings, creating a future where energy efficiency, occupant comfort, and environmental responsibility go hand in hand.

Let's collaborate, innovate, and build a brighter future for smart buildings!

By John Sipple – Google Core Enterprise Machine Learning Team

Empowering etcd Reliability: New Downgrade Support in Version 3.6

Thursday, September 12, 2024

In the world of distributed systems, reliability is paramount. etcd, a widely used key-value store often critical to infrastructure, has made strides in enhancing this aspect. While etcd's reliability has been robust thanks to the Raft consensus protocol, the same couldn't be said for upgrades/downgrades – until now.

The Challenge of etcd Downgrades

Historically, downgrading etcd has been a complex and unsupported process. There is no way to safely downgrade etcd data after it was touched by a newer version. Upgrades, while reliable, weren't easily reversible, often requiring external tools and backups. This lack of flexibility posed a significant challenge for users who encountered issues after upgrading.

Enter etcd 3.6: A New Era of Downgrade Support

etcd 3.6 introduces a groundbreaking solution: built-in downgrade support. This innovation not only simplifies the upgrade and downgrade processes but also significantly enhances etcd's reliability.

How Does It Work?

Storage Versioning: A new storage version (SV) is persisted within the etcd data file. This version indicates compatibility, ensuring safe upgrades and downgrades.

Schema Evolution: A comprehensive schema tracks all fields in the data file and acts as a source of truth about which version a particular was introduced in, allowing etcd to understand and manipulate data across versions.

etcdutl migrate: A dedicated command-line tool, etcdutl migrate, streamlines skip-level upgrade and downgrade process, eliminating the need for complex manual steps.

Benefits for Users

The introduction of downgrade support in etcd 3.6 offers a range of benefits for users:

Improved Reliability: Upgrades can be safely reverted, reducing the risk of data loss or operational disruption.

Simplified Management: The upgrade and downgrade processes are streamlined, reducing the complexity of managing etcd clusters.

Increased Flexibility: Users have greater flexibility in managing their etcd environments, allowing them to experiment with new versions and roll back if necessary.

Under the Hood: Technical Details

To achieve downgrade support, etcd 3.6 implements a strict storage versioning policy. This means that etcd data is versioned, etcd will no longer be allowed to load data generated by version higher than its own, and must rely on cluster downgrade process instead. This ensures that all the DB and WAL files would not have any information that could be incorrectly interpreted.

During the downgrade process, new fields from the higher version in DB files will be cleaned up. The etcd protocol version will be lowered to allow older versions to join. All new features, rpcs and fields would not be used thus preventing older members from interpreting replicated logs differently. This also means that entries added to the Wal log file should be compatible with lower versions. When a wal snapshot happens, all older incompatible entries should be applied, so they no longer need to be read and the storage version can be downgraded.

The etcdutl migrate command tool is added to simplify etcd data upgrade and downgrade process on 2+ minor version upgrades/downgrades scenarios, by validating the WAL log compatibility with the target version, and executing any necessary schema changes to the DB file and updating the storage version.

Implementation Milestones

The rollout of downgrade support is planned in three milestones:

Snapshot Storage Versions: Storage versioning is implemented for snapshots.

Version Annotations: etcd code is annotated with versions, and a schema is created for the data file.

Full Downgrade Support: Downgrades can be fully implemented using the established storage versioning and schema.

We are currently working on finishing the third milestone.

Looking Ahead

etcd 3.6 marks a significant step forward in the reliability and manageability of etcd clusters. The introduction of downgrade support empowers users with greater flexibility and control over their etcd environments. As etcd continues to evolve, we can expect further enhancements to the upgrade and downgrade processes, further solidifying its position as a critical component in modern distributed systems.

By Siyuan Zhang – Software Engineer