Google Open Source Blog: 2021

Posts from 2021

Season of Docs announces results of 2021 program

Tuesday, December 14, 2021

Season of Docs has announced the 2021 program results for all projects. You can view a list of successfully completed projects on the website along with their case studies.

In 2021, the Season of Docs program allowed open source organizations to apply for a grant based on their documentation needs. Selected open source organizations then used their grant to hire a technical writer directly to complete their desired documentation project. Organizations then had six months to complete their documentation project. (In previous years, Google matched technical writers to projects and paid the technical writers directly.)

The 2021 Season of Docs documentation development phase began on April 16 and ended November 16, 2021 for all projects:

30 open source organizations finished their projects (100% completion)
93% of organizations had a positive experience
96% of the technical writers had a positive experience

Take a look at the list of completed projects to see the wide range of subjects covered!

What is next?

Stay tuned for information about Season of Docs 2022—watch for posts on this blog and sign up for the announcements email list. We’ll also be sharing information about best practices in open source technical writing derived from the Season of Docs case studies.

If you were excited about participating, please do write social media posts. See the promotion and press page for images and other promotional materials you can include, and be sure to use the tag #SeasonOfDocs when promoting your project on social media. To include the tech writing and open source communities, add #WriteTheDocs, #techcomm, #TechnicalWriting, and #OpenSource to your posts.

By Kassandra Dhillon and Erin McKean, Google Open Source Programs Office

Boosting Machine Learning with tailored accelerators: Custom Function Units in Renode

Thursday, December 9, 2021

Development of Machine Learning algorithms which enable new and exciting applications is progressing at a breakneck pace, and given the long turnaround time of hardware development, the designers of dedicated hardware accelerators are struggling to keep up. FPGAs offer an interesting alternative to ASICs, enabling a much faster and more flexible environment for such HW-SW co-development, and with projects such as the FPGA interchange format (now part of CHIPS Alliance), Google and Antmicrohave been turning the FPGA ecosystem to be ever more open and software driven.

The open RISC-V ISA was built with Machine Learning in mind, with its configurable and adaptable nature, flexible vector extensions and a rich ecosystem of open source implementations which can serve as an excellent starting point for new R&D projects.

Given their wide-ranging interests in edge AI, both Google and Antmicro have embraced RISC-V as Founding members as far back as 2015. Among many other open source tools and building blocks that Antmicro is creating, we have invested heavily into enabling HW/SW co-development of ML solutions using RISC-V in our open source simulation framework, Renode.

RISC-V is also excellent for FPGA-based ML development. It offers a multitude of FPGA-friendly softcore options—such as VexRiscv and specialized ML-oriented extensions called CFU—which you can experiment in cheap, easily accessible hardware andRenode, using Verilator co-simulation capabilities.

In this note, we will describe the CFU and the CFU playground ML experimentation project that Antmicro and Google have been collaborating on to push forward FPGA acceleration of AI, and how to get started quickly with your very own hardware-assisted ML pipeline.

About the CFU

A “CFU”, or a “Custom Function Unit,” is an accelerator tightly coupled with the CPU. It adds a custom instruction to the ISA using a standardized format defined by the CFU working group of RISC-V International.

CFUs are easy to design, write, and experiment with given the reprogrammable nature of FPGAs. When working with a CFU, you are encouraged to identify blocks to be accelerated iteratively, measure your payload after each iteration and, above all, prepare custom CFUs for each payload (potentially using the capabilities of most FPGAs to be reprogrammed on the fly, or just holding several CFUs in store side by side, to be executed depending on the payload in question).

CFU execution is triggered by one of the standard instructions, with arguments passed via registers. The CPU can handle many different CFUs with various functions, their IDs are retrieved from the `funct7` and `funct3` operands of the decoded instruction. The only interaction between the CPU and the CFU is via registers and immediate values provided in the instruction itself—there is no direct memory access nor any interaction between different CFUs.

Figure 1

CFU Playground

Google’s CFU Playground provides an open source framework which offers a handy methodology for reasoning about ML acceleration and developing your own Custom Function Units using FPGAs and simulation. Various CFU examples and demos are available, and you can also add a project with your sources and modified TFLite Micro code (one of the results of our collaboration with the TF Lite Micro team). An overlay mechanism lets you override every part of code that you need.

A CFU may be written in Verilog or any language/framework that outputs Verilog. In the CFU Playground demos, CFUs are mostly written in nMigen, which allows you to write code in Python and then generates Verilog output. The Python-based flow simplifies development for software engineers who may not be familiar with writing Verilog code. Since it’s generated from Python, it is also very easy to upgrade in small steps in a structured way until you reach your expected acceleration targets.

Co-simulation in Renode

Renode has been supporting co-simulation of various buses since the 1.7.1 release, and support for CFU was also added recently. CFU support is done via the Renode Integration Layer plugin. It essentially consists of two parts: first, a C# class called `CFUVerilatedPeripheral,` which manages the Verilator simulation process, and second, an integration library written in C++. The integration library alongside the ‘verilated’ hardware code (i.e. HDL compiled into C++ via Verilator) are then built into a binary, which in turn is imported by the `CFUVerilatedPeripheral`. It is possible to install up to four different CFUs under one RISC-V CPU. Each of them will be executed based on the opcode received from the CPU.

Since the hardware is translated into C++ via Verilator, you can also enable tracing which dumps CFU waveforms into a file to later analyze.

How to ‘verilate’ your own CFU

Basic examples of verilated CFUs are available on Antmicro’s GitHub. You can use this repository to ‘verilate’ your own custom CFU.

In the `main.cpp` of your verilated model, you need to include C++ headers from the Renode Verilator Integration Library.

#include “src/renode_cfu.h”
#include “src/buses/cfu.h”

Next, you need to initialize the `RenodeAgent` and the model’s `top` instance along with the `eval()` function that will evaluate the model during simulation.

RenodeAgent *cfu;
Vcfu *top = new Vcfu;

void eval() {
top->eval();
}

Now add an `Init()` function that will initialize a bus along with its signals, and the `eval()` function. It should also initialize and return the `RenodeAgent` connected to a bus.

RenodeAgent *Init() {
   Cfu* bus = new Cfu();

   //=================================================
   // Init CFU signals
   //=================================================
   bus->req_valid = &top->cmd_valid;
   bus->req_ready = &top->cmd_ready;
   bus->req_func_id = (uint16_t *)&top->cmd_payload_function_id;
   bus->req_data0 = (uint32_t *)&top->cmd_payload_inputs_0;
   bus->req_data1 = (uint32_t *)&top->cmd_payload_inputs_1;
   bus->resp_valid = &top->rsp_valid;
   bus->resp_ready = &top->rsp_ready;
   bus->resp_ok = &top->rsp_payload_response_ok;
   bus->resp_data = (uint32_t *)&top->rsp_payload_outputs_0;
   bus->rst = &top->reset;
   bus->clk = &top->clk;

   //=================================================
   // Init eval function
   //=================================================
   bus->evaluateModel = &eval;

   //=================================================
   // Init peripheral
   //=================================================
   cfu = new RenodeAgent(bus);

   return cfu;
}

To compile your project, you must first export three environment variables:

`RENODE_ROOT`: path to Renode source directory
`VERILATOR_ROOT`:path to the directory where Verilator is located (this is not needed if Verilator is installed system-wide)
`SRC_PATH`: path to the directory containing your `main.cpp`

With the variables above now set, go to `SRC_PATH` and build your CFU:

mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release "$SRC_PATH"
make libVtop

If you need more details about creating your own ‘verilated’ peripheral, visit the chapter in Renode documentation about co-simulation.

To attach a verilated CFU to a Renode platform, add `CFUVerilatedPeripheral` to your `RISC-V` CPU.

cpu: CPU.VexRiscv @ sysbus
cpuType: "rv32im"

cfu0: Verilated.CFUVerilatedPeripheral @ cpu 0
frequency: 100000000

As the last step, provide a path to a compiled verilated CFU. You can do it either in `.repl` platform as a CFU constructor or in `.resc` script.

cpu.cfu0 SimulationFilePath @libVtop.so

To see how it works without building your own project, run the built-in Renode demo script called litex_vexriscv_verilated_cfu.resc in Renode’s monitor CLI:

(monitor) s @scripts/single-node/litex_vexriscv_verilated_cfu.resc

CFU Playground Integration

CFU Playground makes use of a Continuous Integration mechanism to make sure new changes don’t break anything. Since the project is targeted mostly for real hardware, a simulator like Antmicro’s open source Renode framework is indispensable. A large number of varied tests are executed with every change in the mainline CFU Playground repository, building the CFUsoftware, and then running it in Renode with hardware co-simulation or with a software CFU reimplementation.

In the CI tests, Renode uses scripts which are generated for each specific build target. This makes it possible to generate the exact same scripts locally and run them in Renode to enable a step-by-step assessment of what is happening in the code.

What’s next?

CFU integration in Renode is already used in practice, among other places in the EU-funded project called VEDLIoT, for which Antmicro also implemented the Kenning framework. VEDLIoT will use Renode to develop and test a soft-SoC based system aimed to drive Tiny ML workloads.

Renode’s use in CFU Playground is yet another outcome of Antmicro’s long partnership with Google. Along with the testing and development work we did for the TensorFlow Lite Micro team, this shows that Renode is and will continue to be a go-to framework for embedded ML developers.

By guest author Michael Gielda – Antmicro

#BazelCon 2021 Wrap Up

Friday, December 3, 2021

The apps, platforms, and systems that the Bazel community builds with Bazel touch the lives of people around the world in ways we couldn’t have imagined. Through BazelCon, we aim to connect Bazel enthusiasts, the Bazel team, maintainers, contributors, users, and friends in an inclusive and welcoming environment. At BazelCon, the community demonstrates the global user impact of the community—with some quirky and carefully crafted talks, a readout on the State-of-Bazel, an upfront discussion on “Implicit Bias Mitigation,” and community sharing events that remind us that we are not alone in our efforts to build a better world, one line of code at a time.

At BazelCon, the community shared over 24 technical sessions with the 1400+ registrants, which you can watch here at your own pace. Make sure you check out:

“Reproducible builds with Bazel” — Stories about the meaning of "hermetic" and how to achieve it in the context of builds and a meditation on the aesthetic aspects of build reproducibility.

“Embedded Platform Testing with Remote Execution” — Leverage Remote Execution to speed up and simplify builds while testing on embedded hardware.

“Bazelizing a Gigantic iOS App” — Lessons learned from migrating millions lines of code, decade-old iOS App to Bazel

“Build Custom Silicon with Bazel” — Design custom silicon with the same workflow you use to build software with Bazel

“Implicit Bias Mitigation: Making the Unconscious Conscious” — Reviews the concept of implicit bias and tactics for mitigating it both personally and professionally in the broader context of diversity, equity, inclusion, and belonging.

“Streamlining VMware's Open Source License Compliance” — Solving the complexities of identifying and tracking open-source software (OSS) to comply with license requirements by using Bazel to create an accurate bill of materials containing OSS and third-party packages during a build.

Attendees were able to interact with the community and engage with the Bazel team through a series of “Birds of a Feather” (BoF) sessions and a live Q&A session. You can find all of the BoF presentations and notes here.

As announced, soon we will be releasing Bazel 5.0, the updated version of our next generation, multi-language, multi-platform build functionality that includes a new external dependency system, called bzlmod, for you to try out.

We’d like to thank everyone who helped make BazelCon a success: presenters, organizers, Google Developer Studios, contributors, and attendees. If you have any questions about BazelCon, you can reach out to bazelcon-planning@google.com.

We hope that you enjoyed #BazelCon and "Building Better with Bazel".

By Joe Hicks, Product Manager, Core Developer

Knative applies to become a CNCF incubating project

Tuesday, November 30, 2021

In 2018, the Knative project was founded and released by Google, and was subsequently developed in close partnership with IBM, Red Hat, VMware, and SAP. The project provides a serverless experience layer on Kubernetes, providing the building blocks you need to build and deploy modern, container-based serverless applications. Over the last three years, Knative has become the most widely-installed serverless layer on Kubernetes. More recently, Knative 1.0 was released, reaching an important milestone that was made possible thanks to the contributions and collaboration of over 600 developers in the community.

Google has worked closely with key maintainers and partners on the evolution of Knative, including conformance definition and stability ahead of the 1.0 milestone. To enable the next phase of community-driven innovation in Knative, today we have submitted Knative to the Cloud Native Computing Foundation (CNCF) for consideration as an incubating project, which begins the process to donate the Knative trademark, IP, and code.

As a leader in serverless computing, we’re committed to the future of Knative, and offering Knative 1.0 conformant Cloud Run and Cloud Run For Anthos products. Finding a home in the CNCF secures Knative’s long-term future and encourages continuing and open innovation. This donation recognizes the adoption and investment in Knative from the community, and will encourage further multi-vendor innovation, broader education and training.

At Google, we believe that using open source comes with a responsibility to contribute, sustain, and improve the projects that help drive innovation and make better software. We are excited to see how developers will continue to build and innovate in serverless using Knative.

By Alexandra Bush and Edd Wilder-James, Google Open Source

Open source DDR controller framework for mitigating Rowhammer

Thursday, November 11, 2021

Rowhammer is a hardware vulnerability that affects DRAM memory chips and can be exploited to modify memory contents, potentially providing root access to the system. It occurs because Dynamic RAM consists of multiple memory cells packed tightly together and specific access patterns can cause unwanted effects that propagate to nearby memory cells and cause bit-flips in cells which have not been accessed by the attacker.

The problem has been known for several years, but as shown by most recent research from Google performed with the open source platform Antmicro developed that we’ll describe in this note, it has yet to be completely solved. The tendency in DRAM manufacturing is to make the chips denser to pack more memory in the same size which inevitably results in increased interdependency between memory cells, making Rowhammer an ongoing problem.

Solutions like TRR (Target Row Refresh) introduced in newer memory chips mitigate the issue, although only in part—and attack methods like Half-Double or TRRespass keep emerging. To go beyond the all-too-often used “security through obscurity” approach, Antmicro has been helping build open source platforms which give security researchers full control over the entire technology stack, and enables them to find new solutions to emerging threats.

The Rowhammer Tester platform

The Rowhammer Tester platform was developed for and with Google, who just like Antmicro believe that open source, well documented technical infrastructure is critical in speeding up research and increasing collaboration with the industry. In this case, we wanted to enable the memory security researchers and manufacturers to have access to a flexible platform for experimenting with new types of attacks and finding better Rowhammer mitigation techniques.

Current Rowhammer test methods involve using the chip-specific MBIST (Memory Built-in Self-Test) or costly ATE (Automated Test Equipment), which means that the existing approaches are either costly, inflexible, or both. MBIST are specialized IP cores that test memory chips for errors. Although effective, they lack flexibility of changing testing algorithms hardcoded into the IP core. ATEs devices are usually used at foundries to run various tests on wafers. Access to these devices is limited and expensive; chip vendors have to rely on DFT (Design for Test) software to produce compressed test patterns, which require less access time to ATE while ensuring high test coverage.

The main goal of the project was to address those limitations, providing an FPGA-based Rowhammer testing platform that enables full control over the commands sent to the DRAM chip. This is important because DRAM memory requires specialized hardware controllers and any software-based testing approaches have to communicate with the DRAM indirectly via the controller, which pulls the researchers away from the main research subject when studying the DRAM chip behaviour itself.

Platform architecture

The Rowhammer Tester consists of two parts: the FPGA gateware that is loaded to the hardware platform and a set of Python scripts used to communicate with the FPGA system from the user’s PC. Internally, all the important modules of the FPGA system are connected to a shared WishBone bus. We use an EtherBone bridge to be able to interface with the FPGA WishBone bus from the host PC. EtherBone is a protocol that allows to perform regular WishBone transactions over Ethernet. This way we can perform all of the communication between the user PC and the FPGA efficiently through an Ethernet cable.

The FPGA gateware has four main parts: a Bulk transfer module, a Payload Executor, the LiteDRAM controller, and a VexRiscv CPU. The Bulk transfer module provides an efficient way of filling and testing the whole memory contents. It supports user-configurable access and data patterns, using high-performance DMA to make use of full bandwidth offered by the LiteDRAM controller. When using the Bulk transfer module, LiteDRAM handles all the required DRAM logic, including row activation, refreshing, etc. and ensuring that all DRAM timings are met.

If more fine-grained control is required, our Rowhammer Tester provides the Payload Executor module. Payload Executor can be thought of as a simple processor that can execute our custom instruction set. Most of the instructions map directly to DRAM commands, with minimal control flow provided by the LOOP instruction. A user can compile a “program” and load it to Rowhammer Tester’s instruction SRAM, which will be then executed. To execute a program, Payload Executor will disconnect the LiteDRAM controller and send the requested command sequences directly to the DRAM chip via the PHY’s DFI interface. After execution the LiteDRAM controller gets reconnected and the contents of the memory can be inspected to search for potential bit-flips.

In our platform, we use LiteDRAM which is an open-source controller that we have been using in multiple different projects. It is part of the wider LiteX ecosystem, which is also a very popular choice for many of our FPGA projects. The controller supports different memory types (SDR, DDR, DDR2, DDR3, DDR4, …), as well as many FPGA platforms (Lattice ECP5, Xilinx Series 6, 7, UltraScale, UltraScale+, …). Since it is an open source FPGA IP core, we have complete control over its internals. That means two things: firstly, we were able to easily integrate it with the rest of our system and contribute back to improve LiteDRAM itself. Secondly, and perhaps even more importantly, groups focused on researching new memory attacking methods can modify the controller in order to expose existing vulnerabilities. The results of such experiments should essentially motivate vendors to work on mitigating the uncovered flaws, rather than rely on the “security by obscurity” based approach.

Our Rowhammer Tester is fully open source. We provide an extensive set of Python scripts for controlling the board, performing rowhammer attacks and harvesting the results. For more complex testing you can use the so-called Playbook, which is a framework that allows to describe complex testing scenarios using JSON files, providing some predefined attack configurations.

Antmicro is actively collaborating with Google and memory makers to help study the Rowhammer vulnerability, contributing to standardization efforts under the JEDEC initiative. The platform has already been used to a lot of success in state-of-the-art Rowhammer research (like the case of finding a new type of Rowhammer attack called Half-Double, as mentioned previously).

New DRAM PHYs

Initially our Rowhammer Tester targeted two easily available and price-optimized boards: Digilent Arty (DDR3, Xilinx Series7 FPGA) and Xilinx ZCU104 (DDR4, Xilinx UltraScale+ FPGA). They were a good starting point, as DDR3 and DDR4 PHYs for these boards were already supported by LiteDRAM. After the initial version of the Rowhammer Tester was ready and tested on these boards, proving the validity of the concept, the next step was to cover more memory types, some of which find their way into many devices that we interact with daily. A natural target was the LPDDR4 DRAM—a relatively new type of memory designed for low-power operation with throughputs up to 3200 MT/s. For this end, we designed our dedicated LPDDR4 Test Board, which has already been covered in a previous blog note.

The design is quite interesting because we decided to put the LPDDR4 memory chips on a module, which is against the usual practice of putting LPDDR4 directly on the PCB, as close as possible to the CPU/FPGA to minimize trace impedance. The reason was trivial—we needed the platform to be able to test many memory types interchangeably without having to desolder and resolder parts, using complicated interposers or other niche techniques—the platform is supposed to be open and approachable to all.

Alongside the hardware platform we had to develop a new LPDDR4 PHY IP as LiteDRAM didn’t have support for LPDDR4 at that time, resolving problems related to the differences between LPDDR4 and previously supported DRAM types, such as new training modes. After a phase of verification and testing on our hardware, the newly implemented PHY has been contributed back to LiteDRAM.

What’s next?

The project does not stop there; we are already working on an LPDDR5 PHY for next-gen low power memory support. This latest low-power memory standard published by JEDEC poses some new and interesting challenges including a new clocking architecture and operation on an even lower voltage. As of today, LPDDR5 chips are hardly available on the market as a bleeding-edge technology, but we are continuing our work to prepare LPDDR5 support for our future hardware platform in simulation using custom and vendor provided simulation models.

The fact that our platform has already been successfully used to demonstrate new types of Rowhammer attacks proves that open source test platforms can make a difference, and we are pleased to see a growing collaborative ecosystem around the project in a joint effort to ensure that we find robust and transparent mitigation techniques for all variants of Rowhammer for the foreseeable future.

Ultimately, our work with the Rowhammer Tester platform shows that by using open source, vendor-neutral IP, tools and hardware, we can create better platforms for more effective research and product development. In the future, building on the success of the FPGA version, our work as part of the CHIPS Alliance will most likely lead to demonstrating the LiteDRAM controller in ASIC form, unlocking even more performance based on the same solid platform.

If you are interested in state of the art, high-speed FPGA I/O and extreme customizability that open source FPGA blocks can offer, get in touch with Antmicro at contact@antmicro.com to hire development services to develop your next product.

Originally posted on the Antmicro blog.

By guest author Michael Gielda, Antmicro

Expanding Google Summer of Code in 2022

Wednesday, November 10, 2021

We are pleased to announce that in 2022 we’re broadening our scope of Google Summer of Code (GSoC) with exciting new updates to the program.

For 17 years, GSoC has focused on bringing new open source contributors into OSS communities big and small. GSoC has brought over 18,000 university students from 112 countries together with over 17K mentors from 746 open source organizations.

At its heart, GSoC is a mentorship program where people interested in learning more about open source are welcomed into our open source communities by excited mentors ready to help them learn and grow as developers. The goal is to have these new contributors stay involved in open source communities long after their Google Summer of Code program is over.

Over the course of GSoC’s 17 years, open source has grown and evolved, and we’ve realized that the program needs to evolve as well. With that in mind, we have several major updates to the program coming in 2022, aimed at better meeting the needs of our open source communities and providing more flexibility to both projects and contributors so that people from all walks of life can find, join and contribute to great open source communities.

Expanding eligibility

Beginning in 2022, we are opening the program up to all newcomers of open source that are 18 years and older. The program will no longer be solely focused on university students or recent graduates. We realize there are many folks that could benefit from the GSoC program that are at various stages of their career, recent career changers, self-taught, those returning to the workforce, etc. so we wanted to allow these folks the opportunity to participate in GSoC.

We expect many students to continue applying to the program (which we encourage!), yet we wanted to provide excited individuals who want to get into open source—but weren’t sure how to get started or whether open source communities would welcome their newbie contributions—with a place to start.

Many people can benefit from mentorship programs like GSoC and we want to welcome more folks into open source.

Multiple Sizes of Projects

This year we introduced the concept of a medium sized project in response to the many distractions folks were dealing with during the pandemic. This adjustment was beneficial for many participants and organizations but we also heard feedback that the larger, more complex projects were a better fit for others. In the spirit of flexibility, we are going to support both medium sized projects (~175 hours) and large projects (~350 hours) in 2022.

One of our goals is to find ways to get more people from different backgrounds into open source which means meeting people where they are at and understanding that not everyone can devote an entire summer to coding.

Increased Flexibility of Timing for Projects

For 2022, we are allowing for considerable flexibility in the timing for the program. You can spread the project out over a longer period of time and you can even switch to a longer timeframe mid-program if life happens. Rather than a mandatory 12-week program that runs from June – August with everyone required to finish their projects by the end of the 12th week, we are opening it up so mentors and their GSoC Contributors can decide together if they want to extend the deadline for the project up to 22 weeks.

Image with text reads 'Google Summer of Code'

Interested in Applying to GSoC?

We will announce the GSoC 2022 program timeline soon.

Open Source Organizations

Does your open source project want to learn more about how to apply to be a mentoring organization? This is a mentorship program focused on welcoming new contributors into your community and helping them learn best practices that will help them be long term OSS contributors. A key factor is having plenty of mentors excited about teaching newcomers about open source.

Read the mentor guide, to learn more about what it means to be a mentor organization, how to prepare your community, creating appropriate project ideas (175 hour and 350 hour projects), and tips for preparing your application.

Want to be a GSoC Contributor?

Are you a potential GSoC Contributor interested in learning how to prepare for the 2022 GSoC program? It’s never too early to start thinking about your proposal or about what type of open source organization you may want to work with. Read through the student/contributor guide for important tips on preparing your proposal and what to consider if you wish to apply for the program in 2022. You can also get inspired by checking out the 199 organizations that participated in Google Summer of Code 2021, as well as the projects that students worked on.

We encourage you to explore other resources and you can learn more on the program website.

Please spread the word to your friends as we hope these updates to the program will help more excited folks apply to be GSoC Contributors and mentoring organizations in GSoC 2022!

By Stephanie Taylor, Program Manager, Google Open Source

qsim integrates with NVIDIA cuQuantum SDK to accelerate quantum circuit simulations on NVIDIA GPUs

Tuesday, November 9, 2021

To make quantum computers useful, we need algorithms which cleverly use the unique properties of quantum hardware to solve problems intractable on classical computers. High performance quantum circuit simulation helps researchers and developers around the world build and test novel quantum algorithms. We recently launched major new features in qsim, Google Quantum AI’s open source quantum circuit simulator, that make it more performant, intuitive, and realistic.

Today, we are excited to announce an integration between qsim and the NVIDIA cuQuantum SDK. This integration will enable qsim users to make the most of GPUs when developing quantum algorithms and applications. NVIDIA CEO Jensen Huang announced the integration between Google Quantum AI's open source software stack and NVIDIA's cuQuantum SDK in the NVIDIA GTC conference keynote this morning.

To use the cuQuantum SDK with qsim, users can follow the usual workflow for simulating quantum circuits on a virtual machine, enabling cuQuantum in their simulation command.

Moving image depicting steps to create a quantum circuit setup Google Compute Engine and simulate circuit with qsim(cuQuantum SDK enabled) on GCP

Get started with quantum algorithm development using qsim+cuQuantum and the rest of the Google Quantum AI open source stack here.

By Sergei Isakov, Catherine Vollgraff Heidweiller (Google Quantum AI)

Upgrading qsim, Google Quantum AI's Open Source Quantum Simulator

Friday, November 5, 2021

Quantum computing represents a fundamental shift in computation and gives us the opportunity to make important classically intractable problems solvable. To realize the full potential of quantum computing, we first need to build an error-corrected quantum computer. While we are actively working on our hardware roadmap, today’s quantum hardware is expected to remain a scarce resource. This makes software for simulating quantum circuits and emulating quantum hardware a critical enabler for quantum algorithm development.

To help researchers and developers around the world develop quantum algorithms right now, we are making qsim, our open source quantum circuit simulator, more performant and intuitive, and more “hardware-like”. Our recently published white paper provides a description of the theory and software optimizations that underpin qsim.

Launched in 2020, qsim allows quantum algorithm researchers to simulate quantum circuits developed with our algorithm libraries such as TensorFlow Quantum and OpenFermion, and our quantum programming framework Cirq.

With the upgraded version of qsim, users can back their simulations with high performance processors such as GPUs and ultramem CPU’s via Google Cloud, and distribute simulations over multiple compute nodes.

Moving image of steps to create, setup, and simulate circuit with qsim on GCP

The enriched noisy simulation featureset provides researchers with a “hardware-like” simulation experience for developing applications for the Noisy Intermediate-Scale Quantum Computers (also referred to as NISQ hardware) that exists today. Trajectory simulation speeds up noisy simulations with an efficient stochastic procedure which replaces noiseless gates with quantum channels. A new software routine1 for approximating Google Quantum AI NISQ processor specific hardware noise in simulations, can be used by any developer to test and iterate on algorithm prototypes containing up to 32 qubits. Simulation with processor-specific approximate NISQ noise is expected to advance NISQ applications research, because it allows researchers to account for the dominant hardware error mechanisms in real NISQ devices when prototyping algorithms—using error measurements performed on Google quantum processors.

Get started with quantum algorithm development using the Google Quantum AI open source stack here.

By Sergei Isakov, Dvir Kafri, Orion Martin and Catherine Vollgraff Heidweiller – Google Quantum AI

Notes

Public code release for this feature is coming up ↩

Efficient emulation of quantum circuits for chemistry

Thursday, November 4, 2021

The Google Quantum AI team spends a lot of time developing simulators for quantum hardware and quantum circuits. These simulators are invaluable tools for our entire quantum computing stack. We use them for everything, from algorithm design to defining the boundaries of beyond classical computation, having to do with building useful quantum computers. But there are limits to what we can simulate efficiently. In general, simulating quantum systems with a classical computer incurs exponential cost in CPU time or memory space. To fight the exponential growth we develop more efficient algorithms allowing us to further our ability to simulate large quantum systems.

In this post we describe joint work with our collaborators at QSimulate to develop the Fermionic Quantum Emulator, a fast emulator of quantum circuits that simulates the behavior of electrons. Efficiency gains in this particular area are exciting because simulating fermions is known to be challenging and an area we suspect to have a substantial quantum advantage. The Fermionic Quantum Emulator (FQE) adds to our portfolio of simulators for quantum hardware, quantum error correction, and quantum circuits and provides the capability to simulate even large quantum systems in an effort to evaluate beyond classical physics simulations. The emulator slots into our set of open source software tools for quantum information and is completely interoperable with OpenFermion. The code is Python based with C-extensions for computational hotspots letting us gain performance without sacrificing readability or ease of distribution. For complete details, see the open access publication or check out the code!

Prelude: The cost of quantum simulation

To store the vector of numbers at double precision representing the wavefunction of 26 qubits requires approximately 1 Gigabyte memory. By the time we add 10 more qubits to our calculations we need a Terabyte of memory. In terms of simulations of molecules or materials, 36 qubits is quite small–translating to a modest accuracy simulation of the electrons in diatomic or small molecules. Simulations of electrons in more complicated molecules can require hundreds of qubits to accurately represent nature and provide the scientific insight chemists need to reason about reaction mechanisms. What this means for quantum algorithm developers who focus on physics simulation is that even testing algorithms at a small scale prior to running them on a quantum device can be a serious computational undertaking.

We can temper the exponential scaling of direct simulation by exploiting physical symmetries when we know our circuits are simulating a particular type of quantum particle. If we allow ourselves to move away from direct simulation of the quantum device and relax our simulation requirements to emulation we get a lot more wiggle room in terms of algorithm design. The Fermionic Quantum Emulator takes this approach to circuit simulation and achieves substantial computational advantage over generic quantum circuit simulators that do not make physical symmetry considerations.

The Fermionic Quantum Emulator:

The Fermionic Quantum Emulator (FQE) is a state vector simulator that takes advantage of common symmetries of fermionic spin-1/2 systems which can drastically reduce the memory required to represent a quantum state in memory. Using a symmetry-reduced wavefunction representation allows us to make further algorithmic improvements when simulating arbitrary fermionic generators (many-body quantum gates) and standard non-relativistic chemical Hamiltonians. These improvements are obtained by adapting algorithms developed in the theoretical chemistry community for exact fermionic simulations.

For a system where the number of electrons is equal to half the total qubits required to simulate the system we see modest improvements and can store a 36 qubit wavefunction with 37 Gigabytes of memory–substantial savings when considering the qubit wavefunction without symmetries considered requires 1000 Gigabytes. For lower filling systems we see even more substantial advantages. To emphasize this point we plot the ratio of the sizes required to represent wavefunctions in the symmetry-reduced representation used by FQE and the more expensive qubit representation.

Graph showing the ratio of requirements for circuit simulation when considering physical symmetries versus direct simulation of qubits

The ratio of memory requirements for circuit simulation when considering physical symmetries versus direct simulation of qubits. Chemistry and materials Hamiltonians commute with the number and spin operators so we can focus on simulating the wavefunction in only the relevant symmetry sectors. As shown by the yellow line (quarter filling) this can lead to substantial savings. For a quarter filling wavefunction on 48 qubits (24 orbitals) we would need 290 Gb and a qubit simulation would require 4.5 Petabytes.

Emulating quantum circuits pertaining to fermionic dynamics allows us to be more clever with our simulation algorithms instead of a direct simulation of the quantum gates. For example, instead of translating fermionic generators to qubit generators using standard techniques the FQE provides routines for directly evolving common circuit primitives and arbitrary fermionic generators. The structure of common algorithmic primitives, such as basis rotations or diagonal Coulomb operators, allows us to write very fast routines for evolving symmetry-reduced wavefunctions. Below we plot the performance of one of these primitives (basis rotations) and compare the performance of FQE with a highly optimized Qsim program written in C++. In both a single threaded and multithreaded regime the FQE outperforms Qsim by at least an order of magnitude; clearly demonstrating that physics and symmetry considerations coupled with algorithmic improvements can have substantial computational advantages.

CPU Walk clock time for the the basis rotation primitive C-kernel primitive versus an equivalent circuit evolution time with Qsim.

The code, getting involved, and future directions:

The FQE is a python package with easily extensible modules for maximum flexibility. We have accelerated some computational hotspots with C-kernels and provide users the ability to easily toggle between these two modes. The C-kernels are further accelerated with OpenMP. Installation uses the standard python C-extension workflow and does not require specific linking to BLAS routines. Pointers to these routines are obtained from Numpy and Scipy lowering burden programmers and computational physicists using the software.

The FQE is completely interoperable with OpenFermion but can also be used as a standalone fermionic simulator. We have built in functionality that generates many standard sets of observables for fermionic simulation (arbitrary expectation values and reduced density matrices). The FQE also has custom routines for standard quantum chemistry Hamiltonians that further exploit the symmetry in a spin-restricted setting for more computational efficiency.

The FQE is, and will remain, Apache 2 open source. We welcome pull requests, issues, or general commentary on how we can improve the emulator. Our hope is that you can use the FQE to accelerate explorations of quantum algorithms, as a general purpose tool for simulating time-dynamics of fermions, and as the start of a family of methods for emulating fermionic circuits. To get started check out this tutorial and this example on simulating spin-charge separation in the Fermi Hubbard model.

We depend on simulations to understand which problems quantum computers may be able to solve, and when. The FQE enables us to stretch further into the realm of difficult simulations of fermionic systems and increase our research velocity. Looking forward, the FQE is the start of a general framework for extending various qubit simulation techniques to the fermionic setting which will help us define the frontier of quantum computational relevancy and quantum simulation advances.

By Nicholas Rubin - Senior Research Scientist

Announcing Knative 1.0!

Tuesday, November 2, 2021

Today, the Knative project released version 1.0, reaching an important milestone that was possible thanks to the contributions and collaboration of over 600 developers. Over the last three years, Knative became the most widely-installed serverless layer on Kubernetes.

The Knative project was released by Google in July 2018, with the vision to systemize best practices in cloud native application development, with a focus on three areas: building containers, serving and scaling workloads, and eventing. It delivers an essential set of components to build and run serverless applications on Kubernetes, allowing webhooks and services to scale automatically, even down to zero. Open-sourcing this technology provided the industry with essential base primitives that are shared by all. Knative was developed in close collaboration with IBM, Red Hat, SAP, VMWare, and over 50 different companies. Google offers Cloud Run for Anthos for managed Knative serving that will be Knative 1.0 conformant.

The road to 1.0

Autoscaling (including scaling to zero), revision tracking, and abstractions for developers were some of the early goals of Knative. In addition to delivering on those goals, the project also incorporated support for multiple HTTP routing layers, support for multiple storage layers for Eventing concepts with common Subscription methods, and designed a “Duck types” abstraction to allow processing arbitrary Kubernetes resources that have common fields, to name a few changes.

Knative is now available at 1.0, and while the API is closed for changes, its definition is publicly available so anyone can demonstrate Knative conformance. This stable API allows customers and vendors to support portability of applications, and establishes a new cloud native developer architecture.

Get started with Knative 1.0

Install Knative 1.0 using the documentation on the website. Learn more about the 1.0 release on the Knative blog, and at the Knative community meetup on November 17, 2021, where you'll hear about the latest changes coming with Knative 1.0 from maintainer Ville Aikas. Join the Knative Slack space to ask questions and troubleshoot issues as you get acquainted with the project.

By María Cruz, Program Manager – Google Open Source

Server-side Apply in Kubernetes

Friday, October 29, 2021

What is Server-side Apply?

One of the highest velocity OSS projects of all time, Kubernetes is a cornerstone of Google’s cloud strategy. By providing an abstraction layer between users’ workloads and the underlying infrastructure, Kubernetes enables managing containerized workloads and services across--and migration from--both public cloud competitors and on-premise data centers.

In Config & Policy Automation (CPA) [1], in the Kubernetes Kernel team we aim to improve API expressiveness in Kubernetes so that more powerful controllers, tools, and UIs can be built using these APIs. The expressiveness and having better controllers, tools, and UIs are important to Google because they enable the ecosystem, and make it more sticky. It increases the ability to make more reliable systems that are simpler with better user experiences.

Bringing Server-side Apply to Kubernetes is one of the efforts led by Google to reduce fragmentation in clients, improve automation, and set Kubernetes up for ongoing success. Server-side Apply helps users and controllers manage their resources through declarative configurations. Clients can create and modify their objects declaratively by sending their fully specified intent. Server-side Apply replaces the client side apply feature implemented by “kubectl apply” with a Server-side implementation, permitting use by tools/clients other than kubectl (e.g. kpt). Server-side Apply is a new merging algorithm, as well as tracking of field ownership, running on the Kubernetes api-server. It enables new features like conflict detection, so the system knows when two actors are trying to edit the same field.

Server-side Apply Functionality

Since the Beta 2 release, subresources support has been added. Both client-go and Kubebuilder have added comprehensive support for Server-side Apply. This completes the Server-side Apply functionality required to make controller development practical.

Support for subresources

Server-side Apply now fully supports subresources like status and scale. This is particularly important for controllers, which are often responsible for writing to subresources.

Support in client-go

Previously, Server-side Apply could only be called from the client-go typed client using the Patch function, with PatchType set to ApplyPatchType. Now, Apply functions are included in the client to allow for a more direct and typesafe way of calling Server-side Apply. Each Apply function takes an "apply configuration" type as an argument, which is a structured representation of an Apply request.

Using Server-side Apply in a controller

You can use the new support for Server-side Apply no matter how you implemented your controller. However, the new client-go support makes it easier to use Server-side Apply in controllers.

When authoring new controllers to use Server-side Apply, a good approach is to have the controller recreate the apply configuration for an object each time it reconciles that object. This ensures that the controller fully reconciles all the fields that it is responsible for. Controllers typically should unconditionally set all the fields they own by setting Force: true in the ApplyOptions. Controllers must also provide a FieldManager name that is unique to the reconciliation loop that apply is called from.

When upgrading existing controllers to use Server-side Apply the same approach often works well--migrate the controllers to recreate the apply configuration each time it reconciles any object. Unfortunately, the controller might have multiple code paths that update different parts of an object depending on various conditions. Migrating a controller like this to Server-side Apply can be risky because if the controller forgets to include any fields in an apply configuration that is included in a previous apply request, a field can be accidentally deleted. To ease this type of migration, client-go apply support provides a way to replace any controller reconciliation code that performs a "read/modify-in-place/update" (or patch) workflow with a "extract/modify-in-place/apply" workflow.

Using Server-side Apply in CI/CD

Server-side Apply makes it easier to ensure that clusters can be safely transitioned to the state desired by new code changes as done by CI/CD systems. While CI/CD systems are highly specific to each team, a few general guidelines can help make the most out of this new functionality.

Once a code change results in new Kubernetes configurations (via whatever method the project uses to generate its Kubernetes configurations), the CI system can use server-side diff to present the developer and reviewer with details of what changes are being made as well as detecting any field ownership conflicts.

Developers can then iterate on field ownership conflicts until there are none left (or until the remaining conflicts are known and desired). Final approval can instruct the CD system to perform a Server-side Apply and either force conflicts to apply or instruct the system to block deployment on conflicts in case the cluster being deployed to has been modified in a way that creates new conflicts that the approver was previously unaware of.

Server-side Apply and CustomResourceDefinitions

It is strongly recommended that all Custom Resource Definitions (CRDs) have a schema. CRDs without a schema are treated as unstructured data by Server-side Apply. Keys are treated as fields in a struct and lists are assumed to be atomic. CRDs that specify a schema are able to specify additional annotations in the schema.

Server-side Apply Example

A simple example of an object created by Server-side Apply (SSA) could look like Fig. 1. The object contains a single manager in metadata.managedFields. The manager consists of basic information about the managing entity itself, like operation type, API version, and the fields managed by it. SSA uses a more declarative approach, which tracks a user's field management, rather than a user's last applied state. This means that as a side effect of using SSA, information about which field manager manages each field in an object also becomes available.

Fig 1. Server-side Apply Example

Server-side Apply use-cases in Google

Config Connector

Config Connector [3] leverages Server-side Apply to enable users to manage Google Cloud resources by both Config Connector and other configuration tools; e.g., gcloud, Cloud Console, or custom operators. Config Connector controllers use `managedFields` metadata to understand which fields are owned by Config Connector and which fields are managed outside the Kubernetes object [5]. Customers can have the flexibility of managing Google Cloud resources by both Config Connector and external tools; e.g., using a custom autoscaler for Bigtable clusters.

Config Sync

Config Sync [2] lets cluster operators and platform administrators deploy consistent configurations and policies, by continuously reconciling the state of clusters with Kubernetes configs stored in Git repositories. Config Sync leverages SSA to apply the configs to the clusters, and then monitors and remediates configuration drift using SSA.

KPT

KPT [4] is Git-native, schema-aware, extensible client-side tool for packaging, customizing, validating, and applying Kubernetes resources. KPT live apply leverages SSA to apply Kubernetes Resource Model (KRM) resources. It also uses SSA to preview the changes in KRM resources before applying them to the Kubernetes cluster.

What's Next?

After Server-side Apply, the next focus for the API Expression working-group is around improving the expressiveness and size of the published Kubernetes API schema. To see the full list of items we are working on, please join our working group and refer to the work items document.

How to get involved?

The working-group for apply is wg-api-expression. It is available on slack #wg-api-expression, through the mailing list.

References

[1] CPA: Config & Policy Automation: https://cloud.google.com/anthos/config-management
[2] Config Sync: https://cloud.google.com/anthos-config-management/docs/config-sync-overview
[3] Config Connector: https://cloud.google.com/config-connector/docs/overview
[4] KPT: https://opensource.google/projects/kpt
[5] Config Connector externally managed fields: https://cloud.google.com/config-connector/docs/concepts/managing-fields-externally

By Software Engineers- Antoine Pelisse, Joe Betz, Zeya Zhang, Janet Kuo, Kevin Delgado, Sunil Arora, and Engineering Manager, Leila Jalali

Four areas of open source contributions from Cloud Databases

Tuesday, October 26, 2021

Open Cloud enables you to develop software faster, innovate more easily, and scale more efficiently—while also reducing technology risk. Google has a long history of leadership in open source, and today, I want to look back at our activities around open source projects, for databases, over the past year.

Give developers the best tools to be efficient

Developers choose to build applications with managed database services on Google Cloud to benefit from velocity, scalability, security, and performance. To enable you to be most efficient and deliver your best possible work, we deliver tools and frameworks that work with your preferred development environments, no matter if you develop in the cloud or on premises. To make local testing, building and continuous integration easier for our cloud-native databases, we released emulators for Cloud Spanner, Firestore, and Cloud Bigtable so that you can test your code wherever you develop it - without the need to create or re-create cloud infrastructure with every test run.

Another area where we are helping developers is with instrumentation of Cloud SQL for easier debugging and performance tuning. With Cloud SQL Insights it is easier than ever to pinpoint underperforming SQL statements. That said, without additional instrumentation, it can be cumbersome to identify the source code or microservice that issued that SQL - let alone tying a SQL statement to a client session and its context. So we released Sqlcommenter as an open source library that will automatically add this instrumentation as SQL comments in queries that are generated by popular ORMs like Hibernate, Django, Sqlalchemy, and others (repo blog). We didn’t stop there, but merged Sqlcommenter with OpenTelemetry (blog) to add SQL insights from instrumented queries back to OpenTelemetry traces.

Lastly, we want to broaden access to our differentiated offerings, like Spanner. The recently announced Spanner PostgreSQL interface allows organizations to access Spanner’s industry-leading consistency and availability at scale using tools and skills from the popular PostgreSQL ecosystem. This new way of working with Spanner provides familiarity for developers and portability for administrators. (blog) Learn more in the documentation or sign up for the preview today.

Provide connectivity that is simple and secure

Connecting to APIs and databases from an application running in the cloud should be simple and secure. That’s why we recommend using IAM and Application Default Credentials when authenticating to other services. The Cloud SQL Proxy (repo) has been doing this and also setting up firewalls for you for a while. It works by running a local client either inside your VM or a GKE cluster. This year, we added libraries for Java (repo) and Python (repo) that can provide similar functionality without the overhead of running an extra client such as the proxy.

Cloud Spanner also offers an open source adapter for its new PostgreSQL interface (repo). This local proxy allows tools, starting with psql, to connect to a Spanner database using the PostgreSQL wire protocol.

Manage cloud infrastructure with the tools of your choice

When it comes to provisioning, monitoring, and managing your cloud database services, flexibility and choice are important. We provide you with our cloud console, gcloud cli, and APIs as well as our own Deployment Manager. That said, you may prefer different ways to manage cloud infrastructure - whether through interactive tools or scripts or embedded into CI/CD pipelines that support GitOps or other controls, checks, and balances. Terraform is one of those open tools that is very popular - and we ensure that our cloud databases can be managed from it as documented in this blog about creating Spanner instances with Terraform.

If you manage the majority of your resources with Kubernetes either directly or through package managers like Helm, then our Kubernetes Config Connector (KCC) might be for you. In a nutshell, KCC exposes Google Cloud services such as Cloud SQL, Spanner, and others as Custom Resources in Kubernetes. This allows you to create and reconcile cloud resources outside of Kubernetes just like K8s native objects.

Once you are managing cloud infrastructure with CI/CD, the next step is to extend that same mechanism to manage objects within your databases such as tables, indexes, and views. To that extent we have released a Liquibase extension for Cloud Spanner.

Help you to move data with confidence

Cloud journeys often involve moving data either in a lift and shift process or sometimes replatforming to a different database. Whatever your journey, we want to simplify the process and give you the confidence that your migration is successful.

For enterprise users with Oracle databases, we have several open source projects. First, we have the Optimus Prime database assessment tool (repo) that queries your database and collects information about schemas and historic performance to be analyzed for migration complexity and consolidation potential. Our own professional services teams have been using this toolset to plan migrations to Bare Metal Solution for Oracle.

Some Oracle users are looking for opportunities to transform their workloads to fit with their bigger strategy of modernizing applications with Kubernetes. For this group we developed and open sourced the El Carro Kubernetes operator for Oracle. This not only automates database lifecycle tasks for systems running on Kubernetes, but also exposes declarative APIs for these operations.

If your application supports replatforming from Oracle to PostgreSQL, then we have a toolset for schema conversion along with dataflow pipelines that will read the output of a change data capture job and load it into a PostgreSQL database. What a great use-case for Datastream - our new serverless change data capture service.

Another case of heterogeneous database migration is to move MySQL or PostgreSQL databases to Cloud Spanner. HarbourBridge helps with the evaluation and data migration, and our latest contribution was adding support for DynamoDB as a source database. Part of every heterogeneous migration should be to validate that the source and target data are matching - we have released the Data Validation Toolkit for that use-case. DVT can connect to a number of source and target databases and compare the data on each side - giving you the confidence that your migration did not miss or change any records.

Conclusion

Whether you are migrating existing databases or you are building your next application in the cloud - we want to make your journey as comfortable and seamless as possible. Open source projects play a big role in meeting you where you are and providing you with the connectivity options, language support, and tools you want for management and migrations.

By Bjoern Rost, Product Manager, Google Cloud Databases

Protect your open source project from supply chain attacks

Tuesday, October 19, 2021

From executive orders to key signing parties, 2021 has been the year of supply chain security. If you’re an open source maintainer, learning about the attack surface of your project and the threat vectors throughout your project’s supply chain can feel overwhelming, maybe even insurmountable. The good news is that 2021 has also been the year of supply chain security solutions. While there’s still plenty of work to be done, and plenty of room for improvement in existing solutions, there are preventative controls you can apply to your project now to harden your supply chain and prevent compromise.

At All Things Open 2021, the audience learned about best practices for supply chain security through a quiz game. This blog post walks through the quiz questions, answers, and options for prevention, and can serve as a beginner's guide for anyone who wants to protect their open source project from supply chain attacks. These recommendations follow the SLSA framework and OpenSSF Scorecards rubric, and many can be implemented automatically by using the Allstar project.

An example of a typical software supply chain and examples of attacks that can occur at every link in the chain.

Q1: What should you do to protect your developer accounts from takeover?

ANSWER: Use multi-factor auth (with a security key if possible)
Use a shared account for core maintainers
Make sure to write all your passwords in rot13
Use an IP allowlist

Why and how: A malicious actor with access to a developer account can pretend to be a known contributor and submit bad code. Encourage contributors to use multi-factor authentication (MFA) not only for platforms where they send commits, but also for accounts associated with contributions, such as email. Where possible, security keys are the recommended form of MFA.

Q2: What should you do to avoid merging malicious commits?

ANSWER: Require all commits to be reviewed by someone who is not the commit author
Auto-run tests on all commits
Scan for the word ‘bitcoin’ in all commits
Only accept commits from contributors who have accounts older than 1 year

Why and how: Self-merging (also known as a unilateral change) introduces two risks: 1) An attacker who has compromised a contributor’s account can inject malicious code directly into the project, or 2) A well-intentioned person can merge a commit that accidentally introduces a security risk. A second set of authenticated eyes can help avoid malicious submissions and accidental weaknesses. Set this up as an automated requirement if possible (such as using GitHub’s Branch Protection settings); tools like Allstar can help enforce this requirement. This corresponds to SLSA level 4.

Q3: How can you protect secrets used by your CI/CD pipeline?

ANSWER: Use a secret manager tool
Appoint a maintainer to control secrets access
Store secrets as environment variables
Store secrets in a separate repo

Why and how: The “defense in depth” security concept is about applying multiple, different layers of defense to protect systems and sensitive data, such as secrets*. A secret manager tool (like Secret Manager for GCP users, HashiCorp Vault, CyberArk Conjur, or Keywhiz) removes the need for hard-coding secrets in source code, provides centralization and audit capabilities, and introduces an authorization layer to prevent leaking secrets.

*When storing sensitive data in a CI system, ensure it is truly for CI/CD purposes, and not data that is better suited for a password or identity manager.

Q4: What should you do to protect your CI/CD system from abuse?

ANSWER: Use access controls following the principle of least privilege
Run integration tests on all pull requests/commits
Mark all contributors as “Collaborators” through GitHub roles
Run CI/CD systems locally

Why and how: Defaulting to “the least amount of access necessary” for your project repository protects your CI/CD system from both unintended access and abuse. While running tests is important, running tests on all commits/pull requests by default—before they’ve been reviewed—can lead to unintentional and malicious abuse of your CI/CD system’s compute resources.

Q5: What should you do to avoid compromise during build time?

ANSWER: Define build definitions and configurations as code, eg build.yaml
Make your builds run as quickly as possible so attackers have no time to compromise your code
Only use LEGO brand components in your build system, accept no substitutes
Delete build logs to avoid leaving clues for attackers

Why and how: Using a build script—a file that defines the build and its steps, like build.yaml—removes the need to manually run build steps, which could possibly introduce an accidental misconfiguration. It also reduces the opportunity for a malicious actor to tamper with the build or insert unreviewed changes. This corresponds to SLSA levels 1-4.

Q6: How should you evaluate dependencies before use?

ANSWER: Assess risk and transitive changes with tools like Scorecards and deps.dev
Check for a little ‘lock’ icon next to the package url
Only use dependencies that have a minimum of 1,000 GitHub stars
Only use dependencies that have never changed maintainers

Why and how: There isn’t one definitive measure that can tell you a package is “good” or “bad;” every project has different security profiles and risk tolerances. Gathering information about a dependency, and what changes it might introduce transitively, will help you decide if a dependency is “safe” for your project. Tools like Open Source Insights (deps.dev) map first layer and transitive dependencies, while Scorecards gives packages a score for multiple risk assessment metrics, including use of security policies, MFA, and branch protection.

Once you determine what dependencies you’re using, running a vulnerability scanning tool such as Open Source Vulnerabilities regularly will help you stay up to date on the latest releases and patches. Many vulnerability scanning tools can also apply automatic upgrades.

Q7: What should you do to ensure your build is the build you think it is (aka verification)?

ANSWER: Use a build service that can generate authenticated provenance
Check the last commit to be sure it’s from a trusted committer
Use steganography to embed your project logo into the build
Run conformance tests for each release

Why and how: Showing the origin and artifacts of a build (the build’s provenance) demonstrates to the user that the build has not been tampered with, and is the correct build. There are many components to provenance; one method to deliver these components is to use a build service that generates and authenticates the data needed to show provenance. This corresponds to SLSA levels 2-4.

Q8: What should you look for when selecting artifacts from a registry?

ANSWER: That artifacts have been cryptographically and verifiably signed
That artifacts are not cursed (through being stolen from tombs)
Timestamps: only use the most recent artifact created
Official endorsement: look for the logo of a trusted brand or standards body

Why and how: Just as you should generate provenance and sign builds for your projects (SLSA levels 2-4), you should also look for the same verification when using artifacts from others. Logos and other brand-based forms of endorsement can be falsified and are used by typosquatters to fake legitimacy; look for tamper-proof verification like signatures. For example, Sigstore helps OSS projects sign their builds, and validate the builds of others.

Improving your project’s security is a continuous journey. Some of these recommendations may not be feasible for your project today, but every step you can take to increase your project’s security is a step in the right direction.

Resources for open source project security:

SLSA: A framework for levels of supply chain security
Scorecards: A measurement of security best practices use
Allstar: A GitHub app for enforcing security best practices
Open Source Insights: A searchable visualization of open source project dependencies
OSV: A vulnerability database and automation infrastructure for open source

By Anne Bertucio, Google Open Source Programs Office