Google Open Source Blog: June 2022

Posts from June 2022

Establishing new baselines: Identifying open source work in an unstable world

Wednesday, June 22, 2022

For more than 17 years, Google's Open Source Programs Office (OSPO) has brought the best of open source to Google and the best of Google to open source. We sponsor, create, and invest in projects and programs that enable everyone to join and contribute to the global open source ecosystem. Because that's who open source is for—everyone.

In 2021, Google supported open source innovation, security, collaboration, and sustainability through our programs and services with $15 million of funding. This includes $7 million in direct funding to open source communities through Google’s OSPO. We also stepped up our investments in the governing organizations of open source ecosystems. We became the first Visionary Sponsor of the Python Software Foundation, supporting a number of key initiatives including funding for the first CPython Developer in Residence. We sustained our Platinum Membership of the OpenJS Foundation to continue supporting the essential work of the foundation that fosters many critical projects in the Javascript ecosystem.

In late 2021, we announced the release of Knative 1.0 and submitted Knative to become a Cloud Native Computing Foundation incubation project to enable the next phase of community-driven innovation. (Knative’s application was accepted in March 2022 and followed by our submission of Istio to the CNCF). We also collaborated with Antmicro to develop and release the Rowhammer Tester platform, which provides security professionals a flexible platform for experimenting with new types of attacks to find better mitigation techniques.

We sustain and expand the open source ecosystem through programs at scale

In addition to our new investments, we continued to maintain our long-term programs elevating open source contributors, both new and existing.

The Google Open Source Live virtual events program, initiated in 2020, hosted 11 events, expanding to cover more projects and technology areas.

We launched Learn Kubernetes with Google series to provide specific technical content and live panels with experts from across the industry, focusing on how to make Kubernetes work best for you and your environment.

Google Summer of Code, in its 17th year, matched 1,205 students from 67 countries with more than 2,100 mentors from 75 countries in 199 open source organizations to successfully complete this year’s program. At the end of this round, we announced our plans to broaden the scope of the program in 2022, expanding the eligibility of who could participate, offering more options around project contribution, and increased flexibility in the timing of the program.

Season of Docs, in its third year, had 30 open source organizations finish their projects with 93% of organizations and 96% of the technical writers reporting a positive experience with the program.

Through our Open Source Peer Bonus Program, we were able to award peer bonuses to 175 contributors working in 35 countries on 86 unique projects, including contributors to the CHAOSS, CocoaPods, git and Open Civic Data projects.

In 2021, Google's open source programs directly supported more than 120 events to safely bring together more than 88,000 attendees from open source communities. We proudly sponsored All Things Open 2021, which had nearly 5,000 registered participants from 86 countries. We were also a Cornerstone Sponsor for FOSDEM 2021's two-day virtual gathering.

Our contributing population continues to scale with the growth of Alphabet

In 2021, roughly 10% of Alphabet's full-time workforce (FTEs) actively contributed code or code-adjacent work to open source projects. This percentage has remained roughly consistent over the last five years, indicating that our open source contribution has continued to scale with the growth of Alphabet. Note that in 2021, FTEs represented over 95% of our open source contributors, while the remainder includes vendors, independent contractors, temporary staff, and interns that have contributed to open source projects during their tenure at Alphabet.

After 2020’s atypical growth—which was largely due to novel programs such as our open source internships and COVID-19 response work—in 2021, our activity numbers returned to pre-pandemic baselines. Over the last five years, the trajectory of monthly active usage has increased on both GitHub and Git-on-Borg by more than 15% on average each year.

Note that the dip in GitHub data reported in October of 2021 corresponds to an outage on GHarchive resulting in 4+ days of lost data (see figure below). Despite this outage, contribution levels remained fairly consistent month over month: more than 45% of our active contributing population for the year logged an activity on GitHub or Git-on-Borg in an average month.

For more details about the source data and analysis cited in this report, see “About this data.

” below.

The number of Alphabet open source projects remains steady

In 2021, we estimated that more than 2,000 open source projects that originated from Alphabet teams and employees were still active (not archived). To make this estimate, we chose a broad and variable definition of an open source project, including developer tools, utilities, languages, frameworks, libraries, demos, sample code, models, raw data, designs, and more. This estimate also includes personal projects that went through Alphabet's releasing process but not projects that have been moved to or originated under external organizations or foundations.

Project counts should not be confused with repositories; projects can include many repositories. Within Alphabet, we maintain more than 9,500 public repositories on Github and 1,700 public repositories on Git-on-Borg. While these efforts originated at Google and Alphabet, these repositories are open for anyone to use, contribute to, fork or build on through open source licenses. In 2021, more than 500,000 unique GitHub accounts not affiliated with Alphabet employees contributed to Alphabet projects.

The majority of repositories we work on are outside of Alphabet organizations

Open source contributors at Alphabet work on a variety of projects and repositories—not just our own code. In 2021, contributors at Alphabet engaged with more than 70,000 repositories on GitHub (WatchEvents or “stars” were removed from this count to represent active engagement), pushing commits and/or opening pull requests on over 49,000 repositories. Consistent with our 2019 and 2020 reports, more than 75% of repositories with pull requests opened by Alphabet contributors on GitHub were outside of Google-managed organizations.

We continue to invest in open source quality, security, and long term viability

Alphabet continues to rely on the health and availability of open source projects. Through internal efforts and collaboration with industry-led efforts such as OpenSSF, Alphabet is committed to sharing relevant practices and tooling with the goal of improving overall code quality and bolstering the security posture of users and developers of open source software. We hope that by sharing our internal frameworks and best practices, we can spark industry-wide discussion and progress on the security and sustainability of the open source ecosystem.

In 2021, we launched Open Source Insights, a tool designed to list and visualize a project’s dependencies and their properties, helping developers review the packages that make up their software supply chains. The potential of this tool was put to the test after the disclosure of Log4j vulnerabilities in December 2021. The Open Source insights team reported that more than 35,000 Java packages, amounting to over 8% of the Maven Central repository, were impacted by this vulnerability. As part of their investigation, they compiled a list of 500 affected packages with some of the highest transitive usage to guide users and maintainers who were supporting patching and remediation activities.

In 2021, we also announced our sponsorship of the Secure Open Source (SOS) Rewards pilot program run by the Linux Foundation. This program financially rewards developers for enhancing the security of critical open source projects that we all depend on.

These efforts build on our existing open source security work, such as OSS Fuzz, which was used by over 500 critical open source projects in 2021 and has helped find more than 7,000 vulnerabilities to date.

Our open source work will continue to grow and evolve to support the changing needs of our communities. Thank you to our colleagues and community members that continue to dedicate their personal and professional time supporting the open source ecosystem. Follow our work at opensource.google.

By Sophia Vargas – Research Analyst and Amanda Casari, Open Source Researcher – Google Open Source Programs Office

About this data:

This report features metrics provided by many teams and programs across Alphabet. In regards to the data centered on code and code adjacent activities, we wanted to share more details about the derivation of those metrics:

Data source: These data represent activities on repositories hosted on GitHub and our internal production Git service git-on-borg. These sources represent a subset of open source activity currently tracked by our OSPO.

GitHub: We continue to use GitHub Archive as the primary source for GitHub data, which is available as a public dataset on BigQuery. Alphabet activity within GitHub is identified by self-registered accounts, which we estimate underreports actual activity.

git-on-borg: This is our primary platform for internal projects and some of our larger, long running public projects like Android and Chromium. While we continue to develop on this platform, most of our open source activity has moved to GitHub to increase exposure and encourage community growth.

Distinct event types: Note that git-on-borg and GitHub APIs produce distinct sets of events—as such we will report activity metrics per platform. Where GitHub Event logs capture a wide range of activity from code creation and review to issue creation and comments, the Gerrit Event stream (used by git-on-borg) only captures code changes and reviews.

Driven by humans: We have created many automated bots and systems that can propose changes on various hosting platforms. We have intentionally filtered these data to focus on human-initiated activities. For our estimation of bots, we married our own records with a public list maintained by the devstats project

Business and personal: Activity on GitHub reflects a mixture of Alphabet projects, third party projects, experimental efforts, and personal projects. Our metrics report on all of the above unless otherwise specified.

Alphabet contributors: Please note that unless additional detail is specified, activity counts attributed to Alphabet open source contributors will include our full-time employees as well as our extended Alphabet community (temps, vendors, contractors, and interns).

GitHub Accounts: For counts of Github accounts not affiliated with Alphabet, we cannot assume that 1 account translates to 1 person - as multiple accounts could be tied to one individual or bot accounts.

Active counts: Where possible, we will show ‘active users’ defined by logged activity within a specified timeframe (i.e. in month, year, etc) and ‘active repositories’ and ‘active projects’ as those that have not been archived.

Activity types: This year we explore GitHub activity types in more detail. Note that in some cases we have removed “Watch Events” or articulated this as passive engagement. Additionally, GitHub added an event type “PullRequestReviewEvent” that started logging activity in August 2020, but we chose to remove this from our charts and aggregate counts as it invalidates year over year comparisons.

Dart and Flutter enable Allstar and Security Scorecards

Tuesday, June 21, 2022

We are thrilled to announce that Dart and Flutter have enabled Allstar and Security Scorecards on their open source repositories. This achievement marks the first milestone in our journey towards SLSA compliance to secure builds and releases from supply chain attacks.

Allstar is a GitHub app that provides automated continuous enforcement of security checks such as the OpenSSF Security Scorecards. With Allstar, owners can check for security policy adherence, set desired enforcement actions, and continuously implement those enforcement actions when triggered by a setting or file change in the org or repo.

Security Scorecards is an automated tool that assesses several key heuristics ("checks") associated with software security and assigns each check a score of 0-10. These scores can be used to evaluate the security posture of the project and help assess the risks introduced by dependencies.

Scorecards have been enabled on the following open source repositories, prioritized by their criticality score.

Org

Repos

Flutter

github.com/flutter/flutter

github.com/flutter/engine

github.com/flutter/plugins

github.com/flutter/packages

github.com/flutter/samples

github.com/flutter/website

github.com/flutter/flutter-intellij

github.com/flutter/gallery

github.com/flutter/codelabs

Dart

github.com/dart-lang/linter

github.com/dart-lang/sdk

github.com/dart-lang/dartdoc

github.com/dart-lang/site-www

github.com/dart-lang/test

With these security scanning tools, the Dart and Flutter team have found and resolved more than 200 high and medium security findings. The issues can be classified in the following categories:

Pinned Dependencies: The project should pin its dependencies. A "pinned dependency" is a dependency that is explicitly set to a specific hash instead of allowing a mutable version or range of versions. This reduces several security risks related to dependencies.

Token Permissions: The project's automated workflow tokens should be set to read-only by default. This follows the principle of least privilege.

Branch Protection: Github project's default and release branches should be protected with GitHub's branch protection settings. Branch protection allows maintainers to define rules that enforce certain workflows for branches, such as requiring review or passing certain status.

Code Review: The project should enforce a code review before pull requests (merge requests) are merged.

Dependency update tool: A dependency update tool should be used by the project to identify and update outdated and insecure dependencies.

Binary-Artifacts: The project should not have generated executable (binary) artifacts in the source repository. Embedded binary artifacts in the project cannot be reviewed, allowing possible obsolete or maliciously subverted executables in the source code.

Additionally, the Dart and Flutter teams have an aligned vulnerability management process. Details of these processes can be found on our respective developer sites at https://dart.dev/security and https://docs.flutter.dev/security. Internal process used by the team to handle vulnerabilities can be found on Flutter github wiki.

Image source: https://openssf.org/blog/2021/08/11/introducing-the-allstar-github-app/

Learnings and Best Practices

AllStar and Scorecards allowed Dart and Flutter to quickly identify areas of opportunity to improve security across hundreds of repositories triggering the removal of binaries, standardizing branch protection and enforcing code reviews.
Standardizing third-party dependency management and running vulnerability scanning were identified as the next milestones in the Dart and Flutter journey to improve their overall security posture.
It is safer to not embed binary artifacts in your code. However, there are cases when this is unavoidable.
It is important to track your dependencies and to keep them up to date using tools like Dependabot.

By Khyati Mehta, Technical Program Manager – Dart-Flutter

DC-SCM compatible FPGA based open source BMC hardware platform

Friday, June 3, 2022

Open source software is omnipresent in the server and cloud world, and is giving rise to impressive successes in the SaaS space where useful products can be rapidly created from open source components: operating systems, container runtimes, frameworks for device management, monitoring and data pipelining, workload execution, etc.

Mirroring this trend in software, cloud providers and users are increasingly looking at building their servers using open source hardware, collaborating in initiatives such as the Open Compute Project or OpenPOWER.

With Google, Antmicro has developed two open source hardware FPGA-based Baseboard Management Controller (BMC) platforms compliant with OCP’s DC-SCM ver 1.0 specification to help increase the security, configurability and flexibility of server management and monitoring infrastructure. These have since been adopted by OpenPOWER’s LibreBMC workgroup as the base hardware platform.

The DC-SCM spec

The Datacenter-ready Secure Control Module (DC-SCM) specification aims to move common server management, security and control features from a typical motherboard into a module designed in a normalized form factor, which can be used across various datacenter platforms.

Currently rolling out in first DC-SCM compliant servers, the spec will help Cloud providers share costs, risks and increase reuse for the critical BMC component. Coupled with a fully open source implementation based on popular, inexpensive FPGA platforms will not only allow for more configurability and a tighter integration between hardware and software, but also tap into the momentum behind the broader open source hardware community via groups like CHIPS Alliance, OpenPOWER and RISC-V.

The hardware

Antmicro has developed two implementations of the DC-SCM-compatible BMC. Both designs meet the Open Compute Project specification for a Horizontal Form Factor 90x120 mm DC-SCM ver 1.0.

The BMCs role is central to the server’s faultless operation, responsible for monitoring the system while preventing and mitigating failures. Essentially, acting as an external watchdog.

To be able to provide this functionality, according to the requirements, the module offers a feature-packed Secure Control Interface for communication with the host platform, including:

PCI Express
USB
QSPI
SGPIO
NCSI
multiple I2C, I3C and UART channels

The unique property of Antmicro’s implementation of the standard is merging DC-SCM’s central Baseband Management Controller and the usual programmable SCM CPLD block into one powerful FPGA. This solution was chosen to enable a greater flexibility of the design, allowing remote updates of DC-SCM peripherals, and most notably, placing OpenPOWER or RISC-V IP cores as central processing units of the module.

One of our designs is based on Xilinx Artix-7, whereas the other one features a Lattice ECP5 - both low-cost and open source friendly FPGAs supported by the open source F4PGA toolchain project.

The FPGA is complemented by 512MB of DDR3 memory, 16GB of eMMC flash, as well as a dedicated Gigabit-Ethernet interface. To ensure the security of the Datacenter Control Module, external cryptographic modules: Root of Trust (RoT) and Trusted Platform Module (TPM) can be connected. This will allow future integration with e.g. open hardware Root of Trust projects such as OpenTitan and implementing various boot flow and authentication approaches.

Use in LibreBMC / OpenPOWER

Open source, configurable hardware platforms based on FPGA, open tooling and standards can make BMC more flexible for tomorrow's challenges. Antmicro’s DC-SCM boards have been adopted by the LibreBMC Workgroup, operating under the umbrella of the OpenPOWER Foundation, in a push to build a complete, fully transparent BMC solution. The workgroup, with participation from IBM, Google and Antmicro, among others, will be involved with creating FPGA gateware and software needed to make the hardware fully operational in real-world server solutions.

Variants involving both Linux (in its default open source BMC- distribution, OpenBMC), and Zephyr RTOS, as well as with both POWER and RISC-V cores are planned and thanks to the flexibility of FPGA all of those options will be just one gateware update away. Of course, both the gateware and software will be open source as well.

If you’re looking to develop a secure and transparent DC-SCM spec-compatible BMC solution, reach out to contact@antmicro.com. See how you can collaborate with partners such as Antmicro, Google, and IBM around the open source FPGA-based hardware platform.

By Peter Katarzynski – Antmicro

Vectorized and performance-portable Quicksort

Thursday, June 2, 2022

Today we're sharing open source code that can sort arrays of numbers about ten times as fast as the C++ std::sort, and outperforms state of the art architecture-specific algorithms, while being portable across all modern CPU architectures. Below we discuss how we achieved this.

First, some background. There is a recent trend towards columnar databases that consecutively store all values from a particular column, as opposed to storing all fields of a record or "row" before those of the next record. This can be faster to filter or sort, which are key building blocks for SQL queries; thus we focus on this data layout.

Given that sorting has been heavily studied, how can we possibly find a 10x speedup? The answer lies in SIMD/vector instructions. These carry out operations on multiple independent elements in a single instruction—for example, operating on 16 float32 at once when using the AVX-512 instruction set, or four on Arm NEON:

If you are already familiar with SIMD, you may have heard of it being used in supercomputers, linear algebra for machine learning applications, video processing, or image codecs such as JPEG XL. But if SIMD operations only involve independent elements, how can we sort them, which involves re-arranging adjacent array elements?

Imagine we have some special way to sort, for instance 256 element arrays. Then, the Quicksort algorithm for sorting a larger array consists of partitioning it into two sub-arrays: those less than a "pivot" value (ideally the median), and all others; then recursing until a sub-array is at most 256 elements large, and using our special method for sorting those. Partitioning accounts for most of the CPU time, so if we can speed it up using SIMD, we have a fast sort.

Happily, modern instruction sets (Arm SVE, RISC-V V, x86 AVX-512) include a special instruction suitable for partitioning. Given a separate input of yes/no values (whether an element is less than the pivot), this "compress-store" instruction stores to consecutive memory only the elements whose corresponding input is "yes". We can then logically negate the yes/no values and apply the instruction again to write the elements to the other partition. This strategy has been used in an AVX-512-specific Quicksort. But what about other instruction sets such as AVX2 that don't have compress-store? Previous work has shown how to emulate this instruction using permute instructions.

We build on these techniques to achieve the first vectorized Quicksort that is portable to six instruction sets across three architectures, and in fact outperforms prior architecture-specific sorts. Our implementation uses Highway's portable SIMD functions, so we do not have to re-implement about 3,000 lines of C++ for each platform. Highway uses compress-store when available and otherwise the equivalent permute instructions. In contrast to the previous state of the art—which was also specific to 32-bit integers—we support a full range of 16-128 bit inputs.

Despite our single portable implementation, we reach record-setting speeds on both AVX2, AVX-512 (Intel Skylake) and Arm NEON (Apple M1). For one million 32/64/128-bit numbers, our code running on Apple M1 can produce sorted output at rates of 499/471/466 MB/s. On a 3 GHz Skylake with AVX-512, the speeds are 1123/1119/1120 MB/s. Interestingly, AVX-512 is 1.4-1.6 times as fast as AVX2 - a worthwhile speedup for zero additional effort (Highway checks what instructions are available on the CPU and uses the best available ones). When running on AVX2, we measure 798 MB/s, whereas the prior state of the art optimized for AVX2 only manages 699 MB/s. By comparison, the standard library reaches 58/128/117 MB/s on the same CPU, so we have managed a 9-19x speedup depending on the type of numbers.

Previously, sorting has been considered expensive. We are interested to see what new applications and capabilities will be unlocked by being able to sort at 1 GB/s on a single CPU core. The Apache2-licensed source code is available on Github (feel free to open an issue if you have any questions or comments) and our paper offers a detailed explanation and evaluation of the implementation (including the special case for 256 elements).

By Jan Wassenberg – Brain Computer Architecture Research

Build Open Silicon with Google

Wednesday, June 1, 2022

TLDR; the Google Hardware Toolchains team is launching a new developer portal, developers.google.com/silicon, to help the developer community get started with its Open MPW shuttle program. This will allow anyone to submit open source integrated circuit designs to get manufactured at no-cost.

Since November 2020, when Skywater Technologies announced their partnership with Google to open source their Process Design Kit for the SKY130 process node, the Hardware Toolchains team here at Google has been on a journey to make building open silicon accessible to all developers. Having access to an open source and manufacturable PDK changes the status-quo in the custom silicon design industry and academia:

Designers are now free to start their projects liberated from NDAs and usage restrictions
Researchers are able to make their research reproducible by their fellow peers
Open source EDA tools can integrate deeply with the manufacturing process

Together we've built a community of more than 3,000 members, where hardware designers and software developers alike, can all contribute in their own way to advance the state of the art of open silicon design.

Between the opposite trends of the Moore law coming to an end and the exponential growth of connected devices (IoT), there is a real need to find more sustainable ways to scale computing. We need to go beyond cramming more transistors into smaller areas and toward more efficient dedicated hardware accelerators. Given the recent global chip supply chain struggles, and the lead time for popular ICs sometimes going over a year, we need to do this by leveraging more of the existing global foundry capacity that provides access to older and proven process node technologies.

Mature process nodes like SKY130 (a 130nm technology) offer a great way to prototype IoT applications that often need to balance cost and power with performance and leverage a mix of analog blocks and digital logic in their designs. They offer a faster turnaround rate than bleeding-edge process nodes for a fraction of the price; reducing the temporal and financial cost of making the right mistakes necessary to converge toward the optimal design.

By combining open access to PDKs, and recent advancements in the development of open source ASIC toolchains like OpenROAD, OpenLane, and higher level synthesis toolchain like XLS, we are getting us one step closer to bringing software-like development methodology and fast iteration cycles to the silicon design world.

Free and open source licensing, community collaboration, and fast iteration transformed the way we all develop software. We believe we are at the edge of a similar revolution for custom accelerator development, where hardware designers compete by building on each other's works rather than reinventing the wheel.

Towards this goal, we've been sponsoring a series of Open MPW shuttles on the Efabless platform, allowing around 250 open source projects to manufacture their own silicon.

With the last MPW-5 shuttle that closed up in March this year, we've seen a record level of engagement with 75 open silicon projects submitted for inclusion from 19 different countries.

Each project gets a fixed 2.92mm x 3.52mm user area and 38 I/O pins in a predefined harness to harden their design. It’s also provided with the necessary test infrastructure to validate chip specifications and behavior before being submitted for tape out.

We've seen a variety of designs submission to previous editions of the shuttle including:

Small digital, analog and mixed signal designs
Analog, SRAM and ReRAM generators
Dedicated crypto and ML operations accelerators
Fun designs like Sudoku accelerators, physically modeled guitars, hardware versions of Tetris or Wordle
Many System-on-a-chip designs ranging from award-winning RISC-V cores to larger Linux-capable 64-bit SoC
Group submissions like the Zero to ASIC course, often packing as many as 14 subprojects in the shared user area

Our partner, Efabless announced that the next MPW-6 shuttle will accept open source project submissions until Monday, June 8, 2022. We can't wait to see the variety of projects the open silicon community creates, building on top of the corpus of open source designs steadily growing one Open MPW shuttle after the next.

To help you on-board on future shuttles, we created a new developer portal that provides pointers to get started with the various tools of the open silicon ecosystem: so make sure to check out the portal and start your open silicon journey!

By Johan Euphrosine – Google Hardware Toolchains