Google Open Source Blog: December 2020

Posts from December 2020

Using MicroK8s with Anthos Config Management in the world of IoT

Friday, December 11, 2020

When dealing with large scale Kubernetes deployments, managing configuration and policy is often very complicated. We discussed why Kubernetes’ declarative approach to configuration as data has become the most popular choice for most users a few weeks ago. Today, we will discuss bringing this approach to your MicroK8 deployments using Anthos Config Management.

Image of Anthos Config Management + Cloud Source Repositories + MicroK8s

Anthos Config Management helps you easily create declarative security and operational policies and implement them at scale for your Kubernetes deployments across hybrid and multi-cloud environments. At a high level, you represent the desired state of your deployment as code committed to a central Git repository. Anthos Config Management will ensure the desired state is achieved and also maintained across all your registered clusters.

You can use Anthos Config Management for both your Kubernetes Engine (GKE) clusters as well as on Anthos attached clusters. Anthos attached clusters is a deployment option that extends Anthos’ reach into Kubernetes clusters running in other clouds as well as edge devices and the world of IoT, the Internet of Things. In this blog you will learn by experimenting with attached clusters with MicroK8s, a conformant Kubernetes platform popular in IoT and edge environments.

Consider an organization with a large number of distributed manufacturing facilities or laboratories that use MicroK8s to provide services to IoT devices. In such a deployment, Anthos can help you manage remote clusters directly from the Anthos Console rather than investing engineering resources to build out a multitude of custom tools.

Consider the diagram below.

Diagram of Anthos Config Management with MicroK8s on the Factory Floor with IoT

This diagram shows a set of “N” factory locations each with a MicroK8s cluster supporting IoT devices such as lights, sensors, or even machines. You register each of the MicroK8s clusters in an Anthos environ: a logical collection of Kubernetes clusters. When you want to deploy the application code to the MicroK8s clusters, you commit the code to the repository and Anthos Config Management takes care of the deployment across all locations. In this blog we will show you how you can quickly try this out using a MicroK8s test deployment.

We will use the following Google Cloud services:

Compute Engine provides an Ubuntu instance for a single-node MicroK8s cluster. Ubuntu will use cloud-init to install MicroK8s and generate shell scripts and other files to save time.
Cloud Source Repositories will provide the Git-based repository to which we will commit our workload.
Anthos Config Management will perform the deployment from the repository to the MicroK8s cluster.

Let’s start with a picture

Here’s a diagram of how these components fit together.

Diagram of how Anthos Config Management works together with MicroK8s

A workstation instance is created from which Terraform is used to deploy four components: (1) an IAM service account, (2) a Google Compute Engine Instance with MicroK8s using permissions provided by the service account, (3) a Kubernetes configuration repo provided by Cloud Source Repositories, and (4) a public/private key pair.
The GCE instance will use the service account key to register the MicroK8s cluster with an Anthos environ.
The public key from the public/ private key pair will be registered to the repository while the private key will be registered with the MicroK8s cluster.
Anthos Config Management will be configured to point to the repository and branch to poll for updates.
When a Kubernetes YAML document is pushed to the appropriate branch of the repository, Anthos Config Management will use the private key to connect to the repository, detect that a commit has been made against the branch, fetch the files and apply the document to the MicroK8s cluster.

Anthos Config Management enables you to deploy code from a Git repository to Kubernetes clusters that have been registered with Anthos. Google Cloud officially supports GKE, AKS, and EKS clusters, but you can use other conformant clusters such as MicroK8s in accordance with your needs. The repository below shows you how to register a single MicroK8s cluster to receive deployments. You can also scale this to larger numbers of clusters all of which can receive updates from commitments to the repository. If your organization has large numbers of IoT devices supported by Kubernetes clusters you can update all of them from the Anthos console to provide for consistent deployments across the organization regardless of the locations of the clusters, including the IoT edge. If you would like to learn more, you can build this project yourself. Please check out this Git repository and learn firsthand about how Anthos can help you manage Kubernetes deployments in the world of IoT.

By Jeff Levine, Customer Engineer – Google Cloud

Finding Critical Open Source Projects

Thursday, December 10, 2020

Comic graphic of modern digital infrastructure

Open source software (OSS) has long suffered from a "tragedy of the commons" problem. Most organizations, large and small, make use of open source software every day to build modern products, but many OSS projects are struggling for the time, resources and attention they need. This is a resource allocation problem and Google, as part of Open Source Security Foundation (OpenSSF), can help solve it together. We need ways to connect critical open source projects we all rely on, with organizations that can provide them with adequate support.

Criticality of an open source project is difficult to define; what might be a critical dependency for one consumer of open source software may be entirely absent for another. However, arriving at a shared understanding and framework allows us to have productive conversations about our dependencies. Simply put, we define criticality to be the influence and importance of a project.

In order for OpenSSF to fund these critical open source projects, they need to be identified first. For this purpose, we are releasing a new project - “Criticality Score” under the OpenSSF. Criticality score indicates a project’s criticality (a number between 0 and 1) and is derived from various project usage metrics in a fully automated way. Our initial evaluation metrics include a project’s age, number of individual contributors and organizations involved, user involvement (in terms of new issue requests and updates), and a rough estimate of its dependencies using commit mentions. We also provide a way to add your own metric(s). For example, you can add internal project usage data to re-adjust a project's criticality score for individualized prioritization needs.

Identifying these critical projects is only the first step in making security improvements. OpenSSF is also exploring ways to provide maintainers of these projects with the resources they need. If you're a maintainer of a critical software package and are interested in getting help, funding, or infrastructure to run your project, reach out to the OpenSSF’s Securing Critical Projects working group here.

Check out the Criticality Score project on GitHub, including a list of critical open source projects. Though we have made some progress on this problem, we have not solved it and are eager for the community’s help in refining these metrics to identify critical open source projects.

By Abhishek Arya, Kim Lewandowski, Dan Lorenc and Julia Ferraioli – Google Open Source

Expanding Fuchsia's open source model

Tuesday, December 8, 2020

Fuchsia is a long-term project to create a general-purpose, open source operating system, and today we are expanding Fuchsia’s open source model to welcome contributions from the public.

Fuchsia is designed to prioritize security, updatability, and performance, and is currently under active development by the Fuchsia team. We have been developing Fuchsia in the open, in our git repository for the last four years. You can browse the repository history at https://fuchsia.googlesource.com to see how Fuchsia has evolved over time. We are laying this foundation from the kernel up to make it easier to create long-lasting, secure products and experiences.

Starting today, we are expanding Fuchsia's open source model to make it easier for the public to engage with the project. We have created new public mailing lists for project discussions, added a governance model to clarify how strategic decisions are made, and opened up the issue tracker for public contributors to see what’s being worked on. As an open source effort, we welcome high-quality, well-tested contributions from all. There is now a process to become a member to submit patches, or a committer with full write access.

In addition, we are also publishing a technical roadmap for Fuchsia to provide better insights for project direction and priorities. Some of the highlights of the roadmap are working on a driver framework for updating the kernel independently of the drivers, improving file systems for performance, and expanding the input pipeline for accessibility.

Fuchsia is an open source project that is inclusive by design, from the architecture of the platform itself, to the open source community that we’re building. The project is still evolving rapidly, but the underlying principles and values of the system have remained relatively constant throughout the project. More information about the core architectural principles are available in the documentation: secure, updatable, inclusive, and pragmatic.

Fuchsia is not ready for general product development or as a development target, but you can clone, compile, and contribute to it. It has support for a limited set of x64-based hardware, and you can also test it with Fuchsia’s emulator. You can download and build the source code by following the getting started guide.

Fuchsia emulator startup with fx emu

If you would like to learn more about Fuchsia, join our mailing lists and browse the documentation at fuchsia.dev. You can now be part of the project and help build the future of this operating system. We are looking forward to receiving contributions from the community as we grow Fuchsia together.

By Wayne Piekarski, Developer Advocate for Fuchsia

OpenTitan at one year: the open source journey to secure silicon

Monday, December 7, 2020

During the past year, OpenTitan has grown tremendously as an open source project and is on track to provide transparent, trustworthy, and cost-free security to the broader silicon ecosystem. OpenTitan, the industry’s first open source silicon root of trust, has rapidly increased engineering contributions, added critical new partners, selected our first tapeout target, and published a comprehensive logical security model for the OpenTitan silicon, among other accomplishments.

OpenTitan by the Numbers

OpenTitan has doubled many metrics in the year since our public launch: in design size, verification testing, software test suites, documentation, and unique collaborators at least. Crucially, this growth has been both in the design verification collateral required for high volume production-quality silicon, as well as the digital design itself, a first for any open source silicon project.

More than doubled the number of commits at launch: from 2,500 to over 6,100 (across OpenTitan and the Ibex RISC-V core sub-project).
Grew to over 141K lines of code (LOC) of System Verilog digital design and verification.
Added 13 new IP blocks to grow to a total to 29 distinct hardware units.
Implemented 14 Device Interface Functions (DIFs) for a total 15 KLOC of C11 source code and 8 KLOC of test software.
Increased our design verification suite to over 66,000 lines of test code for all IP blocks.
Expanded documentation to over 35,000 lines of Markdown.
Accepted contributions from 52 new unique contributors, bringing our total to 100.
Increased community presence as shown by an aggregate of over 1,200 Github stars between OpenTitan and Ibex.

One year of OpenTitan and Ibex growth on GitHub: the total number of commits grew from 2,500 to over 6,100.

High quality development is one of OpenTitan’s core principles. Besides our many style guides, we require thorough documentation and design verification for each IP block. Each piece of hardware starts with auto-generated documentation to ensure consistency between documentation and design, along with extensive, progressively improving, design verification as it advances through the OpenTitan hardware stages to reach tapeout readiness.

Sources chart for open titan growth in design verification

One year of growth in Design Verification: from 30,000 to over 65,000 lines of testing source code. Each color represents design verification for an individual IP block.

Innovating for Open Silicon Development

Besides writing code, we have made significant advances in developing processes and security framework for high quality, secure open source silicon development. Design success is not just measured by the hardware, highly functional software and a firm contract between the two, with well-defined interfaces and well-understood behavior, play an important role.

OpenTitan’s hardware-software contract is realized by our DIF methodology, yet another way in which we ensure hardware IP quality. DIFs are a form of hardware-software co-design and the basis of our chip-level design verification testing infrastructure. Each OpenTitan IP block requires a style guide-compliant DIF, and this year we implemented 14 DIFs for a total 15 KLOC of C11 source code and 8 KLOC of tests.

We also reached a major milestone by publishing an open Security Model for a silicon root of trust, an industry first. This comprehensive guidance demonstrates how OpenTitan provides the core security properties required of a secure root of trust. It covers provisioning, secure boot, device identity, and attestation, and our ownership transfer mechanism, among other topics.

Expanding the OpenTitan Ecosystem

Besides engineering effort and methodology development, the OpenTitan coalition added two new Steering Committee members in support of lowRISC as an open source not-for-profit organization. Seagate, a leader in storage technology, and Giesecke and Devrient Mobile Security, a major producer of certified secure systems. We also chartered our Technical Committee to steer technical development of the project. Technical Committee members are drawn from across our organizational and individual contributors, approving 9 technical RFCs and adding 11 new project committers this past year.

On the strength of the OpenTitan open source project’s engineering progress, we are excited to announce today that Nuvoton and Google are collaborating on the first discrete OpenTitan silicon product. Much like the Linux kernel is itself not a complete operating system, OpenTitan’s open source design must be instantiated in a larger, complete piece of silicon. We look forward to sharing more on the industry’s first open source root of trust silicon tapeout in the coming months.

Onward to 2021

OpenTitan’s future is bright, and as a project it fully demonstrates the potential for open source design to enable collaboration across disparate, geographically far flung teams and organizations, to enhance security through transparency, and enable innovation in the open. We could not do this without our committed project partners and supporters, to whom we owe all this progress: Giesecke and Devrient Mobile Security, Western Digital, Seagate, the lowRISC CIC, Nuvoton, ETH Zürich, and many independent contributors.

Interested in contributing to the industry's first open source silicon root of trust? Contact us here.

By Dominic Rizzo, OpenTitan Lead – Google Cloud

Announcing the Atheris Python Fuzzer

Friday, December 4, 2020

Fuzz testing is a well-known technique for uncovering programming errors. Many of these detectable errors have serious security implications. Google has found thousands of security vulnerabilities and other bugs using this technique. Fuzzing is traditionally used on native languages such as C or C++, but last year, we built a new Python fuzzing engine. Today, we’re releasing the Atheris fuzzing engine as open source.

What can Atheris do?

Atheris can be used to automatically find bugs in Python code and native extensions. Atheris is a “coverage-guided” fuzzer, which means that Atheris will repeatedly try various inputs to your program while watching how it executes, and try to find interesting paths.

One of the best uses for Atheris is for differential fuzzers. These are fuzzers that look for differences in behavior of two libraries that are intended to do the same thing. One of the example fuzzers packaged with Atheris does exactly this to compare the Python “idna” package to the C “libidn2” package. Both of these packages are intended to decode and resolve internationalized domain names. However, the example fuzzer idna_uts46_fuzzer.py shows that they don’t always produce the same results. If you ever decided to purchase a domain containing (Unicode codepoints [U+0130, U+1df9]), you’d discover that the idna and libidn2 libraries resolve that domain to two completely different websites.

In general, Atheris is useful on pure Python code whenever you have a way of expressing what the “correct” behavior is - or at least expressing what behaviors are definitely not correct. This could be as complex as custom code in the fuzzer that evaluates the correctness of a library’s output, or as simple as a check that no unexpected exceptions are raised. This last case is surprisingly useful. While the worst outcome from an unexpected exception is typically denial-of-service (by causing a program to crash), unexpected exceptions tend to reveal more serious bugs in libraries. As an example, the one YAML parsing library we tested Atheris on says that it will only raise YAMLErrors; however, yaml_fuzzer.py detects numerous other exceptions, such as ValueError from trying to interpret “-_” as an integer, or TypeError from trying to use a list as a key in a dict. (Bug report.) This indicates flaws in the parser.

Finally, Atheris supports fuzzing native Python extensions, using libFuzzer. libFuzzer is a fuzzing engine integrated into Clang, typically used for fuzzing C or C++. When using libFuzzer with Atheris, Atheris can still find all the bugs previously described, but can also find memory corruption bugs that only exist in native code. Atheris supports the Clang sanitizers Address Sanitizer and Undefined Behavior Sanitizer. These make it easy to detect corruption when it happens, rather than far later. In one case, the author of this document found an LLVM bug using an Atheris fuzzer (now fixed).

What does Atheris support?

Atheris supports fuzzing Python code and native extensions in Python 2.7 and Python 3.3+. When fuzzing Python code, using Python 3.8+ is strongly recommended, as it allows for much better coverage information. When fuzzing native extensions, Atheris can be used in combination with Address Sanitizer or Undefined Behavior Sanitizer.

OSS-Fuzz is a fuzzing service hosted by Google, where we execute fuzzers on open source code free of charge. OSS-Fuzz will soon support Atheris!

How can I get started?

Take a look at the repo, in particular the examples. For fuzzing pure Python, it’s as simple as:

pip3 install atheris

And then, just define a TestOneInput function that runs the code you want to fuzz:

import atheris
import sys

def TestOneInput(data):
if data == b"bad":
raise RuntimeError("Badness!")

atheris.Setup(sys.argv, TestOneInput)
atheris.Fuzz()

That’s it! Atheris will repeatedly invoke TestOneInput and monitor the execution flow, until a crash or exception occurs.

For more details, including how to fuzz native code, see the README.

By the Google Information Security team

From MLPerf to MLCommons: moving machine learning forward

Thursday, December 3, 2020

Today, the community of machine learning researchers and engineers behind the MLPerf benchmark is launching an open engineering consortium called MLCommons. For us, this is the next step in a journey that started almost three years ago.

Early in 2018, we gathered a group of industry researchers and academics who had published work on benchmarking machine learning (ML), in a conference room to propose the creation of an industry standard benchmark to measure ML performance. Everyone had doubts: creating an industry standard is challenging under the best conditions and ML was (and is) a poorly understood stochastic process running on extremely diverse software and hardware. Yet, we all agreed to try.

Together, along with a growing community of researchers and academics, we created a new benchmark called MLPerf. The effort took off. MLPerf is now an industry standard with over 2,000 submitted results and multiple benchmarks suites that span systems from smartphones to supercomputers. Over that time, the fastest result submitted to MLPerf for training the classic ML network ResNet improved by over 13x.

We created MLPerf because we believed in three principles:

Machine learning has tremendous potential: Already, machine learning helps billions of people find and understand information through tools like Google’s search engine and translation service. Active research in machine learning could one day save millions of lives through improvements in healthcare and automotive safety.

Transforming machine learning from promising research into wide-spread industrial practice requires investment in common infrastructure -- especially metrics: Much like computing in the ‘80s, real innovation is mixed with hype and adopting new ideas is slow and cumbersome. We need good metrics to identify the best ideas, and good infrastructure to make adoption of new techniques fast and easy.

Developing common infrastructure is best done by an open, fast-moving collaboration: We need the vision of academics and the resources of industry. We need the agility of startups and the scale of leading tech companies. Working together, a diverse community can develop new ideas, launch experiments, and rapidly iterate to arrive at shared solutions.

Our belief in the principles behind MLPerf has only gotten stronger, and we are excited to be part of the next step for the MLPerf community with the launch of MLCommons.

MLCommons aims to accelerate machine learning to benefit everyone. MLCommons will build a a common set of tools for ML practitioners including:

Benchmarks to measure progress: MLCommons will leverage MLPerf to measure speed, but also expand benchmarking other aspects of ML such as accuracy and algorithmic efficiency. ML models continue to increase in size and consequently cost. Sustaining growth in capability will require learning how to do more (accuracy) with less (efficiency).

Public datasets to fuel research: MLCommons new People’s Speech project seeks to develop a public dataset that, in addition to being larger than any other public speech dataset by more than an order of magnitude, better reflects diverse languages and accents. Public datasets drive machine learning like nothing else; consider ImageNet’s impact on the field of computer vision.

Best practices to accelerate development: MLCommons will make it easier to develop and deploy machine learning solutions by fostering consistent best practices. For instance, MLCommons’ MLCube project provides a common container interface for machine learning models to make them easier to share, experiment with (including benchmark), develop, and ultimately deploy.

Google believes in the potential of machine learning, the importance of common infrastructure, and the power of open, collaborative development. Our leadership in co-founding, and deep support in sustaining, MLPerf and MLCommons has echoed our involvement in other efforts like TensorFlow and NNAPI. Together with the MLCommons community, we can improve machine learning to benefit everyone.

Want to get involved? Learn more at mlcommons.org.

By Peter Mattson – ML Metrics, Naveen Kumar – ML Performance, and Cliff Young – Google Brain

Accessibility best practices for virtual events

Tuesday, December 1, 2020

An image with 4 hands together in the colors: red, blue, green, yellow.

As everyone knows, most of our open source events have transformed from in-person to digital this year. However, due to the sudden change, not everything is accessible. We took this issue seriously and decided to work with one of our accessibility experts, Neighborhood Access, to share best practices for our community. We hope this will help you organize your digital events!