opensource.google.com

Menu

Protect your open source project from supply chain attacks

Tuesday, October 19, 2021

From executive orders to key signing parties, 2021 has been the year of supply chain security. If you’re an open source maintainer, learning about the attack surface of your project and the threat vectors throughout your project’s supply chain can feel overwhelming, maybe even insurmountable. The good news is that 2021 has also been the year of supply chain security solutions. While there’s still plenty of work to be done, and plenty of room for improvement in existing solutions, there are preventative controls you can apply to your project now to harden your supply chain and prevent compromise.

At All Things Open 2021, the audience learned about best practices for supply chain security through a quiz game. This blog post walks through the quiz questions, answers, and options for prevention, and can serve as a beginner's guide for anyone who wants to protect their open source project from supply chain attacks. These recommendations follow the SLSA framework and OpenSSF Scorecards rubric, and many can be implemented automatically by using the Allstar project.

An example of a typical software supply chain and examples of attacks that can occur at every link in the chain.
An example of a typical software supply chain and examples of attacks that can occur at every link in the chain.

Q1: What should you do to protect your developer accounts from takeover?
  1. ANSWER: Use multi-factor auth (with a security key if possible)
  2. Use a shared account for core maintainers
  3. Make sure to write all your passwords in rot13
  4. Use an IP allowlist
Why and how: A malicious actor with access to a developer account can pretend to be a known contributor and submit bad code. Encourage contributors to use multi-factor authentication (MFA) not only for platforms where they send commits, but also for accounts associated with contributions, such as email. Where possible, security keys are the recommended form of MFA.

Q2: What should you do to avoid merging malicious commits?
  1. ANSWER: Require all commits to be reviewed by someone who is not the commit author
  2. Auto-run tests on all commits
  3. Scan for the word ‘bitcoin’ in all commits
  4. Only accept commits from contributors who have accounts older than 1 year
Why and how: Self-merging (also known as a unilateral change) introduces two risks: 1) An attacker who has compromised a contributor’s account can inject malicious code directly into the project, or 2) A well-intentioned person can merge a commit that accidentally introduces a security risk. A second set of authenticated eyes can help avoid malicious submissions and accidental weaknesses. Set this up as an automated requirement if possible (such as using GitHub’s Branch Protection settings); tools like Allstar can help enforce this requirement. This corresponds to SLSA level 4.

Q3: How can you protect secrets used by your CI/CD pipeline?
  1. ANSWER: Use a secret manager tool
  2. Appoint a maintainer to control secrets access
  3. Store secrets as environment variables
  4. Store secrets in a separate repo
Why and how: The “defense in depth” security concept is about applying multiple, different layers of defense to protect systems and sensitive data, such as secrets*. A secret manager tool (like Secret Manager for GCP users, HashiCorp Vault, CyberArk Conjur, or Keywhiz) removes the need for hard-coding secrets in source code, provides centralization and audit capabilities, and introduces an authorization layer to prevent leaking secrets.

*When storing sensitive data in a CI system, ensure it is truly for CI/CD purposes, and not data that is better suited for a password or identity manager.

Q4: What should you do to protect your CI/CD system from abuse?
  1. ANSWER: Use access controls following the principle of least privilege
  2. Run integration tests on all pull requests/commits
  3. Mark all contributors as “Collaborators” through GitHub roles
  4. Run CI/CD systems locally
Why and how: Defaulting to “the least amount of access necessary” for your project repository protects your CI/CD system from both unintended access and abuse. While running tests is important, running tests on all commits/pull requests by default—before they’ve been reviewed—can lead to unintentional and malicious abuse of your CI/CD system’s compute resources.

Q5: What should you do to avoid compromise during build time?
  1. ANSWER: Define build definitions and configurations as code, eg build.yaml
  2. Make your builds run as quickly as possible so attackers have no time to compromise your code
  3. Only use LEGO brand components in your build system, accept no substitutes
  4. Delete build logs to avoid leaving clues for attackers
Why and how: Using a build script—a file that defines the build and its steps, like build.yaml—removes the need to manually run build steps, which could possibly introduce an accidental misconfiguration. It also reduces the opportunity for a malicious actor to tamper with the build or insert unreviewed changes. This corresponds to SLSA levels 1-4.

Q6: How should you evaluate dependencies before use?
  1. ANSWER: Assess risk and transitive changes with tools like Scorecards and deps.dev
  2. Check for a little ‘lock’ icon next to the package url
  3. Only use dependencies that have a minimum of 1,000 GitHub stars
  4. Only use dependencies that have never changed maintainers
Why and how: There isn’t one definitive measure that can tell you a package is “good” or “bad;” every project has different security profiles and risk tolerances. Gathering information about a dependency, and what changes it might introduce transitively, will help you decide if a dependency is “safe” for your project. Tools like Open Source Insights (deps.dev) map first layer and transitive dependencies, while Scorecards gives packages a score for multiple risk assessment metrics, including use of security policies, MFA, and branch protection.

Once you determine what dependencies you’re using, running a vulnerability scanning tool such as Open Source Vulnerabilities regularly will help you stay up to date on the latest releases and patches. Many vulnerability scanning tools can also apply automatic upgrades.

Q7: What should you do to ensure your build is the build you think it is (aka verification)?
  1. ANSWER: Use a build service that can generate authenticated provenance
  2. Check the last commit to be sure it’s from a trusted committer
  3. Use steganography to embed your project logo into the build
  4. Run conformance tests for each release
Why and how: Showing the origin and artifacts of a build (the build’s provenance) demonstrates to the user that the build has not been tampered with, and is the correct build. There are many components to provenance; one method to deliver these components is to use a build service that generates and authenticates the data needed to show provenance. This corresponds to SLSA levels 2-4.

Q8: What should you look for when selecting artifacts from a registry?
  1. ANSWER: That artifacts have been cryptographically and verifiably signed
  2. That artifacts are not cursed (through being stolen from tombs)
  3. Timestamps: only use the most recent artifact created
  4. Official endorsement: look for the logo of a trusted brand or standards body
Why and how: Just as you should generate provenance and sign builds for your projects (SLSA levels 2-4), you should also look for the same verification when using artifacts from others. Logos and other brand-based forms of endorsement can be falsified and are used by typosquatters to fake legitimacy; look for tamper-proof verification like signatures. For example, Sigstore helps OSS projects sign their builds, and validate the builds of others.

Improving your project’s security is a continuous journey. Some of these recommendations may not be feasible for your project today, but every step you can take to increase your project’s security is a step in the right direction.

Resources for open source project security:
  • SLSA: A framework for levels of supply chain security
  • Scorecards: A measurement of security best practices use
  • Allstar: A GitHub app for enforcing security best practices
  • Open Source Insights: A searchable visualization of open source project dependencies
  • OSV: A vulnerability database and automation infrastructure for open source
By Anne Bertucio, Google Open Source Programs Office

JuMP: A modeling language for mathematical optimization

Monday, October 18, 2021

The JuMP logo.

As an author of the paper JuMP: A Modeling Language for Mathematical Optimization, I am honored to have recently received the Mathematical Optimization Society’s Beale—Orchard-Hays Prize, an academic award given once every three years for work in the area of computational mathematical optimization. The award, in fact, is about the open source software project JuMP, which I started with Iain Dunning and Joey Huchette while we were PhD students at MIT’s Operations Research Center almost nine years ago. The humbling milestone of the Beale—Orchard-Hays Prize seems like a good occasion to reflect on JuMP, how it has matured and grown as an independent community-driven project, and Google’s role in enabling me to serve as JuMP’s BDFL.

JuMP was created—in the classical open source fashion—to scratch an itch. As graduate students, we wanted a software package that would enable us to write down and solve optimization problems, especially constrained optimization problems like linear programming and integer programming problems. We wanted it to be not only easy, but also fast and powerful. At the time, one was faced with trade-offs between ease-of-use, speed, and flexibility. For example, optimization libraries in Python were user-friendly but introduced noticeable performance bottlenecks. Commercial software such as AMPL was efficient but hard to extend. Low-level interfaces in C or C++ introduced complexities that were distracting for teaching and academic research. We weren’t satisfied with these trade-offs, and began experimenting with a new programming language called Julia that promised to provide the best of both worlds.

Our early experiments showed that Julia was indeed capable of impressive performance. While similar libraries based on Python could be slower to construct the data structure describing the optimization problem than to solve it, our prototype of JuMP was competitive with state-of-the-art commercial libraries. This gave us confidence that JuMP could be useful for the community, and we made the initial public release in October 2013.

Since then, it’s been a real ride! The first JuMP developers workshop in 2017 attracted thirteen speakers from four continents; this year’s workshop featured 32 virtual talks. Of the 800+ citations to the award-winning paper, we were surprised to discover that that about 75% of them were from outside the fields of operations research or optimization itself; about 20% are in energy and power systems, another 20% are in control and engineering, and the remaining citations are spread across scientific applications, computer science, machine learning, and other fields. These figures speak to the role of optimization as a fundamental technology that can be applied almost anywhere. One example application using JuMP of which I’m perhaps most proud is a study by Sepulveda et al. on cost-effective ways to decarbonize the power grid. This study is cited both by Bill Gates in his new book, “How to Avoid a Climate Disaster,” and by Google’s methodologies and metrics framework for its goal of operating data centers and campuses entirely on carbon-free energy by 2030.

As JuMP’s core development team grew beyond MIT and its original creators graduated, it was important for JuMP to find a new home for its long-term sustainability. We were lucky to find NumFOCUS, a nonprofit organization supporting open source scientific software (of which Google is a corporate sponsor). As a Google employee, I have continued contributing code for JuMP, traveling to workshops, and serving in leadership roles thanks in no small part to Google’s generous open source policies and support from my team and management chain. Last year, I was granted the honorific of Benevolent Dictator for Life (BDFL). I plan to use this power judiciously and rarely, relying instead on JuMP’s strong culture of consensus-driven development.

As for the future, JuMP’s 1.0 release is near on the horizon, and I look forward to whatever comes next!

By Miles Lubin, Algorithms & Optimization Team, Google Research

Open Source in the 2021 Accelerate State of DevOps Report

Wednesday, September 22, 2021

To truly thrive, organizations need to adopt practices and capabilities that will lead them to performance improvements. Therefore, having access to data-driven insights and recommendations about the most effective and efficient ways to develop and deliver technology is critical. Over the past seven years, the DevOps Research and Assessment (DORA) has collected data from more than 32,000 industry professionals and used rigorous statistical analysis to deepen our understanding of the practices that lead to excellence in technology delivery and to powerful business outcomes.
 
One of the most valuable insights that has come from this research is the categorization of organizations on four different performance profiles (Elite, High, Medium, and Low) based on their performance on four software delivery metrics centered around throughput and stability - Deployment Frequency, Lead Time for Changes, Time to Restore Service and Change Failure Rate. We found that organizations that excel at these four metrics can be classified as elite performers while those that do not can be classified as low performers. See DevOps Research and Assessment (DORA) for a detailed description of these metrics and the different levels of organizational performance.

DevOps Research and Assessment (DORA) showing a detailed description of these metrics and the different levels of organizational performance

We have found that a number of technical capabilities are associated with improved continuous delivery performance. Our findings indicate that organizations that have incorporated loosely coupled architecture, continuous testing and integration, truck-based development, deployment automation, database change management, monitoring and observability and have leveraged open source technologies perform better than organizations that have not adopted these capabilities.

Now that you know a little bit about what DORA is and some of its key findings, let’s dive into whether the use of open source technologies within organizations impacts performance.

A quick Google search will yield hundreds (if not, thousands) of articles describing the myriad of ways organizations benefit from using open source software—faster innovation, higher quality products, stronger security, flexibility, ease of customization, etc. We know using open source software is the way to go, but until recently, we still had little empirical evidence demonstrating that its use is associated with improved organizational performance – until today.

This year, we surveyed 1,200 working professionals from a variety of industries around the globe about the factors that drive higher performance, including the use of open source software. Research from this year’s DORA report illustrates that low performing organizations have the highest use of proprietary software. In contrast, elite performers are 1.75 times more likely to make extensive use of open source components, libraries, and platforms. We also find that elite performers are 1.5 times more likely to have plans to expand their use of open source software compared to their low-performing counterparts. But, the question remains—does leveraging open source software impact an organization’s performance? Turns out the answer is, yes!

Our research also found that elite performers who meet their reliability targets are 2.4 times more likely to leverage open source technologies. We suspect that the original tenets of the open source movement of transparency and collaboration play a big role. Developers are less likely to waste time reinventing the wheel which allows them to spend more time innovating, they are able to leverage global talent instead of relying on the few people in their team or organization.

Technology transformations take time, effort, and resources. They also require organizations to make significant mental shifts. These shifts are easier when there is empirical evidence backing recommendations—organizations don’t have to take someone’s word for it, they can look at the data, look at the consistency of findings to know that success and improvement are in fact possible.

In addition to open source software, the 2021 Accelerate State of DevOps Report discusses a variety of capabilities and practices that drive performance. In the 2021 report, we also examined the effects of SRE best practices, the pandemic and burnout, the importance of quality documentation, and we revisited our exploration of leveraging the cloud. If you’d like to read the full report or any previous report, you can visit cloud.google.com/devops.
.