Google Open Source Blog: April 2024

Posts from April 2024

The Power of Open Source

Thursday, April 11, 2024

At the day 1 keynote of Open Source Summit North America, Timothy Jordan, Director of Developer Relations and Open Source at Google, will talk about the landscape of open source and AI, the importance of a responsible approach, and the transformative impact of community collaboration. In anticipation of this talk, let’s break down the AI open source ecosystem, and how Google approaches it.

Google believes in the power of open technology to drive innovation and benefit everyone. It fosters creativity and collaboration, while ensuring technology access for developers and allowing customization to fit unique use cases. Open source licenses give developers full creative autonomy without restriction. It is this ecosystem of open source and open technology, shaped by ML frameworks like TensorFlow, Keras, and JAX, that has enabled so many incredible advances in AI in recent years.

The open source community has been in discussion on how to apply the Open Source Definition to carry forward the open principles of the OSD while addressing concepts like derived work and author attribution in AI. During Timothy’s keynote, he’ll speak to his own philosophy on Open Source and AI, and share how his assumptions about how we apply open source to AI have evolved. The immediate availability of AI models, powered by the open source ecosystem of ML frameworks, means it’s more important than ever that we establish a shared definition for open source and AI.

While that definition is in development, at Google we’re using precise language to describe our openly available models like Gemma. The definition and license is only one part of this open ML/AI future; advancements in safety tooling, policies, and developer knowledge are all part of creating a responsible and open future for AI. Those advancements are all fueled by a dedication to collaboration. Whether sharing innovations and improvements with the community, or having conversations with policymakers and open source leaders, collaboration is key to a responsible approach to AI in the open ecosystem. AI can only be safe and responsible if everyone’s experiences and perspectives are brought to the forefront as it’s built.

To demonstrate how open source has made AI readily available, Timothy will also take the audience through a “low code” demo of how to run large language models in-browser for web applications. Using MediaPipe, the LLM Inference API, and Gemma, users can quickly add genAI capabilities like document summarization and text generation.

Join us at Open Source Summit North America for this keynote, and visit opensource.google to learn more.

By the Google Open Source team

Google Season of Docs announces participating organizations for 2024

Wednesday, April 10, 2024

Google Season of Docs provides support for open source projects to improve their documentation and gives professional technical writers an opportunity to gain experience in open source. Together we improve developer experience through better documentation, increase our understanding of best practices in open source documentation, and raise the profile of technical writers in open source.

For 2024, Google Season of Docs is pleased to announce that eleven organizations will be participating in the program! The list of participating organizations can be viewed on the website.

The project development phase now begins. Organizations and the technical writers they hire will work on their documentation projects from now until November 22nd. For organizations still looking to hire a technical writer, the hiring deadline is May 22nd.

How do I take part in Google Season of Docs as a technical writer?

Start by reading the technical writer guide and FAQs which give information about eligibility and choosing a project. Next, technical writers interested in working with accepted open source organizations can share their contact information via the Season of Docs GitHub repository; or they may submit a statement of interest directly to the organizations. We recommend technical writers reach out to organizations before submitting a statement of interest to discuss the project they’ll be working on and gain a better understanding of the organization. Technical writers do not need to submit a formal application through Google Season of Docs, so reach out to the organizations as soon as possible!

Will technical writers be paid while working with organizations accepted into Google Season of Docs?

Yes. Participating organizations will transfer funds directly to the technical writer via OpenCollective. Technical writers should review the organization's proposed project budgets and discuss their compensation and payment schedule with the organization before hiring. Check out our technical writer payment process guide for more details.

General Timeline

May 22, 2024	Technical writer hiring deadline
June 5, 2024	Organization administrators start reporting on their project status via monthly evaluations
December 10, 2024	Final date for Organization administrators submit their case study and final project evaluation
December 13, 2024	Google publishes the 2024 Season of Docs case studies and aggregate project data
May 1, 2025	Organizations begin to participate in post-program followup surveys

See the full timeline for details.

Care to join us?

Explore the Google Season of Docs website at g.co/seasonofdocs to learn more about the program. Use our logo and other promotional resources to spread the word. Review the timeline, check out the FAQ, and reach out to organizations now!

If you have any questions about the program, please email us at season-of-docs@google.com.

By Erin McKean, Google Open Source Programs Office

Introducing Jpegli: A New JPEG Coding Library

Wednesday, April 3, 2024

The internet has changed the way we live, work, and communicate. However, it can turn into a source of frustration when pages load slowly. At the heart of this issue lies the encoding of images. To improve on this, we are introducing Jpegli, an advanced JPEG coding library that maintains high backward compatibility while offering enhanced capabilities and a 35% compression ratio improvement at high quality compression settings.

Jpegli is a new JPEG coding library that is designed to be faster, more efficient, and more visually pleasing than traditional JPEG. It uses a number of new techniques to achieve these goals, including:

It provides both a fully interoperable encoder and decoder complying with the original JPEG standard and its most conventional 8-bit formalism, and API/ABI compatibility with libjpeg-turbo and MozJPEG.

High quality results. When images are compressed or decompressed through Jpegli, more precise and psychovisually effective computations are performed and images will look clearer and have fewer observable artifacts.

Fast. While improving on image quality/compression density ratio, Jpegli's coding speed is comparable to traditional approaches, such as libjpeg-turbo and MozJPEG. This means that web developers can effortlessly integrate Jpegli into their existing workflows without sacrificing coding speed performance or memory use.

10+ bits. Jpegli can be encoded with 10+ bits per component. Traditional JPEG coding solutions offer only 8 bit per component dynamics causing visible banding artifacts in slow gradients. Jpegli's 10+ bits coding happens in the original 8-bit formalism and the resulting images are fully interoperable with 8-bit viewers. 10+ bit dynamics are available as an API extension and application code changes are needed to benefit from it.

More dense: Jpegli compresses images more efficiently than traditional JPEG codecs, which can save bandwidth and storage space, and speed up web pages.

How Jpegli works

Jpegli works by using a number of new techniques to reduce noise and improve image quality; mainly adaptive quantization heuristics from the JPEG XL reference implementation, improved quantization matrix selection, calculating intermediate results precisely, and having the possibility to use a more advanced colorspace. All the new methods have been carefully crafted to use the traditional 8-bit JPEG formalism, so newly compressed images are compatible with existing JPEG viewers such as browsers, image processing software, and others.

Adaptive quantization heuristics

Jpegli uses adaptive quantization to reduce noise and improve image quality. This is done by spatially modulating the dead zone in quantization based on psychovisual modeling. Using adaptive quantization heuristics that we originally developed for JPEG XL, the result is improved image quality and reduced file size. These heuristics are much faster than a similar approach originally used in guetzli.

Improved quantization matrix selection

Jpegli also uses a set of quantization matrices that were selected by optimizing for a mix of psychovisual quality metrics. Precise intermediate results in Jpegli improve image quality, and both encoding and decoding produce higher quality results. Jpegli can use JPEG XL's XYB colorspace for further quality and density improvements.

Testing Jpegli

In order to quantify Jpegli's image quality improvement we enlisted the help of crowdsourcing raters to compare pairs of images from Cloudinary Image Dataset '22, encoded using three codecs: Jpegli, libjpeg-turbo and MozJPEG, at several bitrates.

In this comparison we limited ourselves to comparing the encoding only, decoding was always performed using libjpeg-turbo. We conducted the study with the XYB ICC color profile disabled since that is how we anticipate most users would initially use Jpegli. To simplify comparing the results across the codecs and settings, we aggregated all the rater decisions using chess rankings inspired ELO scoring.

A bar graph of ELO scores on the left and plot graph of ELO scores on the right

A higher ELO score indicates a better aggregate performance in the rater study. We can observe that jpegli at 2.8 BPP received a higher ELO rating than libjpeg-turbo at 3.7 BPP, a bitrate 32 % higher than Jpegli's.

Results

Our results show that Jpegli can compress high quality images 35% more than traditional JPEG codecs.

Jpegli is a promising new technology that has the potential to make the internet faster and more beautiful.

By Zoltan Szabadka, Martin Bruse and Jyrki Alakuijala – Paradigms of Intelligence, Google Research

OSV and helping developers fix known vulnerabilities

Tuesday, April 2, 2024

In 2021, we launched the OSV project with a goal of enabling easy management of known vulnerabilities in open source software dependencies. To achieve this, we started by building an open source, comprehensive database (https://osv.dev) that accurately describes all known OSS vulnerabilities in the easy-to-use OpenSSF OSV Schema.

Over time, we worked with numerous open source communities to adopt the OSV Schema (totalling over 24 ecosystems), and introduced open source tools like our API and OSV-Scanner to directly make this database useful to developers.

The OSV project takes a very developer-focused approach to vulnerability management, as we realize that day-to-day developers are often the ones who bear the burden of managing dependency updates and triaging vulnerabilities in their dependencies.

Today the OSV team is excited to announce some exciting updates to the work we’ve been doing, and share how the OSV project as a whole helps developers with vulnerability management today.

Announcing guided remediation

Developers are often faced with an overwhelming number of vulnerabilities reported against their dependencies. To tackle this, we’re announcing a tool as part of OSV-Scanner to enable developers to both interactively and automatically prioritize and fix the vulnerabilities that matter in an easy way.

The basic usage of the tool provides a simple command for developers to run which will automatically fix as many vulnerabilities as possible by upgrading their project’s dependencies.

For developers who need or want finer control over vulnerability remediation, the tool also provides the more advanced interactive mode. In the interactive mode, developers can preview and make informed decisions on which packages to upgrade or which vulnerabilities they want to prioritize based on metrics such as vulnerability severity, dependency depth, or dependency type.

Filtering by all these advanced metrics are also available via CLI flags for running the tool non-interactively, which enables integration of guided remediation into automated workflows. For example, developers can connect the tool with their CI/test pipelines to determine the set of non-breaking dependency upgrades.

Currently, the guided remediation tool supports npm package.json and package-lock.json dependencies, but we’ll be adding support for more ecosystems in the future.

Check out our detailed documentation for more information or if you would like to try it out for yourself!

OSV-Scanner GitHub action

We’ve also recently launched the OSV-Scanner GitHub action, which provides an easy way for developers to integrate vulnerability scanning using OSV.dev into their CI/CD pipelines. This is currently being used by Tensorflow and Flutter to provide continuous scanning of their dependencies.

Our GitHub Action can be configured to do the following:

Regular vulnerability scan workflow. A common use case is to set a schedule to regularly scan the repository, with the workflow failing if a new vulnerability is found. Another use case can be to block release workflows if a vulnerability is found.

Trigger a differential vulnerability scan to run when a pull request is opened. This workflow can determine if your changes introduce new vulnerabilities and can be configured to block pull requests when the action fails. Enabling just this feature can allow you to stop new vulnerabilities from being introduced, while not breaking your existing workflows.

Head over to our documentation to see a quick and easy guide on how to get started integrating the OSV-Scanner action into your GitHub repository.

Other OSV features

Guided remediation and the GitHub actions support form is one piece of enabling our goal of making vulnerability management easier.

OSV also provides a broad suite of features:

Support for 11 language ecosystems with 19 lockfile formats

Support for C/C++ vulnerability management. C/C++ brings with it a unique set of challenges for dealing with known vulnerabilities in dependencies

Support for license scanning to detect license compliance issues

Reachability analysis to reduce false positives

Govulncheck integration to enable reachability analysis of Go vulnerabilities
Experimental Rust call analysis to enable reachability analysis of Rust vulnerabilities

What’s next?

We still have a lot more exciting work planned! A remaining challenge for dealing with known vulnerabilities in dependencies is remediation and dealing with false positives. Much of our work is focused on improving data quality and providing accurate and actionable results that lead to easy remediation.

These include:

Iterating on guided remediation: by addressing user feedback and adding support for additional ecosystems.

Improving container scanning. OSV-Scanner has so far focused on source repository scanning. One important gap we aim to fill is to provide better support for container scanning, in a way that provides actionable and useful remediation guidance, while minimizing false positives.

Continue to improve matching and data quality. A continuing focus for OSV-Scanner is making sure that our scanning is comprehensive and accurate. Accuracy is especially important for us, as one of our core goals is to minimize false positives and vulnerability noise for developers at the receiving end of the scanners through things such as reachability analysis.

Interested in using OSV in your project? Check out our OSV-Scanner and OSV.dev documentation for how to get started. Please share any feedback or bugs you encounter via our GitHub issue tracker.

By Michael Kedar, Rex Pan, and Oliver Chang – Google Open Source Security Team