opensource.google.com

Menu

Posts from 2024

Kubernetes 1.30 is now available in GKE in record time

Friday, May 10, 2024

Kubernetes 1.30 is now available in the Google Kubernetes Engine (GKE) Rapid Channel less than 20 days after the OSS release! For more information about the content of Kubernetes 1.30, read the Kubernetes 1.30 Release Notes and the specific GKE 1.30 Release Notes.


Control Plane Improvements

We're excited to announce that ValidatingAdmissionPolicy graduates to GA in 1.30. This is an exciting feature that enables many admission webhooks to be replaced with policies defined using the Common Expression Language (CEL) and evaluated directly in the kube-apiserver. This feature benefits both extension authors and cluster administrators by dramatically simplifying the development and operation of admission extensions. Many existing webhooks may be migrated to validating admission policies. For webhooks not ready or able to migrate, Match Conditions may be added to webhook configurations using CEL rules to pre-filter requests to reduce webhooks invocations.

Validation Ratcheting makes CustomResourceDefinitions even safer and easier to manage. Prior to Kubernetes 1.30, when updating a custom resource, validation was required to pass for all fields, even fields not changed by the update. Now, with this feature, only fields changed in the custom resource by an update request must pass validation. This limits validation failures on update to the changed portion of the object, and reduces the risk of controllers getting stuck when a CustomResourceDefinition schema is changed, either accidentally or as part of an effort to increase the strictness of validation.

Aggregated Discovery graduates to GA in 1.30, dramatically improving the performance of clients, particularly kubectl, when fetching the API information needed for many common operations. Aggregated discovery reduces the fetch to a single request and allows caches to be kept up-to-date by offering ETags that clients can use to efficiently poll the server for changes.


Data Plane Improvements

Dynamic Resource Allocation (DRA) is an alpha Kubernetes feature added in 1.26 that enables flexibility in configuring, selecting, and allocating specialized devices for pods. Feedback from SIG Scheduling and SIG Autoscaling revealed that the design needed revisions to reduce scheduling latency and fragility, and to support cluster autoscaling. In 1.30, the community introduced a new alpha design, DRA Structured Parameters, which takes the first step towards these goals. This is still an alpha feature with a lot of changes expected in upcoming releases. The newly formed WG Device Management has a charter to improve device support in Kubernetes - with a focus on GPUs and similar hardware - and DRA is a key component of that support. Expect further enhancements to the design in another alpha in 1.31. The working group has a goal of releasing some aspects to beta in 1.32.


Kubernetes continues the effort of eliminating perma-beta features: functionality that has long been used in production, but still wasn’t marked as generally available. With this release, AppArmor support got some attention and got closer to the final being marked as GA.

There are also quality of life improvements in Kubernetes Data Plane. Many of them will be only noticeable for system administrators and not particularly helpful for GKE users. This release, however, a notable Sleep Action KEP entered beta stage and is available on GKE. It will now be easier to use slim images while allowing graceful connections draining, specifically for some flavors of nginx images.

Acknowledgements

We want to thank all the Googlers that provide their time, passion, talent and leadership to keep making Kubernetes the best container orchestration platform. From the features mentioned in this blog, we would like to mention especially: Googlers Cici Huang, Joe Betz, Jiahui Feng, Alex Zielenski, Jeffrey Ying, John Belamaric, Tim Hockin, Aldo Culquicondor, Jordan Liggitt, Kuba Tużnik, Sergey Kanzhelev, and Tim Allclair.

Posted by Federico Bongiovanni – Google Kubernetes Engine

OpenXLA Dev Lab 2024: Building Groundbreaking ML Systems Together

Thursday, May 9, 2024


AMD, Arm, AWS, Google, NVIDIA, Intel, Tesla, SambaNova, and more come together to crack the code for colossal AI workloads

As AI models grow increasingly complex and compute-intensive, the need for efficient, scalable, and hardware-agnostic infrastructure has never been greater. OpenXLA is a deep learning compiler framework that makes it easy to speed up and massively scale AI models on a wide range of hardware types—from GPUs and CPUs to specialized chips like Google TPUs and AWS Trainium. It is compatible with popular modeling frameworks—JAX, PyTorch, and TensorFlow—and delivers leading performance. OpenXLA is the acceleration infrastructure of choice for global-scale AI-powered products like Amazon.com Search, Google Gemini, Waymo self-driving vehicles, and x.AI's Grok.


The OpenXLA Dev Lab

On April 25th, the OpenXLA Dev Lab played host to over 100 expert ML practitioners from 10 countries, representing industry leaders like AMD, Arm, AWS, ByteDance, Cerebras, Cruise, Google, NVIDIA, Intel, Tesla, SambaNova, and more. The full-day event, tailored to AI hardware vendors and infrastructure engineers, broke the mold of previous OpenXLA Summits by focusing purely on “Lab Sessions”, akin to office hours for developers, and hands-on Tutorials. The energy of the event was palpable as developers worked side-by-side, learning and collaborating on both practical challenges and exciting possibilities for AI infrastructure.

World map showing where developers come from across countries to the OpenXLA Dev Lab
Figure 1: Developers from around the world congregated at the OpenXLA Dev Lab.

The Dev Lab was all about three key things:

  • Educate and Empower: Teach developers how to implement OpenXLA's essential workflows and advanced features through hands-on tutorials.
  • Offer Expert Guidance: Provide personalized office hours led by OpenXLA experts to help developers refine their ideas and contributions.
  • Foster Community: Encourage collaboration, knowledge-sharing, and lasting connections among the brilliant minds in the OpenXLA community.

Tutorials

The Tutorials included:

Integrating an AI Compiler & Runtime into PJRT

  • Learn how PJRT connects ML frameworks to AI accelerators, standardizing their interaction for easy model deployment on diverse hardware.
  • Explore the PJRT C API for framework-hardware communication.
  • Implement a PJRT Plugin, a Python package that implements the C API.
  • Discover plugin examples for Apple Metal, CUDA, Intel GPU, and TPU.

Led by Jieying Luo and Skye Wanderman-Milne


Extracting StableHLO Graphs + Intro to StableHLO Quantizer

  • Learn to export StableHLO from JAX, PyTorch, and TensorFlow using static/dynamic shapes and SavedModel format.
  • Hack along with the tutorial using the JAX, PyTorch, and TensorFlow Colab notebooks provided on OpenXLA.org.
  • Simplify quantization with StableHLO Quantizer; a framework and device-agnostic tool.
  • Explore streamlined parameter selection and model rewriting for lower precision.

Led by Kevin Gleason, Jen Ha, and Xing Liu


Optimizing PyTorch/XLA Auto-sharding for Your Hardware

  • Discover this experimental feature that automates distributing large-scale PyTorch models across XLA devices.
  • Learn how it partitions and distributes for out-of-the-box performance without manual intervention
  • Explore future directions such as customizable cost models for different hardware

Led by Yeounoh Chung and Pratik Fegade


Optimizing Compute and Communication Scheduling with XLA

  • Scale ML models on multi-GPUs with SPMD partitioning, collective communication, HLO optimizations.
  • Explore tensor parallelism, latency hiding scheduler, pipeline parallelism.
  • Learn collective optimizations, pipeline parallelism for efficient large-scale training.

Led by Frederik Gossen, TJ Xu, and Abhinav Goel


Lab Sessions

Lab Sessions featured use case-specific office hours for AMD, Arm, AWS, ByteDance, Intel, NVIDIA, SambaNova, Tesla, and more. OpenXLA engineers were on hand to provide development teams with dedicated support and walkthrough specific pain points and designs. In addition, Informational Roundtables that covered broader topics like GPU ML Performance Optimization, JAX, and PyTorch-XLA GPU were available for those without specific use cases. This approach led to productive exchanges and fine-grained exploration of critical contribution areas for ML hardware vendors.

four photos of participants and vendors at OpenXLA Dev Lab

Don’t just take our word for it – here’s some of the feedback we received from developers:

"PJRT is awesome, we're looking forward to building with it. We are very grateful for the support we are getting." 
      — Mark Gottscho, Senior Manager and Technical Lead at SambaNova
"Today I learned a lot about Shardonnay and about some of the bugs I found in the GSPMD partitioner, and I got to learn a lot of cool stuff." 
      — Patrick Toulme, Machine Learning Engineer at AWS
“I learned a lot, a lot about how XLA is making tremendous progress in building their community.” 
      — Tejash Shah, Product Manager at NVIDIA
“Loved the format this year - please continue … lots of learning, lots of interactive sessions. It was great!” 
      — Om Thakkar, AI Software Engineer at Intel

Technical Innovations and The Bold Road Ahead

The event kicked off with a keynote by Robert Hundt, Distinguished Engineer at Google, who outlined OpenXLA's ambitious plans for 2024, particularly three major areas of focus:

  • Large-scale training
  • GPU and PyTorch compute performance
  • Modularity and extensibility

Empowering Large-Scale Training

OpenXLA is introducing powerful features to enable model training at record-breaking scales. One of the most notable additions is Shardonnay, a tool coming soon to OpenXLA that automates and optimizes how large AI workloads are divided across multiple processing units, ensuring efficient use of resources and faster time to solution. Building on the success of its predecessor, SPMD, Shardonnay empowers developers with even more fine-grained control over partitioning decisions, all while maintaining the productivity benefits that SPMD is known for.

Diagram of sharding representation with a simple rank 2 tensor and 4 devices.
Figure 2: Sharding representation example with a simple rank 2 tensor and 4 devices.

In addition to Shardonnay, developers can expect a suite of features designed to optimize computation and communication overlap, including:

  • Automatic profile-guided latency estimation
  • Collective pipelining
  • Heuristics-based collective combiners

These innovations will enable developers to push the boundaries of large-scale training and achieve unprecedented performance and efficiency.


OpenXLA Delivers on TorchBench Performance

OpenXLA has also made significant strides in enhancing performance, particularly on GPUs with key PyTorch-based generative AI models. PyTorch-XLA GPU is now neck and neck with TorchInductor for TorchBench Full Graph Models and has a TorchBench pass rate within 5% of TorchInductor.

A bar graph showing a performance comparison of TorchInductor vs. PyTorch-XLA GPU on Google Cloud NVIDIA H100 GPUs
Figure 3: Performance comparison of TorchInductor vs. PyTorch-XLA GPU on Google Cloud NVIDIA H100 GPUs. “Full graph models” represent all TorchBench models that can be fully represented by StableHLO

Behind these impressive gains lies XLA GPU's global cost model, a game-changer for developers. In essence, this cost model acts as a sophisticated decision-making system, intelligently determining how to best optimize computations for specific hardware. The cost model delivers state-of-the-art performance through a priority-based queue for fusion decisions and is highly extensible, allowing third-party developers to seamlessly integrate their backend infrastructure for both general-purpose and specialized accelerators. The cost model's adaptability ensures that computation optimizations are tailored to specific accelerator architectures, while less suitable computations can be offloaded to the host or other accelerators.

OpenXLA is also breaking new ground with novel kernel programming languages, Pallas and Mosaic, which empower developers to write highly optimized code for specialized hardware. Mosaic demonstrates remarkable efficiency in programming key AI accelerators, surpassing widely used libraries in GPU code generation efficiency for models with 64, 128, and 256 Q head sizes, as evidenced by its enhanced utilization of TensorCores.

A bar graph showing a performance comparison of Flash Attention vs. Mosaic GPU on NVIDIA H100 GPUs
Figure 4: Performance comparison of Flash Attention vs. Mosaic GPU on NVIDIA H100 GPUs.

Modular and Extensible AI Development

In addition to performance enhancements, OpenXLA is committed to making the entire stack more modular and extensible. Several initiatives planned for 2024 include:

  • Strengthening module interface contracts
  • Enhancing code sharing between platforms
  • Enabling a shared high-level compiler flow through runtime configuration and component registries

A flow diagram showing modules and subcomponents of the OpenXLA stack.
Figure 5: Modules and subcomponents of the OpenXLA stack.

These improvements will make it easier for developers to build upon and extend OpenXLA.

Alibaba's success with PyTorch XLA FSDP within their TorchAcc framework is a prime example of the benefits of OpenXLA's modularity and extensibility. By leveraging these features, Alibaba achieved state-of-the-art performance for the LLaMa 2 13B model, surpassing the previous benchmark set by Megatron. This demonstrates the power of the developer community in extending OpenXLA to push the boundaries of AI development.

A bar graph showing a performance comparison of TorchAcc and Megatron for  LLaMa 2 13B at different number of GPUs.
Figure 6: Performance comparison of TorchAcc and Megatron for LLaMa 2 13B at different numbers of GPUs.

Join the OpenXLA Community

If you missed the Dev Lab, don't worry! You can still access StableHLO walkthroughs on openxla.org, as well as the GitHub Gist for the PJRT session. Additionally, the recorded keynote and tutorials are available on our YouTube channel. Explore these resources and join our global community – whether you're an AI systems expert, model developer, student, or just starting out, there's a place for you in our innovative ecosystem.

four photos of participants and vendors at OpenXLA Dev Lab

Acknowledgements

Adam Paszke, Allen Hutchison, Amin Vahdat, Andrew Leaver, Andy Davis, Artem Belevich, Abhinav Goel, Benjamin Kramer, Berkin Ilbeyi, Bill Jia, Eugene Zhulenev, Florian Reichl, Frederik Gossen, George Karpenkov, Gunhyun Park, Han Qi, Jack Cao, Jaesung Chung, Jen Ha, Jianting Cao, Jieying Luo, Jiewen Tan, Jini Khetan, Kevin Gleason, Kyle Lucke, Kuy Mainwaring, Lauren Clemens, Manfei Bai, Marisa Miranda, Michael Levesque-Dion, Milad Mohammadi, Nisha Miriam Johnson, Penporn Koanantakool, Robert Hundt, Sandeep Dasgupta, Sayce Falk, Shauheen Zahirazami, Skye Wanderman-milne, Yeounoh Chung, Pratik Fegade, Peter Hawkins, Vaibhav Singh, Tamás Danyluk, Thomas Joerg, TJ Xu, Jacques Pienaar, David Dunleavy, Bart Chrzaszcz, Tom Natan, and Cyril Bortolato.

By James Rubin, Aditi Joshi, and Elliot English on behalf of the OpenXLA Project

Google Summer of Code 2024 accepted contributors announced!

Wednesday, May 1, 2024


We are celebrating our 20th anniversary of Google Summer of Code (GSoC) and we are thrilled to announce the new 1,220 Contributors that will be writing code for 195 open source mentoring organizations starting May 27. Over the last few weeks, our mentoring organizations have read through applications, had discussions with applicants, and made the difficult decision of selecting the GSoC Contributors they will be mentoring this summer.

Highlighting significant results from this year’s application period:

    • 43,984 applicants from 172 countries
    • 9,107 proposals submitted by 6,518 applicants
    • 1,220 GSoC contributors accepted from 73 countries
    • Over 2,800 mentors and organization administrators
    • 34 mentoring organizations are participating in their 16th-20th GSoC!

Starting today, our GSoC 2024 Contributors will actively engage with their new open source community and become familiar with their organizations during the Community Bonding period. Mentors will guide the GSoC Contributors through documentation and introduce them to community norms and processes while helping plan their milestones and projects for the summer. Coding begins May 27th and while most folks will wrap up September 2nd, GSoC Contributors have the opportunity to request a longer coding period and wrap up their projects between early September and early November based on their schedules and availability.

We’d like to express our gratitude to the thousands of applicants who took the time and effort to reach out to our mentoring organizations and submit proposals this year. The experience of researching, asking questions and becoming more familiar with open source communities has hopefully helped you feel excited about open source and maybe you found a great community that you want to contribute to outside of Google Summer of Code! Communication is key to GSoC and open source, and staying connected with the community or reaching out to other organizations is an exceptional way to set the stage for future opportunities. Open source communities are always looking for new and eager collaborators to join their projects.

A huge thank you to all of our mentors and organization administrators who make this program so special and impactful for thousands of developers each year. Google Summer of Code continues because of the dedication of mentors to keep the open source ecosystem thriving by supporting new developers and their exciting perspectives and ideas. Google is honored to support the open source ecosystem (and 1,000+ open source projects and 20,000+ developers) over these past 20 years.

GSoC Contributors — have fun this summer, keep learning and enjoy becoming part of the open source community! Your mentors and community members have dozens, and in some cases hundreds, of years of combined experience so let them share their knowledge with you to help you become phenomenal open source contributors.

By Stephanie Taylor – Program Lead and Lucila Ortiz – Program Administrator

Get ready for Google I/O: Program lineup revealed


Developers, get ready! Google I/O is just around the corner, kicking off live from Mountain View with the Google keynote on Tuesday, May 14 at 10 am PT, followed by the Developer keynote at 1:30 pm PT.

But the learning doesn’t stop there. Mark your calendars for May 16 at 8 am PT when we’ll be releasing over 150 technical deep dives, demos, codelabs, and more on-demand. If you register online, you can start building your 'My I/O' agenda today.

Here's a sneak peek at some of the exciting highlights from the I/O program preview:

Unlocking the power of AI: The Gemini era unlocks a new frontier for developers. We'll showcase the newest features in the Gemini API, Google AI Studio, and Gemma. Discover cutting-edge pre-trained models from Kaggle, and delve into Google's open-source libraries like Keras and JAX.

Android: A developer's playground: Get the latest updates on everything Android! We'll cover groundbreaking advancements in generative AI, the highly anticipated Android 15, innovative form factors, and the latest tools and libraries in the Jetpack and Compose ecosystem. Plus, discover how to optimize performance and streamline your development workflow.

Building beautiful and functional web experiences: We’ll cover Baseline updates, a revolutionary tool that empowers developers with a clear understanding of web features and API interoperability. With Baseline, you'll have access to real-time information on popular developer resource sites like MDN, Can I Use, and web.dev.

The future of ChromeOS: Get a glimpse into the exciting future of ChromeOS. We'll discuss the developer-centric investments we're making in distribution, app capabilities, and operating system integrations. Discover how our partners are shaping the future of Chromebooks and delivering world-class user experiences.

This is just a taste of what's in store at Google I/O. Stay tuned for more updates, and get ready to be a part of the future.

Don't forget to mark your calendars and register for Google I/O today!

Posted by Timothy Jordan – Director, Developer Relations and Open Source

The Power of Open Source

Thursday, April 11, 2024


At the day 1 keynote of Open Source Summit North America, Timothy Jordan, Director of Developer Relations and Open Source at Google, will talk about the landscape of open source and AI, the importance of a responsible approach, and the transformative impact of community collaboration. In anticipation of this talk, let’s break down the AI open source ecosystem, and how Google approaches it.

Google believes in the power of open technology to drive innovation and benefit everyone. It fosters creativity and collaboration, while ensuring technology access for developers and allowing customization to fit unique use cases. Open source licenses give developers full creative autonomy without restriction. It is this ecosystem of open source and open technology, shaped by ML frameworks like TensorFlow, Keras, and JAX, that has enabled so many incredible advances in AI in recent years.

The open source community has been in discussion on how to apply the Open Source Definition to carry forward the open principles of the OSD while addressing concepts like derived work and author attribution in AI. During Timothy’s keynote, he’ll speak to his own philosophy on Open Source and AI, and share how his assumptions about how we apply open source to AI have evolved. The immediate availability of AI models, powered by the open source ecosystem of ML frameworks, means it’s more important than ever that we establish a shared definition for open source and AI.

While that definition is in development, at Google we’re using precise language to describe our openly available models like Gemma. The definition and license is only one part of this open ML/AI future; advancements in safety tooling, policies, and developer knowledge are all part of creating a responsible and open future for AI. Those advancements are all fueled by a dedication to collaboration. Whether sharing innovations and improvements with the community, or having conversations with policymakers and open source leaders, collaboration is key to a responsible approach to AI in the open ecosystem. AI can only be safe and responsible if everyone’s experiences and perspectives are brought to the forefront as it’s built.

To demonstrate how open source has made AI readily available, Timothy will also take the audience through a “low code” demo of how to run large language models in-browser for web applications. Using MediaPipe, the LLM Inference API, and Gemma, users can quickly add genAI capabilities like document summarization and text generation.

Join us at Open Source Summit North America for this keynote, and visit opensource.google to learn more.

By the Google Open Source team

Google Season of Docs announces participating organizations for 2024

Wednesday, April 10, 2024

Google Season of Docs provides support for open source projects to improve their documentation and gives professional technical writers an opportunity to gain experience in open source. Together we improve developer experience through better documentation, increase our understanding of best practices in open source documentation, and raise the profile of technical writers in open source.

For 2024, Google Season of Docs is pleased to announce that eleven organizations will be participating in the program! The list of participating organizations can be viewed on the website.

The project development phase now begins. Organizations and the technical writers they hire will work on their documentation projects from now until November 22nd. For organizations still looking to hire a technical writer, the hiring deadline is May 22nd.


How do I take part in Google Season of Docs as a technical writer?

Start by reading the technical writer guide and FAQs which give information about eligibility and choosing a project. Next, technical writers interested in working with accepted open source organizations can share their contact information via the Season of Docs GitHub repository; or they may submit a statement of interest directly to the organizations. We recommend technical writers reach out to organizations before submitting a statement of interest to discuss the project they’ll be working on and gain a better understanding of the organization. Technical writers do not need to submit a formal application through Google Season of Docs, so reach out to the organizations as soon as possible!


Will technical writers be paid while working with organizations accepted into Google Season of Docs?

Yes. Participating organizations will transfer funds directly to the technical writer via OpenCollective. Technical writers should review the organization's proposed project budgets and discuss their compensation and payment schedule with the organization before hiring. Check out our technical writer payment process guide for more details.


General Timeline

May 22, 2024

Technical writer hiring deadline

June 5, 2024

Organization administrators start reporting on their project status via monthly evaluations

December 10, 2024

Final date for Organization administrators submit their case study and final project evaluation

December 13, 2024

Google publishes the 2024 Season of Docs case studies and aggregate project data

May 1, 2025

Organizations begin to participate in post-program followup surveys

See the full timeline for details.


Care to join us?

Explore the Google Season of Docs website at g.co/seasonofdocs to learn more about the program. Use our logo and other promotional resources to spread the word. Review the timeline, check out the FAQ, and reach out to organizations now!

If you have any questions about the program, please email us at season-of-docs@google.com.

By Erin McKean, Google Open Source Programs Office

Introducing Jpegli: A New JPEG Coding Library

Wednesday, April 3, 2024

The internet has changed the way we live, work, and communicate. However, it can turn into a source of frustration when pages load slowly. At the heart of this issue lies the encoding of images. To improve on this, we are introducing Jpegli, an advanced JPEG coding library that maintains high backward compatibility while offering enhanced capabilities and a 35% compression ratio improvement at high quality compression settings.

Jpegli is a new JPEG coding library that is designed to be faster, more efficient, and more visually pleasing than traditional JPEG. It uses a number of new techniques to achieve these goals, including:

  • It provides both a fully interoperable encoder and decoder complying with the original JPEG standard and its most conventional 8-bit formalism, and API/ABI compatibility with libjpeg-turbo and MozJPEG.
  • High quality results. When images are compressed or decompressed through Jpegli, more precise and psychovisually effective computations are performed and images will look clearer and have fewer observable artifacts.
  • Fast. While improving on image quality/compression density ratio, Jpegli's coding speed is comparable to traditional approaches, such as libjpeg-turbo and MozJPEG. This means that web developers can effortlessly integrate Jpegli into their existing workflows without sacrificing coding speed performance or memory use.
  • 10+ bits. Jpegli can be encoded with 10+ bits per component. Traditional JPEG coding solutions offer only 8 bit per component dynamics causing visible banding artifacts in slow gradients. Jpegli's 10+ bits coding happens in the original 8-bit formalism and the resulting images are fully interoperable with 8-bit viewers. 10+ bit dynamics are available as an API extension and application code changes are needed to benefit from it.
  • More dense: Jpegli compresses images more efficiently than traditional JPEG codecs, which can save bandwidth and storage space, and speed up web pages.

How Jpegli works

Jpegli works by using a number of new techniques to reduce noise and improve image quality; mainly adaptive quantization heuristics from the JPEG XL reference implementation, improved quantization matrix selection, calculating intermediate results precisely, and having the possibility to use a more advanced colorspace. All the new methods have been carefully crafted to use the traditional 8-bit JPEG formalism, so newly compressed images are compatible with existing JPEG viewers such as browsers, image processing software, and others.


Adaptive quantization heuristics

Jpegli uses adaptive quantization to reduce noise and improve image quality. This is done by spatially modulating the dead zone in quantization based on psychovisual modeling. Using adaptive quantization heuristics that we originally developed for JPEG XL, the result is improved image quality and reduced file size. These heuristics are much faster than a similar approach originally used in guetzli.


Improved quantization matrix selection

Jpegli also uses a set of quantization matrices that were selected by optimizing for a mix of psychovisual quality metrics. Precise intermediate results in Jpegli improve image quality, and both encoding and decoding produce higher quality results. Jpegli can use JPEG XL's XYB colorspace for further quality and density improvements.


Testing Jpegli

In order to quantify Jpegli's image quality improvement we enlisted the help of crowdsourcing raters to compare pairs of images from Cloudinary Image Dataset '22, encoded using three codecs: Jpegli, libjpeg-turbo and MozJPEG, at several bitrates.

In this comparison we limited ourselves to comparing the encoding only, decoding was always performed using libjpeg-turbo. We conducted the study with the XYB ICC color profile disabled since that is how we anticipate most users would initially use Jpegli. To simplify comparing the results across the codecs and settings, we aggregated all the rater decisions using chess rankings inspired ELO scoring.

A bar graph of ELO scores on the left and plot graph of ELO scores on the right
A higher ELO score indicates a better aggregate performance in the rater study. We can observe that jpegli at 2.8 BPP received a higher ELO rating than libjpeg-turbo at 3.7 BPP, a bitrate 32 % higher than Jpegli's.

Results

Our results show that Jpegli can compress high quality images 35% more than traditional JPEG codecs.


Jpegli is a promising new technology that has the potential to make the internet faster and more beautiful.

By Zoltan Szabadka, Martin Bruse and Jyrki Alakuijala – Paradigms of Intelligence, Google Research

OSV and helping developers fix known vulnerabilities

Tuesday, April 2, 2024

In 2021, we launched the OSV project with a goal of enabling easy management of known vulnerabilities in open source software dependencies. To achieve this, we started by building an open source, comprehensive database (https://osv.dev) that accurately describes all known OSS vulnerabilities in the easy-to-use OpenSSF OSV Schema.

Over time, we worked with numerous open source communities to adopt the OSV Schema (totalling over 24 ecosystems), and introduced open source tools like our API and OSV-Scanner to directly make this database useful to developers.

The OSV project takes a very developer-focused approach to vulnerability management, as we realize that day-to-day developers are often the ones who bear the burden of managing dependency updates and triaging vulnerabilities in their dependencies.

Today the OSV team is excited to announce some exciting updates to the work we’ve been doing, and share how the OSV project as a whole helps developers with vulnerability management today.


Announcing guided remediation

Developers are often faced with an overwhelming number of vulnerabilities reported against their dependencies. To tackle this, we’re announcing a tool as part of OSV-Scanner to enable developers to both interactively and automatically prioritize and fix the vulnerabilities that matter in an easy way.

The basic usage of the tool provides a simple command for developers to run which will automatically fix as many vulnerabilities as possible by upgrading their project’s dependencies.

For developers who need or want finer control over vulnerability remediation, the tool also provides the more advanced interactive mode. In the interactive mode, developers can preview and make informed decisions on which packages to upgrade or which vulnerabilities they want to prioritize based on metrics such as vulnerability severity, dependency depth, or dependency type.

Filtering by all these advanced metrics are also available via CLI flags for running the tool non-interactively, which enables integration of guided remediation into automated workflows. For example, developers can connect the tool with their CI/test pipelines to determine the set of non-breaking dependency upgrades.

Currently, the guided remediation tool supports npm package.json and package-lock.json dependencies, but we’ll be adding support for more ecosystems in the future.

Check out our detailed documentation for more information or if you would like to try it out for yourself!


OSV-Scanner GitHub action

We’ve also recently launched the OSV-Scanner GitHub action, which provides an easy way for developers to integrate vulnerability scanning using OSV.dev into their CI/CD pipelines. This is currently being used by Tensorflow and Flutter to provide continuous scanning of their dependencies.

Our GitHub Action can be configured to do the following:

  • Regular vulnerability scan workflow. A common use case is to set a schedule to regularly scan the repository, with the workflow failing if a new vulnerability is found. Another use case can be to block release workflows if a vulnerability is found.
  • Trigger a differential vulnerability scan to run when a pull request is opened. This workflow can determine if your changes introduce new vulnerabilities and can be configured to block pull requests when the action fails. Enabling just this feature can allow you to stop new vulnerabilities from being introduced, while not breaking your existing workflows.

Head over to our documentation to see a quick and easy guide on how to get started integrating the OSV-Scanner action into your GitHub repository.


Other OSV features

Guided remediation and the GitHub actions support form is one piece of enabling our goal of making vulnerability management easier.

OSV also provides a broad suite of features:


What’s next?

We still have a lot more exciting work planned! A remaining challenge for dealing with known vulnerabilities in dependencies is remediation and dealing with false positives. Much of our work is focused on improving data quality and providing accurate and actionable results that lead to easy remediation.

These include:

  • Iterating on guided remediation: by addressing user feedback and adding support for additional ecosystems.
  • Improving container scanning. OSV-Scanner has so far focused on source repository scanning. One important gap we aim to fill is to provide better support for container scanning, in a way that provides actionable and useful remediation guidance, while minimizing false positives.
  • Continue to improve matching and data quality. A continuing focus for OSV-Scanner is making sure that our scanning is comprehensive and accurate. Accuracy is especially important for us, as one of our core goals is to minimize false positives and vulnerability noise for developers at the receiving end of the scanners through things such as reachability analysis.

Interested in using OSV in your project? Check out our OSV-Scanner and OSV.dev documentation for how to get started. Please share any feedback or bugs you encounter via our GitHub issue tracker.

By Michael Kedar, Rex Pan, and Oliver Chang – Google Open Source Security Team

Google Summer of Code 2024 contributor applications are open!

Monday, March 18, 2024

We are thrilled to announce that the Contributor applications for Google Summer of Code (GSoC) 2024 are now open! If you are a Student or a beginner in open source software development and 18+ years old, we hope you will apply. The application period opened March 18th at 18:00 UTC and closes April 2nd at 18:00 UTC.

This year we are celebrating the 20th year of Google Summer of Code! During GSoC, contributors will spend 12+ weeks writing code and learning more about open source software development under the guidance of experienced mentors.

Since 2005, GSoC has welcomed thousands of new developers into the open source community every year. The GSoC program has brought together over 20,000 contributors from 116 countries and 19,000 mentors from 850+ open source organizations.

This year we have added more projects focused on Artificial Intelligence, Machine Learning and Security than ever before; keep in mind the following points before applying:

  • Consider the three project sizes: ~90 hours, ~175 hours, and ~350 hours and choose which time commitment is best for you.
  • Contributors can submit a maximum of 3 project proposals (to different orgs or even multiple proposals to the same org).

With GSoC contributor applications now open, please review these helpful tips to guide your application:

  • Read the program rules, FAQ, contributor guide, and advice for applying and join us in our Discord chat Channel to connect with the community.
  • Review the list of 195 mentoring organizations and use filters to sort by your interests including programming language (python, Rust, etc.) and category (data, development tools, artificial intelligence, infrastructure and cloud, security, etc.).
  • Narrow down your list to 2-4 organizations and review their ideas list.
  • Reach out to the organizations via their contact methods listed on the GSoC site immediately.
  • Engage with the organization early and often, good communication is key! You must talk to the organization about your proposal before the application period ends if you want to be accepted into the program.
  • Watch our Intro to GSoC video, as well as the GSoC Org Highlight videos and Community Talks series, to get inspired about projects that contributors have worked on in the past.

Interested contributors may register and submit project proposals on the GSoC site from now until Tuesday, April 2nd at 18:00 UTC.

Best of luck to all our applicants!

By Stephanie Taylor – Program Manager, and Lucila Ortiz – Associate Program Manager for the Google Open Source Programs Office

PJRT Plugin to Accelerate Machine Learning

Wednesday, March 13, 2024

PJRT is an open, stable interface for device runtime and compiler, which simplifies ML hardware and framework integration. With PJRT, ML frameworks become hardware-agnostic and ML hardware becomes pluggable. For the ML developer, it simplifies the adoption of new ML hardware and models become more portable. This addresses ML infrastructure fragmentation across frameworks, compilers and runtimes enhancing the industry’s ability to productionize ML-driven advancements with velocity and at scale.

This article provides an overview of what building a PJRT plugin entails, how frameworks (and models) can use this plugin, and some updates on the PJRT API. PJRT is now used by a growing spectrum of hardware: Apple silicon, Google Cloud TPU, NVIDIA GPU, and Intel Max GPU. We also share a spotlight on Apple’s adoption of PJRT with some details on the workflow and performance.

If you’re developing an ML hardware accelerator or developing your own compiler and runtime, check out the PJRT source code on GitHub and sign up for the PJRT mailing list to quickly bootstrap your work.

What’s in a PJRT Plugin

PJRT was introduced to simplify the growing complexity of ML workload execution across hardware and frameworks. PJRT (used in conjunction with StableHLO) is a stable interface for device runtime and compiler, which abstracts away device specific implementations from frameworks.

An implementation of the PJRT API is called a PJRT plugin, which is usually a Python package for seamless ML model developer experience. To build a PJRT plugin for a hardware target, the following methods need to be implemented:

  • Compile: compile (program) -> executable
  • Runtime: execute (executable, arguments) -> results
  • Memory management: transfer buffer from host to device, device to host, device to device, as well as buffer management such as buffer donation
  • Topology information such as the platform, how many accelerators and how are they attached.

ML frameworks will discover and load one or multiple PJRT plugins, and call the PJRT API to compile and execute the model. The PJRT plugins may be required to register to the ML frameworks depending on the specific discovery mechanism the framework uses.

API Updates

Versioning and ABI Compatibility

PJRT API has a major version and a minor version. If the framework is newer than the plugin, the framework provides a N-week (N=6 today) forwards compatibility window for minor version updates. The major version updates will be a coordinated update. Frameworks will not support plugins with a lower major version. If the plugin is newer than the framework, plugins will define their own backward compatibility policy.

Multi-Node

A PJRT client is per node, and the plugin may need some way to communicate among nodes in a distributed workload. The framework can pass in key-value store callbacks to the plugin. The plugin can use them to bootstrap multi-node initialization and other coordination needs. An example with the NVIDIA GPU CUDA plugin is as follows:

  • JAX starts a distribution service and provides key-value store callbacks.
  • NVIDIA GPU CUDA plugin uses these callbacks to (1) generate global PJRT device topology that includes PJRT device information from all nodes, and (2) generate NCCL ids.

DLPack

A few C APIs were added to PJRT to support DLPack.

  • PJRT_Client_CreateViewOfDeviceBuffer supports receiving buffers from DLPack.
  • Exporting buffers to DLPack requires: PJRT_Buffer_IncreaseExternalReferenceCount, PJRT_Buffer_DecreaseExternalReferenceCount to get a PJRT_Buffer_OpaqueDeviceMemoryDataPointer.

Extension

PJRT API provides an extension mechanism that the plugin can provide extensions which are optional or experimental features. These extensions can have their own compatibility guarantee and do not need to support the ABI compatibility of PJRT API.

Industry Adoption

PJRT is the only interface for JAX, the primary interface for TensorFlow and fully supported for PyTorch through PyTorch/XLA. PJRT is not tied to a specific compiler and runtime. The toolchain-independent architecture and open-source availability as part of the OpenXLA Project allows it to be leveraged by any hardware, framework or compiler, with extensibility for unique features. This has allowed PJRT to be adopted by various industry partners through close collaboration. A brief account of Apple’s adoption of PJRT follows.

JAX on Apple Silicon

Apple’s PJRT plugin for the Metal training backend accelerates JAX models on Apple silicon and AMD GPUs. This empowers any ML developers to leverage the full potential of Apple silicon and AMD GPUs on their Apple hardware to accelerate JAX models for faster experimentation. The integration and user experience to accelerate JAX on Apple silicon GPUs is similar to the existing PyTorch and TensorFlow implementations.

The Metal plug-in uses the OpenXLA compiler and PJRT runtime to optimize and accelerate JAX workloads on GPU. When a JAX program is executed, the JAX graph is lowered into StableHLO, which is then passed to PJRT for compilation and execution. The StableHLO is converted to MPSGraph executables and the Metal runtime APIs are invoked to dispatch to the GPU.

Performance

The Metal backend with PJRT plugin provides impressive performance speedup for JAX. On an Apple MacBook Pro with M2 Max, training common networks in JAX see performance speedups of up to 28x, with an average of 10x over a CPU baseline. This empowers any ML developer to leverage the full potential of Apple Silicon on their Apple hardware to accelerate JAX models for faster experimentation.

graph of performance speedups of up to 28x on Apple MacBook Pro with M2 Max over CPU for JAX training.
Figure 1: Performance speedups of up to 28x on Apple MacBook Pro with M2 Max over CPU for JAX training.

Getting Started

Adding Metal support to JAX is as simple as a single pip install:

python -m pip install jax-metal
python -c 'import jax; print(jax.numpy.arange(10))'

For more details on environment setup and installation of JAX on Apple hardware, please refer to the Metal Developer Resources page.

Google Cloud TPU

PJRT is the default runtime for PyTorch 2.0 on Google Cloud TPU. GitHub Readme has more details.

NVIDIA GPU

The NVIDIA GPU CUDA implementation in JAX is extracted and packaged as a PJRT plugin. The ML model developers can install the NVIDIA GPU CUDA plugin from pypi. This plugin uses the newly added features such as multi-node, DLPack, and extensions.

Intel GPU

Intel is leveraging PJRT in Intel® Extension for TensorFlow to provide the Intel GPU backend for TensorFlow, JAX and PyTorch. The example of executing a JAX program on Intel GPU demonstrates how this greatly simplifies the framework and hardware integration.

PJRT Resources

PJRT is available on GitHub: source code for the API, integration guides and issues. If you develop ML frameworks, compilers, runtimes or are interested in improving portability of workloads across hardware, we want your feedback. We encourage you to contribute code, design ideas and feature suggestions. We also invite you to join the PJRT mailing list to stay updated with the latest product and community announcements and to help shape the future of an interoperable ML infrastructure.

Acknowledgements

Chalana Bezawada, Daniel Doctor, Kulin Seth, Shuhan Ding from Apple 
Penporn Koanantakool, Peter Hawkins, Skye Wanderman-Milne, Xiao Yu from Google.

By Aman Verma – Product Manager, Machine Learning Infrastructure, Google and Jieying Luo – Software Engineer, Machine Learning Infrastructure, Google

Mentor organizations announced for Google Summer of Code 2024

Wednesday, February 21, 2024

We are thrilled to share that we have 195 open source projects that have been selected for Google Summer of Code (GSoC) 2024! This year we are excited to welcome 30 new organizations for their first year as part of the program.

Check out our program site to view the complete list of GSoC 2024 accepted mentoring organizations. Get to know more about each organization on their GSoC program page, which includes reading through the project ideas that they are looking for GSoC contributors to work on this year.

Are you interested in being a GSoC Contributor?

The 2024 GSoC program is open to students and to beginners in open source software development. Contributor applications will open on Monday, March 18, 2024 at 18:00 UTC with a deadline of Tuesday, April 2, 2024 18:00 UTC to submit your application (including your project proposal).

If you are eager to enhance your chances of becoming a successful contributor this year, we highly recommend beginning your preparations and initiating communication with the organizations that interest you right away. Below are some tips for prospective GSoC contributors to accomplish before the application period begins March 18th:

  • Watch our ‘Introduction to GSoC’ video to see a quick overview of the program, and view our Community Talks or Org Highlight Videos to get inspired and learn more about some projects that contributors have worked on in the past.
  • Check out the Contributor Guide (so much great info in here!) and Advice for Applying to GSoC doc.
  • Review the list of accepted organizations here. We recommend finding two to four that interest you and reading through their project ideas lists. Use the filters on the site to help you narrow down based on the programming languages you are familiar with and the categories that interest you (cloud, AI, security, science, etc.).
  • As soon as you see an idea that sparks your interest, reach out to the organization via their preferred communication methods (listed on their org page on the GSoC program site). The earlier you start the conversation, the better your chances of being accepted as a GSoC contributor.
  • Talk with the mentors and community to determine if this project idea is something you would enjoy working on during the program. Find a project that excites you, otherwise it may be a challenging summer for you and your mentor.
  • Use the information you received during your communications with the mentors and other org community members to write up your proposal.

You can find more information about the program on our website which includes a full timeline of important dates. We also urge anyone interested in applying to read the FAQ and Program Rules and watch some of our other videos with more details about GSoC for contributors and mentors.

A hearty welcome—and thank you—to all of our mentor organizations. We look forward to working with all of you during this 20th year of Google Summer of Code!

By Stephanie Taylor – Google Open Source

Building Open Models Responsibly in the Gemini Era

Google has long believed that open technology is not only good for our company, but good for the industry, consumers, and the world. We’ve released open-source projects like Android and Chromium that transformed access to mobile and web technologies, and have done the same in AI with Transformers, TensorFlow, and AlphaFold. The release of our Gemma family of open models is a next step in how we’re deepening our commitment to open technology alongside an industry-leading safe, responsible approach. At the same time, the rapidly evolving nature of AI raises important considerations for how to enable safety-aligned open models: an approach that supports broad innovation while promoting safe uses.

A benefit of open source is that once it is released, its license gives users full creative autonomy. This is a powerful guarantee of technology access for developers and end users. Another benefit is that open-source technology can be modified to fit the unique use case of the end user, without restriction.

In the hands of a malicious actor, however, the lack of restrictions can raise risks. Computing has been through similar cycles before, addressing issues such as protecting users of the open internet, handling cryptography, and addressing open-source software security. We now face this challenge with AI. Below we share the approach we took to openly releasing Gemma models, and the advancements in open model safety we hope to accelerate.


Providing access to Gemma open models

Today, Gemma models are being released as what the industry collectively has begun to refer to as “open models.” Open models feature free access to the model weights, but terms of use, redistribution, and variant ownership vary according to a model’s specific terms of use, which may not be based on an open-source license. The Gemma models’ terms of use make them freely available for individual developers, researchers, and commercial users for access and redistribution. Users are also free to create and publish model variants. In using Gemma models, developers agree to avoid harmful uses, reflecting our commitment to developing AI responsibly while increasing access to this technology.

We’re precise about the language we’re using to describe Gemma models because we’re proud to enable responsible AI access and innovation, and we’re equally proud supporters of open source. The definition of "Open Source" has been invaluable to computing and innovation because of requirements for redistribution and derived works, and against discrimination. These requirements enable cross-industry collaboration, individual innovation and entrepreneurship, and shared research to happen with exponential effects.

However, existing open-source concepts can’t always be directly applied to AI systems, which raises questions on how to use open-source licenses with AI. It’s important that we carry forward open principles that have made the sea-change we’re experiencing with AI possible while clarifying the concept of open-source AI and addressing concepts like derived work and author attribution.


Taking a comprehensive approach to releasing Gemma safely and responsibly

Licensing and terms of use are only one part of the evaluations, technical tools, and considered decision-making that went into aligning this release with our responsible AI Principles. Our approach involved:

  • Systematic internal review in accordance with our AI Principles: Consistent with our AI Principles, we release models only when we have determined the benefits are significant, and the risks of misuse are low or can be mitigated. We take that same approach to open models, incorporating a balance of the benefits of wider access to a particular model as well as the risks of misuse and how we can mitigate them. With Gemma, we considered the increased AI research and innovation by us and many others in the community, the access to AI technology the models could bring, and what access was needed to support these use cases.
  • A high evaluation bar: Gemma models underwent thorough evaluations, and were held to a higher bar for evaluating risk of abuse or harm than our proprietary models, given the more limited mitigations currently available for open models. These evaluations cover a broad range of responsible AI areas, including safety, fairness, privacy, societal risk, as well as capabilities such as chemical, biological, radiological, nuclear (CBRN) risks, cybersecurity, and autonomous replication. As described in our technical report, the Gemma models exhibit state-of-the-art safety performance in human side-by-side evaluations.
  • Responsibility tools for developers: As we release the Gemma models, we are also releasing a Responsible Generative AI Toolkit for developers, providing guidance and tools to help them create safer AI applications.

We continue to evolve our approach. As we build these frameworks further, we will proceed thoughtfully and incorporate what we learn into future model assessments. We will continue to explore the full range of access mechanisms, with benefits and risk mitigation in mind, including API-based access and staged releases.


Advancing open model safety together

Many of today’s AI safety tools are designed for systems where the design approach assumes restricted access and redistribution, as well as auxiliary controls like query filters. Similarly, much of the AI safety research for improving mitigations takes on the design assumptions of those systems. Just as we have created unique threat models and solutions for other open technology, we are developing safety and security tools appropriate for the differences of openly available AI.

As models become more and more capable, we are conducting research and investing in rigorous safety evaluation, testing, and mitigations for open models. We are also actively participating in conversations with policymakers and open-source community leaders on how the industry should approach this technology. This challenge is multifaceted, just like AI systems themselves. Model-sharing platforms like Hugging Face and Kaggle, where developers inspire each other with novel model iterations, play a critical role in efforts to develop open models safely; there is also a role for the cybersecurity community to contribute learnings and best practices.

Building those solutions requires access to open models, sharing innovations and improvements. We believe sharing the Gemma models will not just help increase access to AI technology, but also help the industry develop new approaches to safety and responsibility.

As developers adopt Gemma models and other safety-aligned open models, we look forward to working with the open-source community to develop more solutions for responsible approaches to AI in the open ecosystem. A global diversity of experiences, perspectives, and opportunities will help build safe and responsible AI that works for everyone.

By Anne Bertucio – Sr Program Manager, Open Source Programs Office; Helen King – Sr Director of Responsibility, Google DeepMind

.