opensource.google.com

Menu

Posts from 2020

From MLPerf to MLCommons: moving machine learning forward

Thursday, December 3, 2020

Today, the community of machine learning researchers and engineers behind the MLPerf benchmark is launching an open engineering consortium called MLCommons. For us, this is the next step in a journey that started almost three years ago.


Early in 2018, we gathered a group of industry researchers and academics who had published work on benchmarking machine learning (ML), in a conference room to propose the creation of an industry standard benchmark to measure ML performance. Everyone had doubts: creating an industry standard is challenging under the best conditions and ML was (and is) a poorly understood stochastic process running on extremely diverse software and hardware. Yet, we all agreed to try.

Together, along with a growing community of researchers and academics, we created a new benchmark called MLPerf. The effort took off. MLPerf is now an industry standard with over 2,000 submitted results and multiple benchmarks suites that span systems from smartphones to supercomputers. Over that time, the fastest result submitted to MLPerf for training the classic ML network ResNet improved by over 13x.

We created MLPerf because we believed in three principles:
  • Machine learning has tremendous potential: Already, machine learning helps billions of people find and understand information through tools like Google’s search engine and translation service. Active research in machine learning could one day save millions of lives through improvements in healthcare and automotive safety.
  • Transforming machine learning from promising research into wide-spread industrial practice requires investment in common infrastructure -- especially metrics: Much like computing in the ‘80s, real innovation is mixed with hype and adopting new ideas is slow and cumbersome. We need good metrics to identify the best ideas, and good infrastructure to make adoption of new techniques fast and easy.
  • Developing common infrastructure is best done by an open, fast-moving collaboration: We need the vision of academics and the resources of industry. We need the agility of startups and the scale of leading tech companies. Working together, a diverse community can develop new ideas, launch experiments, and rapidly iterate to arrive at shared solutions.
Our belief in the principles behind MLPerf has only gotten stronger, and we are excited to be part of the next step for the MLPerf community with the launch of MLCommons.

MLCommons aims to accelerate machine learning to benefit everyone. MLCommons will build a a common set of tools for ML practitioners including:
  • Benchmarks to measure progress: MLCommons will leverage MLPerf to measure speed, but also expand benchmarking other aspects of ML such as accuracy and algorithmic efficiency. ML models continue to increase in size and consequently cost. Sustaining growth in capability will require learning how to do more (accuracy) with less (efficiency).
  • Public datasets to fuel research: MLCommons new People’s Speech project seeks to develop a public dataset that, in addition to being larger than any other public speech dataset by more than an order of magnitude, better reflects diverse languages and accents. Public datasets drive machine learning like nothing else; consider ImageNet’s impact on the field of computer vision. 
  • Best practices to accelerate development: MLCommons will make it easier to develop and deploy machine learning solutions by fostering consistent best practices. For instance, MLCommons’ MLCube project provides a common container interface for machine learning models to make them easier to share, experiment with (including benchmark), develop, and ultimately deploy.
Google believes in the potential of machine learning, the importance of common infrastructure, and the power of open, collaborative development. Our leadership in co-founding, and deep support in sustaining, MLPerf and MLCommons has echoed our involvement in other efforts like TensorFlow and NNAPI. Together with the MLCommons community, we can improve machine learning to benefit everyone.

Want to get involved? Learn more at mlcommons.org.


By Peter Mattson – ML Metrics, Naveen Kumar – ML Performance, and Cliff Young – Google Brain

Best practices for accessibility for virtual events

Tuesday, December 1, 2020





As everyone knows, most of our open source events have transformed from in-person to digital this year. However, due to the sudden change, not everything is accessible. We took this issue seriously and decided to work with one of our accessibility experts, Neighborhood Access, to share best practices for our community. We hope this will help you organize your digital events!

How Google’s 2020 summer interns became the newest contributors in open source

Thursday, November 19, 2020

Our internship program changed in structure this year to accommodate a virtual environment, and we enjoyed seeing the intern involvement in our open source teams. Now, as the Summer 2020 Interns have departed Google, we’ve seen widespread impact across these OSS projects. Some accomplishments from the intern community included:
  • Mohamed Ibrahim, a Software Engineering major at the University of Ontario Institute of Technology, interned on the Earth Engine team in Geo. He built a web app from scratch that allows Earth Engine developers, who are primarily climate and remote-sensing researchers, to build rich UIs for their Earth Engine Apps without needing to write any code. Mohamed also learned two coding languages unfamiliar to him, enabling him to write over 10,000 lines of TypeScript, 480 lines of Go, and merge over 30 PRs during one internship.
App creator demo
Web app demo
  • Vismita Uppalli, a Cloud intern and Computer Science major at the University of Virginia, wrote a tutorial showing how to use AI Platform Operators on Apache Airflow, which is now published in the official Airflow docs.
  • Colin Marsch interned with the Android team and published a blog post for Android developers, "Re-writing the AOSP DeskClock App in Kotlin," which has reached over 1,600 viewers! He is scheduled to graduate from the University of Waterloo with a major in Computer Science in Spring 2021.
  • Satyam Ralhan worked in the MyHeart team in Research to build a first-of-its-kind Android app that engages users in conversations to encourage healthy habits. He created a demo, which explores the different phases of the app and how it learns to personalize lifestyle suggestions for various kinds of users. He is in his fourth year at the Indian Institute of Technology, Kanpur, studying Computer Science and Engineering.
    MyHeart app demo
  • An Apigee intern, Nicole Gizzo, presented her work analyzing API vocabularies at the API Specifications Conference. She is majoring in Computer Science and Cognitive Science at Rensselaer Polytechnic Institute, and will graduate in May 2021.
  • The OSS Fuzzing Interns have found and reported over 600 bugs to critical open source projects like the Linux kernel and Nginx, over 100 of which were security vulnerabilities.
  • Madelyn Dubuk, a SWE Intern on the Cloud DPE team and a Computer Science major at USC, worked with three other interns to create a full stack web app to help better understand test flakiness, and enjoyed working directly with other interns.
Initial feedback from our interns indicates that their OSS contributions won’t stop when their internships end. Of the interns who worked on OSS projects, 69% plan to continue contributing to OSS, enjoying the ability to talk about their work and have a broader impact. Beyond the impact on OSS, we’ve seen tremendous professional growth for our interns. Lucia Cantu-Miller, an intern on the Chrome team and Computer Science major at ITESM Monterrey, reflected she was, “proud of seeing how I’ve grown during the internship. As the days passed I became more confident in my work and in asking questions, I have grown a lot as a person and as a professional student.” Lucia wasn’t the only intern to experience this as 98% of interns who worked on OSS feel that Google is a good place to start a career. The success of this summer’s Internship is due in large part to the many contributions of Google’s OSS community—from the intern hosts to the project champions and mentors—we can’t thank them enough for their support. 

By Emma Stamp, Google Engineering Education

Kubernetes: Efficient Multi-Zone Networking with Topology Aware Routing

Wednesday, November 18, 2020

Topology Aware Routing of Services, a feature that was first introduced as alpha in the Kubernetes 1.17 release, aims to solve an often overlooked issue with Kubernetes Services; that they are not region aware.

Kubernetes services provide a uniform, durable, and easy to use method of accessing a variety of different backend applications. These backend applications are most commonly an exposed app running within your pods. Kubernetes does this by reserving a static virtual IP and DNS name, unique to it throughout the cluster and turning them into simple load balancers.

While this model is great for small clusters or applications, if you have thousands of nodes, your cluster spans multiple regions, or your application is latency sensitive then the service model can start to break down a bit. By default, each endpoint in a service has an equal opportunity to be selected as the destination. If you’re accessing a service with a backend hosted in the same zone, there’s a high probability that you’d be directed to a pod in a completely separate zone—likely in a completely separate region—and is what Topology Awareness intends to solve.

The Topology Aware Routing of Services feature added the concept of topologyKeys as an additional field in service objects. It allows you to define a set of node labels that could be used to route traffic closer to where it originated from.

Example Service with topologyKeys


apiVersion: v1
kind: Service
metadata:
  Name: my-app-web
spec:
  selector:
    app: my-app
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080
  topologyKeys:

    - "topology.kubernetes.io/zone"

    - "topology.kubernetes.io/region"

In this example, the service makes use of some commonly used labels for its topology preferences. It signals that when kube-proxy is routing traffic for that service, it should only route to pods within the same zone or region the traffic is originating from.

This is great! Traffic should remain “close” to where it originated and remove unnecessary latency.

While topologyKeys is available as alpha in 1.17, it hasn’t yet graduated to the next stage because the first pass at building topology-aware routing surfaced many challenges and scalability issues.

Each node in the cluster now has to manage a potentially complex ruleset for each service that would require more frequent updating. In clusters with thousands of pods or thousands of nodes, this solution quickly becomes untenable.

Another pain point with this implementation depends on how your application was distributed across a zone or region, as it's quite possible that a singular pod would be receiving ALL traffic for that zone or region. The preference list doesn’t take into account the performance of the pod on the receiving end and could potentially cause an outage.

These problems have led the Kubernetes Network Special Interest Group (SIG) to do a full re-evaluation of how to approach the Topology Awareness implementation.

What’s Planned for Topology Aware Routing?
The new design is intended to automatically handle the routing of services so that they will be load-balanced across a minimum number of the closest possible endpoints. It does this by applying an algorithm using two of the topology keys to signal affinity for service routing: topology.kubernetes.io/region and topology.kubernetes.io/zone without having to specify them via topologyKeys at the service level.

This algorithm works by establishing a dynamic threshold for a service where it calculates an expected number of endpoints per zone. It then builds a list of available endpoints for that service, prioritizing the ones that are in the same zone. If there are not enough endpoints to meet that expected number, it adds them from other zones until it reaches its expected number of endpoints. This list of expected endpoints, or a subset of endpoints are then passed to the nodes within that zone.

These nodes no longer have to maintain the complex set of rules like they had in the first iteration, and now just manage the small subset of endpoints for each service. This is less flexible than its predecessor, but it drastically reduces the performance overhead when compared to the previous method, while also covering the majority of use-cases. A big win for everyone.

These features are slated to graduate to alpha in the 1.21 release in the first part of 2021. If Topology Aware Routing would be of value to you, please consider taking the time to test it when it becomes available. Early feedback is highly appreciated and helps shape the direction of the feature.

Until then, if you’d like to learn more about Service Topology, Endpoint Slice, and the various algorithms that have been evaluated for service routing, check out Rob Scott’s presentation: Improving Network Efficiency with Topology Aware Routing, on November 19th, at KubeCon + CloudNativeCon North America.



By Bob Killen, Program Manager – Google Open Source Programs Office

Welcome Android Open Source Project (AOSP) to the Bazel ecosystem

Monday, November 16, 2020

After significant investment in understanding how best to build the Android Platform correctly and quickly, we are pleased to announce that the Android Platform is migrating from its current build systems (Soong and Make) to Bazel. While components of Bazel have been already checked into the Android Open Source Project (AOSP) source tree, this will be a phased migration over the next few Android releases which includes many concrete and digestible milestones to make the transformation as seamless and easy as possible. There will be no immediate impact to the Android Platform build workflow or the existing supported Android Platform Build tools in 2020 or 2021. Some of the changesto support Android Platform builds are already in Bazel, such as Bazel’s ability to parse and execute Ninja files to support a gradual migration.

Migrating to Bazel will enable AOSP to:
  • Provide more flexibility for configuring the AOSP build (better support for conditionals)
  • Allow for greater introspection into the AOSP build progress and dependencies
  • Enable correct and reproducible (hermetic) AOSP builds
  • Introduce a configuration mechanism that will reduce complexity of AOSP builds
  • Allow for greater integration of build and test activities
  • Combine all of these to drive significant improvements in build time and experience
The benefits of this migration to the Bazel community are:
  • Significant ongoing investment in Bazel to support Android Platform builds
  • Expansion of the Bazel ecosystem and community to include, initially, tens of thousands of Android Platform developers and Android handset OEMs and chipset vendors.
  • Google’s Bazel rules for building Android apps will be open sourced, used in AOSP, and maintained by Google in partnership with the Android / Bazel community
  • Better Bazel support for building Android Apps
  • Better rules support for other languages used to build Android Platform (Rust, Java, Python, Go, etc)
  • Strong support for Bazel Long Term Support (LTS) releases, which benefits the expanded Bazel community
  • Improved documentation (tutorials and reference)
The recent check-in of Bazel to AOSP begins an initial pilot phase, enabling Bazel to be used in place of Ninja as the execution engine to build AOSP. Bazel can also explore the AOSP build graph. We're pleased to be developing this functionality directly in the Bazel and AOSP codebases. As with most initial development efforts, this work is experimental in nature. Remember to use the currently supported Android Platform Build System for all production work.

We believe that these updates to the Android Platform Build System enable greater developer velocity, productivity, and happiness across the entire Android Platform ecosystem.

By Joe Hicks on behalf of the Bazel and AOSP infrastructure teams

Get ready for BazelCon 2020

Wednesday, November 11, 2020

With only 24 hours to go, BazelCon 2020 is shaping up to be a much anticipated gathering for the Bazel community and broader Build ecosystem. With over 1000 attendees, presentations by Googlers, as well as talks from industry Build leaders from Twitter, Dropbox, Uber, Pinterest, GrabTaxi, and more, we hope BazelCon 2020 will provide an opportunity for knowledge sharing, networking, and community building.

I am very excited by the keynote announcements, the migration stories at Twitter, Pinterest, and CarGurus, as well as technical deep dives on Bazel persistent workers, incompatible target skipping, migrating from Gradle to Bazel, and more. The “sold out” Birds of a Feather sessions and the Live Q&A with the Bazel team will bring the community together to discuss design docs, look at landings, and provide feedback on the direction of Bazel and the entire ecosystem.

We are also pleased to announce that, starting with the next major release (4.0), Bazel will support Long Term Support (LTS) releases as well as regular Rolling releases.

Some benefits of this new release cadence are:
  • Bazel will release stable, supported LTS releases on a predictable schedule with a long window without breaking changes
  • Bazel contributors / rules owners can prepare to support future LTS releases via rolling releases.
  • Bazel users can choose the release cadence that works best for them, since we will offer both LTS releases and rolling releases.
Long Term Support (LTS) releases:
  • We will create an LTS release every ~9 months => new LTS release branch, increment major version number.
  • Each LTS release will include all new features, bug fixes and (breaking) changes since the last major version.
  • Bazel will actively support each LTS branch for 9 months with critical bug fixes, but no new features.
  • Thereafter, Bazel will provide maintenance for two additional years with only security and OS compatibility fixes.
  • Bazel Federation reboot: Bazel will provide strong guidance about the ruleset versions that should be used with each Bazel release so that each user will not have to manage interoperability themselves.
Make sure that you register at http://goo.gle/bazelcon to be a part of the excitement of the premier build conference!

See you all at BazelCon 2020!

By Joe Hicks and the entire Bazel Team at Google

Google’s initiative for more inclusive language in open source projects

Tuesday, November 10, 2020

Certain terms in open source projects reinforce negative associations and unconscious biases. At Google, we want our language to be inclusive. The Google Open Source Programs Office (OSPO) created and posted a policy for new Google-run projects to remove the terms “slave,” “whitelist,” and “blacklist,” and replace them with more inclusive alternatives, such as “replica,” “allowlist,” and “blocklist.” OSPO required that new projects follow this policy beginning October 2020, and has plans to enforce these changes on more complex, established projects beginning in 2021. 


To ensure this policy was implemented in a timely manner, a small team within OSPO and Developer Relations orchestrated tool and policy updates and an open-source specific fix-it, a virtual event where Google engineers dedicate time to fixing a project. The fix-it focused on existing projects and non-breaking changes, but also served as a reminder that inclusivity is an important part of our daily work. Now that the original fix-it is over, the policy remains and the projects continue.

For more information on why inclusive language matters to us, you can check out Google Developer Documentation Style Guide which contains a section on word-choice with useful, clearer alternatives. Regardless of the phrases used, it is necessary to understand that certain terms reinforce biases and that replacing them is a positive step, both in creating a more welcoming atmosphere for everyone and in being more technically accurate. In short, words matter.


By Erin Balabanian, Open Source Compliance.

Security scorecards for open source projects

Monday, November 9, 2020

When developers or organizations introduce a new open source dependency into their production software, there’s no easy indication of how secure that package is.

Some organizations—including Google—have systems and processes in place that engineers must follow when introducing a new open source dependency, but that process can be tedious, manual, and error-prone. Furthermore, many of these projects and developers are resource constrained and security often ends up a low priority on the task list. This leads to critical projects not following good security best practices and becoming vulnerable to exploits. These issues are what inspired us to work on a new project called “Scorecards” announced last week by the Open Source Security Foundation (OpenSSF). 

Scorecards is one of the first projects being released under the OpenSSF since its inception in August, 2020. The goal of the Scorecards project is to auto-generate a “security score” for open source projects to help users as they decide the trust, risk, and security posture for their use case. Scorecards defines an initial evaluation criteria that will be used to generate a scorecard for an open source project in a fully automated way. Every scorecard check is actionable. Some of the evaluation metrics used include a well-defined security policy, code review process, and continuous test coverage with fuzzing and static code analysis tools. A boolean is returned as well as a confidence score for each security check. Over time, Google will be improving upon these metrics with community contributions through the OpenSSF.

Check out the Security Scorecards project on GitHub and provide feedback. This is just the first step of many, and we look forward to continuing to improve open source security with the community.

By Kim Lewandowski, Dan Lorenc, and Abhishek Arya, Google Security team


Releasing the Healthcare Text Annotation Guidelines

Friday, October 30, 2020

The Healthcare Text Annotation Guidelines are blueprints for capturing a structured representation of the medical knowledge stored in digital text. In order to automatically map the textual insights to structured knowledge, the annotations generated using these guidelines are fed into a machine learning algorithm that learns to systematically extract the medical knowledge in the text. We’re pleased to release to the public the Healthcare Text Annotation Guidelines as a standard.

Google Cloud recently launched AutoML Entity Extraction for Healthcare, a low-code tool used to build information extraction models for healthcare applications. There remains a significant execution roadblock on AutoML DIY initiatives caused by the complexity of translating the human cognitive process into machine-readable instructions. Today, this translation occurs thanks to human annotators who annotate text for relevant insights. Yet, training human annotators is a complex endeavor which requires knowledge across fields like linguistics and neuroscience, as well as a good understanding of the business domain. With AutoML, Google wanted to democratize who can build AI. The Healthcare Text Annotation Guidelines are a starting point for annotation projects deployed for healthcare applications.

The guidelines provide a reference for training annotators in addition to explicit blueprints for several healthcare annotation tasks. The annotation guidelines cover the following:
  • The task of medical entity extraction with examples from medical entity types like medications, procedures, and body vitals.
  • Additional tasks with defined examples, such as entity relation annotation and entity attribute annotation. For instance, the guidelines specify how to relate a medical procedure entity to the source medical condition entity, or how to capture the attributes of a medication entity like dosage, frequency, and route of administration.
  • Guidance for annotating an entity’s contextual information like temporal assessment (e.g., current, family history, clinical history), certainty assessment (e.g., unlikely, somewhat likely, likely), and subject (e.g., patient, family member, other).
Google consulted with industry experts and academic institutions in the process of assembling the Healthcare Text Annotation Guidelines. We took inspiration from other open source and research projects like i2b2 and added context to the guidelines to support information extraction needs for industry-applications like Healthcare Effectiveness Data and Information Set (HEDIS) quality reporting. The data types contained in the Healthcare Text Annotation Guidelines are a common denominator across information extraction applications. Each industry application can have additional information extraction needs that are not captured in the current version of the guidelines. We chose to open source this asset so the community can tailor this project to their needs.

We’re thrilled to open source this project. We hope the community will contribute to the refinement and expansion of the Healthcare Text Annotation Guidelines, so they mirror the ever-evolving nature of healthcare.

By Andreea Bodnari, Product Manager and Mikhail Begun, Program Manager—Google Cloud AI

Peer Bonus Experiences: The many ways in which you can contribute to open source

Tuesday, October 27, 2020

Recently, I was awarded a Google Open Source Peer Bonus, which I’m grateful for, as it proved to me that one can contribute value to open source projects, and build a career in it, without extensive experience coding. So how can someone with limited coding skills like me contribute to open source in a meaningful way?

Documentation

Documentation is important across open source and especially helpful to those who are new to a project! Developers and maintainers of projects are often focused on fixing bugs and improving the software. Therefore, documentation is harder to prioritize, so contributions to documentation are highly appreciated. Being experienced with applications won’t always help you in writing the documentation, since familiarity can cause you to miss a step when creating the doc. This is why, as a beginner, you are in an excellent position to ensure that instructions and step-by-step guides are easy to follow, don’t skip vital steps, and don’t use off-putting language.

If you have the opportunity to get involved in programs like Season of Docs as a mentor or a participant, as I did in 2019, the experience is hugely rewarding!

Events and Conferences

If you can help with mailing lists or organizing events, you can get involved in the community! In 2006, I became involved with the nascent Open Source Geospatial Foundation (OSGeo), where I was persuaded to set up a local chapter in the United Kingdom (going strong 14 years later!). It was one of the best things I could’ve done. This year we hosted a global conference (FOSS4G) and several UK events, including an online-only event. We’ve also managed to financially support a number of open source projects by providing an annual sponsorship, or by contributing to the funding of a specific improvement. I’ve met so many great people through my involvement in OSGeo, some of which have become colleagues and good friends.
The group meeting at FOSS4G 2013 in NottinghamAdd caption

If you’re interested in writing case studies, you can always speak about your experiences at conferences. Evidence that particular packages can be used successfully in real-world situations are incredibly valuable, and can help others put together business cases for considering an open source solution.

Assisting others

Sometimes the problems you face with technology can be experienced by money, and by open-sourcing your solution you could be impacting a lot of people. When I first started using open source software, the packages I needed were often hard to install and configure on Windows, having to be started using the command prompt, which can be intimidating for beginners. To scratch a problem-solving itch, I packaged them up onto a USB stick, added some batch files to make them load properly from an external drive, added a little menu for starting them, and Portable GIS was born. After 12 years, a few iterations, a website and a GitLab repository, it has been downloaded thousands of times, and is used in situations such as disaster relief, where installing lots of software rapidly on often old PCs is not really an option.

Mentoring Others

Once you are proficient in something, use your knowledge to help others. Some existing platforms for software use and development (online repositories like GitHub or GitLab) are extremely intimidating to new users, and create barriers to participation. If you can help people get over the fear-inducing first pull request, you will empower them to keep on contributing. My first pull request was a contribution to the Vaguely Rude Place-names map back in 2013 and since then I’ve run few training events along a similar line at conferences.

Open source is now fundamental to my career—16 years after learning about it—and something I am truly passionate about. It has shaped my life in many ways. I hope that my experiences might help someone who isn’t versed in code to get involved, realizing that their contributions are equally as valuable as bug fixes and patches.

By Jo Cook, Astun Technology—Guest Author

Google Summer of Code 2021 will bring some changes

Monday, October 26, 2020

Google Open Source is pleased to announce the 2021 cycle of the Google Summer of Code (GSoC) program, which will be our 17th consecutive year bringing students into open source communities. Over the past 16 years Google Summer of Code has brought over 16,000 student developers from 111 countries into 715 open source organizations big and small.

Some exciting changes are coming to the 2021 GSoC as we make adjustments to add more flexibility into the program for students and mentors alike.
  • With the pandemic straining folks’ time we are changing the size of the projects and time commitment students are expected to spend on their projects. Starting in 2021, students will be focused on a 175-hour project over a 10-week coding period.
  • As students are learning in many different educational formats in 2020, we are opening up the 2021 program to students 18 years and older who are:
    1. Enrolled in post-secondary academic programs (including college, university, masters program, PhD program and/or undergraduate program, or licensed coding school, etc.) as of May 17, 2021; or,
    2. Have graduated from a post-secondary academic program between December 1, 2020 and May 17, 2021.

We’re excited that GSoC will be able to continue to thrive as we welcome more students from around the world into open source in 2021! Applications for interested open source project organizations open on January 29th, and student applications open March 29, 2021.

Does your open source project want to learn more about how to apply to be a mentoring organization? This is a mentorship program so having mentors excited about teaching students how to be a part of your community and ready to guide students is key.

Visit the program site and read the mentor guide to learn more about what it means to be a mentor organization, how to prepare your community (hint: have plenty of enthusiastic mentors!), create appropriate project ideas, and tips for preparing your application. We welcome all types of organizations—large and small—and are very eager to involve first time projects. For 2021, we hope to welcome more organizations than ever before and are looking to accept at least 40 into their first GSoC.

Are you a student interested in learning how to prepare for the 2021 GSoC program? It’s never too early to start thinking about your proposal or about what type of open source organization you may want to work with. Read through the student guide for important tips on preparing your proposal and what to consider if you wish to apply for the program in late-March. You can also get inspired by checking out the 198 organizations that participated in Google Summer of Code 2020, as well as the projects that students worked on.

We encourage you to explore other resources and you can learn more on the program website.

Please spread the word to your friends as we hope these changes will help more excited folks apply to be students and mentoring organizations in GSoC 2021!

By Stephanie Taylor, Program Manager—Google Open Source

Peer Bonus Experiences: Building tiny models for the ML community with TensorFlow

Friday, October 23, 2020

Almost all the current state-of-the-art machine learning (ML) models take quite a lot of disk space. This makes them particularly inefficient in production situations. A bulky machine learning model can be exposed as a REST API and hosted on cloud services, but that same bulk may lead to hefty infrastructure costs. And some applications may need to operate in low-bandwidth environments, making cloud-hosted models less practical.

In a perfect world, your models would live alongside your application, saving data transfer costs and complying with any regulatory requirements restricting what data can be sent to the cloud. But storing multi-gigabyte models on today’s devices just isn’t practical. The field of on-device ML is dedicated to the development of tools and techniques to produce tiny—yet high performing!—ML models. Progress has been slow, but steady!

There has never been a better time to learn about on-device ML and successfully apply it in your own projects. With frameworks like TensorFlow Lite, you have an exceptional toolset to optimize your bulky models while retaining as much performance as possible. TensorFlow Lite also makes it very easy for mobile application developers to integrate ML models with tools like metadata and ML Model Binding, Android codegen, and others.

What is TensorFlow Lite?

“TensorFlow Lite is a production ready, cross-platform framework for deploying ML on mobile devices and embedded systems.” - TensorFlow Youtube

TensorFlow Lite provides first-class support for Native Android and iOS-based integrations (with many additional features, such as delegates). TensorFlow Lite also supports other tiny computing platforms, such as microcontrollers. TensorFlow Lite’s optimization APIs produce world-class, fast, and well-performing machine learning models.

Venturing into TensorFlow Lite

Last year, I started playing around with TensorFlow Lite while developing projects for Raspberry Pi for Computer Vision, using the official documentation and this course to fuel my initial learning. Following this interest, I decided to join a voluntary working group focused on creating sample applications, writing out tutorials, and creating tiny models. This working group consists of individuals from different backgrounds passionate about teaching on-device machine learning to others. The group is coordinated by Khanh LeViet (TensorFlow Lite team) and Hoi Lam (Android ML team). This is by far one of the most active working groups I have ever seen. And, back in our starting days, Khanh proposed a few different state-of-art machine learning models that were great fits for on-device machine learning:

These ideas were enough for us to start spinning up Jupyter notebooks and VSCode. After months of work, we now have strong collaborations between machine learning GDEs and a bunch of different TensorFlow Lite models, sample applications, and tutorials for the community to learn from. Our collaborations have been fueled by the power of open source and all the tiny models that we have built together are available on TensorFlow Hub. There are numerous open source applications that we have built that demonstrate how to use these models.
The Cartoonizer model cartoonizes uploaded images

Margaret and I co-authored an end-to-end tutorial that was published from the official TensorFlow blog and published the TensorFlow Lite models on TensorFlow Hub. So far, the response we have received for this work has been truly mesmerizing. I’ve also shared my experiences with TensorFlow Lite in these blog posts and conference talks:

A Tale of Model Quantization in TF Lite
Plunging into Model Pruning in Deep Learning
A few good stuff in TF Lite
Doing more with TF Lite
Model Optimization 101

The power of collaboration

The working group is a tremendous opportunity for machine learning GDEs, Googlers, and passionate community individuals to collaborate and learn. We get to learn together, create together, and celebrate the joy of teaching others. I am immensely thankful, grateful, and humbled to be a part of this group. Lastly, I would like to wholeheartedly thank Khanh for being a pillar of support to us and for nominating me for the Google Open Source Peer Bonus Award.

By Sayak Paul, PyImageSearch—Guest Author

OpenTelemetry's First Release Candidates

Wednesday, October 21, 2020

OpenTelemetry has hit another milestone with the tracing specification reaching release candidate status.

With the specification now ready to go, expect to see tracing release candidates of the official APIs and SDKs over the next few weeks, along with updated exporters for Cloud Trace. In the coming months the same will follow for the metrics specification, followed by metrics release candidates of the APIs and SDKs and Cloud Monitoring exporters, followed by the project’s general availability. At this point we’ll switch our default application metrics and distributed tracing instrumentation from OpenCensus to OpenTelemetry.

This is exciting news for Google Cloud customers, as OpenTelemetry will enable even better observability experiences, both with Cloud Monitoring and Cloud Trace, or the third party monitoring and operations tools of your choice.

Originally posted on the on the OpenTelemetry blog.


Fuzzing internships for open source software

Thursday, October 8, 2020

Open source software is the foundation of many modern software products. Over the years, developers increasingly have relied on reusable open source components for their applications. It is paramount that these open source components are secure and reliable, as weaknesses impact those that build upon it.

Google cares deeply about the security of the open source ecosystem and recently launched the Open Source Security Foundation with other industry partners. Fuzzing is an automated testing technique to find bugs by feeding unexpected inputs to a target program. At Google, we leverage fuzzing at scale to find tens of thousands of security vulnerabilities and stability bugs. This summer, as part of Google’s OSS internship initiative, we hosted 50 interns to improve the state of fuzz testing in the open source ecosystem.

The fuzzing interns worked towards integrating new projects and improving existing ones in OSS-Fuzz, our continuous fuzzing service for the open source community (which has 350+ projects, 22,700 bugs, 89% fixed). Several widely used open source libraries including but not limited to nginx, postgresql, usrsctp, and openexr, now have continuous fuzzing coverage as a result of these efforts.

Another group of interns focused on improving the security of the Linux kernel. syzkaller, a kernel fuzzing tool from Google, has been instrumental in finding kernel vulnerabilities in various operating systems. The interns were tasked with improving the fuzzing coverage by adding new descriptions to syzkaller like ip tunnels, io_uring, and bpf_lsm for example, refining the interface description language, and advancing kernel fault injection capabilities.

Some interns chose to write fuzzers for Android and Chrome, which are open source projects that billions of internet users rely on. For Android, the interns contributed several new fuzzers for uncovered areas - network protocols such as pppd and dns, audio codecs like monoblend, g722, and android framework. On the Chrome side, interns improved existing blackbox fuzzers, particularly in the areas: DOM, IPC, media, extensions, and added new libprotobuf-based fuzzers for Mojo.

Our last set of interns researched quite a few under-explored areas of fuzzing, some of which were fuzzer benchmarking, ML based fuzzing, differential fuzzing, bazel rules for build simplification and made useful contributions.

Over the course of the internship, our interns have reported over 150 security vulnerabilities and 750 functional bugs. Given the overall success of these efforts, we plan to continue hosting fuzzing internships every year to help secure the open source ecosystem and teach incoming open source contributors about the importance of fuzzing. For more information on the Google internship program and other student opportunities, check out careers.google.com/students. We encourage you to apply.

By: Abhishek Arya, Google Chrome Security

Announcing the latest Google Open Source Peer Bonus winners!

Monday, October 5, 2020

We are very pleased to announce the latest Google Open Source Peer Bonus winners!

The Google Open Source Peer Bonus program rewards external open source contributors nominated by Googlers for their exceptional contributions to open source. Historically, the program was primarily focused on rewarding developers. Over the years the program has evolved—rewarding not just software engineers contributors from every part of open source—including technical writers, user experience and graphic designers, community managers and marketers, mentors and educators, ops and security experts. 


This time around we have 90 winners from an impressive number of countries—24—spread across five continents: Australia, Austria, Canada, China, Costa Rica, Finland, France, Germany, Ghana, India, Italy, Japan, Mozambique, New Zealand, Nigeria, Poland, Portugal, Singapore, Spain, Sweden, Switzerland, Uganda, United Kingdom, and the United States.

Although the majority of recipients in this round were recognized for their code contributions, more than 40% of the successful nominations included tooling work, community work, and documentation. (Some contributors were recognized for their work in more than one area.)

Below is the list of current winners who gave us permission to thank them publicly:
WinnerProject
Xihan LiA Concise Handbook of TensorFlow 2
Alain SchlesserAMP Plugin for WordPress
Pierre GordonAMP Plugin for WordPress
Catherine HouleAMP Project
Quyen Le HoangANGLE
Kamil BregulaApache Airflow
László Kiss Kollárauditwheel/manylinux
Jack NeusChrome OS Release Branching tool
Fabian Hennekechromium
Matt GodboltCompiler Explorer
Sumeet Pawnikarcoreboot
Hal Sekicovid19
Derek ParkerDelve
Alessandro ArzilliDelve
Matthias SohnEclipse Foundation
Luca MilanesioEclipse Foundation
João Távoraeglot
Brad Cowiefaucetsdn
Harri HohteriFirebase
Rosário Pereira FernandesFirebase
Peter SteinbergerFirebase iOS, CocoaPods
Eduardo SilvaFluent Bit
Matthias SohnGerrit Code Review
Marco MillerGerrit Code Review
Akim DemailleGNU Bison
Alex BrainmanGo
Richard MusiolGo
Roger PeppeGo, CUE, gohack
Daniel MartíGo, CUE, many individual repo.
Juan LinietskyGodot Engine
Maddy MyersGoogle Research Open-COVID-19-Data
Pontus Leitzlergovim, gopls
Paul Jollygovim, gopls
Parul RahejaGround
Pau FreixesgRPC
Marius BrehlerIREE
George Nachmaniterm2
Kenji Urushimajsrsasign
Jacques ChesterKNative
Markus ThömmesKnative Serving
Savitha RaghunathanKubernetes
David Andersonlibdwarf
Florian WestphalLinux kernel
Hugo van KemenadeMany open-source Python projects
Jeff LockhartMaps SDK for Android Utility Library
Claude VervoortMoodle
Jared McNeillNetBSD
Nao Yonashironginx-sxg-module
Geoffrey BoothNode.js
Gus CaplanNode.js
Guy BedfordNode.js
Samson GoddyOpen Source Community Africa
Daniel DylaOpenTelemetry
Leighton ChenOpenTelemetry
Shivkanya AndhareOpenTelemetry
Bartlomiej ObecnyOpenTelemetry
Philipp WagnerOpenTitan, Ibex, CocoTB
Srijan ReddyOppia
Bastien GuerryOrg mode
Gary KramlichPidgin Lead Developer
Hassan Kibirigeplotnine
Abigail DogbePyLadies Ghana
David HewittPyO3
Yuji KanagawaPyO3
Mannie YoungPython Ghana
Alex BradburyRISC-V LLVM, Ibex, OpenTitan
Lukas Taegert-AtkinsonRollup.js
Sanil RautShaka Packager
Luke EdwardsSvelte and Node Libraries
Zoe CarverSwift Programming Language
Nick LockwoodSwiftFormat
Priti DesaiTekton
Sayak PaulTensorFlow
Lukas GeigerTensorFlow
Margaret Maynard-ReidTensorFlow
Gabriel de MarmiesseTensorFlow Addons
Jared MorganThe Good Docs Project
Jo CookThe Good Docs Project, GeoNetwork, Portable GIS, Various Open Source Geospatial Foundation communities
Ricky Mulyawan SuryadiTink JNI Examples
Michael Tüxenusrsctp
Seth BrenithV8
Ramya RaoVS Code Go
Philipp HanckeWebRTC
Jason DonenfeldWireGuard
Congratulations to our winners! We look forward to your continued support and contributions to open source!

By Maria Tabak and Erin McKean, Program Managers – Google Open Source Programs Office

Kubernetes Ingress goes GA

Wednesday, September 23, 2020

The Kubernetes Ingress API, first introduced in late 2015 as an experimental beta feature, has finally graduated as a stable API and is included in the recent 1.19 release of Kubernetes.

The goal of the Ingress API is to provide a simple uniform means of describing the routing of HTTP or HTTPS traffic from outside a cluster to backend services within a cluster; independent of the Ingress Controller being used. An Ingress controller is a 3rd party application, such as Nginx or an external service like the Google Cloud Load Balancer (GCLB), that performs the actual routing of the HTTP(S) traffic. This uniform API, supported by the Ingress Controllers made it easy to create simple HTTP(S) load balancers, however most use-cases required something more complex.

By early 2019, the Ingress API had remained in beta for close to four years. Beta APIs are not intended to be relied upon for business-critical production use, yet many users were using the Ingress API in some level of production capacity. After much discussion, the Kubernetes Networking Special Interest Group (SIG) proposed a path forward to bring the Ingress API to GA primarily by introducing two changes in Kubernetes 1.18. These were: a new field, pathType, to the Ingress API; and a new Ingress resource type, IngressClass. Combined, they provide a means of guaranteeing a base level of compatibility between different path prefix matching implementations, along with opening the door to further extension by the Ingress Controller developers in a uniform and consistent pattern.

What does this mean for you? You can be assured that the path prefixes you use will be evaluated the same way across Ingress Controllers implementations, and the Ingress configuration sprawl across Annotations, ConfigMaps and CustomResourceDefinitions (CRDs) will be consolidated into a single IngressClass resource type.

pathType

The pathType field specifies one of three ways that an Ingress Object’s path should be interpreted:
  • ImplementationSpecific: Path prefix matching is delegated to the Ingress Controller (IngressClass).
  • Exact: Matches the URL path exactly (case sensitive)
  • Prefix: Matches based on a URL path prefix split by /. Matching is case sensitive and done on a path element by element basis.

NOTE: ImplementationSpecific was configured as the default pathType in the 1.18 release. In 1.19 the defaulting behavior was removed and it MUST be specified. Paths that do not include an explicit pathType will fail validation.

Pre Kubernetes 1.18

Kubernetes 1.19+

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: minimal-ingress
  annotations:
    kubernetes.io/ingress.class: nginx
spec:
  rules:
  - http:
    paths:
    - path: /testpath
      backend:
        serviceName: test
        servicePort: 80

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: minimal-ingress
spec:
  ingressClassName: external-lb
  rules:
  - http:
      paths:
      - path: /testpath
        pathType: Prefix
        backend:
          service:
            name: test
            port:
              number: 80



These changes not only make room for backwards-compatible configurations with the ImplementationSpecific pathType, but also enables more portable workloads between Ingress Controllers with Exact or Prefix pathType.

IngressClass

The new IngressClass resource takes the place of various different Annotations, ConfigMaps, Environment Variables or Command Line Parameters that you would regularly pass to an Ingress Controller directly. Instead, it has a generic parameters field that can be used to reference controller specific configuration.


Example IngressClass Resource

apiVersion: networking.k8s.io/v1
kind: IngressClass
metadata:
  name: external-lb
spec:
  controller: example.com/ingress-controller
  parameters:
    apiGroup: k8s.example.com
    kind: IngressParameters
    name: external-lb

Source: https://kubernetes.io/docs/concepts/services-networking/ingress/#ingress-class

In this example, the parameters resource would include configuration options implemented by the example.com/ingress-controller ingress controller. These items would not need to be passed as Annotations or a ConfigMap as they would in versions prior to Kubernetes 1.18.

How do you use IngressClass with an Ingress Object? You may have caught it in the earlier example, but the Ingress resource’s spec has been updated to include an ingressClassName field. This field is similar to the previous kubernetes.io/ingress.class annotation but refers to the name of the corresponding IngressClass resource.

Other Changes

Several other small changes went into effect with the graduation of Ingress to GA in 1.19. A few fields have been remapped/renamed and support for resource backends has been added.

Remapped Ingress Fields

Resource Backend
A Resource Backend is essentially a pointer or ObjectRef (apiGroup, kind, name) to another resource in the same namespace. Why would you want to do this? Well, it opens the door to all sorts of future possibilities such as routing to static object storage hosted in GCS or S3, or another internal form of storage.

NOTE: Resource Backend and Service Backends are mutually exclusive. Only one field can be specified at a time.

Deprecation Notice

With the graduation of Ingress in the 1.19 release, it officially puts the older iterations of the API (extensions/v1beta1 and networking.k8s.io/v1beta1) on a clock. Following the Kubernetes Deprecation Policy, the older APIs are slated to be removed in Kubernetes 1.22.

Should you migrate right now (September 2020)? Not yet. The majority of Ingress Controllers have not added support for the new GA Ingress API. Ingress-GCE, the Ingress Controller for Google Kubernetes Engine (GKE) should be updated to support the Ingress GA API in Q4 2020. Keep your eyes on the GKE rapid release channel to stay up to date on it, and Kubernetes 1.19’s availability.

What’s Next for Ingress?

The Ingress API has had a rough road getting to GA. It is an essential resource for many, and the changes that have been introduced help manage that complexity while keeping it relatively light-weight. However, even with the added flexibility that has been introduced it doesn’t cover a variety of complex use-cases.

SIG Network has been working on a new API referred to as “Service APIs” that takes into account the lessons learned from the previous efforts of working on Ingress. These Service APIs are not intended to replace Ingress, but instead compliment it by providing several new resources that could enable more complex workflows.


By Bob Killen, Program Manager, Google Open Source Programs Office
.