Google Open Source Blog: October 2020

Posts from October 2020

Releasing the Healthcare Text Annotation Guidelines

Friday, October 30, 2020

The Healthcare Text Annotation Guidelines are blueprints for capturing a structured representation of the medical knowledge stored in digital text. In order to automatically map the textual insights to structured knowledge, the annotations generated using these guidelines are fed into a machine learning algorithm that learns to systematically extract the medical knowledge in the text. We’re pleased to release to the public the Healthcare Text Annotation Guidelines as a standard.

Google Cloud recently launched AutoML Entity Extraction for Healthcare, a low-code tool used to build information extraction models for healthcare applications. There remains a significant execution roadblock on AutoML DIY initiatives caused by the complexity of translating the human cognitive process into machine-readable instructions. Today, this translation occurs thanks to human annotators who annotate text for relevant insights. Yet, training human annotators is a complex endeavor which requires knowledge across fields like linguistics and neuroscience, as well as a good understanding of the business domain. With AutoML, Google wanted to democratize who can build AI. The Healthcare Text Annotation Guidelines are a starting point for annotation projects deployed for healthcare applications.

The guidelines provide a reference for training annotators in addition to explicit blueprints for several healthcare annotation tasks. The annotation guidelines cover the following:

The task of medical entity extraction with examples from medical entity types like medications, procedures, and body vitals.
Additional tasks with defined examples, such as entity relation annotation and entity attribute annotation. For instance, the guidelines specify how to relate a medical procedure entity to the source medical condition entity, or how to capture the attributes of a medication entity like dosage, frequency, and route of administration.
Guidance for annotating an entity’s contextual information like temporal assessment (e.g., current, family history, clinical history), certainty assessment (e.g., unlikely, somewhat likely, likely), and subject (e.g., patient, family member, other).

Google consulted with industry experts and academic institutions in the process of assembling the Healthcare Text Annotation Guidelines. We took inspiration from other open source and research projects like i2b2 and added context to the guidelines to support information extraction needs for industry-applications like Healthcare Effectiveness Data and Information Set (HEDIS) quality reporting. The data types contained in the Healthcare Text Annotation Guidelines are a common denominator across information extraction applications. Each industry application can have additional information extraction needs that are not captured in the current version of the guidelines. We chose to open source this asset so the community can tailor this project to their needs.

We’re thrilled to open source this project. We hope the community will contribute to the refinement and expansion of the Healthcare Text Annotation Guidelines, so they mirror the ever-evolving nature of healthcare.

By Andreea Bodnari, Product Manager and Mikhail Begun, Program Manager—Google Cloud AI

Peer Bonus Experiences: The many ways in which you can contribute to open source

Tuesday, October 27, 2020

Recently, I was awarded a Google Open Source Peer Bonus, which I’m grateful for, as it proved to me that one can contribute value to open source projects, and build a career in it, without extensive experience coding. So how can someone with limited coding skills like me contribute to open source in a meaningful way?

Documentation

Documentation is important across open source and especially helpful to those who are new to a project! Developers and maintainers of projects are often focused on fixing bugs and improving the software. Therefore, documentation is harder to prioritize, so contributions to documentation are highly appreciated. Being experienced with applications won’t always help you in writing the documentation, since familiarity can cause you to miss a step when creating the doc. This is why, as a beginner, you are in an excellent position to ensure that instructions and step-by-step guides are easy to follow, don’t skip vital steps, and don’t use off-putting language.

If you have the opportunity to get involved in programs like Season of Docs as a mentor or a participant, as I did in 2019, the experience is hugely rewarding!

Events and Conferences

If you can help with mailing lists or organizing events, you can get involved in the community! In 2006, I became involved with the nascent Open Source Geospatial Foundation (OSGeo), where I was persuaded to set up a local chapter in the United Kingdom (going strong 14 years later!). It was one of the best things I could’ve done. This year we hosted a global conference (FOSS4G) and several UK events, including an online-only event. We’ve also managed to financially support a number of open source projects by providing an annual sponsorship, or by contributing to the funding of a specific improvement. I’ve met so many great people through my involvement in OSGeo, some of which have become colleagues and good friends.

The group meeting at FOSS4G 2013 in NottinghamAdd caption

If you’re interested in writing case studies, you can always speak about your experiences at conferences. Evidence that particular packages can be used successfully in real-world situations are incredibly valuable, and can help others put together business cases for considering an open source solution.

Assisting others

Sometimes the problems you face with technology can be experienced by money, and by open-sourcing your solution you could be impacting a lot of people. When I first started using open source software, the packages I needed were often hard to install and configure on Windows, having to be started using the command prompt, which can be intimidating for beginners. To scratch a problem-solving itch, I packaged them up onto a USB stick, added some batch files to make them load properly from an external drive, added a little menu for starting them, and Portable GIS was born. After 12 years, a few iterations, a website and a GitLab repository, it has been downloaded thousands of times, and is used in situations such as disaster relief, where installing lots of software rapidly on often old PCs is not really an option.

Mentoring Others

Once you are proficient in something, use your knowledge to help others. Some existing platforms for software use and development (online repositories like GitHub or GitLab) are extremely intimidating to new users, and create barriers to participation. If you can help people get over the fear-inducing first pull request, you will empower them to keep on contributing. My first pull request was a contribution to the Vaguely Rude Place-names map back in 2013 and since then I’ve run few training events along a similar line at conferences.

Open source is now fundamental to my career—16 years after learning about it—and something I am truly passionate about. It has shaped my life in many ways. I hope that my experiences might help someone who isn’t versed in code to get involved, realizing that their contributions are equally as valuable as bug fixes and patches.

By Jo Cook, Astun Technology—Guest Author

Google Summer of Code 2021 will bring some changes

Monday, October 26, 2020

Google Open Source is pleased to announce the 2021 cycle of the Google Summer of Code (GSoC) program, which will be our 17th consecutive year bringing students into open source communities. Over the past 16 years Google Summer of Code has brought over 16,000 student developers from 111 countries into 715 open source organizations big and small.

Some exciting changes are coming to the 2021 GSoC as we make adjustments to add more flexibility into the program for students and mentors alike.

With the pandemic straining folks’ time we are changing the size of the projects and time commitment students are expected to spend on their projects. Starting in 2021, students will be focused on a 175-hour project over a 10-week coding period.
As students are learning in many different educational formats in 2020, we are opening up the 2021 program to students 18 years and older who are:

Enrolled in post-secondary academic programs (including college, university, masters program, PhD program and/or undergraduate program, or licensed coding school, etc.) as of May 17, 2021; or,
Have graduated from a post-secondary academic program between December 1, 2020 and May 17, 2021.

We’re excited that GSoC will be able to continue to thrive as we welcome more students from around the world into open source in 2021! Applications for interested open source project organizations open on January 29th, and student applications open March 29, 2021.

Does your open source project want to learn more about how to apply to be a mentoring organization? This is a mentorship program so having mentors excited about teaching students how to be a part of your community and ready to guide students is key.

Visit the program site and read the mentor guide to learn more about what it means to be a mentor organization, how to prepare your community (hint: have plenty of enthusiastic mentors!), create appropriate project ideas, and tips for preparing your application. We welcome all types of organizations—large and small—and are very eager to involve first time projects. For 2021, we hope to welcome more organizations than ever before and are looking to accept at least 40 into their first GSoC.

Are you a student interested in learning how to prepare for the 2021 GSoC program? It’s never too early to start thinking about your proposal or about what type of open source organization you may want to work with. Read through the student guide for important tips on preparing your proposal and what to consider if you wish to apply for the program in late-March. You can also get inspired by checking out the 198 organizations that participated in Google Summer of Code 2020, as well as the projects that students worked on.

We encourage you to explore other resources and you can learn more on the program website.

Please spread the word to your friends as we hope these changes will help more excited folks apply to be students and mentoring organizations in GSoC 2021!

By Stephanie Taylor, Program Manager—Google Open Source

Peer Bonus Experiences: Building tiny models for the ML community with TensorFlow

Friday, October 23, 2020

Almost all the current state-of-the-art machine learning (ML) models take quite a lot of disk space. This makes them particularly inefficient in production situations. A bulky machine learning model can be exposed as a REST API and hosted on cloud services, but that same bulk may lead to hefty infrastructure costs. And some applications may need to operate in low-bandwidth environments, making cloud-hosted models less practical.

In a perfect world, your models would live alongside your application, saving data transfer costs and complying with any regulatory requirements restricting what data can be sent to the cloud. But storing multi-gigabyte models on today’s devices just isn’t practical. The field of on-device ML is dedicated to the development of tools and techniques to produce tiny—yet high performing!—ML models. Progress has been slow, but steady!

There has never been a better time to learn about on-device ML and successfully apply it in your own projects. With frameworks like TensorFlow Lite, you have an exceptional toolset to optimize your bulky models while retaining as much performance as possible. TensorFlow Lite also makes it very easy for mobile application developers to integrate ML models with tools like metadata and ML Model Binding, Android codegen, and others.

What is TensorFlow Lite?

“TensorFlow Lite is a production ready, cross-platform framework for deploying ML on mobile devices and embedded systems.” - TensorFlow Youtube

TensorFlow Lite provides first-class support for Native Android and iOS-based integrations (with many additional features, such as delegates). TensorFlow Lite also supports other tiny computing platforms, such as microcontrollers. TensorFlow Lite’s optimization APIs produce world-class, fast, and well-performing machine learning models.

Venturing into TensorFlow Lite

Last year, I started playing around with TensorFlow Lite while developing projects for Raspberry Pi for Computer Vision, using the official documentation and this course to fuel my initial learning. Following this interest, I decided to join a voluntary working group focused on creating sample applications, writing out tutorials, and creating tiny models. This working group consists of individuals from different backgrounds passionate about teaching on-device machine learning to others. The group is coordinated by Khanh LeViet (TensorFlow Lite team) and Hoi Lam (Android ML team). This is by far one of the most active working groups I have ever seen. And, back in our starting days, Khanh proposed a few different state-of-art machine learning models that were great fits for on-device machine learning:

DeepLabV3 segmentation models that classify each image pixels to a particular category like so -

Selfie2Anime model that can turn a selfie into an anime like so -

These ideas were enough for us to start spinning up Jupyter notebooks and VSCode. After months of work, we now have strong collaborations between machine learning GDEs and a bunch of different TensorFlow Lite models, sample applications, and tutorials for the community to learn from. Our collaborations have been fueled by the power of open source and all the tiny models that we have built together are available on TensorFlow Hub. There are numerous open source applications that we have built that demonstrate how to use these models.

The Cartoonizer model cartoonizes uploaded images

Margaret and I co-authored an end-to-end tutorial that was published from the official TensorFlow blog and published the TensorFlow Lite models on TensorFlow Hub. So far, the response we have received for this work has been truly mesmerizing. I’ve also shared my experiences with TensorFlow Lite in these blog posts and conference talks:

A Tale of Model Quantization in TF Lite
Plunging into Model Pruning in Deep Learning
A few good stuff in TF Lite
Doing more with TF Lite
Model Optimization 101

The power of collaboration

The working group is a tremendous opportunity for machine learning GDEs, Googlers, and passionate community individuals to collaborate and learn. We get to learn together, create together, and celebrate the joy of teaching others. I am immensely thankful, grateful, and humbled to be a part of this group. Lastly, I would like to wholeheartedly thank Khanh for being a pillar of support to us and for nominating me for the Google Open Source Peer Bonus Award.

By Sayak Paul, PyImageSearch—Guest Author

OpenTelemetry's First Release Candidates

Wednesday, October 21, 2020

OpenTelemetry has hit another milestone with the tracing specification reaching release candidate status.

With the specification now ready to go, expect to see tracing release candidates of the official APIs and SDKs over the next few weeks, along with updated exporters for Cloud Trace. In the coming months the same will follow for the metrics specification, followed by metrics release candidates of the APIs and SDKs and Cloud Monitoring exporters, followed by the project’s general availability. At this point we’ll switch our default application metrics and distributed tracing instrumentation from OpenCensus to OpenTelemetry.

This is exciting news for Google Cloud customers, as OpenTelemetry will enable even better observability experiences, both with Cloud Monitoring and Cloud Trace, or the third party monitoring and operations tools of your choice.

Originally posted on the on the OpenTelemetry blog.

Thursday, October 8, 2020

Open source software is the foundation of many modern software products. Over the years, developers increasingly have relied on reusable open source components for their applications. It is paramount that these open source components are secure and reliable, as weaknesses impact those that build upon it.

Google cares deeply about the security of the open source ecosystem and recently launched the Open Source Security Foundation with other industry partners. Fuzzing is an automated testing technique to find bugs by feeding unexpected inputs to a target program. At Google, we leverage fuzzing at scale to find tens of thousands of security vulnerabilities and stability bugs. This summer, as part of Google’s OSS internship initiative, we hosted 50 interns to improve the state of fuzz testing in the open source ecosystem.

The fuzzing interns worked towards integrating new projects and improving existing ones in OSS-Fuzz, our continuous fuzzing service for the open source community (which has 350+ projects, 22,700 bugs, 89% fixed). Several widely used open source libraries including but not limited to nginx, postgresql, usrsctp, and openexr, now have continuous fuzzing coverage as a result of these efforts.

Another group of interns focused on improving the security of the Linux kernel. syzkaller, a kernel fuzzing tool from Google, has been instrumental in finding kernel vulnerabilities in various operating systems. The interns were tasked with improving the fuzzing coverage by adding new descriptions to syzkaller like ip tunnels, io_uring, and bpf_lsm for example, refining the interface description language, and advancing kernel fault injection capabilities.

Some interns chose to write fuzzers for Android and Chrome, which are open source projects that billions of internet users rely on. For Android, the interns contributed several new fuzzers for uncovered areas - network protocols such as pppd and dns, audio codecs like monoblend, g722, and android framework. On the Chrome side, interns improved existing blackbox fuzzers, particularly in the areas: DOM, IPC, media, extensions, and added new libprotobuf-based fuzzers for Mojo.

Our last set of interns researched quite a few under-explored areas of fuzzing, some of which were fuzzer benchmarking, ML based fuzzing, differential fuzzing, bazel rules for build simplification and made useful contributions.

Over the course of the internship, our interns have reported over 150 security vulnerabilities and 750 functional bugs. Given the overall success of these efforts, we plan to continue hosting fuzzing internships every year to help secure the open source ecosystem and teach incoming open source contributors about the importance of fuzzing. For more information on the Google internship program and other student opportunities, check out careers.google.com/students. We encourage you to apply.

By: Abhishek Arya, Google Chrome Security

Announcing the latest Google Open Source Peer Bonus winners!

Monday, October 5, 2020

We are very pleased to announce the latest Google Open Source Peer Bonus winners!

The Google Open Source Peer Bonus program rewards external open source contributors nominated by Googlers for their exceptional contributions to open source. Historically, the program was primarily focused on rewarding developers. Over the years the program has evolved—rewarding not just software engineers contributors from every part of open source—including technical writers, user experience and graphic designers, community managers and marketers, mentors and educators, ops and security experts.

This time around we have 90 winners from an impressive number of countries—24—spread across five continents: Australia, Austria, Canada, China, Costa Rica, Finland, France, Germany, Ghana, India, Italy, Japan, Mozambique, New Zealand, Nigeria, Poland, Portugal, Singapore, Spain, Sweden, Switzerland, Uganda, United Kingdom, and the United States.

Although the majority of recipients in this round were recognized for their code contributions, more than 40% of the successful nominations included tooling work, community work, and documentation. (Some contributors were recognized for their work in more than one area.)

Below is the list of current winners who gave us permission to thank them publicly:

Winner	Project
Xihan Li	A Concise Handbook of TensorFlow 2
Alain Schlesser	AMP Plugin for WordPress
Pierre Gordon	AMP Plugin for WordPress
Catherine Houle	AMP Project
Quyen Le Hoang	ANGLE
Kamil Bregula	Apache Airflow
László Kiss Kollár	auditwheel/manylinux
Jack Neus	Chrome OS Release Branching tool
Fabian Henneke	chromium
Matt Godbolt	Compiler Explorer
Sumeet Pawnikar	coreboot
Hal Seki	covid19
Derek Parker	Delve
Alessandro Arzilli	Delve
Matthias Sohn	Eclipse Foundation
Luca Milanesio	Eclipse Foundation
João Távora	eglot
Brad Cowie	faucetsdn
Harri Hohteri	Firebase
Rosário Pereira Fernandes	Firebase
Peter Steinberger	Firebase iOS, CocoaPods
Eduardo Silva	Fluent Bit
Matthias Sohn	Gerrit Code Review
Marco Miller	Gerrit Code Review
Akim Demaille	GNU Bison
Alex Brainman	Go
Richard Musiol	Go
Roger Peppe	Go, CUE, gohack
Daniel Martí	Go, CUE, many individual repo.
Juan Linietsky	Godot Engine
Maddy Myers	Google Research Open-COVID-19-Data
Pontus Leitzler	govim, gopls
Paul Jolly	govim, gopls
Parul Raheja	Ground
Pau Freixes	gRPC
Marius Brehler	IREE
George Nachman	iterm2
Kenji Urushima	jsrsasign
Jacques Chester	KNative
Markus Thömmes	Knative Serving
Savitha Raghunathan	Kubernetes
David Anderson	libdwarf
Florian Westphal	Linux kernel
Hugo van Kemenade	Many open-source Python projects
Jeff Lockhart	Maps SDK for Android Utility Library
Claude Vervoort	Moodle
Jared McNeill	NetBSD
Nao Yonashiro	nginx-sxg-module
Geoffrey Booth	Node.js
Gus Caplan	Node.js
Guy Bedford	Node.js
Samson Goddy	Open Source Community Africa
Daniel Dyla	OpenTelemetry
Leighton Chen	OpenTelemetry
Shivkanya Andhare	OpenTelemetry
Bartlomiej Obecny	OpenTelemetry
Philipp Wagner	OpenTitan, Ibex, CocoTB
Srijan Reddy	Oppia
Bastien Guerry	Org mode
Gary Kramlich	Pidgin Lead Developer
Hassan Kibirige	plotnine
Abigail Dogbe	PyLadies Ghana
David Hewitt	PyO3
Yuji Kanagawa	PyO3
Mannie Young	Python Ghana
Alex Bradbury	RISC-V LLVM, Ibex, OpenTitan
Lukas Taegert-Atkinson	Rollup.js
Sanil Raut	Shaka Packager
Luke Edwards	Svelte and Node Libraries
Zoe Carver	Swift Programming Language
Nick Lockwood	SwiftFormat
Priti Desai	Tekton
Sayak Paul	TensorFlow
Lukas Geiger	TensorFlow
Margaret Maynard-Reid	TensorFlow
Gabriel de Marmiesse	TensorFlow Addons
Jared Morgan	The Good Docs Project
Jo Cook	The Good Docs Project, GeoNetwork, Portable GIS, Various Open Source Geospatial Foundation communities
Ricky Mulyawan Suryadi	Tink JNI Examples
Michael Tüxen	usrsctp
Seth Brenith	V8
Ramya Rao	VS Code Go
Philipp Hancke	WebRTC
Jason Donenfeld	WireGuard

Congratulations to our winners! We look forward to your continued support and contributions to open source!

By Maria Tabak and Erin McKean, Program Managers – Google Open Source Programs Office

opensource.google.com