Google Open Source Blog: December 2019

Posts from December 2019

Season of Docs Announces Results of 2019 Program

Thursday, December 12, 2019

Season of Docs has announced the 2019 program results for standard-length projects. You can view a list of successfully completed technical writing projects on the website along with their final project reports.

During the program, technical writers spent a few months working closely with an open source community. They brought their technical writing expertise to improve the project's documentation while the open source projects provided mentors to introduce the technical writers to open source tools, workflows, and the project's technology.

The technical writers and their mentors did a fantastic job with the inaugural year of Season of Docs! Participants represented countries across all continents except for Antarctica! 36 technical writers out of 41 successfully completed their standard-length technical writing projects, and there are eight long-running projects in progress that are expected to finish in February.

91.7% of the mentors had a positive experience and want to mentor again in future Season of Docs cycles
88% of the technical writers had a positive experience
96% plan to continue contributing to open source projects
100% of the technical writers said that Season of Docs helped improved their knowledge of code and/or open source

Technical writing projects ranged from beginners' guides and tutorials to API and reference documentation; all of which benefited a diverse set of open source projects that included programming languages, software, compiler infrastructure, operating systems, software libraries, hardware, science, healthcare, and more. Take a look at the list of successful projects to see the wide range of subjects covered!

What is next?

The long-running projects are still in progress and finish in February 2020. Technical writers participating in these long-running projects submit their project reports by Feb. 25, and the writer and mentor evaluations are due by Feb. 28. Successfully completed long-running technical writing projects are then published on the results page on March 6, 2020.

If you were excited about participating, please do write social media posts. See the promotion and press page for images and other promotional materials you can include, and be sure to use the tag #SeasonOfDocs when promoting your ideas on social media. To include the tech writing and open source communities, add #WriteTheDocs, #techcomm, #TechnicalWriting, and #OpenSource to your posts.

Stay tuned for information about Season of Docs 2020—watch for posts in this blog and sign up for the announcements email list.

By Andrew Chen, Google Open Source and Sarah Maddox, Cloud Docs

W3C Trace Context Specification: What it Means for You

Wednesday, December 11, 2019

Since the first days of Google Cloud Platform (GCP), Google has been at the forefront of making your applications more observable. Beyond Stackdriver, our most visible impact in this space is OpenTelemetry, which we initiated in 2017 (as OpenCensus) and has grown into a huge community that includes the majority of APM / monitoring vendors and cloud platforms.

While OpenTelemetry allows developers to easily capture distributed traces and metrics from their own services, there’s also a need to trace requests as they propagate through components that developers don’t directly control, like managed services, load balancers, network hardware, etc. To solve this we co-defined a prototype HTTP header that these components can rely on, gathered partners, and moved the work into the W3C.

This work is now complete, and the W3C Trace Context format is now an official standard. Once implemented in GCP, this will make our services even easier to manage, both with Stackdriver and other third party distributed tracing tools. We explain more in the official post on the W3C blog, which I’ve copied below:

The W3C Distributed Tracing working group has moved the Trace Context specification to the next maturity level. The specification is already being adopted and implemented by many platforms and SDKs. This article describes the Trace Context specification and how it improves troubleshooting and monitoring of modern distributed apps.

W3C Trace Context specification defines the format for propagating distributed tracing context between services. Distributed tracing makes it easy for developers to find the causes of issues in highly-distributed microservices applications by tracking how a single interaction was processed across multiple services. Each step of a trace is correlated through an ID that is passed between services, and W3C Trace Context now defines a standard for these context propagation headers.

Until now, different tracing systems have defined their own headers. Examples include Zipkin’s B3 format and X-Google-Cloud-Trace. Adopting a common context propagation format has been long desired by developers, APM vendors, and cloud platform hosts, as compatibility provides numerous benefits:

Web and RPC frameworks that use this standard to provide context propagation out of the box will also offer cross-service log correlation, even for developers who haven’t set up distributed tracing.
API producers can record the trace IDs of requests from API consumers and provide additional spans or metadata to their customers for a given traced request. Producers can also correlate customer trace IDs to internal traces when debugging technical issues raised by consumers.
Networking infrastructure (proxies, load balancers, routers, etc.) can both ensure that context propagation headers are not removed from requests passing through them, and can record spans or logs for a given trace, without having to support multiple vendor-specific formats. Potential examples of these include router appliances, cloud load balancers, and sidecar proxies like Envoy.
Instrumentation can be further decoupled from a developer’s choice of APM vendor. For example, using both OpenTelemetry and a given vendor’s agents, a developer can instrument different services in an application, and traces will flow through the system and be processed correctly by the vendor’s backend.
Web browsers and other clients can use these identifiers to correlate their telemetry with traces collected from backend services. This functionality is currently being defined.

To address this effort, a group of cloud providers, open source contributors, and APM vendors started defining a standard HTTP context propagation header that would replace their homegrown formats. This specification has been discussed and iterated on over the past two years, and the group working on it has grown significantly over that time. Sponsors include Google, Microsoft, Dynatrace, and New Relic (W3C members), and the group was officially moved into the W3C in 2018 for the work to proceed under the guidance of an official standards body and to spur even greater adoption.

TraceContext has since been adopted by OpenTelemetry (which enables it by default and also serves as the reference implementation), Azure services, Dynatrace, Elastic, Google Cloud Platform, Lightstep, and New Relic. We are tracking adoption in this list.

This first phase of work has focused on HTTP, as it is commonly used and has no built-in affordances for trace context propagation (gRPC and some newer RPC systems do). The same group of committee members are also working to define trace context propagation in other formats, starting with AMQP and MQTT for IoT; other upcoming topics include context propagation from clients and web browsers.

By Morgan McLean, OpenTelemetry + Stackdriver

Announcing Google Summer of Code 2020!

Monday, December 9, 2019

Google Open Source is proud to announce Google Summer of Code (GSoC) 2020—the 16th year of the program! We look forward to introducing the 16th batch of student developers to the world of open source and matching them with open source projects, while earning a stipend so they can focus their summer on their project.

Over the last 15 years GSoC has provided over 15,000 university students, from 109 countries, with an opportunity to hone their skills by contributing to open source projects during their summer break.

And the ‘special sauce’ that has kept this program thriving for 16 years: the mentorship aspect of the program. Participants gain invaluable experience working directly with mentors who are dedicated members of these open source communities; mentors help bring students into their communities while teaching them, guiding them and helping them find their place in the world of open source.

We’re excited to keep the tradition going! Applications for interested open source project organizations open on January 14, 2020, and student applications open March 25.

Are you an open source project interested in learning more? Visit the program site and read the mentor guide to learn more about what it means to be a mentor organization, how to prepare your community and create appropriate project ideas, and tips for preparing your application. We welcome all types of organizations—large and small—and are very eager to involve first time projects. For 2020, we hope to welcome more organizations into GSoC than ever before and are looking to accept 40-50 new organizations into their first GSoC.

Are you a university student interested in learning how to prepare for the 2020 GSoC program? It’s never too early to start thinking about your proposal or about what type of open source organization you may want to work with. You should read the student guide for important tips on preparing your proposal and what to consider if you wish to apply for the program in mid-March. You can also get inspired by checking out the 200+ organizations that participated in Google Summer of Code 2019, as well as the projects that students worked on.

We encourage you to explore other resources and you can learn more on the program website.

By Stephanie Taylor, Google Open Source

Blockly Summit 2019: Rendering, Accessibility, and More!

Thursday, December 5, 2019

It has been over eight years since we started work on Blockly, an open source library for building drag-and-drop block coding apps. In that time, the team has grown from a single developer to a small team and a large community. Blockly is now a standard in the CS education space, used by Scratch, MakeCode, AppInventor, and hundreds of other developers to enable tens of millions of kids around the world to create and express themselves with code.

But Blockly isn't only used for education. The library provides everything an app developer needs to create rich block coding languages and is highly customizable and extensible. This means Blockly is also used by hobbyists and commercial companies alike for business logic, computer games, virtual reality, robotics, and just about anything else you can do with code.

The work we do on Blockly wouldn't be possible without the many folks who contribute back with code, suggestions, and support on the forums. As such, we were very excited to welcome around 30 members of the Blockly open source community to our second annual Blockly User Summit and to be able to make all of the talks available online!

The summit spanned two days in October and included 16 talks, over half of which were given by external contributors, and a Q&A with the Blockly team. The talks covered everything from Blockly's brand new rendering framework and building custom fields to explorations in performance and debugging block code. Check out the full playlist.

We also held a hackathon on the second day of the summit, with quick start guides for using our new rendering and accessibility APIs. If you're new to Blockly and you'd like a good starting point, take a look at our CodeLab and if you build your own cool demo let us know on our forums.

By Erik Pasternak, Kids Coding Team

RecSim: A Configurable Simulation Platform for Recommender Systems

Wednesday, December 4, 2019

Originally posted on the Google AI Blog

Significant advances in machine learning, speech recognition, and language technologies are rapidly transforming the way in which recommender systems engage with users. As a result, collaborative interactive recommenders (CIRs)—recommender systems that engage in a deliberate sequence of interactions with a user to best meet that user's needs—have emerged as a tangible goal for online services.

Despite this, the deployment of CIRs has been limited by challenges in developing algorithms and models that reflect the qualitative characteristics of sequential user interaction. Reinforcement learning (RL) is the de facto standard ML approach for addressing sequential decision problems, and as such is a natural paradigm for modeling and optimizing sequential interaction in recommender systems. However, it remains under-investigated and under-utilized for use in CIRs in both research and practice. One major impediment is the lack of general-purpose simulation platforms for sequential recommender settings, whereas simulation has been one of the primary means for developing and evaluating RL algorithms in real-world applications like robotics.

To address this, we have developed RᴇᴄSɪᴍ (available here), a configurable platform for authoring simulation environments to facilitate the study of RL algorithms in recommender systems (and CIRs in particular). RᴇᴄSɪᴍ allows both researchers and practitioners to test the limits of existing RL methods in synthetic recommender settings. RecSim’s aim is to support simulations that mirror specific aspects of user behavior found in real recommender systems and serve as a controlled environment for developing, evaluating and comparing recommender models and algorithms, especially RL systems designed for sequential user-system interaction.

As an open-source platform, RᴇᴄSɪᴍ: (i) facilitates research at the intersection of RL and recommender systems; (ii) encourages reproducibility and model-sharing; (iii) aids the recommender-systems practitioner, interested in applying RL to rapidly test and refine models and algorithms in simulation, before incurring the potential cost (e.g., time, user impact) of live experiments; and (iv) serves as a resource for academic-industry collaboration through the release of “realistic” stylized models of user behavior without revealing user data or sensitive industry strategies.

Reinforcement Learning and Recommendation Systems

One challenge in applying RL to recommenders is that most recommender research is developed and evaluated using static datasets that do not reflect the sequential, repeated interaction a recommender has with its users. Even those with temporal extent, such as MovieLens 1M, do not (easily) support predictions about the long-term performance of novel recommender policies that differ significantly from those used to collect the data, as many of the factors that impact user choice are not recorded within the data. This makes the evaluation of even basic RL algorithms very difficult, especially when it comes to reasoning about the long-term consequences of some new recommendation policy—research shows changes in policy can have long-term, cumulative impact on user behavior. The ability to model such user behaviors in a simulated environment, and devise and test new recommendation algorithms, including those using RL, can greatly accelerate the research and development cycle for such problems.

Overview of RᴇᴄSɪᴍ

RᴇᴄSɪᴍ simulates a recommender agent’s interaction with an environment consisting of a user model, a document model and a user choice model. The agent interacts with the environment by recommending sets or lists of documents (known as slates) to users, and has access to observable features of simulated individual users and documents to make recommendations. The user model samples users from a distribution over (configurable) user features (e.g., latent features, like interests or satisfaction; observable features, like user demographic; and behavioral features, such as visit frequency or time budget). The document model samples items from a prior distribution over document features, both latent (e.g., quality) and observable (e.g., length, popularity). This prior, as all other components of RᴇᴄSɪᴍ, can be specified by the simulation developer, possibly informed (or learned) from application data.

The level of observability for both user and document features is customizable. When the agent recommends documents to a user, the response is determined by a user-choice model, which can access observable document features and all user features. Other aspects of a user’s response (e.g., time spent engaging with the recommendation) can depend on latent document features, such as document topic or quality. Once a document is consumed, the user state undergoes a transition through a configurable user transition model, since user satisfaction or interests might change.

We note that RᴇᴄSɪᴍ provides the ability to easily author specific aspects of user behavior of interest to the researcher or practitioner, while ignoring others. This can provide the critical ability to focus on modeling and algorithmic techniques designed for novel phenomena of interest (as we illustrate in two applications below). This type of abstraction is often critical to scientific modeling. Consequently, high-fidelity simulation of all elements of user behavior is not an explicit goal of RᴇᴄSɪᴍ. That said, we expect that it may also serve as a platform that supports “sim-to-real” transfer in certain cases (see below).

Data Flow through components of RᴇᴄSɪᴍ. Colors represent different model components — user and user-choice models (green), document model (blue), and the recommender agent (red)

Applications

We have used RᴇᴄSɪᴍ to investigate several key research problems that arise in the use of RL in recommender systems. For example, slate recommendations can result in RL problems, since the parameter space for action grows exponentially with slate size, posing challenges for exploration, generalization and action optimization. We used RᴇᴄSɪᴍ to develop a novel decomposition technique that exploits simple, widely applicable assumptions about user choice behavior to tractably compute Q-values of entire recommendation slates. In particular, RᴇᴄSɪᴍ was used to test a number of experimental hypotheses, such as algorithm performance and robustness to different assumptions about user behavior.

Future Work

While RᴇᴄSɪᴍ provides ample opportunity for researchers and practitioners to probe and question assumptions made by RL/recommender algorithms in stylized environments, we are developing several important extensions. These include: (i) methodologies to fit stylized user models to usage logs to partially address the “sim-to-real” gap; (ii) the development of natural APIs using TensorFlow’s probabilistic APIs to facilitate model specification and learning, as well as scaling up simulation and inference algorithms using accelerators and distributed execution; and (iii) the extension to full-factor, mixed-mode interaction models that will be the hallmark of modern CIRs—e.g., language-based dialogue, preference elicitation, explanations, etc.

Our hope is that RᴇᴄSɪᴍ will serve as a valuable resource that bridges the gap between recommender systems and RL research — the use cases above are examples of how it can be used in this fashion. We also plan to pursue it as a platform to support academic-industry collaborations, through the sharing of stylized models of user behavior that, at suitable levels of abstraction, reflect a degree of realism that can drive useful model and algorithm development.

Further details of the RᴇᴄSɪᴍ framework can be found in the white paper, while code and colabs/tutorials are available here.

Acknowledgements
We thank our collaborators and early adopters of RᴇᴄSɪᴍ, including the other members of the RᴇᴄSɪᴍ team: Eugene Ie, Vihan Jain, Sanmit Narvekar, Jing Wang, Rui Wu and Craig Boutilier.

By Martin Mladenov, Research Scientist and Chih-wei Hsu, Software Engineer, Google Research

Google Code-in 2019 Contest for Teenagers

Monday, December 2, 2019

Today is the start of the 10th consecutive year of the Google Code-in (GCI) contest for teens. We anticipate this being the biggest contest yet!

The Basics

What is Google Code-in?
Our global, online contest introducing students to open source development. The contest runs for seven weeks until January 23, 2020.

Who can register?
Pre-university students ages 13-17 that have their parent or guardian’s permission to register for the contest.

How do students register and participate?
Students can register for the contest beginning today at g.co/gci. Once students have registered, and the parental consent form has been submitted and approved by Program Administrators, students can choose which “task” they want to work on first. Students choose the task they find interesting from a list of thousands of available tasks created by 29 participating open source organizations. Tasks take an average of 3-5 hours to complete. There are even beginner tasks that are a wonderful way for students to get started in the contest.

The task categories are:

Coding
Design
Documentation/Training
Outreach/Research
Quality Assurance

Why should students participate?
Students not only have the opportunity to work on a real open source software project, thus gaining invaluable skills and experience, but they also have the opportunity to be a part of the open source community. Mentors are readily available to help answer their questions while they work through the tasks.

Google Code-in is a contest so there are prizes*! Complete one task and receive a digital certificate, three completed tasks and you’ll also get a fun Google t-shirt. Finalists earn a jacket, runners-up earn backpacks, and grand prize winners (two from each organization) will receive a trip to Google headquarters in California in 2020!

Details
Over the past nine years, more than 11,000 students from 108 countries have successfully completed over 55,000 tasks in GCI. Curious? Learn more about GCI by checking out the Contest Rules, short videos, and FAQs. Please visit our contest site and read the Getting Started Guide.

Teachers, if you are interested in getting your students involved in Google Code-in we have resources available to help you get started.

By Stephanie Taylor, Google Open Source

* There are a handful of countries we are unable to ship physical goods to, as listed in the FAQs.