Google Open Source Blog: December 2018

Posts from December 2018

Wrapping up Google Code-in 2018

Wednesday, December 12, 2018

We are excited to announce the conclusion of the 9th annual Google Code-in (GCI), our global online contest introducing teenagers to the world of open source development. Over the years the contest has not only grown bigger, but also helped find and support talented young people around the world.

Here are some initial statistics about this year’s program:

Total number of students completing tasks: 3,123*
Total number of countries represented by students: 77
Percentage of girls among students: 17.9%

Below you can see the total number of tasks completed by students year over year:

*These numbers will increase as mentors finish reviewing the final work submitted by students this morning.

Mentors from each of the 27 open source organizations are now busy reviewing the last work submitted by participants. We look forward to sharing more statistics about the program, including countries and schools with the most student participants, in an upcoming blog post.

The mentors for each organization will spend the next couple of weeks selecting four Finalists (who will receive a hoodie too!) and their two Grand Prize Winners. Grand Prize Winners will be flown to Northern California to visit Google’s headquarters, enjoy a day of adventure in San Francisco, meet their mentors and hear talks from Google engineers.

Hearty congratulations to all the student participants for challenging themselves and making contributions to open source in the process!

Further, we’d like to thank the mentors and the organization administrators for GCI 2018. They are the heart of this program, volunteering countless hours creating tasks, reviewing student work, and helping bring students into the world of open source. Mentors teach young students about the many facets of open source development, from community standards and communicating across time zones to version control and testing. We couldn’t run this program without you! Thank you!

Stay tuned, we’ll be announcing the Grand Prize Winners and Finalists on January 7, 2019!

By Saranya Sampat, Google Open Source

Knative momentum continues, hits another adoption milestone

Monday, December 10, 2018

Released just four months ago by Google Cloud in collaboration with several vendors, Knative, an open source platform based on Kubernetes which provides the building blocks for serverless workloads, has already gained broad support.

The number of contributors has doubled, more than a dozen companies have contributed each month, and community contributions have increased over 45% since the 0.1 release. It’s an encouraging signal that validates the need for such a project, and suggests that ongoing development will be driven by healthy discussions among users and contributors.

Knative 0.2 Release

In recent 0.2 release, the first major release since the project’s launch in July, we incorporated 323 pull requests from eight different companies. Knative 0.2 added a new Eventing resource model to complement the Serving and Build components. There were also lots of improvements under-the-hood, such as the implementation of pluggable routing and better support for autoscaling.

KubeCon North America

Continuing the theme, there are 10 sessions about Knative by speakers from seven different companies this week at KubeCon in Seattle. The sessions cover a variety of topics spanning from introductory overview sessions to advanced autoscaler customization. The number of companies represented by speakers illustrates the breadth of the growing Knative community.

Growing Ecosystem

Another sign of Knative’s momentum is the growing ecosystem. A number of enterprise platform developers have begun using Knative to create serverless solutions on Kubernetes for their own hybrid cloud use-cases. Their use of the Knative API makes for a consistent developer experience and enables workload portability. For example, Pivotal, a top contributor to the Knative project, has adopted Knative alongside Kubernetes which helps them dedicate more resources higher in the stack:

"Since the release of Knative, we've been collaborating on an open functions platform to help companies run their new workloads on every cloud. That’s why we’re excited to launch the alpha of Pivotal Function Service." – Onsi Fakhouri, SVP of Engineering at Pivotal

Similarly, TriggerMesh has launched a hosted serverless management platform that runs on top of Knative, enabling developers to deploy and manage their functions from a central console.

"Knative provides us with the critical building blocks we need to create our serverless management platform." – Sebastien Goasguen, Co-founder, TriggerMesh

We’re excited by the speed with which Knative is being adopted and the broad cross section of the industry that is already contributing to the project. If you haven’t already jumped in, we invite you to get involved! Come visit github.com/knative and join the growing Knative community.

By Mark Chmarny, Knative Team

Google joins the OpenChain Project for license compliance

Thursday, December 6, 2018

Google is thrilled to announce that we are joining the OpenChain Project as Platinum Members. OpenChain is an effort to make open source license compliance simpler and more consistent. We will also join the OpenChain board and are excited that Facebook and Uber will be fellow board members.

Over the last 14 years, the Open Source Programs Office (OSPO) at Google has developed rigorous policies and processes so that we can do open source license compliance correctly, and at scale. This helps us use free and open source software extensively across the company and makes it easier to upstream our work. For us, it’s a matter of legal compliance as well as showing respect for the amazing communities that create and maintain the software.

Until now, there’s been no commonly accepted standard for open source compliance within an organization. Most organizations, like Google, have had to invent and cobble together policies and processes, occasionally comparing notes and hoping we haven’t forgotten anything.

The OpenChain Project is changing that by defining the core requirements of a quality compliance program and developing curriculum to help with training and management. It’s hard to overstate the importance of this work now that open source is a critical input at every step in the supply chain, both in hardware and software.

Google believes in this mission and is excited for the opportunity to use what we’ve learned to pave the way for the rest of the industry. We can help guide the development of standards that are rigorous, clear, and easy to follow for companies both large and small.

By Max Sills and Josh Simmons, Google Open Source

TF-Ranking: a scalable TensorFlow library for learning-to-rank

Wednesday, December 5, 2018

Cross-posted from the Google AI Blog.

Ranking, the process of ordering a list of items in a way that maximizes the utility of the entire list, is applicable in a wide range of domains, from search engines and recommender systems to machine translation, dialogue systems and even computational biology. In applications like these (and many others), researchers often utilize a set of supervised machine learning techniques called learning-to-rank. In many cases, these learning-to-rank techniques are applied to datasets that are prohibitively large — scenarios where the scalability of TensorFlow could be an advantage. However, there is currently no out-of-the-box support for applying learning-to-rank techniques in TensorFlow. To the best of our knowledge, there are also no other open source libraries that specialize in applying learning-to-rank techniques at scale.

Today, we are excited to share TF-Ranking, a scalable TensorFlow-based library for learning-to-rank. As described in our recent paper, TF-Ranking provides a unified framework that includes a suite of state-of-the-art learning-to-rank algorithms, and supports pairwise or listwise loss functions, multi-item scoring, ranking metric optimization, and unbiased learning-to-rank.

TF-Ranking is fast and easy to use, and creates high-quality ranking models. The unified framework gives ML researchers, practitioners and enthusiasts the ability to evaluate and choose among an array of different ranking models within a single library. Moreover, we strongly believe that a key to a useful open source library is not only providing sensible defaults, but also empowering our users to develop their own custom models. Therefore, we provide flexible API's, within which the users can define and plug in their own customized loss functions, scoring functions and metrics.

Existing Algorithms and Metrics Support

The objective of learning-to-rank algorithms is minimizing a loss function defined over a list of items to optimize the utility of the list ordering for any given application. TF-Ranking supports a wide range of standard pointwise, pairwise and listwise loss functions as described in prior work. This ensures that researchers using the TF-Ranking library are able to reproduce and extend previously published baselines, and practitioners can make the most informed choices for their applications. Furthermore, TF-Ranking can handle sparse features (like raw text) through embeddings and scales to hundreds of millions of training instances. Thus, anyone who is interested in building real-world data intensive ranking systems such as web search or news recommendation, can use TF-Ranking as a robust, scalable solution.

Empirical evaluation is an important part of any machine learning or information retrieval research. To ensure compatibility with prior work, we support many of the commonly used ranking metrics, including Mean Reciprocal Rank (MRR) and Normalized Discounted Cumulative Gain (NDCG). We also make it easy to visualize these metrics at training time on TensorBoard, an open source TensorFlow visualization dashboard.

An example of the NDCG metric (Y-axis) along the training steps (X-axis) displayed in the TensorBoard. It shows the overall progress of the metrics during training. Different methods can be compared directly on the dashboard. Best models can be selected based on the metric.

Multi-Item Scoring

TF-Ranking supports a novel scoring mechanism wherein multiple items (e.g., web pages) can be scored jointly, an extension of the traditional scoring paradigm in which single items are scored independently. One challenge in multi-item scoring is the difficulty for inference where items have to be grouped and scored in subgroups. Then, scores are accumulated per-item and used for sorting. To make these complexities transparent to the user, TF-Ranking provides a List-In-List-Out (LILO) API to wrap all this logic in the exported TF models.

The TF-Ranking library supports multi-item scoring architecture, an extension of traditional single-item scoring.

As we demonstrate in recent work, multi-item scoring is competitive in its performance to the state-of-the-art learning-to-rank models such as RankNet, MART, and LambdaMART on a public LETOR benchmark.

Ranking Metric Optimization

An important research challenge in learning-to-rank is direct optimization of ranking metrics (such as the previously mentioned NDCG and MRR). These metrics, while being able to measure the performance of ranking systems better than the standard classification metrics like Area Under the Curve (AUC), have the unfortunate property of being either discontinuous or flat. Therefore standard stochastic gradient descent optimization of these metrics is problematic.

In recent work, we proposed a novel method, LambdaLoss, which provides a principled probabilistic framework for ranking metric optimization. In this framework, metric-driven loss functions can be designed and optimized by an expectation-maximization procedure. The TF-Ranking library integrates the recent advances in direct metric optimization and provides an implementation of LambdaLoss. We are hopeful that this will encourage and facilitate further research advances in the important area of ranking metric optimization.

Unbiased Learning-to-Rank

Prior research has shown that given a ranked list of items, users are much more likely to interact with the first few results, regardless of their relevance. This observation has inspired research interest in unbiased learning-to-rank, and led to the development of unbiased evaluation and several unbiased learning algorithms, based on training instances re-weighting. In the TF-Ranking library, metrics are implemented to support unbiased evaluation and losses are implemented for unbiased learning by natively supporting re-weighting to overcome the inherent biases in user interactions datasets.

Getting Started with TF-Ranking

TF-Ranking implements the TensorFlow Estimator interface, which greatly simplifies machine learning programming by encapsulating training, evaluation, prediction and export for serving. TF-Ranking is well integrated with the rich TensorFlow ecosystem. As described above, you can use TensorBoard to visualize ranking metrics like NDCG and MRR, as well as to pick the best model checkpoints using these metrics. Once your model is ready, it is easy to deploy it in production using TensorFlow Serving.

If you’re interested in trying TF-Ranking for yourself, please check out our GitHub repo, and walk through the tutorial examples. TF-Ranking is an active research project, and we welcome your feedback and contributions. We are excited to see how TF-Ranking can help the information retrieval and machine learning research communities.

By Xuanhui Wang and Michael Bendersky, Software Engineers, Google AI

Acknowledgements

This project was only possible thanks to the members of the core TF-Ranking team: Rama Pasumarthi, Cheng Li, Sebastian Bruch, Nadav Golbandi, Stephan Wolf, Jan Pfeifer, Rohan Anil, Marc Najork, Patrick McGregor and Clemens Mewald‎. We thank the members of the TensorFlow team for their advice and support: Alexandre Passos, Mustafa Ispir, Karmel Allison, Martin Wicke, and others. Finally, we extend our special thanks to our collaborators, interns and early adopters: Suming Chen, Zhen Qin, Chirag Sethi, Maryam Karimzadehgan, Makoto Uchida, Yan Zhu, Qingyao Ai, Brandon Tran, Donald Metzler, Mike Colagrosso, and many others at Google who helped in evaluating and testing the early versions of TF-Ranking.