Google Open Source Blog: 2017

Posts from 2017

Google Code-in is breaking records

Friday, December 15, 2017

It’s been an incredible (and incredibly busy!) three weeks for the 25 mentor organizations participating in Google Code-in (GCI) 2017, our seven week global contest designed to introduce teens to open source software development. Participants complete bite sized “tasks” in topics that include coding, documentation, UI/UX, quality assurance and more. Volunteer mentors from each open source project help participants along the way.

Total registered students has already surpassed 2016 numbers and we are less than halfway to the finish! We’re thrilled that high school students are embracing GCI like never before.

Check out some of the statistics below (current as of Thursday, December 14):

Total registered students: 6,146
Number of students who have completed at least one task: 1,573 (51% of those students have completed more than 3 tasks, earning them a GCI t-shirt)
Total number of tasks completed: 5,499
Most tasks completed by one student: 39

Top 5 Countries by Tasks Completed

Countries Represented by Mentors and Students

Of course, GCI wouldn’t be possible without the effort of the more than 725 mentors and organization administrators. Based in 65 countries, mentors answer questions, review submissions, and approve tasks for students at all hours of the day -- and sometimes night! They work tirelessly to help encourage and guide the next generation of open source contributors.

Every year we express our gratitude to the mentors and organization administrators. We are particularly grateful for them given how many more students are participating in GCI this year. Thank you all, and hang in there!

By Mary Radomile, Google Open Source

TFGAN: A Lightweight Library for Generative Adversarial Networks

Tuesday, December 12, 2017

Crossposted on the Google Research Blog

Training a neural network usually involves defining a loss function, which tells the network how close or far it is from its objective. For example, image classification networks are often given a loss function that penalizes them for giving wrong classifications; a network that mislabels a dog picture as a cat will get a high loss. However, not all problems have easily-defined loss functions, especially if they involve human perception, such as image compression or text-to-speech systems. Generative Adversarial Networks (GANs), a machine learning technique that has led to improvements in a wide range of applications including generating images from text, superresolution, and helping robots learn to grasp, offer a solution. However, GANs introduce new theoretical and software engineering challenges, and it can be difficult to keep up with the rapid pace of GAN research.

A video of a generator improving over time. It begins by producing random noise, and eventually learns to generate MNIST digits.

In order to make GANs easier to experiment with, we’ve open sourced TFGAN, a lightweight library designed to make it easy to train and evaluate GANs. It provides the infrastructure to easily train a GAN, provides well-tested loss and evaluation metrics, and gives easy-to-use examples that highlight the expressiveness and flexibility of TFGAN. We’ve also released a tutorial that includes a high-level API to quickly get a model trained on your data.

This demonstrates the effect of an adversarial loss on image compression. The top row shows image patches from the ImageNet dataset. The middle row shows the results of compressing and uncompressing an image through an image compression neural network trained on a traditional loss. The bottom row shows the results from a network trained with a traditional loss and an adversarial loss. The GAN-loss images are sharper and more detailed, even if they are less like the original.

TFGAN supports experiments in a few important ways. It provides simple function calls that cover the majority of GAN use-cases so you can get a model running on your data in just a few lines of code, but is built in a modular way to cover more exotic GAN designs as well. You can just use the modules you want -- loss, evaluation, features, training, etc. are all independent.. TFGAN’s lightweight design also means you can use it alongside other frameworks, or with native TensorFlow code. GAN models written using TFGAN will easily benefit from future infrastructure improvements, and you can select from a large number of already-implemented losses and features without having to rewrite your own. Lastly, the code is well-tested, so you don’t have to worry about numerical or statistical mistakes that are easily made with GAN libraries.

Most neural text-to-speech (TTS) systems produce over-smoothed spectrograms. When applied to the Tacotron TTS system, a GAN can recreate some of the realistic-texture, which reduces artifacts in the resulting audio.

When you use TFGAN, you’ll be using the same infrastructure that many Google researchers use, and you’ll have access to the cutting-edge improvements that we develop with the library. Anyone can contribute to the github repositories, which we hope will facilitate code-sharing among ML researchers and users.

By Joel Shor, Senior Software Engineer, Machine Perception

Announcing the S2 Library: Geometry on the Sphere

Tuesday, December 5, 2017

Google has always embraced new approaches to organizing all the world's information, and this includes all the world's geography. Today we are announcing the open source release of Google's S2 library, the core geometric library on which Google's global geographic database is built.

A unique feature of the S2 library is that unlike traditional geographic information systems, which represent data as flat two-dimensional projections (similar to an atlas), the S2 library represents all data on a three-dimensional sphere (similar to a globe). This makes it possible to build a worldwide geographic database with no seams or singularities, using a single coordinate system, and with low distortion everywhere compared to the true shape of the Earth. While the Earth is not quite spherical, it is much closer to being a sphere than it is to being flat!

Notable features of the library include:

Flexible support for spatial indexing, including the ability to approximate arbitrary regions as collections of discrete S2 cells. This feature makes it easy to build large distributed spatial indexes. (The image above illustrates the S2 space-filling curve, an important tool used for spatial indexing.)
Fast in-memory spatial indexing of collections of points, polylines, and polygons.
Robust constructive operations (such as intersection, union, and simplification) and boolean predicates (such as testing for containment).
Efficient query operations for finding nearby objects, measuring distances, computing centroids, etc.
A flexible and robust implementation of snap rounding (a geometric technique that allows operations to be implemented 100% robustly while using small and fast coordinate representations).
A collection of efficient yet exact mathematical predicates for testing relationships among geometric primitives.
Extensive testing on Google's vast collection of geographic data.
Flexible Apache 2.0 license.

The reference implementation of the S2 library is written in C++, and subsets have been ported to Go, Java, and Python. An early version of the code was released in 2011, but today's announcement represents a major update along with a commitment to maintain the library going forward. The code is under active development and new features will be released regularly. (The Java port is based on the 2011 code and does not have the same robustness, performance, or features as the current C++ version.)

Our C++ code repository is here: https://github.com/google/s2geometry
And check out our documentation here: https://s2geometry.io

To learn more, start by reading the overview and quick start documents, then explore the documentation site. The library also has extensive documentation in the header files, which is where the most authoritative information can be found. More introductions and tutorials will be added over time - contributions are welcome!

The S2 library was written primarily by Eric Veach. Other significant contributors include Jesse Rosenstock, Eric Engle (Java port lead), Robert Snedegar (Go port lead), Julien Basch, and Tom Manshreck.

By Eric Veach, Software Engineer

DeepVariant: Highly Accurate Genomes With Deep Neural Networks

Monday, December 4, 2017

Crossposted on the Google Research Blog

Across many scientific disciplines, but in particular in the field of genomics, major breakthroughs have often resulted from new technologies. From Sanger sequencing, which made it possible to sequence the human genome, to the microarray technologies that enabled the first large-scale genome-wide experiments, new instruments and tools have allowed us to look ever more deeply into the genome and apply the results broadly to health, agriculture and ecology.

One of the most transformative new technologies in genomics was high-throughput sequencing (HTS), which first became commercially available in the early 2000s. HTS allowed scientists and clinicians to produce sequencing data quickly, cheaply, and at scale. However, the output of HTS instruments is not the genome sequence for the individual being analyzed — for humans this is 3 billion paired bases (guanine, cytosine, adenine and thymine) organized into 23 pairs of chromosomes. Instead, these instruments generate ~1 billion short sequences, known as reads. Each read represents just 100 of the 3 billion bases, and per-base error rates range from 0.1-10%. Processing the HTS output into a single, accurate and complete genome sequence is a major outstanding challenge. The importance of this problem, for biomedical applications in particular, has motivated efforts such as the Genome in a Bottle Consortium (GIAB), which produces high confidence human reference genomes that can be used for validation and benchmarking, as well as the precisionFDA community challenges, which are designed to foster innovation that will improve the quality and accuracy of HTS-based genomic tests.

CAPTION: For any given location in the genome, there are multiple reads among the ~1 billion that include a base at that position. Each read is aligned to a reference, and then each of the bases in the read is compared to the base of the reference at that location. When a read includes a base that differs from the reference, it may indicate a variant (a difference in the true sequence), or it may be an error.

Today, we announce the open source release of DeepVariant, a deep learning technology to reconstruct the true genome sequence from HTS sequencer data with significantly greater accuracy than previous classical methods. This work is the product of more than two years of research by the Google Brain team, in collaboration with Verily Life Sciences. DeepVariant transforms the task of variant calling, as this reconstruction problem is known in genomics, into an image classification problem well-suited to Google's existing technology and expertise.

CAPTION: Each of the four images above is a visualization of actual sequencer reads aligned to a reference genome. A key question is how to use the reads to determine whether there is a variant on both chromosomes, on just one chromosome, or on neither chromosome. There is more than one type of variant, with SNPs and insertions/deletions being the most common. A: a true SNP on one chromosome pair, B: a deletion on one chromosome, C: a deletion on both chromosomes, D: a false variant caused by errors. It's easy to see that these look quite distinct when visualized in this manner.

We started with GIAB reference genomes, for which there is high-quality ground truth (or the closest approximation currently possible). Using multiple replicates of these genomes, we produced tens of millions of training examples in the form of multi-channel tensors encoding the HTS instrument data, and then trained a TensorFlow-based image classification model to identify the true genome sequence from the experimental data produced by the instruments. Although the resulting deep learning model, DeepVariant, had no specialized knowledge about genomics or HTS, within a year it had won the the highest SNP accuracy award at the precisionFDA Truth Challenge, outperforming state-of-the-art methods. Since then, we've further reduced the error rate by more than 50%.

DeepVariant is being released as open source software to encourage collaboration and to accelerate the use of this technology to solve real world problems. To further this goal, we partnered with Google Cloud Platform (GCP) to deploy DeepVariant workflows on GCP, available today, in configurations optimized for low-cost and fast turnarounds using scalable GCP technologies like the Pipelines API. This paired set of releases provides a smooth ramp for users to explore and evaluate the capabilities of DeepVariant in their current compute environment while providing a scalable, cloud-based solution to satisfy the needs of even the largest genomics datasets.

DeepVariant is the first of what we hope will be many contributions that leverage Google's computing infrastructure and ML expertise to both better understand the genome and to provide deep learning-based genomics tools to the community. This is all part of a broader goal to apply Google technologies to healthcare and other scientific applications, and to make the results of these efforts broadly accessible.

By Mark DePristo and Ryan Poplin, Google Brain Team

Google Summer of Code 2017 Mentor Summit

Thursday, November 30, 2017

This year Google brought over 320 mentors from all over the world (33 countries!) to Google's offices in Sunnyvale, California for the 2017 Google Summer of Code Mentor Summit. This year 149 organizations were represented, which provided the perfect opportunity to meet like-minded open source enthusiasts and discuss ways to make open source better and more sustainable.

Group photo by Dmitry Levin used under a CC BY-SA 4.0 license.

The Mentor Summit is run as an unconference in which attendees create and join sessions based on their interests. “I liked the unconference sessions, that they were casual and discussion based and I got a lot out of them. It was the place I connected with the most people,” said Cassie Tarakajian, attending on behalf of the Processing Foundation.

Attendees quickly filled the schedule boards with interesting sessions. One theme in this year’s session schedule was the challenging topic of failing students. Derk Ruitenbeek, part of the phpBB contingent, had this to say:

“This year our organisation had a high failure rate of 3 out of 5 accepted students. During the Mentor Summit I attended multiple sessions about failing students and rating proposals and got a lot [of] useful tips. Talking with other mentors about this really helped me find ways to improve student selection for our organisation next time.”

This year was the largest Mentor Summit ever – with the exception of our 10 Year Reunion in 2014 – and had the best gender diversity yet. Katarina Behrens, a mentor who worked with LibreOffice, observed:

“I was pleased to see many more women at the summit than last time I participated. I'm also beyond happy that now not only women themselves, but also men engage in increasing (not only gender) diversity of their projects and teams.”

We've held the Mentor Summit for the past 10+ years as a way to meet some of the thousands of mentors whose generous work for the students makes the program successful, and to give some of them and the projects they represent a chance to meet. This year was their first Mentor Summit for 52% of the attendees, giving us a lot of fresh perspectives to learn from!

We love hosting the Mentor Summit and attendees enjoy it, as well, especially the opportunity to meet each other. In fact, some attendees met in person for the first time at the Mentor Summit after years of collaborating remotely! According to Aveek Basu, who mentored for The Linux Foundation, the event was an excellent opportunity for “networking with like minded people from different communities. Also it was nice to know about people working in different fields from bioinformatics to robotics, and not only hard core computer science.”

You can browse the event website and read through some of the session notes that attendees took to learn a bit more about this year’s Mentor Summit.

Now that Google Summer of Code 2017 and the Mentor Summit have come to a close, our team is busy gearing up for the 2018 program. We hope to see you then!

By Maria Webb, Google Open Source

Google Code-in contest for teenagers starts today!

Tuesday, November 28, 2017

Today marks the start of the 8th consecutive year of Google Code-in (GCI). It’s the biggest contest ever and we hope you’ll come along for the ride!

The Basics

What is Google Code-in?

Our global, online contest introducing students to open source development. The contest runs for 7 weeks until January 17, 2018.

Who can register?

Pre-university students ages 13-17 that have their parent or guardian’s permission to register for the contest.

How do students register?

Students can register for the contest beginning today at g.co/gci. Once students have registered and the parental consent form has been submitted, students can choose which task they want to work on first. Students choose the task they find interesting from a list of hundreds of available tasks created by 25 participating open source organizations. Tasks take an average of 3-5 hours to complete. The task categories are:

Coding
Documentation/Training
Outreach/Research
Quality Assurance
User Interface

Why should students participate?

Students not only have the opportunity to work on a real open source software project, thus gaining invaluable experience, but they also have the opportunity to be a part of the open source community. Mentors are readily available to help answer their questions while they work through the tasks.

Google Code-in is a contest so there are prizes! Complete one task and receive a digital certificate. Three completed tasks and you’ll also get a fun Google t-shirt. Finalists get a hoodie. Grand Prize winners receive an all expense paid trip to Google headquarters in California!

Details

Over the last 7 years, more than 4,500 students from 99 countries have successfully completed over 23,000 tasks in GCI. Intrigued? Learn more about GCI by checking out our rules and FAQs. And please visit our contest site and read the Getting Started Guide.

Teachers, if you are interested in getting your students involved in Google Code-in we have resources available to help you get started.

By Stephanie Taylor, Google Open Source

Adopting a Community-Oriented Approach to Open Source License Compliance

Monday, November 27, 2017

Today Google joins Red Hat, Facebook, and IBM alongside the Linux Kernel Community in increasing the predictability of open source license compliance and enforcement.

We are taking an approach to compliance enforcement that is consistent with the Principles of Community-Oriented GPL Enforcement. We hope that this will encourage greater collaboration on open source projects, and foster discussion on how we can all continue to work closely together.

You can learn more about today’s announcement in Red Hat’s press release and in our GPL Enforcement Statement.

By Chris DiBona, Director of Open Source

Introducing container-diff, a tool for quickly comparing container images

Thursday, November 16, 2017

The Google Container Tools team originally built container-diff, a new project to help uncover differences between container images, to aid our own development with containers. We think it can be useful for anyone building containerized software, so we’re excited to release it as open source to the development community.

Containers and the Dockerfile format help make customization of an application’s runtime environment more approachable and easier to understand. While this is a great advantage of using containers in software development, a major drawback is that it can be hard to visualize what changes in a container image will result from a change in the respective Dockerfile. This can lead to bloated images and make tracking down issues difficult.

Imagine a scenario where a developer is working on an application, built on a runtime image maintained by a third-party. During development someone releases a new version of that base image with updated system packages. The developer rebuilds their application and picks up the latest version of the base image, and suddenly their application stops working; it depended on a previous version of one of the installed system packages, but which one? What version was it on before? With no currently existing tool to easily determine what changed between the two base image versions, this totally stalls development until the developer can track down the package version incompatibility.

Introducing container-diff

container-diff helps users investigate image changes by computing semantic diffs between images. What this means is that container-diff figures out on a low-level what data changed, and then combines this with an understanding of package manager information to output this information in a format that’s actually readable to users. The tool can find differences in system packages, language-level packages, and files in a container image.

Users can specify images in several formats - from local Docker daemon (using the prefix `daemon://` on the image path), a remote registry (using the prefix `remote://`), or a file in the .tar in the format exported by "docker save" command. You can also combine these formats to compute the diff between a local version of an image and a remote version. This can be useful when experimenting with new builds of an image that you might not be quite ready to push yet. container-diff supports image tarballs and the registry protocol natively, enabling it to run in environments without a Docker daemon.

Examples and Use Cases

Here is a basic Dockerfile that installs Python inside our Debian base image. Running container-diff on the base image and the new one with Python, users can see all the apt packages that were installed as dependencies of Python.

And below is a Dockerfile that inherits from our Python base runtime image, and then installs the mock and six packages inside of it. Running container-diff with the pip differ, users can see all the Python packages that have either been installed or changed as a result of this:

This can be especially useful when it’s unclear which packages might have been installed or changed incidentally as a result of dependency management of Python modules.

These are just a few examples. The tool currently has support for Python and Node.js packages installed via pip and npm, respectively, as well as comparison of image filesystems and Docker history. In the future, we’d like to see support added for additional runtime and language differs, including Java, Go, and Ruby. External contributions are welcome! For more information on contributing to container-diff, see this how-to guide.

Now that we’ve seen container-diff compare two images in action, it’s easy to imagine how the tool may be integrated into larger workflows to aid in development:

Changelog generation: Given container-diff’s capacity to facilitate investigation of filesystem and package modifications, it can do most of the heavy lifting in discerning changes for automatic changelog generation for new releases of an image.
Continuous integration: As part of a CI system, users can leverage container-diff to catch potentially breaking filesystem changes resulting from a Dockerfile change in their builds.

container-diff’s default output mode is “human-readable,” but also supports output to JSON, allowing for easy automated parsing and processing by users.

Single Image Analysis

In addition to comparing two images, container-diff has the ability to analyze a single image on its own. This can enable users to get a quick glance at information about an image, such as its system and language-level package installations and filesystem contents.

Let’s take a look at our Debian base image again. We can use the tool to easily view a list of all packages installed in the image, along with each one’s installed version and size:

We could use this to verify compatibility with an application we’re building, or maybe sort the packages by size in another one of our images and see which ones are taking up the most space.

For more information about this tool as well as a breakdown with examples, uses, and inner workings of the tool, please take a look at documentation on our GitHub page. Happy diffing!

Special thanks to Colette Torres and Abby Tisdale, our software engineering interns who helped build the tool from the ground up.

By Nick Kubala, Container Tools team

Tangent: Source-to-Source Debuggable Derivatives

Wednesday, November 8, 2017

Crossposted on the Google Research Blog

Tangent is a new, free, and open source Python library for automatic differentiation. In contrast to existing machine learning libraries, Tangent is a source-to-source system, consuming a Python function f and emitting a new Python function that computes the gradient of f. This allows much better user visibility into gradient computations, as well as easy user-level editing and debugging of gradients. Tangent comes with many more features for debugging and designing machine learning models.

This post gives an overview of the Tangent API. It covers how to use Tangent to generate gradient code in Python that is easy to interpret, debug and modify.

Neural networks (NNs) have led to great advances in machine learning models for images, video, audio, and text. The fundamental abstraction that lets us train NNs to perform well at these tasks is a 30-year-old idea called reverse-mode automatic differentiation (also known as backpropagation), which comprises two passes through the NN. First, we run a “forward pass” to calculate the output value of each node. Then we run a “backward pass” to calculate a series of derivatives to determine how to update the weights to increase the model’s accuracy.

Training NNs, and doing research on novel architectures, requires us to compute these derivatives correctly, efficiently, and easily. We also need to be able to debug these derivatives when our model isn’t training well, or when we’re trying to build something new that we do not yet understand. Automatic differentiation, or just “autodiff,” is a technique to calculate the derivatives of computer programs that denote some mathematical function, and nearly every machine learning library implements it.

Existing libraries implement automatic differentiation by tracing a program’s execution (at runtime, like TF Eager, PyTorch and Autograd) or by building a dynamic data-flow graph and then differentiating the graph (ahead-of-time, like TensorFlow). In contrast, Tangent performs ahead-of-time autodiff on the Python source code itself, and produces Python source code as its output.

As a result, you can finally read your automatic derivative code just like the rest of your program. Tangent is useful to researchers and students who not only want to write their models in Python, but also read and debug automatically-generated derivative code without sacrificing speed and flexibility.

You can easily inspect and debug your models written in Tangent, without special tools or indirection. Tangent works on a large and growing subset of Python, provides extra autodiff features other Python ML libraries don’t have, is high-performance, and is compatible with TensorFlow and NumPy.

Automatic differentiation of Python code

How do we automatically generate derivatives of plain Python code? Math functions like tf.exp or tf.log have derivatives, which we can compose to build the backward pass. Similarly, pieces of syntax, such as subroutines, conditionals, and loops, also have backward-pass versions. Tangent contains recipes for generating derivative code for each piece of Python syntax, along with many NumPy and TensorFlow function calls.

Tangent has a one-function API:

import tangent
df = tangent.grad(f)

Here’s an animated graphic of what happens when we call tangent.grad on a Python function:

If you want to print out your derivatives, you can run

import tangent
df = tangent.grad(f, verbose=1)

Under the hood, tangent.grad first grabs the source code of the Python function you pass it. Tangent has a large library of recipes for the derivatives of Python syntax, as well as TensorFlow Eager functions. The function tangent.grad then walks your code in reverse order, looks up the matching backward-pass recipe, and adds it to the end of the derivative function. This reverse-order processing gives the technique its name: reverse-mode automatic differentiation.

The function df above only works for scalar (non-array) inputs. Tangent also supports

Although we started with TensorFlow Eager support, Tangent isn’t tied to one numeric library or another—we would gladly welcome pull requests adding PyTorch or MXNet derivative recipes.

Next Steps

Tangent is open source now at github.com/google/tangent. Go check it out for download and installation instructions. Tangent is still an experiment, so expect some bugs. If you report them to us on GitHub, we will do our best to fix them quickly.

We are working to add support in Tangent for more aspects of the Python language (e.g., closures, inline function definitions, classes, more NumPy and TensorFlow functions). We also hope to add more advanced automatic differentiation and compiler functionality in the future, such as automatic trade-off between memory and compute (Griewank and Walther 2000; Gruslys et al., 2016), more aggressive optimizations, and lambda lifting.

We intend to develop Tangent together as a community. We welcome pull requests with fixes and features. Happy deriving!

By Alex Wiltschko, Research Scientist, Google Brain Team

Acknowledgments

Bart van Merriënboer contributed immensely to all aspects of Tangent during his internship, and Dan Moldovan led TF Eager integration, infrastructure and benchmarking. Also, thanks to the Google Brain team for their support of this post and special thanks to Sanders Kleinfeld and Aleks Haecky for their valuable contribution for the technical aspects of the post.

Welcoming 25 mentor organizations for Google Code-in 2017

Thursday, October 26, 2017

We’re thrilled to introduce 25 open source organizations that are participating in Google Code-in 2017. The contest, now in its eighth year, offers 13-17 year old pre-university students an opportunity to learn and practice their skills while contributing to open source projects.

Google Code-in officially starts for students on November 28. Students are encouraged to learn about the participating organizations ahead of time and can get started by clicking on the links below:

Apertium: rule-based machine translation platform
BRL-CAD: computer graphics, 2D and 3D geometry modeling and computer-aided design (CAD)
Catrobat: visual programming for creating mobile games and animations
CCExtractor: open source tools for subtitle generation
CloudCV: building platforms for reproducible AI research
coala: a unified interface for linting and fixing code, regardless of the programming languages used
Drupal: content management platform
FOSSASIA: developing communities across all ages and borders to form a better future with Open Technologies and ICT
Haiku: operating system specifically targeting personal computing
JBoss Community: a community of projects around JBoss Middleware
LibreHealth: aiming to bring open source healthcare IT to all of humanity
Liquid Galaxy: an interactive, panoramic and immersive visualization tool
MetaBrainz: builds community maintained databases
Mifos Initiative: transforming the delivery of financial services to the poor and the unbanked
MovingBlocks: a Minecraft-inspired open source game
OpenMRS: open source medical records system for the world
OpenWISP: build and manage low cost networks such as public wifi
OSGeo: building open source geospatial tools
Sugar Labs: learning platform and activities for elementary education
SCoRe: research lab seeking sustainable solutions for problems faced by developing countries
Systers: community for women involved in technical aspects of computing
Ubuntu: an open source operating system
Wikimedia: non-profit foundation dedicated to bringing free content to the world, operating Wikipedia
XWiki: a web platform for developing collaborative applications using the wiki paradigm
Zulip: powerful, threaded open source group chat with apps for every major platform

These mentor organizations are hard at work creating thousands of tasks for students to work on, including code, documentation, user interface, quality assurance, outreach, research and training tasks. The contest officially starts for students on Tuesday, November 28th at 9:00am PST.

You can learn more about Google Code-in on the contest site where you’ll find Contest Rules, Frequently Asked Questions and Important Dates. There you’ll also find flyers and other helpful information including the Getting Started Guide. Our discussion mailing list is a great way to talk with other students, mentors and organization administrators about the contest.

By Josh Simmons, Google Open Source

Announcing OpenFermion: the open source chemistry package for quantum computers

Wednesday, October 25, 2017

Crossposted on the Google Research Blog

“The underlying physical laws necessary for the mathematical theory of a large part of physics and the whole of chemistry are thus completely known, and the difficulty is only that the exact application of these laws leads to equations much too complicated to be soluble.”
-Paul Dirac, Quantum Mechanics of Many-Electron Systems (1929)

In this passage, physicist Paul Dirac laments that while quantum mechanics accurately models all of chemistry, exactly simulating the associated equations appears intractably complicated. Not until 1982 would Richard Feynman suggest that instead of surrendering to the complexity of quantum mechanics, we might harness it as a computational resource. Hence, the original motivation for quantum computing: by operating a computer according to the laws of quantum mechanics, one could efficiently unravel exact simulations of nature. Such simulations could lead to breakthroughs in areas such as photovoltaics, batteries, new materials, pharmaceuticals and superconductivity. And while we do not yet have a quantum computer large enough to solve classically intractable problems in these areas, rapid progress is being made. Last year, Google published this paper detailing the first quantum computation of a molecule using a superconducting qubit quantum computer. Building on that work, the quantum computing group at IBM scaled the experiment to larger molecules, which made the cover of Nature last month.

Today, we announce the release of OpenFermion, the first open source platform for translating problems in chemistry and materials science into quantum circuits that can be executed on existing platforms. OpenFermion is a library for simulating the systems of interacting electrons (fermions) which give rise to the properties of matter. Prior to OpenFermion, quantum algorithm developers would need to learn a significant amount of chemistry and write a large amount of code hacking apart other codes to put together even the most basic quantum simulations. While the project began at Google, collaborators at ETH Zurich, Lawrence Berkeley National Labs, University of Michigan, Harvard University, Oxford University, Dartmouth College, Rigetti Computing and NASA all contributed to alpha releases. You can learn more details about this release in our paper, OpenFermion: The Electronic Structure Package for Quantum Computers.

One way to think of OpenFermion is as a tool for generating and compiling physics equations which describe chemical and material systems into representations which can be interpreted by a quantum computer¹. The most effective quantum algorithms for these problems build upon and extend the power of classical quantum chemistry packages used and developed by research chemists across government, industry and academia. Accordingly, we are also releasing OpenFermion-Psi4 and OpenFermion-PySCF which are plugins for using OpenFermion in conjunction with the classical electronic structure packages Psi4 and PySCF.

The core OpenFermion library is designed in a quantum programming framework agnostic way to ensure compatibility with various platforms being developed by the community. This allows OpenFermion to support external packages which compile quantum assembly language specifications for diverse hardware platforms. We hope this decision will help establish OpenFermion as a community standard for putting quantum chemistry on quantum computers. To see how OpenFermion is used with diverse quantum programming frameworks, take a look at OpenFermion-ProjectQ and Forest-OpenFermion - plugins which link OpenFermion to the externally developed circuit simulation and compilation platforms known as ProjectQ and Forest.

The following workflow describes how a quantum chemist might use OpenFermion in order to simulate the energy surface of a molecule (for instance, by preparing the sort of quantum computation we described in our past blog post):

The researcher initializes an OpenFermion calculation with specification of:

An input file specifying the coordinates of the nuclei in the molecule.
The basis set (e.g. cc-pVTZ) that should be used to discretize the molecule.
The charge and spin multiplicity (if known) of the system.

The researcher uses the OpenFermion-Psi4 plugin or the OpenFermion-PySCF plugin to perform scalable classical computations which are used to optimally stage the quantum computation. For instance, one might perform a classical Hartree-Fock calculation to choose a good initial state for the quantum simulation.
The researcher then specifies which electrons are most interesting to study on a quantum computer (known as an active space) and asks OpenFermion to map the equations for those electrons to a representation suitable for quantum bits, using one of the available procedures in OpenFermion, e.g. the Bravyi-Kitaev transformation.
The researcher selects a quantum algorithm to solve for the properties of interest and uses a quantum compilation framework such as OpenFermion-ProjectQ to output the quantum circuit in assembly language which can be run on a quantum computer. If the researcher has access to a quantum computer, they then execute the experiment.

A few examples of what one might do with OpenFermion are demonstrated in ipython notebooks here, here and here. While quantum simulation is widely recognized as one of the most important applications of quantum computing in the near term, very few quantum computer scientists know quantum chemistry and even fewer chemists know quantum computing. Our hope is that OpenFermion will help to close the gap between these communities and bring the power of quantum computing to chemists and material scientists. If you’re interested, please checkout our GitHub repository - pull requests welcome!

By Ryan Babbush and Jarrod McClean, Quantum Software Engineers, Quantum AI Team

1 If we may be allowed one sentence for the experts: the primary function of OpenFermion is to encode the electronic structure problem in second quantization defined by various basis sets and active spaces and then to transform those operators into spin Hamiltonians using various isomorphisms between qubit and fermion algebras.^↩

TensorFlow Lattice: Flexibility Empowered by Prior Knowledge

Wednesday, October 11, 2017

Crossposted on the Google Research Blog

Machine learning has made huge advances in many applications including natural language processing, computer vision and recommendation systems by capturing complex input/output relationships using highly flexible models. However, a remaining challenge is problems with semantically meaningful inputs that obey known global relationships, like “the estimated time to drive a road goes up if traffic is heavier, and all else is the same.” Flexible models like DNNs and random forests may not learn these relationships, and then may fail to generalize well to examples drawn from a different sampling distribution than the examples the model was trained on.

Today we present TensorFlow Lattice, a set of prebuilt TensorFlow Estimators that are easy to use, and TensorFlow operators to build your own lattice models. Lattices are multi-dimensional interpolated look-up tables (for more details, see [1--5]), similar to the look-up tables in the back of a geometry textbook that approximate a sine function. We take advantage of the look-up table’s structure, which can be keyed by multiple inputs to approximate an arbitrarily flexible relationship, to satisfy monotonic relationships that you specify in order to generalize better. That is, the look-up table values are trained to minimize the loss on the training examples, but in addition, adjacent values in the look-up table are constrained to increase along given directions of the input space, which makes the model outputs increase in those directions. Importantly, because they interpolate between the look-up table values, the lattice models are smooth and the predictions are bounded, which helps to avoid spurious large or small predictions in the testing time.

How Lattice Models Help You

Suppose you are designing a system to recommend nearby coffee shops to a user. You would like the model to learn, “if two cafes are the same, prefer the closer one.” Below we show a flexible model (pink) that accurately fits some training data for users in Tokyo (purple), where there are many coffee shops nearby. The pink flexible model overfits the noisy training examples, and misses the overall trend that a closer cafe is better. If you used this pink model to rank test examples from Texas (blue), where businesses are spread farther out, you would find it acted strangely, sometimes preferring farther cafes!

Slice through a model’s feature space where all the other inputs stay the same and only distance changes. A flexible function (pink) that is accurate on training examples from Tokyo (purple) predicts that a cafe 10km-away is better than the same cafe if it was 5km-away. This problem becomes more evident at test-time if the data distribution has shifted, as shown here with blue examples from Texas where cafes are spread out more.

A monotonic flexible function (green) is both accurate on training examples and can generalize for Texas examples compared to non-monotonic flexible function (pink) from the previous figure.

In contrast, a lattice model, trained over the same example from Tokyo, can be constrained to satisfy such a monotonic relationship and result in a monotonic flexible function (green). The green line also accurately fits the Tokyo training examples, but also generalizes well to Texas, never preferring farther cafes.

In general, you might have many inputs about each cafe, e.g., coffee quality, price, etc. Flexible models have a hard time capturing global relationships of the form, “if all other inputs are equal, nearer is better, ” especially in parts of the feature space where your training data is sparse and noisy. Machine learning models that capture prior knowledge (e.g. how inputs should impact the prediction) work better in practice, and are easier to debug and more interpretable.

Pre-built Estimators

We provide a range of lattice model architectures as TensorFlow Estimators. The simplest estimator we provide is the calibrated linear model, which learns the best 1-d transformation of each feature (using 1-d lattices), and then combines all the calibrated features linearly. This works well if the training dataset is very small, or there are no complex nonlinear input interactions. Another estimator is a calibrated lattice model. This model combines the calibrated features nonlinearly using a two-layer single lattice model, which can represent complex nonlinear interactions in your dataset. The calibrated lattice model is usually a good choice if you have 2-10 features, but for 10 or more features, we expect you will get the best results with an ensemble of calibrated lattices, which you can train using the pre-built ensemble architectures. Monotonic lattice ensembles can achieve 0.3% -- 0.5% accuracy gain compared to Random Forests [4], and these new TensorFlow lattice estimators can achieve 0.1 -- 0.4% accuracy gain compared to the prior state-of-the-art in learning models with monotonicity [5].

Build Your Own

You may want to experiment with deeper lattice networks or research using partial monotonic functions as part of a deep neural network or other TensorFlow architecture. We provide the building blocks: TensorFlow operators for calibrators, lattice interpolation, and monotonicity projections. For example, the figure below shows a 9-layer deep lattice network [5].

Example of a 9-layer deep lattice network architecture [5], alternating layers of linear embeddings and ensembles of lattices with calibrators layers (which act like a sum of ReLU’s in Neural Networks). The blue lines correspond to monotonic inputs, which is preserved layer-by-layer, and hence for the entire model. This and other arbitrary architectures can be constructed with TensorFlow Lattice because each layer is differentiable.

In addition to the choice of model flexibility and standard L1 and L2 regularization, we offer new regularizers with TensorFlow Lattice:

Monotonicity constraints [3] on your choice of inputs as described above.
Laplacian regularization [3] on the lattices to make the learned function flatter.
Torsion regularization [3] to suppress un-necessary nonlinear feature interactions.

We hope TensorFlow Lattice will be useful to the larger community working with meaningful semantic inputs. This is part of a larger research effort on interpretability and controlling machine learning models to satisfy policy goals, and enable practitioners to take advantage of their prior knowledge. We’re excited to share this with all of you. To get started, please check out our GitHub repository and our tutorials, and let us know what you think!

By Maya Gupta, Research Scientist, Jan Pfeifer, Software Engineer and Seungil You, Software Engineer

Acknowledgements

Developing and open sourcing TensorFlow Lattice was a huge team effort. We’d like to thank all the people involved: Andrew Cotter, Kevin Canini, David Ding, Mahdi Milani Fard, Yifei Feng, Josh Gordon, Kiril Gorovoy, Clemens Mewald, Taman Narayan, Alexandre Passos, Christine Robson, Serena Wang, Martin Wicke, Jarek Wilkiewicz, Sen Zhao, Tao Zhu

References

[1] Lattice Regression, Eric Garcia, Maya Gupta, Advances in Neural Information Processing Systems (NIPS), 2009
[2] Optimized Regression for Efficient Function Evaluation, Eric Garcia, Raman Arora, Maya R. Gupta, IEEE Transactions on Image Processing, 2012
[3] Monotonic Calibrated Interpolated Look-Up Tables, Maya Gupta, Andrew Cotter, Jan Pfeifer, Konstantin Voevodski, Kevin Canini, Alexander Mangylov, Wojciech Moczydlowski, Alexander van Esbroeck, Journal of Machine Learning Research (JMLR), 2016
[4] Fast and Flexible Monotonic Functions with Ensembles of Lattices, Mahdi Milani Fard, Kevin Canini, Andrew Cotter, Jan Pfeifer, Maya Gupta, Advances in Neural Information Processing Systems (NIPS), 2016
[5] Deep Lattice Networks and Partial Monotonic Functions, Seungil You, David Ding, Kevin Canini, Jan Pfeifer, Maya R. Gupta, Advances in Neural Information Processing Systems (NIPS), 2017

Google Code-in 2017 is seeking organization applications

Monday, October 9, 2017

We are now accepting applications for open source organizations who want to participate in Google Code-in 2017. Google Code-in, a global online contest for pre-university students ages 13-17, invites students to learn by contributing to open source software.

Working with young students is a special responsibility and each year we hear inspiring stories from mentors who participate. To ensure these new, young contributors have a great support system, we select organizations that have gained experience in mentoring students by previously taking part in Google Summer of Code.

Organizations must apply before Tuesday, October 24 at 16:00 UTC.

17 organizations were accepted last year, and over the last 7 years, 4,553 students from 99 different countries have completed more than 23,651 tasks for participating open source projects. Tasks fall into 5 categories:

Code: writing or refactoring
Documentation/Training: creating/editing documents and helping others learn more
Outreach/Research: community management, outreach/marketing, or studying problems and recommending solutions
Quality Assurance: testing and ensuring code is of high quality
User Interface: user experience research or user interface design and interaction

Once an organization is selected for Google Code-in 2017 they will define these tasks and recruit mentors who are interested in providing online support for students.

You can find a timeline, FAQ and other information about Google Code-in on our website. If you’re an educator interested in sharing Google Code-in with your students, you can find resources here.

By Josh Simmons, Google Open Source

Announcing more Open Source Peer Bonus winners

Tuesday, October 3, 2017

We’re excited to announce 2017’s second round of Open Source Peer Bonus winners. Google Open Source established this program six years ago to encourage Googlers to recognize and celebrate external contributors to the open source ecosystem Google depends on.

The Open Source Peer Bonus program works like this: Googlers nominate open source contributors outside of the company who deserve recognition for their contributions to open source projects, including those used by Google. Nominees are reviewed by a volunteer team of engineers and the winners receive our heartfelt thanks and a small token of our appreciation.

To date, we’ve recognized nearly 600 open source contributors from dozens of countries who have contributed their time and talent to more than 400 open source projects. You can find past winners in recent blog posts: Fall 2016, Spring 2017.

Without further ado, we’d like to recognize the latest round of winners and the projects they worked on. Here are the individuals who gave us permission to thank them publicly:

Name	Project	Name	Project
Mo Jangda	AMP Project	Eric Tang	Material Motion
Osvaldo Lopez	AMP Project	Nicholas Tollervey	micro:bit, Mu
Jason Jean	Angular CLI	Damien George	MicroPython
Henry Zhu	Babel	Tom Spilman	MonoGame
Oscar Boykin	Bazel Scala rules	Arthur Edge	NARKOZ/gitlab
Francesc Alted	Blosc	Sebastian Berg	NumPy
Matt Holt	Caddy	Bogdan-Andrei Iancu	OpenSIPS
Martijn Croonen	Chromium	Amit Ambasta	OR-tools
Raphael Costa	Chromium	Michael Powell	OR-tools
Mariatta Wijaya	CPython	Westbrook Johnson	Polymer
Victor Stinner	CPython	Marten Seemann	quic-go
Derek Parker	Delve	Fabian Henneke	Secure Shell
Thibaut Courouble	devdocs	Chris Fillmore	Shaka Player
David Lechner	ev3dev	Takeshi Komiya	Sphinx
Michael Niedermayer	FFmpeg	Dan Kennedy	SQLite
Mathew Huusko	Firebase	Joe Mistachkin	SQLite
Armin Ronacher	Flask	Richard Hipp	SQLite
Nenad Stojanovski	Forseti Security	Yuxin Wu	Tensorpack
Solly Ross	Heapster	Michael Herzog	three.js
Bjørn Erik Pedersen	Hugo	Takahiro Aoyagi	three.js
Brion Vibber	JS-Interpreter	Jelle Zijlstra	Typeshed
Xiaoyu Zhang	Kubernetes	Verónica López	Women Who Go
Anton Kachurin	Material Components for the Web

Thank you all so much for your contributions to the open source community and congratulations on being selected!

By Maria Webb, Google Open Source

Talk to Google at Node.js Interactive

Monday, October 2, 2017

We’re headed to Vancouver this week, with about 25 Googlers who are incredibly excited to attend Node.js Interactive. With a mix of folks working on Cloud, Chrome, and V8, we’re going to be giving demos and answering questions at the Google booth.

A few of us are also going to be giving talks. Here’s a list of the talks Googlers will be giving at the conference, ranging from serverless Slack bots to JavaScript performance tuning.

Wednesday, October 4th

Keynote: Franzi Hinkelmann
9:40am - 10:00am, West Ballroom A
Franzi Hinkelmann, Software Engineer @ Google

Franzi is located in Munich, Germany where she works at Google on Chrome V8. Franzi, like James and Anna, is a member of the Node.js Core Technical Committee. She speaks across the globe on the topic of JavaScript virtual machines. She has a PhD in mathematics, but left academia to follow her true passion: writing code.

Franzi will discuss her perspective on Chrome V8 in Node.js, and what the Chrome V8 team is doing to continue to support Node.js. Want to know what the future of browser development looks like? This is a must-attend keynote.

Functionality Abuse: The Forgotten Class of Attacks
12:20pm - 12:50pm, West Ballroom A
Nwokedi Idika, Software Engineer @ Google

If you were given a magic wand that would remove all implementation flaws from your web application, would it be free of security problems? If it took you more five seconds to say “No!” (or if, worse, you said “Yes!”), then you’re the target audience for this talk. If you’re in the target audience, don’t fret, much of the security community is there with you. After this talk, attendees will understand why the answer to the aforementioned question is an emphatic “No!” and they will learn an approach to decrease their chance of failing to consider an important vector of attack for their current and future web applications.

High Performance JS in V8
5:20pm - 5:50pm, West Ballroom A
Peter Marshall, Software Engineer @ Google

This year, V8 launched Ignition and Turbofan, the new compiler pipeline that handles all JavaScript code generation. Previously, achieving high performance in Node.js meant catering to the oddities of our now-deprecated Crankshaft compiler. This talk covers our new code generation architecture - what makes it special, a bit about how it works, and how to write high performance code for the new V8 pipeline.

Thursday, October 5th

New DevTools Features for JavaScript
11:40am - 12:10pm, West Meeting Room 122
Yang Guo, Software Engineer @ Google

Ever since v8-inspector was moved to V8's repository, we have been working on a number of new features for DevTools, usable for both Chrome and Node.js. The talk will demonstrate code coverage, type profiling, and give a deep dive into how evaluating a code snippet in DevTools console works in V8.

Understanding and Debugging Memory Leaks in Your Node.js Applications
12:20pm - 12:50pm, West Meeting Room 122
Ali Sheikh, Software Engineer @ Google

Memory leaks are hard. This talk with introduce developers to what memory leaks are, how they can exist in a garbage collected language, the available tooling that can help them understand and isolate memory leaks in their code. Specifically it will talk about heap snapshots, the new sampling heap profiler in V8, and other various other tools available in the ecosystem.

Workshop: Serverless Bots with Node.js
2:20pm - 4:10pm, West Meeting Room 117
Bret McGowen, Developer Advocate @ Google
Amir Shevat, DevRel @ Slack

This talk will show you how to build both voice and chat bots using serverless technologies. Amir Shevat, Head of Developer Relations at Slack, has overseen 17K+ bots deployed on the platform. He will present a maturing model, as best practices, for enterprise bots covering all sorts of use cases ranging for devops, HR, and marketing. Alan Ho from Google Cloud will then show you how to use various serverless technologies to build these bots. He’ll give you a demo of Slack and Google Assistant bots incorporating Google’s latest serverless technology including Edge (API Management), CloudFunctions (Serverless Compute), Cloud Datastore, and API.ai.

Modules Modules Modules
3:00pm - 3:30pm, West Meeting Room 120
Myles Borins, Developer Advocate @ Google

ES Modules and Common JS go together like old bay seasoning and vanilla ice cream. This talk will dig into the inconsistencies of the two patterns, and how the Node.js project is dealing with reconciling the problem. The talk will look at the history of modules in the JavaScript ecosystem and the subtle difference between them. It will also skim over how ECMA-262 is standardized by the TC39, and how ES Modules were developed.

Keynote: The case for Node.js
4:50pm - 5:05pm, West Ballroom A
Justin Beckwith, Product Manager @ Google

Node.js has had a transformational effect on the way we build software. However, convincing your organization to take a bet on Node.js can be difficult. My personal journey with Node.js has included convincing a few teams to take a bet on this technology, and this community. Let’s take a look at the case for Node.js we made at Google, and how you can make the case to bring it to your organization.

We can’t wait to see everyone and have some great conversations. Feel free to reach out to us on Twitter @googlecloud, or request an invite to the Google Cloud Slack community and join the #nodejs channel.

By Justin Beckwith, Languages and Runtimes Team

Introducing Abseil, a new common libraries project

Tuesday, September 26, 2017

Today we are open sourcing Abseil, a collection of libraries drawn from the most fundamental pieces of Google’s internal codebase. These libraries are the nuts-and-bolts that underpin almost everything that Google runs. Bits and pieces of these APIs are embedded in most of our open source projects, and now we have brought them together into one comprehensive project. Abseil encompasses the most basic building blocks of Google’s codebase: code that is production tested and will be fully maintained for years to come.

Our C++ code repository is available at: https://github.com/abseil/abseil-cpp

By adopting these new Apache-licensed libraries, you can reap the benefit of years (over a decade in many cases) of our design and optimization work in this space. Our past experience is baked in.

Just as interesting, we’ve also prepared for the future: several types in Abseil’s C++ libraries are “pre-adoption” versions of C++17 types like string_view and optional - implemented in C++11 to the greatest extent possible. We look forward to moving more and more of our code to match the current standard, and using these new vocabulary types helps us make that transition. Importantly, in C++17 mode these types are merely aliases to the standard, ensuring that you only ever have one type for optional or string_view in a project at a time. Put another way: Abseil is focused on the engineering task of providing APIs that remain stable over time.

Consisting of the foundational C++ and Python code at Google, Abseil includes libraries that will grow to underpin other Google-backed open source projects like gRPC, Protobuf and TensorFlow. We love those projects, and we love the users of those projects - we want to ensure smooth usage for these things over time. In the next few months we’ll introduce new distribution methods to incorporate these projects as a collection into your project.

Continuing with the “over time” theme, Abseil aims for compatibility with major compilers, platforms and standard libraries for approximately 5 years. Our 5-year target also applies to language version: we assume everyone builds with C++11 at this point. (In 2019 we’ll start talking about requiring C++14 as our base language version.) This 5-year horizon is part of our balance between “support everything” and “provide modern implementations and APIs.”

Highlights of the initial release include:

Zero configuration: most platforms (OS, compiler, architecture) should just work.
Pre-adoption for C++17 types: string_view, optional, any. We’ll follow up with variant soon.
Our primary synchronization type, absl::Mutex, has an elegant interface and has been extensively optimized.
Efficient support for handling time: absl::Time and absl::Duration are conceptually similar to std::chrono types, but are concrete (not class templates) and have defined behavior in all cases. Additionally, our clock-sampling API absl::Now() is more heavily optimized than most standard library calls for std::chrono::system_clock::now().
String handling routines: among internal users, we’ve been told that releasing absl::StrCat(), absl::StrJoin(), and absl::StrSplit() would itself be a big improvement for the open source C++ world.

The project has support for C++ and some Python. Over time we’ll tie those two projects together more closely with shared logging and command-line flag infrastructure. To start contributing, please see our contribution guidelines and fork us on GitHub. Check out our documentation and community page for information on how to contact us, ask questions or contribute to Abseil.

By Titus Winters, Abseil Lead