opensource.google.com

Menu

Posts from June 2019

Introducing TensorNetwork, an Open Source Library for Efficient Tensor Calculations

Friday, June 7, 2019

Originally posted on the Google AI Blog.

Many of the world's toughest scientific challenges, like developing high-temperature superconductors and understanding the true nature of space and time, involve dealing with the complexity of quantum systems. What makes these challenges difficult is that the number of quantum states in these systems is exponentially large, making brute-force computation infeasible. To deal with this, data structures called tensor networks are used. Tensor networks let one focus on the quantum states that are most relevant for real-world problems—the states of low energy, say—while ignoring other states that aren't relevant. Tensor networks are also increasingly finding applications in machine learning (ML). However, there remain difficulties that prohibit them from widespread use in the ML community: 1) a production-level tensor network library for accelerated hardware has not been available to run tensor network algorithms at scale, and 2) most of the tensor network literature is geared toward physics applications and creates the false impression that expertise in quantum mechanics is required to understand the algorithms.

In order to address these issues, we are releasing TensorNetwork, a brand new open source library to improve the efficiency of tensor calculations, developed in collaboration with the Perimeter Institute for Theoretical Physics and X. TensorNetwork uses TensorFlow as a backend and is optimized for GPU processing, which can enable speedups of up to 100x when compared to work on a CPU. We introduce TensorNetwork in a series of papers, the first of which presents the new library and its API, and provides an overview of tensor networks for a non-physics audience. In our second paper we focus on a particular use case in physics, demonstrating the speedup that one gets using GPUs.

How are Tensor Networks Useful?

Tensors are multidimensional arrays, categorized in a hierarchy according to their order: e.g., an ordinary number is a tensor of order zero (also known as a scalar), a vector is an order-one tensor, a matrix is an order-two tensorDiagrammatic notation for tensors. and so on. While low-order tensors can easily be represented by an explicit array of numbers or with a mathematical symbol such as Tijnklm (where the number of indices represents the order of the tensor), that notation becomes very cumbersome once we start talking about high-order tensors. At that point it's useful to start using diagrammatic notation, where one simply draws a circle (or some other shape) with a number of lines, or legs, coming out of it—the number of legs being the same as the order of the tensor. In this notation, a scalar is just a circle, a vector has a single leg, a matrix has two legs, etc. Each leg of the tensor also has a dimension, which is the size of that leg. For example, a vector representing an object's velocity through space would be a three-dimensional, order-one tensor.
Diagrammatic notation for tensors.
The benefit of representing tensors in this way is to succinctly encode mathematical operations, e.g., multiplying a matrix by a vector to produce another vector, or multiplying two vectors to make a scalar. These are all examples of a more general concept called tensor contraction.
Diagrammatic notation for tensor contraction. Vector and matrix multiplication, as well as the matrix trace (i.e., the sum of the diagonal elements of a matrix), are all examples.
These are also simple examples of tensor networks, which are graphical ways of encoding the pattern of tensor contractions of several constituent tensors to form a new one. Each constituent tensor has an order determined by its own number of legs. Legs that are connected, forming an edge in the diagram, represent contraction, while the number of remaining dangling legs determines the order of the resultant tensor.
Left: The trace of the product of four matrices, tr(ABCD), which is a scalar. You can see that it has no dangling legs. Right: Three order-three tensors being contracted with three legs dangling, resulting in a new order-three tensor.
While these examples are very simple, the tensor networks of interest often represent hundreds of tensors contracted in a variety of ways. Describing such a thing would be very obscure using traditional notation, which is why the diagrammatic notation was invented by Roger Penrose in 1971.

Tensor Networks in Practice

Consider a collection of black-and-white images, each of which can be thought of as a list of N pixel values. A single pixel of a single image can be one-hot-encoded into a two-dimensional vector, and by combining these pixel encodings together we can make a 2^N-dimensional one-hot encoding of the entire image. We can reshape that high-dimensional vector into an order-N tensor, and then add up all of the tensors in our collection of images to get a total tensor Ti1,i2,...,iN encapsulating the collection.
This sounds like a very wasteful thing to do: encoding images with about 50 pixels in this way would already take petabytes of memory. That's where tensor networks come in. Rather than storing or manipulating the tensor T directly, we instead represent T as the contraction of many smaller constituent tensors in the shape of a tensor network. That turns out to be much more efficient. For instance, the popular matrix product state (MPS) network would write T in terms of N much smaller tensors, so that the total number of parameters is only linear in N, rather than exponential.
The high-order tensor T is represented in terms of many low-order tensors in a matrix product state tensor network.
It's not obvious that large tensor networks can be efficiently created or manipulated while consistently avoiding the need for a huge amount of memory. But it turns out that this is possible in many cases, which is why tensor networks have been used extensively in quantum physics and, now, in machine learning. Stoudenmire and Schwab used the encoding just described to make an image classification model, demonstrating a new use for tensor networks. The TensorNetwork library is designed to facilitate exactly that kind of work, and our first paper describes how the library functions for general tensor network manipulations.

Performance in Physics Use-Cases

TensorNetwork is a general-purpose library for tensor network algorithms, and so it should prove useful for physicists as well. Approximating quantum states is a typical use-case for tensor networks in physics, and is well-suited to illustrate the capabilities of the TensorNetwork library. In our second paper, we describe a tree tensor network (TTN) algorithm for approximating the ground state of either a periodic quantum spin chain (1D) or a lattice model on a thin torus (2D), and implement the algorithm using TensorNetwork. We compare the use of CPUs with GPUs and observe significant computational speed-ups, up to a factor of 100, when using a GPU and the TensorNetwork library.
Computational time as a function of the bond dimension, χ. The bond dimension determines the size of the constituent tensors of the tensor network. A larger bond dimension means the tensor network is more powerful, but requires more computational resources to manipulate.

Conclusion and Future Work

These are the first in a series of planned papers to illustrate the power of TensorNetwork in real-world applications. In our next paper we will use TensorNetwork to classify images in the MNIST and Fashion-MNIST datasets. Future plans include time series analysis on the ML side, and quantum circuit simulation on the physics side. With the open source community, we are also always adding new features to TensorNetwork itself. We hope that TensorNetwork will become a valuable tool for physicists and machine learning practitioners.

Acknowledgements

The TensorNetwork library was developed by Chase Roberts, Adam Zalcman, and Bruce Fontaine of Google AI; Ashley Milsted, Martin Ganahl, and Guifre Vidal of the Perimeter Institute; and Jack Hidary and Stefan Leichenauer of X. We'd also like to thank Stavros Efthymiou at X for valuable contributions.

by Chase Roberts, Research Engineer, Google AI and Stefan Leichenauer, Research Scientist, X 

Software Community Growth Through “first-timers-only” Issues

Wednesday, June 5, 2019

“first-timers-only issues are those which are written in a very engaging, welcoming way, far different than the usual ‘just report the bug’ type of GitHub issue. To read more about these, check out firsttimersonly.com, which really captures how and why this works and is beginning to be a movement in open source coding outreach! Beyond the extra welcome, this also includes getting such well-formatted issues out in front of lots of people who may be contributing to open source software for the very first time. 
It takes a LOT of work to make a good issue of this type, and we often walk through each step required to actually make the requested changes – the point is to help newcomers understand that a) they're welcome, and b) what the collaboration workflow looks like. Read more at https://publiclab.org/software-outreach!”
Since early 2016, we at Public Lab have been working to make our open source software projects more welcoming and inclusive and to grow our software contributor community in diversity and size. Our adoption of the strategy of posting first-timers-only (FTO) issues was also started at Public Lab near the end of 2016:https://publiclab.org/notes/warren/10-31-2016/create-a-welcoming-first-timers-only-issue-to-invite-new-software-contributors

During March and April, as GSoC, Outreachy, and other outreach programs were seeking proposals for the upcoming summer, we put a lot of extra time and work into welcoming newcomers into our community and making sure they are well-supported. We've seen a huge increase in newcomers and wanted to report in about how this process has scaled!

Through the end of March, nearly 409 FTO issues had been created across our projects, which shows how many people have been welcomed into Open Source 🌐 and in our community, by the collaborative efforts.

From March 9, 2019, we started maintaining a list of people who want to work on various projects of Public Lab - https://github.com/publiclab/plots2/issues/4963 - through first-timers-only issues. And, we are proud to announce that over 20 days at the end of March, the Public Lab community created 55 FTO issues i.e., 13% of total Public Lab FTO issues (for all time) were created during this 20 day period.

The idea of maintaining a list of FTO issue-seekers has been a big success and has helped coordinate and streamline the process. We were able to assign issues to nearly 50 contributors in just 20 days. And, each day the list is growing and we are opening more and more FTO issues for helping new contributors in taking their first-step in Open Source with Public Lab.

The credit for this tremendous growth goes to whole Public Lab reviewers team who ensured that every newcomer gets an FTO issue and also supported each newcomer in making their first contribution.

What makes this especially cool is that many of the FTO creators were people who had just recently completed their own FTO — then turned around and welcomed someone else into the community. This really highlights how newcomers have special insight into how important it is to welcome people properly, to support them in their first steps, and how making this process a core responsibility of our community has worked well.

Thanks to everyone, for the great work and cheers to this awesome community growth 🎉 🥂 💯

By Gaurav Sachdeva with input from Jeffrey Warren, Public Lab


Easier and More Powerful Observability with the OpenCensus Agent and Collector

Tuesday, June 4, 2019

The OpenCensus project has grown considerably over the past year, and we’ve had several significant announcements in the first half of 2019, including the project’s merger into OpenTelemetry. The features discussed in this post will move into OpenTelemetry over the coming months.

For those who aren’t already familiar with the project, OpenCensus provides a set of libraries that collect distributed traces and application metrics from your applications and send them to your backend of choice. Google announced OpenCensus one year ago, and the community has since grown to include large companies, major cloud providers, and APM vendors. OpenCensus includes integrations with popular web, RPC, and storage clients out of the box, along with exporters that allow it to send captured telemetry to your backend of choice.

We’ve recently enhanced OpenCensus with two new components: an optional agent that can manage exporters and collect system telemetry from VMs and containers, and an optional collector service that offers improved trace sampling and metrics aggregation. While we’ve already demonstrated these components on stage at Kubecon and Next, we’re now ready to show them more broadly and encourage their use.


The OpenCensus Agent

The OpenCensus agent is an optional component that can be deployed to each of your virtual machines or kubernetes pods. The agent receives traces and metrics from OpenCensus libraries, collects system telemetry, and can even capture data from a variety of different sources including Zipkin, Jaeger, and Prometheus. The agent has several benefits over exporting telemetry directly from the OpenCensus libraries:
  • The OpenCensus libraries export data to the OpenCensus agent by default, meaning that backend exporters can be reconfigured on the agent without having to rebuild or redeploy your application. This provides value for applications with high deployment costs and for PaaS-like platforms that have OpenCensus already integrated.
  • While the OpenCensus libraries collect application-level telemetry, the agent also captures system metrics like CPU and memory consumption and exports these to your backend of choice.
  • You can configure stats aggregations without redeploying your application.
  • The OpenCensus agent will host z-pages. While we originally made these a part of the libraries, we’ll be moving this functionality to the agent. This should result in a higher quality diagnostic page experience, as we’ll no longer have to reimplement the pages in each language.
  • The OpenCensus agent uses the same exporters already written for the Go OpenCensus library.
While directly exporting to a backend will remain a common use case for developers, we expect most larger deployments to start using the OpenCensus agent. The agent is currently in beta and we expect it to reach production ready quality and feature completeness later this year.

The OpenCensus Collector

The OpenCensus collector and agent share the same source code and are both optional – the key difference is how they’re deployed. While the agent runs on each host, the collector is deployed as a service and receives data from OpenCensus libraries and agents running across multiple hosts. The collector has several benefits over exporting telemetry directly from the OpenCensus libraries or agent:
  • Intelligent (tail based) trace sampling is one of the biggest benefits of the collector. By configuring your OpenCensus libraries to sample all traces and send all spans to the collector, you can have the collector perform sampling decisions after the fact! For example, you can configure the collector to sample the slowest one percent of traces at 30%, traces with errors at 100%, and all other traces at 1%!
  • The collector performs well and can be sharded across multiple instances. Current performance scales linearly across cores, allowing 10,000 spans to be collected per 1.2 cores.
  • The collector can be used as a proxy for other telemetry sources. In addition to receiving data from OpenCensus libraries and agents, Zipkin clients, Jaeger clients, and Prometheus clients, the service can be used to receive telemetry from client applications running on the web or on mobile devices.
  • The collector will soon host z-pages for your entire cluster. This is simply an expansion of the z-page functionality that we’ve added to the OpenCensus agent.
  • The collector can be used to apply telemetry policies across your entire application including adding span attributes such as region to all spans received, stripping personally identifiable information by removing or overwriting span attributes, or mapping span attributes to different names.

When to Use Each

As mentioned above, both the agent and collector are optional, and we expect that some developers will continue to export traces and metrics directly from the OpenCensus libraries. However, we expect both to become quite popular in the following scenarios:
  • Many organizations don’t want to have to rebuild and redeploy their apps when they change exporters. The agent provides the flexibility to change exporters without having to modify and redeploy your code.
  • With the OpenCensus agent you can capture system metrics via the same pipeline used to extract application metrics and distributed traces from your application.
  • If you want to make trace sampling decisions more intelligently, you’ll need to start using the collector.
  • With the OpenCensus collector you can minimize egress points and support features including batching, queuing and retry. These features are important when sending tracing and metric data to SaaS-based backends.
  • Platform providers can include the OpenCensus agent and collector into their services, making them available out of the box to customers.

Status

Both the agent and collector are currently in beta, though we know that several companies are already using them in their production services. We’re working towards the 1.0 release of each of these, and we expect this to occur by the end of Q2.

In the meantime, please join us on GitHub, Gitter, and in our monthly community meetings!
.