Free Universal Sound Separation

Thursday, April 9, 2020

We are happy to announce the release of FUSS: the Free Universal Sound Separation dataset.

Audio recordings often contain a mixture of different sound sources; Universal sound separation is the ability to separate such a mixture into its component sounds, regardless of the types of sound present. Previously, sound separation work has focused on separating mixtures of a small number of sound types, such as "speech" versus "nonspeech", or different instances of the same type of sound, such as speaker #1 versus speaker #2. Often in such work, the number of sounds in a mixture is also assumed to be known a priori. The FUSS dataset shifts focus to the more general problem of separating a variable number of arbitrary sounds from one another.

One major hurdle to training models in this domain is that even if you have high-quality recordings of sound mixtures, you can't easily annotate these recordings with ground truth. High-quality simulation is one approach to overcome this limitation. To achieve good results, you need a diverse set of sounds, a realistic room simulator, and code to mix these elements together for realistic, multi-source, multi-class audio with ground truth. With FUSS, we are releasing all three of these.

FUSS relies on Creative Commons licensed audio clips from We filtered these by license type, then using a pre-release of FSD50k [1], further filtered out sounds that aren't separable by humans when mixed together. We were left with about 23 hours of audio, consisting of 12,377 sounds useful for mixing (7,237 train, 2,883 validation, 2,257 eval). Using these clips, we created 20,000 training mixtures, 1,000 validation mixtures, and 1,000 eval mixtures.

We developed our own room simulator implemented in tensorflow, which generates the impulse response of a box shaped room with frequency-dependent reflective properties given a sound source location and a mic location. As part of the dataset release, we provide pre-calculated room impulse responses used for each audio sample along with mixing code, so the research community can simulate novel audio without running the computationally expensive room simulator. Future work may include releasing the code for our room simulator and extending the simulator capabilities to address more extensive acoustic properties of rooms, materials with different reflective properties, novel room shapes, etc.

Finally, we have released a masking-based separation model, based on an improved time-domain convolutional network (TDCN++), described in our recent publications [2, 3]. On the eval set, this model achieves 12.5 dB of scale-invariant signal-to-noise ratio improvement (SI-SNRi) on mixtures with two to four sources, while reconstructing single-source mixtures with 37.6 dB absolute SI-SNR.

Source audio, reverb impulse responses, reverberated mixtures and sources created by the mixing code, and a baseline model checkpoint are available for download. Code for reverberating and mixing the audio data and for training the released model is available on our github page.

The dataset will also be used in the DCASE challenge, as a component of the Sound Event Detection and Separation task. The released model will serve as a baseline for this competition, and a benchmark to demonstrate progress against in future experiments.

Our hope is this dataset will lower the barrier to new research, and particularly will allow for fast iteration and application of novel techniques from other machine learning domains to the sound separation challenge.

By John Hershey, Scott Wisdom, and Hakan Erdogan, Google Research

[1] Eduardo Fonseca, Jordi Pons, Xavier Favory, Frederic Font Corbera, Dmitry Bogdanov, Andrés Ferraro, Sergio Oramas, Alastair Porter, and Xavier Serra. "Freesound Datasets: A Platform for the Creation of Open Audio Datasets." International Society for Music Information Retrieval Conference (ISMIR), pp. 486–493. Suzhou, China, 2017.
[2] Ilya Kavalerov, Scott Wisdom, Hakan Erdogan, Brian Patton, Kevin Wilson, Jonathan Le Roux, and John R. Hershey. "Universal Sound Separation." IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 175-179. New Paltz, NY, USA, 2019.
[3] Efthymios Tzinis, Scott Wisdom, John R. Hershey, Aren Jansen, and Daniel P. W. Ellis. "Improving Universal Sound Separation Using Sound Classification." IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2020.

Code Search for Google open source projects

Wednesday, April 1, 2020

We are pleased to launch Code Search for Google open source projects. Code Search is one of Google’s most popular internal tools, and now we have a version (same binary, different flags) targeted to open source communities.

Googlers use Code Search every day to help understand the codebase: they search for half-remembered functions and usages; jump through the codebase to figure out what calls the function they are viewing; and try to identify when and why a particular line of code changed.

The Code Search tool gives a rich code browsing experience. For example, the blame button shows which user last changed each line and you can display history on the same page as the file contents. In addition, it supports a powerful search language and, for some repositories, cross-references.

Suggest-as-you-type in any search box annotates suggestions with the type of code object, the repository and the path, helping users find what they want faster.

The search language supports regular expressions and a number of helpful search atoms. For a user looking for a function foo in a Go file, instead of sifting through thousands of results containing foo, the user can search for lang:go function:foo to limit search results to Go files where foo is a function and not a struct or a word in a comment.

One example is finding a file using only part of the name. The query goes directly to the file, since there is only one such file.

See the quick reference for more information.

In addition to text search, some of the open source repositories have cross-references powered by Kythe. Kythe is a Google open source project that includes tools to help understand code. Project owners instrument a build of their repository to output compilation information for Kythe. Kythe tools convert this data to a graph. This graph connects definitions to declarations and code references to the abstract objects they represent (described by a graph schema). Google then runs an internal pipeline that combines these graphs for the different languages, prunes unnecessary pieces, and optimizes it for serving cross-references. The whole process runs several times per day to keep the data fresh.

Open source communities use a broader set of build systems than Google. In order to support cross-references, Kythe added drop-in support for Bazel, CMake, Maven, and Go. Projects using other build systems can use Kythe-provided wrappers for clang and javac to instrument their builds; these are used by Chromium and Android AOSP to provide compilation information for Kythe.

Because Kythe is based on the build, Kythe cross-references include links to files generated as part of the build process, such as Java files generated for AutoValues (example here) or protos. For repositories where cross-references are enabled, clicking on a symbol will take you to a definition of that symbol.

Clicking on the definition of a symbol will open a cross-reference panel, showing all the places where that symbol is referenced. For example, clicking on toVName below, we can see the places that reference this method. One of the callers is parseVName, and clicking on that shows the callers of that method.

At this time, we only provide search on the repositories listed below, but we plan to add more over time:
We are also investing in making the application keyboard-navigable and usable with a screen reader.

We hope you find this tool useful!

By Kris Hildrum, Code Search Team

Kpt: Packaging up your Kubernetes configuration with git and YAML since 2014

Tuesday, March 31, 2020

Kubernetes configuration manifests have become an industry standard for deploying both custom and off-the-shelf applications (as well as for infrastructure). Manifests are combined into bundles to create higher-level deployable systems as well as reusable blueprints (such as a product offering, off the shelf software, or customizable starting point for a new application).

However, most teams lack the expertise or desire to create bespoke bundles of configuration from scratch and instead: 1) either fork them from another bundle, or 2) use some packaging solution which generates manifests from code.

Teams quickly discover they need to customize, validate, audit and re-publish their forked/ generated bundles for their environment. Most packaging solutions to date are tightly coupled to some format written as code (e.g. templates, DSLs, etc). This introduces a number of challenges when trying to extend, build on top of, or integrate them with other systems. For example, how does one update a forked template from upstream, or how does one apply custom validation?

Packaging is the foundation of building reusable components, but it also incurs a productivity tax on the users of those components.

Today we’d like to introduce kpt, an OSS tool for Kubernetes packaging, which uses a standard format to bundle, publish, customize, update, and apply configuration manifests.

Kpt is built around an “as data” architecture bundling Kubernetes resource configuration, a format for both humans and machines. The ability for tools to read and write the package contents using standardized data structures enables powerful new capabilities:
  • Any existing directory in a Git repo with configuration files can be used as a kpt package.
  • Packages can be arbitrarily customized and later pull in updates from upstream by merging them.
  • Tools and automation can perform high-level operations by transforming and validating package data on behalf of users or systems.
  • Organizations can develop their own tools and automation which operate against the package data.
  • Existing tools and automation that work with resource configuration “just work” with kpt.
  • Existing solutions that generate configuration (e.g. from templates or DSLs) can emit kpt packages which enable the above capabilities for them.

Example workflow with kpt

Now that we’ve established the benefits of using kpt for managing your packages of Kubernetes config, lets walk through how an enterprise might leverage kpt to package, share and use their best practices for Kubernetes across the organization.

First, a team within the organization may build and contribute to a repository of best practices (pictured in blue) for managing a certain type of application, for example a microservice (called “app”). As the best practices are developed within an organization, downstream teams will want to consume and modify configuration blueprints based on them. These blueprints provide a blessed starting point which adheres to organization policies and conventions.

The downstream team will get their own copy of a package by downloading it to their local filesystem (pictured in red) using kpt pkg get. This clones the git subdirectory, recording upstream metadata so that it can be updated later.

They may decide to update the number of replicas to fit their scaling requirements or may need to alter part of the image field to be the image name for their app. They can directly modify the configuration using a text editor (as would be done before). Alternatively, the package may define setters, allowing fields to be set programmatically using kpt cfg set. Setters streamline workflows by providing user and automation friendly commands to perform common operations.

Once the modifications have been made to the local filesystem, the team will commit and push their package to an app repository owned by them. From there, a CI/CD pipeline will kick off and the deployment process will begin. As a final customization before the package is deployed to the cluster, the CI/CD pipeline will inject the digest of the image it just built into the image field (using kpt cfg set). When the image digest has been set, the CI/CD pipeline can send the manifests to the cluster using kpt live apply. Kpt live operates like kubectl apply, providing additional functionality to prune resources deleted from the configuration and block on rollout completion (reporting status of the rollout back to the user).

Now that we’ve walked through how you might use kpt in your organization, we’d love it if you’d try it out, read the docs, or contribute.

One more thing

There’s still a lot to the story we didn’t cover here. Expect to hear more from us about:
  • Using kpt with GitOps
  • Building custom logic with functions
  • Writing effective blueprints with kpt and kustomize
By Phillip Wittrock, Software Engineer and Vic Iglesias, Cloud Solutions Architect