Supercharge your Computer Vision models with the TensorFlow Object Detection API

Thursday, June 15, 2017

Crossposted on the Google Research Blog

At Google, we develop flexible state-of-the-art machine learning (ML) systems for computer vision that not only can be used to improve our products and services, but also spur progress in the research community. Creating accurate ML models capable of localizing and identifying multiple objects in a single image remains a core challenge in the field, and we invest a significant amount of time training and experimenting with these systems.
Detected objects in a sample image (from the COCO dataset) made by one of our models.
Image credit: Michael Miley, original image
Last October, our in-house object detection system achieved new state-of-the-art results, and placed first in the COCO detection challenge. Since then, this system has generated results for a number of research publications1,2,3,4,5,6,7 and has been put to work in Google products such as NestCam, the similar items and style ideas feature in Image Search and street number and name detection in Street View.

Today we are happy to make this system available to the broader research community via the TensorFlow Object Detection API. This codebase is an open source framework built on top of TensorFlow that makes it easy to construct, train and deploy object detection models.  Our goals in designing this system was to support state-of-the-art models while allowing for rapid exploration and research.  Our first release contains the following:
The SSD models that use MobileNet are lightweight, so that they can be comfortably run in real time on mobile devices. Our winning COCO submission in 2016 used an ensemble of the Faster RCNN models, which are are more computationally intensive but significantly more accurate.  For more details on the performance of these models, see our CVPR 2017 paper.

Are you ready to get started?
We’ve certainly found this code to be useful for our computer vision needs, and we hope that you will as well.  Contributions to the codebase are welcome and please stay tuned for our own further updates to the framework. To get started, download the code here and try detecting objects in some of your own images using the Jupyter notebook, or training your own pet detector on Cloud ML engine!

By Jonathan Huang, Research Scientist and Vivek Rathod, Software Engineer

The release of the Tensorflow Object Detection API and the pre-trained model zoo has been the result of widespread collaboration among Google researchers with feedback and testing from product groups. In particular we want to highlight the contributions of the following individuals:

Core Contributors: Derek Chow, Chen Sun, Menglong Zhu, Matthew Tang, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, Jasper Uijlings, Viacheslav Kovalevskyi, Kevin Murphy

Also special thanks to: Andrew Howard, Rahul Sukthankar, Vittorio Ferrari, Tom Duerig, Chuck Rosenberg, Hartwig Adam, Jing Jing Long, Victor Gomes, George Papandreou, Tyler Zhu

  1. Speed/accuracy trade-offs for modern convolutional object detectors, Huang et al., CVPR 2017 (paper describing this framework)
  2. Towards Accurate Multi-person Pose Estimation in the Wild, Papandreou et al., CVPR 2017
  3. YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video, Real et al., CVPR 2017 (see also our blog post)
  4. Beyond Skip Connections: Top-Down Modulation for Object Detection, Shrivastava et al., arXiv preprint arXiv:1612.06851, 2016
  5. Spatially Adaptive Computation Time for Residual Networks, Figurnov et al., CVPR 2017
  6. AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions, Gu et al., arXiv preprint arXiv:1705.08421, 2017
  7. MobileNets: Efficient convolutional neural networks for mobile vision applications, Howard et al., arXiv preprint arXiv:1704.04861, 2017

MobileNets: Open Source Models for Efficient On-Device Vision

Wednesday, June 14, 2017

Crossposted on the Google Research Blog

Deep learning has fueled tremendous progress in the field of computer vision in recent years, with neural networks repeatedly pushing the frontier of visual recognition technology. While many of those technologies such as object, landmark, logo and text recognition are provided for internet-connected devices through the Cloud Vision API, we believe that the ever-increasing computational power of mobile devices can enable the delivery of these technologies into the hands of our users, anytime, anywhere, regardless of internet connection. However, visual recognition for on device and embedded applications poses many challenges — models must run quickly with high accuracy in a resource-constrained environment making use of limited computation, power and space.

Today we are pleased to announce the release of MobileNets, a family of mobile-first computer vision models for TensorFlow, designed to effectively maximize accuracy while being mindful of the restricted resources for an on-device or embedded application. MobileNets are small, low-latency, low-power models parameterized to meet the resource constraints of a variety of use cases. They can be built upon for classification, detection, embeddings and segmentation similar to how other popular large scale models, such as Inception, are used.
Example use cases include detection, fine-grain classification, attributes and geo-localization.
This release contains the model definition for MobileNets in TensorFlow using TF-Slim, as well as 16 pre-trained ImageNet classification checkpoints for use in mobile projects of all sizes. The models can be run efficiently on mobile devices with TensorFlow Mobile.
Model Checkpoint
Million MACs
Million Parameters
Top-1 Accuracy
Top-5 Accuracy
Choose the right MobileNet model to fit your latency and size budget. The size of the network in memory and on disk is proportional to the number of parameters. The latency and power usage of the network scales with the number of Multiply-Accumulates (MACs) which measures the number of fused Multiplication and Addition operations. Top-1 and Top-5 accuracies are measured on the ILSVRC dataset.
We are excited to share MobileNets with the open source community. Information for getting started can be found at the TensorFlow-Slim Image Classification Library. To learn how to run models on-device please go to TensorFlow Mobile. You can read more about the technical details of MobileNets in our paper, MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.

By Andrew G. Howard, Senior Software Engineer and Menglong Zhu, Software Engineer

MobileNets were made possible with the hard work of many engineers and researchers throughout Google. Specifically we would like to thank:

Core Contributors: Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam

Special thanks to: Benoit Jacob, Skirmantas Kligys, George Papandreou, Liang-Chieh Chen, Derek Chow, Sergio Guadarrama, Jonathan Huang, Andre Hentz, Pete Warden

Google Summer of Code 2017 statistics part 2

Tuesday, June 6, 2017

Now that Google Summer of Code (GSoC) 2017 is under way with students in their first full week of the coding period we wanted to bring you some more statistics on the 2017 program. Lots and lots of numbers follow:


Students are working with 201 organizations (the most we’ve ever had!) of which 39 are participating in GSoC for the first time.

Student Registrations

Over 20,651 students from 144 countries registered for the program, which is an 8.8% increase over the previous high for the program.

Project Proposals

4,764 students from 108 countries submitted a total of 7,089 project proposals.

Gender breakdown

11.4% of accepted students are women. We are always interested in making our programs and open source more inclusive. Please contact us if you know of organizations we should work with to spread the word about GSoC to underrepresented groups.


The 1,318 students accepted into the GSoC 2017 program hailed from 575 universities, of which 142 have students participating for the first time in GSoC.

Top 10 schools by students accepted for GSoC 2017 

University Name Country Accepted Students
International Institute of Information Technology, Hyderabad India 39
Birla Institute of Technology and Science, Pilani (BITS Pilani) India 37
Indian Institute of Technology, Kharagpur India 31
University of Moratuwa Sri Lanka 24
Delhi Technological University India 23
Birla Institute of Technology and Science Pilani, Goa Campus India 18
Indian Institute of Technology, Roorkee India 18
Indian Institute of Technology, Bombay India 15
LNM Institute of Information Technology India 15
TU Munich/Technische Universität München Germany 14

Another post with stats on our GSoC mentors will be coming soon!

Stephanie Taylor, Google Open Source