Google Open Source Blog: October 2016

Posts from October 2016

Using TensorFlow and JupyterHub in Classrooms

Monday, October 31, 2016

We’ve published a new solution and a companion GitHub repository that guides you through setting up a Google Container Engine cluster to run JupyterHub to automatically provision secure Jupyter containers for each user in a classroom or team. Don’t let the title of this article mislead you, not only does it use TensorFlow and JupyterHub, it’s actually an open source and cloud smorgasbord based on the Jupyter and Kubernetes platforms.

Jupyter is a powerful open source technology that gives you a platform to write and execute code to analyze, visualize and share the discoveries you find in your big data set. You can download a number of different Docker images preconfigured with many different notebook extensions and software packages to help you on any kind of data-science quest.

If you’re exploring on your own, and really want to get started quickly, you can get this all running on your local computer, but what if you want to take your expertise and lead a classroom of people along the same path? You have to either configure everything for them or walk them through configuring their own machines with all the required software.

This is where JupyterHub comes in, as a management layer in front of Jupyter instances, allowing you to configure users, using custom authentication, and giving you a Python interface to spawn new Jupyter instances for each user. Even with JupyterHub, you still need a way to provision physical and virtual hardware for the students.

Enter Kubernetes, an open source system for automating deploying, scaling and managing containerized applications. Google Container Engine is a fully managed service based on Kubernetes, allowing you to create clusters easily on Google Cloud Platform.

This solution comes with a JupyterHub Spawner class that allows it to create Kubernetes Pods, which are Docker images running Jupyter, for each user. It also comes with all the automation scripts required to create a Container Engine cluster and let you easily customize your setup.

When your students log into JupyterHub using Google OAuth2, they can choose from a list of several pre-built Jupyter images, including a newly updated “datalab-jupyter” image, which comes with the Google Datalab open source notebook extension enabling integration with BigQuery, Google Cloud ML, StackDriver, and it also has TensorFlow and the Apache Beam Python SDK for Google Cloud DataFlow installed. Users can also choose to run any of the pre-configured Jupyter docker-stack images, or you can build your own Docker images to run any special libraries or Jupyter configurations you want.

We hope that this solution allows you to get your classroom or team environment running quickly so you can focus on learning rather than configuring machines.

By Brad Svee, Cloud Solutions Architect

Dart in 2017 and beyond

Wednesday, October 26, 2016

We’re here at the Dart Developer Summit in Munich, Germany. Over 250 developers from more than 50 companies from all over the world just finished watching the keynote.

This is a summary of the topics we covered:

Dart is the fastest growing programming language at Google, with a 3.5x increase in lines of code since last year. We like to think that this is because of our focus on developer productivity: teams report 25% to 100% increase in speed of development. Google has bet its biggest business on Dart — the web apps built on Dart bring over $70B per year.

Google AdSense recently launched a ground-up redesign of their web app, built with Dart. Earlier this year, we announced that the next generation of AdWords is built with Dart. There are more exciting Dart products at Google that we’re looking forward to reveal. Outside Google, companies such as Wrike, Workiva, Soundtrap, Blossom, DG Logic, Sonar Design have all been using and enjoying Dart for years.

Our five year investment in this language is reaping fruit. But we’re not finished.

We learned that people who use Dart love its terse and readable syntax. So we’re keeping that.

We have also learned that Dart developers really enjoy the language’s powerful static analysis. So we’re making it better. With strong mode, Dart’s type system becomes sound (meaning that it rejects all incorrect programs). We’re also introducing support for generic methods.

We have validated that the programming language itself is just a part of the puzzle. Dart comes with ‘batteries included.’ Developers really like Dart’s core libraries — we will keep them tight, efficient and comprehensive. We will also continue to invest in tooling such as pub (our integrated packaging system), dartfmt (our automatic formatter) and, of course, the analyzer.

On the web, we have arrived at a framework that is an excellent fit for Dart: AngularDart. All the Google web apps mentioned above use it. It has been in production at Google since February. AngularDart is designed for Dart, and it’s getting better every week. In the past 4 months, AngularDart’s output has gotten 40% smaller, and our AngularDart web apps got 15% faster.

Today, we’re launching AngularDart 2.0 final. Tune in to the next session.

With that, we’re also releasing — as a developer preview — the AngularDart components that Google uses for its major web apps. These Material Design widgets are being developed by hundreds of Google engineers and are thoroughly tested. They are written purely in Dart.

We’re also making Dart easier to use with existing JavaScript libraries. For example, you will be able to use our tool to convert TypeScript .d.ts declarations into Dart libraries.

We’re making the development cycle much faster. Thanks to Dart Dev Compiler, compilation to JavaScript will take less than a second across all modern browsers.

We believe all this makes Dart an even better choice for web development than before. Dart has been here for a long time and it’s not going anywhere. It’s cohesive and dependable, which is what a lot of web developers want.

We’re also very excited about Flutter — a project to help developers build high-performance, high-fidelity, mobile apps for iOS and Android from a single codebase in Dart. More on that tomorrow.

We hope you’ll enjoy these coming two days. Tune in on the live stream or follow #dartsummit on Twitter.

By Filip Hracek, Developer Relations Program Manager

Google Summer of Code 2016 wrap-up: GNU Radio

Tuesday, October 25, 2016

This post is the third installment in our series of wrap-up posts reflecting on Google Summer of Code 2016. Check out the first and second posts in the series.

Originally posted on GNU Radio Blog

The summer has come to an end -- along with the Summer of Code for GNU Radio. It was a great season in terms of student participation, and as the students are preparing their last commits, this seems a good time to summarize their efforts.

All students presented their work (either in person, or via poster) at this year’s GNU Radio Conference in Boulder, Colorado.

gr-inspector

With gr-inspector, GNU Radio now has its own out-of-tree module, which serves as a repository for signal analysis algorithms, but also as a collection of fantastic examples. This module was created and worked on by Sebastian Müller, who was funded by Google Summer of Code (GSoC), and Christopher Richardson, who participated as a Summer of Code in Space (SOCIS) student funded by the European Space Agency. Sebastian also created a video demonstrating some of the features:

Both Sebastian and Chris have written up their efforts on their own blogs.

PyBOMBS GUI

Ravi Sharan was our other GSoC student, primarily working on a GUI for PyBOMBS, our installation helper tool. Ravi also worked on a bunch of other things, and has summarized his efforts as well.

The PyBOMBS GUI is written in Qt, and is a nice extension to our out-of-tree module ecosystem:

While some developers prefer the comfort of their command line environments, we hope that the PyBOMBS GUI will ease the entry for more new developers. The GUI ties in nicely with CGRAN, and with the correct setup, users can directly launch installation of out-of-tree modules from their browser.

Want to participate? Have ideas?

We will definitely apply for GSoC and SOCIS again next year! If you want to participate as a student, it helps a lot to get involved with the community early on. We also recommend you sign up for the mailing list, and get involved with GNU Radio by using it, reporting and fixing issues, or even publishing your own out-of-tree module. For more ideas, take a look at our summer of code wiki pages.

If you simply have ideas for future projects, those are welcome too! Suggest those on the mailing list, or simply edit the wiki page.

By Martin Braun, Organization Administrator for GNU Radio

Google Code-in 2016 now accepting organization applications

Monday, October 24, 2016

Google Code-in is our global online contest that invites pre-university students ages 13-17 to learn by contributing to open source software. The contest begins its 7th year on November 28th, 2016. With the start date of the contest rapidly approaching, we are now accepting applications for open source projects interested in being a part of Google Code-in.

Working with young students is a special responsibility and each year we hear inspiring stories from mentors who participate. To ensure these new, young contributors have a great support system, we select organizations that have gained experience in mentoring students by previously taking part in Google Summer of Code.

There were 14 organizations in 2015 that collectively created thousands of bite-sized tasks for students to choose from. Tasks are created in 5 categories:

Code: writing or refactoring
Documentation/Training: creating/editing documents and helping others learn more
Outreach/Research: community management, outreach/marketing, or studying problems and recommending solutions
Quality Assurance: testing and ensuring code is of high quality
User Interface: user experience research or user interface design and interaction

Once an organization is selected for Google Code-in 2016 they will define these tasks and recruit mentors who are interested in providing online support for students.

You can find a timeline, FAQ and other information about Google Code-in on our website. If you’re an educator interested in sharing Google Code-in with your students, you can find resources here.

By Josh Simmons, Open Source Programs Office

Budou: Automatic Japanese line breaking tool

Friday, October 21, 2016

Today we are pleased to introduce Budou, an automatic line breaking tool for Japanese. What is a line breaking tool and why is it necessary? English uses spacing and hyphenation as cues to allow for beautiful, aka more legible, line breaks. Japanese, which has none of these, is notoriously more difficult. Breaks occur randomly, usually in the middle of a word.

This is a long standing issue in Japanese typography on the web, and results in degradation of readability. We can specify the place which line breaks can occur with CSS coding, but this is a non-trivial manual process which requires Japanese vocabulary and knowledge of grammar.

Budou automatically translates Japanese sentences into organized HTML code with meaningful chunks wrapped in non-breaking markup so as to semantically control line breaks. Budou uses Cloud Natural Language API to analyze the input sentence, and it concatenates proper words in order to produce meaningful chunks utilizing PoS (part-of-speech) tagging and syntactic information. Budou outputs HTML code by wrapping the chunks in a SPAN tag. By specifying their display property as inline-block in CSS, semantic units will no longer be split at the end of a line.

Budou is a simple Python script that runs each sentence through the Cloud Natural Language API. It can easily be extended as a custom filter for template engines, or as a task for runners such as Grunt and Gulp. The latest version also caches the response so no duplicate requests are sent. If you are using Budou for a static website, you can process your HTML code before deployment.

Budou is aimed to be used in relatively short sentences such as titles and headings. Screen readers may read a sentence by splitting the chunks wrapped by SPAN tag or split by WBR tag, so it is discouraged to use Budou for body paragraphs.

As of October 2016, the Cloud Natural Language API supports English, Spanish, and Japanese, and Budou currently only supports Japanese. Support for other Asian languages with line break issues, such as Chinese and Thai, will be added as the API adds support.

Any comments and suggestions are welcome. You can find us on GitHub.

By Shuhei Iitsuka, UX Engineer

Introducing Nomulus: an open source top-level domain name registry

Tuesday, October 18, 2016

Today, Google is proud to announce the release of Nomulus, a new open source cloud-based registry platform that powers Google’s top level domains (TLDs). We’re excited to make this piece of Internet infrastructure available to everyone.

TLDs are the top level of the Internet Domain Name System (DNS), and they collectively host every domain name on the Internet. To manage a TLD, you need a domain name registry -- a behind-the-scenes system that stores registration details and DNS information for all domain names under that TLD. It handles WHOIS queries and requests to buy, check, transfer, and renew domain names. When you purchase a domain name on a TLD using a domain name registrar, such as Google Domains, the registrar is actually conducting business with that TLD’s registry on your behalf. That’s why you can transfer a domain from one registrar to another and have it remain active and 100% yours the entire time.

The project that became Nomulus began in 2011 when the Internet Corporation for Assigned Names and Numbers (ICANN) announced the biggest ever expansion of Internet namespace, aimed at improving choice and spurring innovation for Internet users. Google applied to operate a number of new generic TLDs, and built Nomulus to help run them.

We designed Nomulus to be a brand-new registry platform that takes advantage of the scalability and easy operation of Google Cloud Platform. Nomulus runs on Google App Engine and is backed by Google Cloud Datastore, a highly scalable NoSQL database. Nomulus can manage any number of TLDs in a single shared instance and supports the full range of TLD functionality required by ICANN, including the Extensible Provisioning Protocol (EPP), WHOIS, reporting, and trademark protection. It is written in Java and is released under the Apache 2.0 license.

We hope that by providing access to our implementation of core registry functions and up-and-coming services like Registration Data Access Protocol (RDAP), we can demonstrate advanced features of Google Cloud Platform and encourage interoperability and open standards in the domain name industry for registry operators like Donuts. With approximately 200 TLDs, Donuts has made early contributions to the Nomulus code base and has spun up an instance which they'll be sharing soon.

For more information, view Nomulus on GitHub.

By Ben McIlwain, Software Engineer

Google Summer of Code 2016 wrap-up: NRNB

Monday, October 17, 2016

This post is part of our series of Google Summer of Code wrap-ups, guest posts from students, mentors and organization admins reflecting on Google Summer of Code 2016. Don't miss our first post and follow along for more wrap-up posts and announcements.

We were so excited to be a part of Google Summer of Code (GSoC) again after a year off, we pulled together over 50 project ideas and dozens of eager mentors to develop open source code for network biology research. Organized as the National Resource for Network Biology (NRNB), we selected 15 proposals that brought together well-matched students, mentors and project ideas.

All 15 students passed their midterm and final evaluations, resulting in a wide range of (mostly) production-ready code, covering algorithm, UI, importer and converter development for both web and desktop for Cytoscape, cytoscape.js, SBML, SBGN, cBioPortal, Cell Designer, GraphSpace and more.

We are proud of the technical accomplishments and productivity of our students, and we are also proud of the many important aspects of diversity our students represent in the GSoC program, including geographical, gender and academic. Here are some numbers and facts about our 15 students compared to overall GSoC 2016 student stats in parentheses:

9 different countries, including 1 (of 2) from Croatia, 1 (of 3) from Armenia and 2 (of 12) from Turkey
20% female (compared to 12% overall)
67% Computer Science (compared to 78% overall), including PhD students in Biological Oceanography and Medical Biochemistry & Biotechnology, an MS student in Bioinformatics, and a pre-med undergraduate.

Here are some quotes and blogs from our students this year. If you are considering applying as student (or mentor) next year, here is some inspiration:

“I had the opportunity to learn and practice JavaScript with a very interesting project and having a mentor available was great for getting help when needed. The program seemed extremely well run and I would strongly recommend it to anyone interested.”

“Working in an NRNB [GSoC] training program helped to strengthen my resume and introduced me to the idea of combining a career in medicine with computer-based research.”

“I love the friendly atmosphere and the way the team works together. From the very beginning I [felt] well integrated in the group. It was pure fun to work together on the same project and to see how it [has] grown over the time. I [would] recommend everybody try the NRNB training program.”

Some of our student blogs:

Hovakim Grabski – "Java support for Deviser, a code generation system for SBML libraries"
Kaito Ii – "Interconvertible Layout software for CellDesigner"
Roman Schulte – "Offline SBML validation in the Java-based JSBML library"
Mridul Seth – "Import graphs in multiple formats and Cytoscape files into GraphSpace"

By Alex Pico and Kristina Hanspers, Organization Administrators for NRNB

Google Open Source Report Card

Friday, October 14, 2016

Open source software enables Google to build things quickly and efficiently without reinventing the wheel, allowing us to focus on solving new problems. We stand on the shoulders of giants and we know it. This is why we support open source and make it easy for Googlers to release the projects they’re working on internally as open source.

Today we’re sharing our first Open Source Report Card, highlighting our most popular projects, sharing a few statistics and detailing some of the projects we’ve released in 2016.

We’ve open sourced over 20 million lines of code to date and you can find a listing of some of our best known project releases on our website. Here are some of our most popular projects:

Android - a software stack for mobile devices that includes an operating system, middleware and key applications.
Chromium - a project encompassing Chromium, the software behind Google Chrome, and Chromium OS, the software behind Google Chrome OSdc devices.
Angular - a web application framework for JavaScript and Dart focused on developer productivity, speed and testability.
TensorFlow - a library for numerical computation using data flow graphics with support for scalable machine learning across platforms from data centers to embedded devices.
Go - a statically typed and compiled programming language that is expressive, concise, clean and efficient.
Kubernetes - a system for automating deployment, operations and scaling of containerized applications.
Polymer - a lightweight library built on top of Web Components APIs for building encapsulated re-usable elements in web applications.
Protobuf - an extensible, language-neutral and platform-neutral mechanism for serializing structured data.
Guava - a set of Java core libraries that includes new collection types (such as multimap and multiset), immutable collections, a graph library, functional types, an in-memory cache, and APIs/utilities for concurrency, I/O, hashing, primitives, reflection, string processing and much more.
Yeoman - a robust and opinionated set of scaffolding tools including libraries and a workflow that can help developers quickly build beautiful and compelling web applications.

While it’s difficult to measure the full scope of open source at Google, we can use the subset of projects that are on GitHub to gather some interesting data. Today our GitHub footprint includes over 84 organizations and 3,499 repositories, 773 of which were created this year.

Googlers use countless languages from Assembly to XSLT, but what are their favorites? GitHub flags the most heavily used language in a repository and we can use that to find out. A survey of GitHub repositories shows us these are some of the languages Googlers use most often:

JavaScript
Java
C/C++
Go
Python
TypeScript
Dart
PHP
Objective-C
C#

Many things can be gleaned using the open source GitHub dataset on BigQuery, like usage of tabs versus spaces and the most popular Go packages. What about how many times Googlers have committed to open source projects on GitHub? We can search for Google.com email addresses to get a baseline number of Googler commits. Here’s our query:

SELECT count(*) as n
FROM [bigquery-public-data:github_repos.commits]
WHERE committer.date > '2016-01-01 00:00'
AND REGEXP_EXTRACT(author.email, r'.*@(.*)') = 'google.com'

With this, we learn that Googlers have made 142,527 commits to open source projects on GitHub since the start of the year. This dataset goes back to 2011 and we can tweak this query to find out that Googlers have made 719,012 commits since then. Again, this is just a baseline number as it doesn’t count commits made with other email addresses.

Looking back at the projects we’ve open-sourced in 2016 there’s a lot to be excited about. We have released open source software, hardware and datasets. Let’s take a look at some of this year’s releases.

Seesaw
Seesaw is a Linux Virtual Server (LVS) based load balancing platform developed in Go by our Site Reliability Engineers. Seesaw, like many projects, was built to scratch our own itch.

From our blog post announcing its release: “We needed the ability to handle traffic for unicast and anycast VIPs, perform load balancing with NAT and DSR (also known as DR), and perform adequate health checks against the backends. Above all we wanted a platform that allowed for ease of management, including automated deployment of configuration changes.”

Vendor Security Assessment Questionnaire (VSAQ)
We assess the security of hundreds of vendors every year and have developed a process to automate much of the initial information gathering with VSAQ. Many vendors found our questionnaires intuitive and flexible, so we decided to share them. The VSAQ Framework includes four extensible questionnaire templates covering web applications, privacy programs, infrastructure as well as physical and data center security. You can learn more about it in our announcement blog post.

OpenThread
OpenThread, released by Nest, is a complete implementation of the Thread protocol for connected devices in the home. This is especially important because of the fragmentation we’re seeing in this space. Development of OpenThread is supported by ARM, Microsoft, Qualcomm, Texas Instruments and other major vendors.

Magenta
Can we use machine learning to create compelling art and music? That’s the question that animates Magenta, a project from the Google Brain team based on TensorFlow. The aim is to advance the state of the art in machine intelligence for music and art generation and build a collaborative community of artists, coders and machine learning researchers. Read the release announcement for more information.

Omnitone
Virtual reality (VR) isn’t nearly as immersive without spatial audio and much of VR development is taking place on proprietary platforms. Omnitone is an open library built by members of the Chrome Team that brings spatial audio to the browser. Omnitone builds on standard Web Audio APIs to deliver an immersive experience and can be used alongside projects like WebVR. Find out more in our blog post announcing the project’s release.

Science Journal
Today’s smartphones are packed with sensors that can tell us interesting things about the world around us. We launched Science Journal to help educators, students and citizen scientists tap into those sensors. You can learn more about the project in our announcement blog post.

Cartographer
Cartographer is a library for real-time simultaneous localization and mapping (SLAM) in 2D and 3D with Robot Operating System (ROS) support. Combining data from a variety of sensors, this library computes positioning and maps surroundings. This is a key element of self-driving cars, UAVs and robotics as well as efforts to map the insides of famous buildings. More information on Cartographer can be found in our blog post announcing its release.

This is just a small sampling of what we’ve released this year. Follow the Google Open Source Blog to stay apprised of Google’s open source software, hardware and data releases.

By Josh Simmons, Open Source Programs Office

Google Summer of Code 2016 wrap-up: HUES Platform

Wednesday, October 12, 2016

Every year Google Summer of Code pairs university students with mentors to hone their skills while working on open source projects, and every year we like to post wrap-ups from the open source projects about their experience and what students accomplished. Stay tuned for more!

The Holistic Urban Energy Simulation (HUES) platform is an open source platform for facilitating the design and control of renewables-based distributed energy systems. The platform is an initiative of the Urban Energy Systems Laboratory at Empa in Switzerland, in collaboration with our research partners at ETH-Zurich, EPFL, the University of Geneva and the Lucerne University of Applied Sciences. As we push towards the second version of the HUES platform, we had help from three bright and enthusiastic students as part of the Google Summer of Code (GSoC).

Project 1: Real-time wind flow in cities

Air flow pattern around a building configuration (left); link to Rhinoceros/Grasshopper (middle & right)

People in cities are suffering more and more from scorching heat, caused by global warming and bad urban planning. This traps heat inside cities and has led to soaring air conditioning demand, making cities even hotter - a vicious circle! Clever bioclimatic urban design can mitigate urban heat by facilitating the use of natural ventilation and guiding air streams. However, the simulation of wind flow is a computationally and technically demanding task. There is a need to provide urban planners and architects with a tool able to predict wind flow patterns in real-time to facilitate development of energy efficient and passive designs.

Lukas Bystricky, a student at Florida State University, developed a Fast Fluid Dynamics (FFD) library in C# exactly for this purpose. Lukas’s implementation is based on the paper by Jos Stam (1999). In contrast to the original implementation, where a cell centred finite difference is used to discretize the equations, Lukas applies a staggered grid finite difference, which is the standard finite difference in Computational Fluid Dynamics (CFD). This is done to prevent spurious pressure oscillations near the boundary which can occur in cell centered finite difference for the Navier-Stokes equations. This does not change much in the algorithm or solvers, but makes enforcing the boundary conditions significantly more complicated. So far, Lukas uses a simple Jacobi solver as linear solver, as was the case in Stam's original implementation, but he plans to replace it with more efficient solvers in the future. Also, he is validating his library with typical benchmarks.

We are now coupling Lukas’s library into our HUES platform, more specifically into the 3D CAD software Rhinoceros and its visual programming platform Grasshopper. The final goal is to have an intuitive real-time visual design tool of wind flow for urban planners and architects. Also, we will use it to couple it to whole year dynamic building energy simulation programs, to better capture microclimatic effects of the urban context in simulating building energy consumption of designs.

Project 2: Modular energy hub modeling framework

A connection between two bus objects in a CopyHub container

Distributed energy system components are modular in nature and interact across multiple scales. As such, there is a need for a modeling framework that can easily construct and configure systems of modular entities (energy demands, sources, converters, storages and network links) across scales. Frederik Banis, a student at the University of Applied Sciences Stuttgart, developed a modular approach to modeling distributed multi-energy systems (energy hubs) in Python, based on the Open Energy System Modelling Framework (Oemof) and Pyomo.

In the developed framework, energy systems components are specified in a common format allowing for easy duplication and reconfiguring at larger scales. The platform enables easy manipulation of an energy hub grouping multiple components (demand, sources: electricity, natural gas; systems: photovoltaic panels, wind turbines, gas boils, combined heat and power engines, etc.), as well as copying it (from hub1 to hub2) to create a larger interlinked system (district) where multiple energy hubs are connected. This hierarchical nested structure can be repeated as needed, and detailed results about the energy supply of each technology or energy stream can be analyzed in the form of different plots for each system or sub-system.

Project 3: Open source energy simulation database

The HUES platform includes a growing array of datasets describing the technical and economic characteristics of distributed energy technologies. Currently, this data is stored in separate modules using different data structures and file formats, making it difficult to explore holistically and query systematically. To address this, GSoC student Khushboo Mandlecha has developed an open source database to enable the linked exploration, querying and visualization of data in the platform.

The first part of the project involved the development of server based scripts to automatically extract relevant data from the modules of the existing HUES platform, and write this data to a common database. A standard format for technology component data was developed, enabling users to upload technology data files to be stored in the new database. The new database has been developed in MongoDB, enabling fast data retrieval and allowing everything to be retrieved in the form of JSON objects. The second part of the project involved the development of a web-based portal for querying, visualizing and downloading data. Once this portal is complete, it will be possible to visualize the contents of the database in different ways, enabling users to get a sense of the distribution of property values and facilitating the identification of outliers. Ultimately, the database will help researchers and practitioners using the HUES platform to develop models and perform comprehensive analyses of distributed energy systems.

By L. Andrew Bollinger, Julien Marquant and Christoph Waibel; Urban Energy Systems Laboratory, Empa, Switzerland

Announcing Google Code-in 2016 and Google Summer of Code 2017

Monday, October 10, 2016

One of the goals of the Open Source Programs Office is to encourage more people to contribute to open source software. One way we achieve that goal is through our student programs, Google Summer of Code (for university students) and Google Code-in (for pre-university students).

Over 15,000 students from more than 100 countries have worked with 23,000 mentors and contributed to 560+ open source projects.

This is why we’re excited to announce the next round of both of our student programs!

Google Code-in begins for students November 28, 2016

For the seventh consecutive year, Google Code-in will give students (ages 13-17) a chance to explore open source. Students will find opportunities to learn and get hands on experience with tasks from a range of categories. This structure allows students to stretch themselves as they take on increasingly more challenging tasks.

Getting started is easy: once the contest begins, simply choose an interesting task from our participating organizations’ lists and complete it. Mentors from the organizations are available to help online.

Google Code-in is for students asking questions like:

What is open source?
What kinds of stuff do open source projects do?
How can I write real code when all I’ve done is a little classroom work?
Can I contribute even if I’m not really a programmer?

With tasks in five different categories, there’s something to fit almost any student’s skills:

Code: writing or refactoring
Documentation/Training: creating/editing documents and helping others learn more
Outreach/research: community management, outreach/marketing, or studying problems and recommending solutions
Quality Assurance: testing and ensuring code is of high quality
User Interface: user experience research or user interface design and interaction

Google Summer of Code student applications open on March 20, 2017

Google Summer of Code (GSoC) provides university students from around the world with an opportunity to take their skills and hone them by contributing to open source projects during their summer break from university.

Students gain invaluable experience working with mentors on these open source software projects, earning a stipend upon successful completion of their project.

We’re proud to keep this tradition going: we’ll be opening student applications for Google Summer of Code 2017 on March 20, 2017. Applications for interested open source organizations open on January 19, 2017.

Students, it’s never too early to start preparing or thinking about your proposal. You can learn about the organizations that participated in Google Summer of Code 2016 and the projects students worked on. We also encourage you to explore other resources like the student and mentor manuals and frequently asked questions.

You can learn more on the program website.

Share the news with your friends and stay tuned, more details are coming soon!

By Josh Simmons, Open Source Programs Office

An open source font system for everyone

Thursday, October 6, 2016

Originally posted on the Google Developers Blog

A big challenge in sharing digital information around the world is “tofu”—the blank boxes that appear when a computer or website isn’t able to display text: ⯐. Tofu can create confusion, a breakdown in communication, and a poor user experience.

Five years ago we set out to address this problem via the Noto—aka “No more tofu”—font project. Today, Google’s open source Noto font family provides a beautiful and consistent digital type for every symbol in the Unicode standard, covering more than 800 languages and 110,000 characters.

A few samples of the 110,000+ characters covered by Noto fonts.

The Noto project started as a necessity for Google’s Android and Chrome OS operating systems. When we began, we did not realize the enormity of the challenge. It required design and technical testing in hundreds of languages, and expertise from specialists in specific scripts. In Arabic, for example, each character has four glyphs (i.e., shapes a character can take) that change depending on the text that comes after it. In Indic languages, glyphs may be reordered or even split into two depending on the surrounding text.

The key to achieving this milestone has been partnering with experts in the field of type and font design, including Monotype, Adobe, and an amazing network of volunteer reviewers. Beyond “no more tofu” in the common languages used every day, Noto will be used to preserve the history and culture of rare languages through digitization. As new characters are introduced into the Unicode standard, Google will add these into the Noto font family.

Google has a deep commitment to openness and the accessibility and innovation that come with it. The full Noto font family, design source files, and the font building pipeline are available for free at the links below. In the spirit of sharing and communication across borders and cultures, please use and enjoy!

Noto fonts download: https://www.google.com/get/noto/
Design source files: https://github.com/googlei18n/noto-source
Font building pipeline: https://github.com/googlei18n/fontmake

By Xiangye Xiao and Bob Jung, Internationalization

Introducing Cartographer

Wednesday, October 5, 2016

We are happy to announce the open source release of Cartographer, a real-time simultaneous localization and mapping (SLAM) library in 2D and 3D with ROS support.

SLAM algorithms combine data from various sensors (e.g. LIDAR, IMU and cameras) to simultaneously compute the position of the sensor and a map of the sensor’s surroundings. For example, consider this approach to drawing a floor plan of your living room:

Grab a laser rangefinder, stand in the middle of the room, and draw an X on a piece of paper.
Measure the distance from where you’re standing to any wall.
Draw a line on the paper where the wall is and write down the distance between the X (your position) and the wall.
Measure the distance from where you’re standing to another wall and add it to the drawing as well.
Now, move to another part of the room.
Since the walls (hopefully) haven’t moved, you can measure your distance to the same two walls to determine your new position.

SLAM is an essential component of autonomous platforms such as self driving cars, automated forklifts in warehouses, robotic vacuum cleaners, and UAVs.

Cartographer builds globally consistent maps in real-time across a broad range of sensor configurations common in academia and industry. The following video is a demonstration of Cartographer’s real-time loop closure:

A detailed description of Cartographer’s 2D algorithms can be found in our ICRA 2016 paper.

Thanks to ROS integration and support from external contributors, Cartographer is ready to use on several robot platforms with ROS support:

At Google, Cartographer has enabled a range of applications from mapping museums and transit hubs to enabling new visualizations of famous buildings.

We recognize the value of high quality datasets to the research community. That’s why, thanks to cooperation with the Deutsches Museum (the largest tech museum in the world), we are also releasing three years of LIDAR and IMU data collected using our 2D and 3D mapping backpack platforms during the development and testing of Cartographer.

Our focus is on advancing and democratizing SLAM as a technology. Currently, Cartographer is heavily focused on LIDAR SLAM. Through continued development and community contributions, we hope to add both support for more sensors and platforms as well as new features, such as lifelong mapping and localizing in a pre-existing map.

By Damon Kohler, Wolfgang Hess, and Holger Rapp, Google Engineering

Introducing the Open Images Dataset

Monday, October 3, 2016

Originally posted on the Google Research Blog

In the last few years, advances in machine learning have enabled Computer Vision to progress rapidly, allowing for systems that can automatically caption images to apps that can create natural language replies in response to shared photos. Much of this progress can be attributed to publicly available image datasets, such as ImageNet and COCO for supervised learning, and YFCC100M for unsupervised learning.

Today, we introduce Open Images, a dataset consisting of ~9 million URLs to images that have been annotated with labels spanning over 6000 categories. We tried to make the dataset as practical as possible: the labels cover more real-life entities than the 1000 ImageNet classes, there are enough images to train a deep neural network from scratch and the images are listed as having a Creative Commons Attribution license^*.

The image-level annotations have been populated automatically with a vision model similar to Google Cloud Vision API. For the validation set, we had human raters verify these automated labels to find and remove false positives. On average, each image has about 8 labels assigned. Here are some examples:

Annotated images form the Open Images dataset. Left: Ghost Arches by Kevin Krejci. Right: Some Silverware by J B. Both images used under CC BY 2.0 license

We have trained an Inception v3 model based on Open Images annotations alone, and the model is good enough to be used for fine-tuning applications as well as for other things, like DeepDream or artistic style transfer which require a well developed hierarchy of filters. We hope to improve the quality of the annotations in Open Images the coming months, and therefore the quality of models which can be trained.

The dataset is a product of a collaboration between Google, CMU and Cornell universities, and there are a number of research papers built on top of the Open Images dataset in the works. It is our hope that datasets like Open Images and the recently released YouTube-8M will be useful tools for the machine learning community.

By Ivan Krasin and Tom Duerig, Software Engineers

* While we tried to identify images that are licensed under a Creative Commons Attribution license, we make no representations or warranties regarding the license status of each image and you should verify the license for each image yourself.