Google Open Source Blog: April 2017

Posts from April 2017

Saddle up and meet us in Texas for OSCON 2017

Wednesday, April 26, 2017

The Google Open Source team is getting ready to hit the road and join the open source panoply that is Open Source Convention (OSCON). This year the event runs May 8-11 in Austin, Texas and is preceded on May 6-7 by the free-to-attend Community Leadership Summit (CLS).

Program chairs at OSCON 2016, left to right:
Kelsey Hightower, Scott Hanselman, Rachel Roumeliotis.
Photo used with permission from O'Reilly Media.

You’ll find our team and many other Googlers throughout the week on the program schedule and in the expo hall at booth #401. We’ve got a full rundown of our schedule below, but you can swing by the expo hall anytime to discuss Google Cloud Platform, our open source outreach programs, the projects we’ve open-sourced including Kubernetes, TensorFlow, gRPC, and even our recently released open source documentation.

Of course, you’ll also find our very own Kelsey Hightower everywhere since he is serving as one of three OSCON program chairs for the second year in a row.

Are you a student, educator, project maintainer, community leader, past or present participant in Google Summer of Code or Google Code-in? Join us for lunch at the Google Summer of Code table in the conference lunch area on Wednesday afternoon. We’ll discuss our outreach programs which help open source communities grow while providing students with real world software development experience. We’ll be updating this blog post and tweeting with details closer to the date.

Without further ado, here’s our schedule of events:

Monday, May 8th (Tutorials)

9:00am Site reliability engineering by Jean Joswig

1:30pm Kubernetes hands-on by Kelsey Hightower

Tuesday, May 9th (Tutorials)

1:30pm Building amazing cross-platform command-line apps in Go by Steve Francia co-presented with Ashley McNamara
1:30pm "Measure all the things" and other memes you haven’t implemented yet by Kelsey Hightower

Wednesday, May 10th (Sessions)

11:00am TensorFlow Community Keynote by Yufeng Guo and Amy Unruh

11:50am The life of a large-scale open source project by Jessica Frazelle

11:50am What is machine learning by Amy Unruh

12:30pm Google Summer of Code and Google Code-in lunch

4:15pm From WebSockets to WISH (web in strict HTTP) by Wenbo Zhu

5:05pm gRPC 101 for Java developers: Building small and efficient microservices by Ray Tsang

5:05pm Go deep, go wide, go everywhere: Hands-on machine learning with TensorFlow by Yufeng Guo

Thursday, May 11th (Sessions)

9:30am Half my life spent in open source by Brad Fitzpatrick

4:15pm Multilayered testing by Alex Martelli

We look forward to seeing you deep in the heart of Texas at OSCON 2017!

By Josh Simmons, Google Open Source

Introducing tf-seq2seq: An Open Source Sequence-to-Sequence Framework in TensorFlow

Tuesday, April 11, 2017

Crossposted on the Google Research Blog

Last year, we announced Google Neural Machine Translation (GNMT), a sequence-to-sequence (“seq2seq”) model which is now used in Google Translate production systems. While GNMT achieved huge improvements in translation quality, its impact was limited by the fact that the framework for training these models was unavailable to external researchers.

Today, we are excited to introduce tf-seq2seq, an open source seq2seq framework in TensorFlow that makes it easy to experiment with seq2seq models and achieve state-of-the-art results. To that end, we made the tf-seq2seq codebase clean and modular, maintaining full test coverage and documenting all of its functionality.

Our framework supports various configurations of the standard seq2seq model, such as depth of the encoder/decoder, attention mechanism, RNN cell type, or beam size. This versatility allowed us to discover optimal hyperparameters and outperform other frameworks, as described in our paper, “Massive Exploration of Neural Machine Translation Architectures.”

A seq2seq model translating from Mandarin to English. At each time step, the encoder takes in one Chinese character and its own previous state (black arrow), and produces an output vector (blue arrow). The decoder then generates an English translation word-by-word, at each time step taking in the last word, the previous state, and a weighted combination of all the outputs of the encoder (aka attention [3], depicted in blue) and then producing the next English word. Please note that in our implementation we use wordpieces [4] to handle rare words.

In addition to machine translation, tf-seq2seq can also be applied to any other sequence-to-sequence task (i.e. learning to produce an output sequence given an input sequence), including machine summarization, image captioning, speech recognition, and conversational modeling. We carefully designed our framework to maintain this level of generality and provide tutorials, preprocessed data, and other utilities for machine translation.

We hope that you will use tf-seq2seq to accelerate (or kick off) your own deep learning research. We also welcome your contributions to our GitHub repository, where we have a variety of open issues that we would love to have your help with!

Acknowledgments:
We’d like to thank Eugene Brevdo, Melody Guan, Lukasz Kaiser, Quoc V. Le, Thang Luong, and Chris Olah for all their help. For a deeper dive into how seq2seq models work, please see the resources below.

References:
[1] Massive Exploration of Neural Machine Translation Architectures, Denny Britz, Anna Goldie, Minh-Thang Luong, Quoc Le
[2] Sequence to Sequence Learning with Neural Networks, Ilya Sutskever, Oriol Vinyals, Quoc V. Le. NIPS, 2014
[3] Neural Machine Translation by Jointly Learning to Align and Translate, Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio. ICLR, 2015
[4] Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, Jeffrey Dean. Technical Report, 2016
[5] Attention and Augmented Recurrent Neural Networks, Chris Olah, Shan Carter. Distill, 2016
[6] Neural Machine Translation and Sequence-to-sequence Models: A Tutorial, Graham Neubig
[7] Sequence-to-Sequence Models, TensorFlow.org

By Anna Goldie and Denny Britz, Research Software Engineer and Google Brain Resident, Google Brain Team

Join the first POSSE Workshop in Europe

Monday, April 10, 2017

We are excited to announce that the Professors’ Open Source Software Experience (POSSE) is expanding to Europe! POSSE is an event that brings together educators interested in providing students with experience in real-world projects through participation in humanitarian free and open source software (HFOSS) projects.

Over 100 faculty members have attended past workshops and there is a growing community of instructors teaching students through contributions to HFOSS. This three-stage faculty workshop will prepare you to support student participation in open source projects. During the workshop, you will:

Learn how to support student learning within real-world project environments
Motivate students and cultivate their appreciation of computing for social good
Collaborate with instructors who have similar interests and goals
Join a community of educators passionate about HFOSS

Workshop Format

Stage 1: Starts May 8, 2017 with online activities. Activities will take 2-3 hours per week and include interaction with workshop instructors and participants.
Stage 2: The face-to-face workshop will be held in Bologna, Italy, July 1-2, 2017 and is a pre-event for the ACM ITiCSE conference. Workshop participants include the workshop organizers, POSSE alumni, and members of the open source community.
Stage 3: Online activities and interactions in small groups immediately following the face-to-face workshop. Participants will have support while involving students in an HFOSS project in the classroom.

How to Apply

If you’re a full-time instructor at an academic institution outside of the United States, you can join the workshop being held in Bologna, Italy, July 1-2, 2017. Please complete and submit the application by May 1, 2017. Prior work with FOSS projects is not required. English is the official language of the workshop. The POSSE workshop committee will send an email notifying you of the status of your application by May 5, 2017.

Participant Support

The POSSE workshop in Europe is supported by Google. Attendees will be provided with funding for two nights lodging ($225 USD per night) and meals during the workshop. Travel costs will also be covered up to $450 USD. Participants are responsible for any charges above these limits. At this time, we can only support instructors at institutions of higher education outside of the U.S. For faculty at U.S. institutions, the next POSSE will be in fall 2017 on the east coast of the U.S.

We look forward to seeing you at the POSSE workshop in Italy!

By Helen Hu, Open Source Programs Office

Noto Serif CJK is here!

Thursday, April 6, 2017

Crossposted from the Google Developers Blog

Today, in collaboration with Adobe, we are responding to the call for Serif! We are pleased to announce Noto Serif CJK, the long-awaited companion to Noto Sans CJK released in 2014. Like Noto Sans CJK, Noto Serif CJK supports Simplified Chinese, Traditional Chinese, Japanese, and Korean, all in one font.

A serif-style CJK font goes by many names: Song (宋体) in Mainland China, Ming (明體) in Hong Kong, Macao and Taiwan, Minchō (明朝) in Japan, and Myeongjo (명조) or Batang (바탕) in Korea. The names and writing styles originated during the Song and Ming dynasties in China, when China's wood-block printing technique became popular. Characters were carved along the grain of the wood block. Horizontal strokes were easy to carve and vertical strokes were difficult; this resulted in thinner horizontal strokes and wider vertical ones. In addition, subtle triangular ornaments were added to the end of horizontal strokes to simulate Chinese Kai (楷体) calligraphy. This style continues today and has become a popular typeface style.

Serif fonts, which are considered more traditional with calligraphic aesthetics, are often used for long paragraphs of text such as body text of web pages or ebooks. Sans-serif fonts are often used for user interfaces of websites/apps and headings because of their simplicity and modern feeling.

Design of '永' ('eternity') in Noto Serif and Sans CJK. This ideograph is famous for having the most important elements of calligraphic strokes. It is often used to evaluate calligraphy or typeface design.

The Noto Serif CJK package offers the same features as Noto Sans CJK:

It has comprehensive character coverage for the four languages. This includes the full coverage of CJK Ideographs with variation support for four regions, Kangxi radicals, Japanese Kana, Korean Hangul and other CJK symbols and letters in the Unicode Basic Multilingual Plane of Unicode. It also provides a limited coverage of CJK Ideographs in Plane 2 of Unicode, as necessary to support standards from China and Japan.

Simplified Chinese	Supports GB 18030 and China’s latest standard Table of General Chinese Characters (通用规范汉字表) published in 2013.
Traditional Chinese	Supports BIG5, and Traditional Chinese glyphs are compliant to glyph standard of Taiwan Ministry of Education (教育部國字標準字體).
Japanese	Supports all of the kanji in JIS X 0208, JIS X 0213, and JIS X 0212 to include all kanji in Adobe-Japan1-6.
Korean	The best font for typesetting classic Korean documents in Hangul and Hanja such as Humninjeongeum manuscript, a UNESCO World Heritage. Supports over 1.5 million archaic Hangul syllables and 11,172 modern syllables as well as all CJK ideographs in KS X 1001 and KS X 1002

Noto Serif CJK’s support of character and glyph set standards for the four languages

It respects diversity of regional writing conventions for the same character. The example below shows the four glyphs of '述' (describe) in four languages that have subtle differences.

From left to right are glyphs of '述' in S. Chinese, T. Chinese, Japanese and Korean. This character means "describe".

It is offered in seven weights: ExtraLight, Light, Regular, Medium, SemiBold, Bold, and Black. Noto Serif CJK supports 43,027 encoded characters and includes 65,535 glyphs (the maximum number of glyphs that can be included in a single font). The seven weights, when put together, have almost a half-million glyphs. The weights are compatible with Google's Material Design standard fonts, Roboto, Noto Sans and Noto Serif (Latin-Greek-Cyrillic fonts in the Noto family).

Seven weights of Noto Serif CJK

It supports vertical text layout and is compliant with the Unicode vertical text layout standard. The shape, orientation, and position of particular characters (e.g., brackets and kana letters) are changed when the writing direction of the text is vertical.

The sheer size of this project also required regional expertise! Glyph design would not have been possible without leading East Asian type foundries Changzhou SinoType Technology, Iwata Corporation, and Sandoll Communications.

Noto Serif CJK is open source under the SIL Open Font License, Version 1.1. We invite individual users to install and use these fonts in their favorite authoring apps; developers to bundle these fonts with your apps, and OEMs to embed them into their devices. The fonts are free for everyone to use!

Noto Serif CJK font download: https://www.google.com/get/noto
Noto Serif CJK on GitHub: https://github.com/googlei18n/noto-cjk
Adobe's landing page for this release: http://adobe.ly/SourceHanSerif
Source Han Serif on GitHub: https://github.com/adobe-fonts/source-han-serif/tree/release/

By Xiangye Xiao and Jungshik Shin, Internationalization Engineering team