Google Open Source Blog: October 2015

Posts from October 2015

Google Summer of Code wrap-up: STE||AR Group

Friday, October 30, 2015

Today we are featuring the STE||AR Group, another Google Summer of Code veteran organization. Adrian Serio gives an overview of their four students summer projects below.

The STE||AR Group is an international team of researchers who aim to improve application scalability by more efficiently utilizing hardware resources available to developers. This summer has been an exciting time for the STE||AR Group’s Google Summer of Code (GSoC) mentors and students alike! We were very pleased with the dedication and effort of all five of our participants.

Our students made contributions to three of our software products:

HPX: a distributed C++ runtime system which comes with a standards-compliant API and allows users to scale their applications across thousands of machines
LibGeoDecomp: an auto-parallelizing library for petascale computer simulations which is able to take advantage of HPX to better adapt fluctuating workloads to the system
LibFlatArray: a highly efficient multidimensional array library which provides an object-oriented interface but stores data in a vectorization-friendly Struct-of-Arrays format.

Just like how these three products can work together as a tightly integrated stack, our goal with the GSoC projects was to create synergy between them and steer our development towards increasing the adaptivity and efficiency of our software. Below are the summaries of our student’s projects.

Implementation of a New Resource Manager in HPX: Nidhi Makhijani

This project set out to properly assign hardware resources to executors: C++ objects that dictate the way a thread should be executed. Nidhi was able to allocate resources to an executor when it was created and return the resources when it stops. Additionally, Nidhi laid the groundwork for dynamic allocation where the resource manager can monitor and share resources amongst all of the running executors.

SIMD Wrapper for ARM NEON, Intel AVX512 & KNC in LibFlatArray: Larry Xiao

Vectorization is imperative for writing highly efficient numerical kernels. The goal of this project was to extend the already existing SIMD wrappers in LibFlatArray to more architectures (e.g. ARM NEON, Intel AVX512, etc.) and to extend the capabilities of these wrappers. Larry set out to study the different ISAs (Instruction Set Architectures), and make the library run efficiently on these architectures.

CSV Formatted Performance Counters for HPX: Devang Bacharwar

HPX provides users with a uniform interface to access arbitrary system information from anywhere in the system. Devang’s project allows users to request these counters in a CSV format. Additionally, he has enabled the ability to get timestamps with each value as well. These features will make it easier for HPX users to perform analysis on the performance data gathered from an application.

Integrate a C++AMP Kernel with HPX: Marcin Copik

The HPX runtime system can coordinate the execution and synchronization of OpenCL kernels on arbitrary OpenCL devices, such as GPUs, in a system. In his GSoC project, Marcin used a C++ AMP compiler to produce an OpenCL kernel from a parallel algorithm implemented by HPX. Marcin integrated the Kalmar AMP compiler into the HPX build system, transformed a parallel for each algorithm into an OpenCL kernel, dispatched the kernel to a GPU and synchronized the result with a concurrently running HPX application.

A Flexible IO Infrastructure for LibGeoDecomp: Konstantin Kronfeldner

In LibGeoDecomp, users are able to read from and write to arbitrary regions of the simulation space. These operations are carried out by objects which we call Steerers and Writers. Over the summer, Konstantin added the ability for these Steerers and Writers to be dynamically created and destroyed. LibGeoDecomp is typically used on supercomputers, where jobs are executed non-interactively via a batch system. Konstantin's extensions enable users to interact with the application at runtime. They can view and modify the simulation model dynamically. The benefit of this is a significantly lower turnaround time for domain scientists who need to carry out many computational experiments.

By Adrian Serio, Scientific Program Coordinator, STE||AR Group

Google Summer of Code wrap-up: Cesium

Wednesday, October 28, 2015

Today we are featuring Cesium, a three-time Google Summer of Code participant. Read more below about the fascinating work their students did with imagery this past summer.

Cesium is a JavaScript library for creating 3D globes and 2D maps in a web browser without a plugin. It uses WebGL for hardware-accelerated graphics, and is cross-platform, cross-browser and tuned for dynamic-data visualization. Cesium first participated in Google Summer of Code (GSoC) in 2013. The bright students who have joined us through GSoC have made significant contributions to the Cesium community. This summer, our students worked on the following projects:

NASA Worldview - Abhishek Potnis

Bringing Cesium to NASA imagery, Abhishek improved visualization of LIDAR profile data from NASA’s Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observations (CALIPSO). CALIPSO data is used to study atmospheric developments such as cloud formation and aerosol interactions, and can be extended to develop models for climate predictions among other possibilities. Using Cesium, Abhishek developed an interface that takes a user’s location input and displays the profile curtains nearest to that location. The user can select the time range for the data curtains, and the profile curtains near the location of interest will be refreshed accordingly. Perhaps most importantly, the interface shows the CALIPSO profiles in their natural “curtain” orientation in the context of traditional “flat” maps served by NASA’s Global Imagery Browse Services. This allows users to effectively combine the strengths of both types of maps for tasks such as determining the three-dimensional structures of clouds and dust storms. See the live demo and source code.

By combining “flat maps” with “data curtains,” a user can visualize the height of clouds and dust storms above the Earth’s surface. In this case, cloud heights above western Africa are shown as red and yellow blobs in the generally blue LIDAR curtain. The CALIPSO satellite recorded the LIDAR curtain at approximately the same time as the Aqua satellite recorded the “flat map” below it.

Cesium Support for GPX and Shapefiles - André Nunes

André joined us in 2013 as part of GSoC and worked on client-side support for KML, allowing users to easily visualize the many geographic data sets widely available in KML files. This year, he returned to tackle native Cesium support for GPS Exchange Format (GPX). GPX support will let anyone with a cell phone or other GPS device easily transfer their own outdoor activities (such as bike rides, running, boating, and even drone flights) into Cesium. Check out his GitHub pull request for the full technical details. Since he will be graduating from Técnico Lisboa in January, this is André’s final GSoC, but we hope he continues to contribute to Cesium as he embarks on his professional career!

Cesium Support for GML SFP - Ayush Khandelwal

Geography Markup Language (GML) Simple Features Profile (SFP) is a common way of representing geospatial vector features such as points, lines, and polygons, plus accompanying metadata. In addition to being useful in its own right for spatial data visualization, GML SFP is commonly used to encode features retrieved from an OGC standard Web Feature Service (WFS) and to represent the result of a GetFeatureInfo call to a Web Map Service (WMS). This summer Ayush implemented support in Cesium for GML SFP.

By Sarah Chow, Cesium Organization Administrator

Google Summer of Code wrap-up: Portland State University

Friday, October 16, 2015

We’ve enjoyed featuring new organizations to Google Summer of Code over the past six weeks. But it’s now time to turn our attention to our veterans. Portland State University is one of our longest running participants – 2015 marked their 11th year in the program!

Portland State University, in our eleventh year with Google Summer of Code (GSoC), is a relatively unique organization. Portland State is obviously not an open source project; neither are we focused on a particular kind of software or service. Instead, we concentrate on two attributes of projects: individual focus and academic relevance. Portland State serves as a home for projects that might otherwise not find an organization within GSoC, either because they require especially theory-based or academic mentoring or because there is no GSoC organization that fits. Even more than most organization, we insist that students do a self-contained piece of technical work. Many of our projects are "from scratch".

As with every year of GSoC, the results for Portland State this year are fantastic. We supervised projects covering a wide range of activities. Below are snippets about each of our student projects. You can view the full project descriptions and outcomes on our blog.

Alexia Ingerson: Efficient Parallelized Bitmap Compression

Alexia implemented a parallelized compression algorithm in C in order to run some CPU and cache analyses. She and her mentor originally hypothesized that the decrease in speedup was caused by an increase in cache misses, but analysis showed that it was in fact the I/O that could be significantly improved.

Hisham Benotman: Multiple Diagram Navigation Drupal Module

Multiple Diagram Navigation (MDN) is a Drupal module that allows website authors to incorporate diagrams, maps, infographics and other visual structures in their sites. The diagrams are supplied in SVG format. Using MDN, website authors can both connect shapes in a diagram to related shapes in other diagrams and to related Drupal nodes. Based on these connections, users can browse the website content using multiple diagrams which provide multiple points of view for the content.

Jon Barnes: Web Application for Geologic Thin Section Mapping and Mineral Analysis

Jon worked on Python-based code to identify minerals in geologic rock thin sections. The focus of this project was to work on a website to analyze and share data about thin sections between geologists. The first half of this summer he focused on building and designing the website, and the second half of this summer was focused more on getting the Django code to work for the site.

Josh Leverette: High-Precision Open Source Indoor Tracking System

Josh took on the task of getting a COTS 9-DoF Inertial Measurement Unit, the STM LSM9DS0, working with Linux and implementing sensor fusion with the goal of building a portable embedded system that could track a person through a building. His intended target was emergency First Responders, but the system has a variety of uses.

Karthik Senthil: A Tool To Build Definitional Trees

Karthik built an open source tool useful in the compilation and execution of Functional Logic Programming languages; this tool is related to research conducted with his mentor. The tool was completed successfully and fielded in demonstration projects.

Maxim Grishin: Commercial-Quality Sound In MuseScore

Maxim took on the task of improving MuseScore, a high-quality open source music composition tool that was not part of GSoC this year. In particular, the MIDI and audio generation needed some help, especially after the release of MuseScore 2.0 which made fundamental architectural changes.

Melissa Fabros: WebLogo: Making Sequence Logos Easy and Painless

A "Sequence Logo" is a graphical representation of RNA, DNA, or protein multiple sequence alignments. Melissa worked to rewrite portions of a web tool, WebLogo, for managing Sequence Logos. She modified WebLogo's front-end to enable the web application€™s use on mobile computing devices and to incorporate dynamic web features, modernized the HTML and CSS to meet Responsive Design€ standards, and added substantial capabilities around Sequence Logo upload, download and sharing.

Michael Kennedy: A Mobile Application Privacy Testing Tool

The aim of Michael's project was to develop a network privacy testing tool for Android applications. This tool detects the disclosure of personal information and specifically two disclosure issues: the "leakage" of personal information through unencrypted network traffic, and inappropriate disclosure of personal information to third-party providers such as advertisers (in encrypted and unencrypted traffic).

Nalin: XBoard Enhancement and Accessibility

XBoard is an open source cross-platform C program that is one of the oldest and most-used interfaces to Chess engines. Nalin took on the task of fixing some user interface issues in XBoard with the specific intention of improving accessibility for users with disabilities, under the mentorship of two of the XBoard authors.

Tim Cooper: Adding gRPC support To The Mumble VoIP server

Tim worked on the server-side code of the Mumble project, an open-source VoIP system that is primarily used in the online gaming community. He added support for Google's new gRPC library, one that allows developers to remotely invoke functions on a server. These changes allow Mumble server owners to write code in several different languages to interact with and change how their Mumble servers operate.

Vaibhav Sharma: Face Detection and Recognition In Videos

Vaibhav took on the task of recognizing actors' faces in videos using machine learning. Building the infrastructure alone was a major effort, and there were countless challenges in algorithms and techniques. One major hurdle was building a good corpus for training and evaluation.

Overall this was one of the best years ever for Portland State. I was really impressed with the students and with the work that they produced. I learned some valuable lessons that will be applied to the program if we are accepted next year, and as always really enjoyed the process.

Huge thanks to all the mentors and students who made this year so successful.

Bart Massey and Team, Portland State University

Dozen of one, half dozen of the other: the 6th Google Code-in and 12th Google Summer of Code are on!

Tuesday, October 13, 2015

Since 2005, our Open Source Programs Office has enabled 11,000+ students, ranging in age from 13 to 56, to explore open source software development. They’ve worked hands-on with over 515 projects across a variety of disciplines.

If you’re a student looking to learn new coding skills that can help make a difference, check out our upcoming programs: Google Code-in for students 13-17 and Google Summer of Code for university students.

Google Code-in - Program starts for students December 7, 2015

For the sixth year in a row, Google Code-in will give 13-17 year old pre-university students an opportunity to dive in and explore the world of open source. Students with many different skills -- coders and non-coders alike -- will find opportunities to learn by doing and earn prizes. It’s easy to get started: just choose an interesting task from our participating organizations’ lists and complete it under the guidance of a mentor.

Google Code-in is for students asking questions like:

What is open source?
What kinds of stuff do open source projects do?
How can I write real code when all I’ve done is a little classroom work?
Can I contribute even if I’m not really a coder?

With tasks in five different categories, there’s something to fit almost any student’s skills:

Code: writing or refactoring
Documentation/Training: creating/editing documents and helping others learn more
Outreach/research: community management, outreach/marketing, or studying problems and recommending solutions
Quality Assurance: testing and ensuring code is of high quality
User Interface: user experience research or user interface design and interaction

GCI 2014 Grand Prize Winners on the Google Campus

Over 2,200 students from 87 countries have taken part in Google Code-in, and we’re excited to welcome many more into this year’s edition. We’ll be announcing this year’s participating organizations on November 13th, so stay tuned.

Google Summer of Code - Student applications open on March 14, 2016

GSoC logos from the last 10 years

Google Summer of Code (GSoC) is an innovative program dedicated to introducing students from universities around the world to open source software development. The program offers student developers stipends to write code for a wide variety of carefully selected open source projects while under the guidance of mentors. Our goal is to help these students pursue academic challenges over the summer break while they create and release open source code for the benefit of all. Over the past 11 years, over 8,300 mentors and 8,500 student developers in 101 countries have produced a stunning 55 million lines of code.

500+ GSoC Students and Mentors

We’re proud to continue this tradition for another year: we’ll be welcoming another batch of students into Google Summer of Code 2016. We’ll be accepting applications from open source organizations in February and student applications from March 14 - 25, 2016 so it’s not too early to start thinking about proposals.

Spread the word to your friends and stay tuned for more details coming soon!

By Stephanie Taylor and Carol Smith, Open Source Programs Office

Google Summer of Code wrap-up: HPCC Systems

Friday, October 9, 2015

Our wrap-up post this Friday features HPCC Systems, another organization new to Google Summer of Code 2015. HPCC aims to solve big problems around big data. Read below to learn more.

HPCC Systems was designed to solve “big data” problems. It can process, analyze and find links and associations in high volumes of complex data at high speed and with incredible accuracy. While it was originally created by LexisNexis and is still used in-house, the HPCC Systems Project went open source four years ago. Free downloads of the software, documentation and training materials are available from our website.

This is the first time we participated in Google Summer of Code (GSoC) and it has been a great success. As a first-time organization, we were allocated two student slots. It was quite hard to choose which proposals to accept because there were many high quality contenders. We selected two projects that highlight areas of specific interest not just for us but for our community and the world of big data.

Add Statistics to the Linear and Logistic Regression Modules - Sarthak Jain
Machine learning statistics are important to the big data world, providing a way to drill down into data using complex queries and produce meaningful results to help businesses maintain their competitive edge in the market place. The HPCC Systems Machine Learning Library has been around for a while now and we are always looking for ways to improve it. The new statistics added as part of this project give vastly improved results about the models created.

Slide taken from Sarthak's presentation describing some of the tasks completed

The statistics Sarthak added provide metrics which indicate the “goodness” of the model created. He completed the tasks associated with these statistics in very good time and also added three stepwise functions to the same modules which find the best model by adding or taking away independent variables. A goodness metric was also added to these features to select which independent variables are added to or taken away from the model. The three functions he added were forward, backward and bidirectional.

Expand the HPCC Systems Visualization Framework (Web-Based) - Anmol Jagetia
Currently the HPCC Systems Platform has very little support for visual analytics. While there are plenty of “off the shelf” visual analytic tools and dashboard creators, none are really suitable for big data because they typically work with local datasets (think charting with a spreadsheet). The HPCC Systems Visualization Framework aims to solve the issue by bringing together existing “best of breed” visualizations as well as bespoke HPCC Systems visualizations into a consistent framework.

Anmol’s project involved adding unit tests and linting as well as adding new visualization widgets and enhancing existing ones. He used his knowledge and experience to enhance our build quality infrastructure and has also added a range of new features to the existing framework including the addition of a time lapse capability and a number of features which enable bar charts to be used as Gantt charts. The work he has done, which is already being used, significantly improves the user experience.

Below is an illustration of the work Anmol did to add range support in a column chart where there is both an upper and lower bound.

We’ve really enjoyed participating in GSoC this year and we will definitely apply to be accepted again next year. Our thanks go to the students for contributing to our project. We hope they enjoyed working with us.

By Lorraine Chapman, HPCC Systems Release Manager and GSoC Org Admin

Google Summer of Code wrap-up: Red Hen Lab

Friday, October 2, 2015

For our Google Summer of Code wrap-up this week we have The Distributed Little Red Hen Lab. A new organization for 2015, Red Hen Lab had three student projects. Read on to learn about the Lab and their effort to scan a huge repository of international television news programming.

The Distributed Little Red Hen Lab is an international consortium for research on multimodal communication. We develop open source tools for joint parsing of text, audio/speech and video, using datasets of various sorts, most centrally a very large dataset of international television news called the UCLA Library Broadcast NewsScape. Red Hen uses 100% open source software. In fact, not just the software but everything else—including recording nodes—is shared in the consortium.

The Red Hen archive is a huge repository of recordings of TV programming, processed in a range of ways to produce derived products useful for research, expanded daily, and supplemented by various sets of other recordings. Our challenge is to create tools that allow us to access audio, visual, and textual (closed-captioning) information in the corpus in various ways by creating abilities to search, parse and analyze the video files. However, as you can see, the archive is very large, so creating processes that can scan the entire dataset is time consuming, and often with a margin of error.

Our projects for Google Summer of Code 2015 (GSoC) challenged students to assist in a number of projects, including some that have successfully improved our ability to search, parse and extract information from the archive.

Ekateriana Ageeva - Multiword Expression Search and Tagging

Ekaterina built a multiword expressions toolkit (MWEtoolkit), which is a tool for detecting multi-word units (e.g. phrasal verbs or idiomatic expressions) in large corpora. The toolkit operates via command-line interface. To ease access and expand the toolkit's audience, Ekaterina developed a web-based interface, which builds on and extends the toolkit functionality.

The interface allows us to do the following:

Upload, manage, and share corpora
Create XML patterns which define constraints on multiword expressions
Search the corpora using the patterns
Filter search results by occurrence and frequency measures
Tag the corpora with obtained search results

The interface is built with Python/Django. It currently supports operations with corpora tagged with Stanford CoreNLP parser, with a possibility to extend to other formats supported by MWEtoolkit. The system uses part of speech and syntactic dependency information to find the expressions. Users may rely on various frequency metrics to obtain the most relevant search results.

Owen He - Automatic Speaker Recognition System

Owen used a reservoir computing method called conceptor together with the traditional Gaussian Mixture Models (GMM) to distinguish voices between different speakers. He also used a method proposed by Microsoft Research last year at the Interspeech Conference, which used a Deep Neural Network (DNN) and an Extreme Learning Machine (ELM) to recognize speech emotions. DNN was trained to extract segment-level (256 ms) features and ELM was trained to make decisions based on the statistics of these features on a utterance level.

Owen’s project focused on applying this to detect male and female speakers, specific speakers, and emotions by collecting training samples from different speakers and audio signals with different emotional features. He then preprocessed the audio signals and created the statistical models from the training dataset. Finally, he computed the combined evidence in real time and tuned the apertures for the conceptors so that the optimal classification performance could be reached. You can check out the summary of results on GitHub.

Vasanth Kalingeri - Commercial detection system

Vasanth built a system for detecting commercials in television programs from any country and in any language. The system detects the location and the content of ads in any stream of video, regardless of the content being broadcast and other transmission noise in the video. In tests, the system achieved 100% detection of commercials. An online interface was built along with the system to allow regular inspection and maintenance.

Initially the user uses a set of hand tagged commercials. The system detects this set of commercials in the TV segment. On detecting these commercials, it divides the entire broadcast into blocks. Each of these blocks can be viewed and tagged as commercials by the user. There is a set of 60 hand labelled commercials for one to work with. This process takes about 10-30min for a 1hr TV segment, depending on the number of commercials that have to be tagged.

When the database has an appreciable amount of commercials (usually around 30 per channel) we can use it to recognize commercials in any unknown TV segment. On making changes to the web interface, the system updates its db with new/edited commercials. This web interface can be used for viewing the detected commercials as well. For more information see Vasanth’s summary of results.

By Patricia Wayne, UCLA Communication Studies