Posts from 2009

Merry Music: MusicBrainz's Latest Summit and 10th Anniversary

Tuesday, December 22, 2009

The yearly MusicBrainz summit serves an important function in building our community: we talk about issues facing MusicBrainz and we plan the road map for MusicBrainz projects. The summits are usually scheduled to allow as many people to attend as possible and this year we chose Nürnberg, Germany as our location. MusicBrainz contributor Nikolai "Pronik" Prokoschenko lives in Nürnberg and was our local contract and ended up planning most of the summit.

Pronik found us a conference room that we rented for the entire day, complete with open WiFi, which is important if you plan to have a room full of geeks. He also found us a cheap Gasthof that provided lodgings slightly better than a Hostel for a mere 20€ per person per night — a really good deal for Europe. The evening before the summit we all sat in the Gasthof and were treated to some confusing German/Greek cuisine with some of the most rude service any of us have ever encountered. But, our group is used to dealing with the crude Internet public, so we managed to laugh off the horrible service and still have a great time.

To our luck there was a grocery store right next door to our Gasthof and we commenced another successful crowd sourced breakfast. Four people were each given 20€ with the instructions to buy food/drinks that they would like to eat/drink for breakfast/lunch. No collusion was allowed between people! Once the shopping was complete we walked to the conference room, settled in and dove into the masses of food we'd collected. Many tasty bread rolls with jam, nutella, cold cuts and cheese were consumed. Of course we had fun things like a case of Bionade, juices, tea, gummy bears and chocolate. Crowd sourcing breakfast takes a potentially frustrating chore and makes it fun for everyone.

Plus, Pronik and his mate Kira brought a MusicBrainz decorated cake to celebrate 10 years of MusicBrainz!

As people were eating, we started to collect an unconference-like agenda of what people wanted to talk about. We decided to have a detailed state of the project talk including recent developments from meeting our customers in Europe. We also talked about current development processes and some of the problems associated with these processes. Oliver Charles, a 2008 Google Summer of Code™ student, gave an introduction on how to hack on the MusicBrainz server, based on his work from the last year.

Most of the time was spent discussing new features for once we release our much anticipated Next Generation Schema. At times we managed to get into deep philosophical discussions about what MusicBrainz is and what it should be. At other times we discussed light hearted topics with lots of joking. These summits do wonders for building our community and getting people on the same page. We manage to explore many topics and reach consensus on many points in one day instead of spending weeks on the same discussions online.

Finally, in the evening we cleaned up our space and retired to a local beer hall where we continued the discussion in a less formal manner. If you're interested, we posted all the session notes from the summit on our wiki. All in all, this event was fun and not much effort to put on — thanks to Pronik! On another happy note, 1/3 of the people in attendance were women, which is much better than most tech summits I've attended.

In total we spent about $1500, including all the food, drinks, lodgings and one person's travel costs. For a summit with 12 people, I think we did rather well! I call that Google's support well spent — thanks again for supporting MusicBrainz, Google!

London Open Source Jam 15

Friday, December 18, 2009

On the 3rd of December we held the latest (and greatest) Google London Open Source Jam at our offices near Victoria. The Jam is a way to get like-minded Open Source contributors and users together and give them a chance to give a 5 minute talk on something dear to their hearts, all the while availing themselves of free beer and pizza!

This time's topic was the somewhat catchall: "the Web." Like always, the topic is more of a guide than a rule, so we had some pretty diverse talks.

Our very own Jon Skeet set the evening off to a good start by telling us all about Noda Time — a new Open Source library for handling dates and times in .NET, based on the Joda Time library for Java.

Simon Phillips on Google Wave

Simon Phillips is a consultant to the film business and gave a great presentation on how he uses Google Wave to help him work closely with directors, script writers, set designers and the like. He showed some great ideas for using Wave in this way and was canvassing for help in developing Open Source Wave robots to help this process.

Simon Stewart gave a rallying cry for making the web more accessible to the blind and deaf, especially in this modern era of HTML canvas and video tags. By ensuring your sites are accessible, you open them up to more users, and as a useful side effect you also make them more testable.

HTTP has started to show its age, and maybe it's time for a leaner, meaner protocol to come along. I took a brief break from my hosting duties to present a summary of SPDY, a project to develop a replacement protocol which will deliver data to our browsers faster.

Glyn Wintle Gets Comfortable

If you run a web site, you may have come to fear the "Slashdot effect" where you are linked from a popular website and get a spike of traffic. Glyn Wintle from the Open Rights Group (ORG) informed us that this is nothing compared to having a bunch of knitting forums link to you! His was a tale of Open Sourcing of knitting patterns and DMCA take-down notices. He also brought us up to speed on the latest from the ORG.

Sam Mbale gave us an update on his work bringing open source to Africa and told us all about BarCamp Lusaka which he'll be attending. We look forward to hearing how it went at another Jam.

Robert Rees gave us an experience report on using Velocity templates to divide responsibilities between engineers and web designers. It seems to work pretty well; contracts are enforced by unit tests, and designers know exactly what primitives they can use when laying out web pages.

Matt Savage on RESTful Acceptance Tests

Finally, Matt Savage talked about his ideas for RESTful acceptance tests, and Steven Goodwin gave us an update on his project to build a "Wallace and Gromit" house.

You can find more pictures of the event on Picasa Web Albums. To find out more about the Google London Open Source Jam, visit If you'd like to receive regular updates about future jams, sign up for our mailing list. We hope to see you at future jams!

Rocking the Grid: The Globus Alliance's Second Google Summer of Code

Tuesday, December 15, 2009

The Globus Alliance is a community of organizations and individuals developing fundamental technologies behind the "Grid," which lets people share computing power, databases, instruments, and other on-line tools securely across corporate, institutional, and geographic boundaries without sacrificing local autonomy. We first participated in Google Summer of Code™ in 2008 and we found the experience extremely productive both for the Globus Alliance and the individual mentors, so we wanted to confirm the value of the program for the students who took part. We contacted our eight students from last year to find out what impact Google Summer of Code had on their lives and careers. While many of our students still remembered the experience fondly, and said it was valued highly by prospective employers, there were two students who had particularly remarkable stories.

AliEn Grid Site Dynamic Deployment and Working at CERN

Last year, Artem Harutyunyan, mentored by Tim Freeman, developed a set of scripts on top of Globus Nimbus to dynamically deploy an entire AliEn Grid site (AliEn is the Grid infrastructure which is used by scientists participating in the ALICE experiment at CERN). His collaboration with the CERN and Globus Nimbus folks went beyond his Google Summer of Code work, and resulted in a new framework, called CernVM Co-Pilot, for execution of 'pilot' Grid jobs on cloud resources. His work is currently used in production to run Grid jobs from CERN'S ALICE experiment, and there are plans to extend it for the execution of ATLAS and LHCb jobs. Artem also co-authored two papers on his work: "Dynamic AliEn Grid Sites on Nimbus with CernVM" was presented at the 17th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2009) in Prague, and "Building a Volunteer Cloud", which includes a description of CernVM Co-Pilot, was presented during the Latin American Conference on High Performance Computing in Mérida, Venezuela.

Holder-of-Key Single Sign-On

Joana M. F. Trindade, mentored by Tom Scavo, spent last summer implementing a Holder-of-Key Single Sign-On profile handler for the Shibboleth Identity Provider in Globus GridShib. And, since then, things have just been getting better for her. Thanks to her outstanding summer work, she was offered an appointment as a Visiting Scholar at UIUC, where she worked on researching fault injection in virtual machines with Professor Ravi Iyer. After six months in that position, Joana was offered admission into the masters program at UIUC, where she is currently working with Professor Marianne Winslett. More importantly, Joana tells us that participating in Google Summer of Code gave her a renewed sense of confidence in her research abilities, having previously thought that her academic background was insufficient to gain admission into a top-tier university in the US. Joana tells us that "After Google Summer of Code, I regained that hope, and I must say I'm really happy to have found a topic in Globus to which I could contribute, and that in turn opened so many doors for me."

Congratulations Artem and Joana for all you have achieved!

Lessons Learned

Our first Google Summer of Code last year also had its fair share of challenges, including two students who didn't make it through the program, but it gave us the opportunity to learn a lot about how to mentor and manage summer students. We were fortunate to be selected again this year as a Google Summer of Code mentoring organization, which allowed us to apply everything we learned. First of all, we required students to provide more information about their background and the project they were proposing. Last year our student application form was essentially a blank form saying "Tell us about your project here," so this year we presented prospective students with more specific questions. We also decided to check in with our students more often which, at least in one case, allowed us to identify a problem between a student and a mentor early on, giving us time to deal with it constructively before the midterm.

In the end, applying what we learned during last year's Google Summer of Code and as well as the Mentor Summit had a noticeable effect. We were fortunate to be given ten students to mentor, and all ten students passed. Furthermore, our mentors report that practically all the code written by the students has either already been released or will be released soon. In fact, overall, we felt that this year's students rocked. Here's a summary of their summer work.

Going Beyond a Single Cluster

The Globus Nimbus cloud toolkit allows you to turn your cluster into an Infrastructure-as-a-Service (IaaS) cloud. However, it was mainly geared towards managing a single cluster. Not any more! Adam Bishop, mentored by Ian Gable, worked hard over the summer to add new components enabling multiple cluster support for Nimbus. He developed a series of production-quality plugins, which have already been committed to the Nimbus source repository, that publish the state of Nimbus cluster back to a Globus MDS Registry. This allows the availability of cloud resources across multiple Nimbus clusters to be gathered together into a single registry, which is the first step towards adding cross-cluster support to Nimbus.

Spilling Over Multiple Clusters

Another student, Jan-Philip Gehrcke, mentored by Kate Keahey, also spent the summer with his head in the clouds, but in a good way: he developed the Clobi project, a job scheduling system supporting virtual machines (VMs) in multiple IaaS clouds, with support for Globus Nimbus and Amazon EC2 clouds. In a nutshell, there are many scientific applications that are typically run as "jobs" on a compute cluster. Jan-Philip's project allows these jobs to be submitted to a cloud instead of to a traditional compute cluster. The most interesting use case is when a site operates a Globus Nimbus cloud and, during peaks in demand for computational capacity, extends its capacity momentarily by spilling the jobs over to a second (or third, or fourth, ...) cloud such as Amazon EC2. Although Clobi is not tied to any particular application (its design is generic and should be useful whenever it’s convenient to distribute jobs across different clouds), the motivating application for Clobi is ATLAS Computing (for the LHC's ATLAS experiment at CERN). In fact, by the end of the summer, Jan-Philip was able to run a common ATLAS Computing application (the so-called “full chain”) successfully with Clobi. If you want more details about Clobi, check out this blog post written by Jan-Philip.

Incremental GridFTP Transfers

Enough about clouds, let's move on to the exciting topic of data. Globus GridFTP is a high-performance, secure, reliable data transfer protocol that is pretty good at moving data. Fast. Of course, there's always someone who wants to go even faster, like Shruti Jain, mentored by Michael Link. Shruti took globus-url-copy, the GridFTP client, and added a 'sync' feature that allows a local and remote file to be synchronized, by sending only the changed sections of the file. This results in more effective bandwidth utilization by avoiding redundant data transfers.

Checksummed GridFTP Transfers

Remember Mattias Lidman? We certainly do. In last year's Google Summer of Code, he developed a compression driver for the Globus XIO input/output library (which GridFTP depends on) to compress/uncompress data as it passes through it. However, although moving data faster is all good and well, it's not worth much if it somehow gets corrupted in-flight. So this year, Mattias, mentored by Joseph Bester, continued to work on Globus XIO and developed a Checksum Driver. Mattias's driver checksums GridFTP data streams allowing both ends of a GridFTP transfer to verify the integrity of the data.

CQL Queries Builder

You know one really cool thing grids are used for? Cancer research. The Cancer Biomedical Informatics Grid, or caBIG®, is an information network enabling all constituencies in the cancer community – researchers, physicians, and patients – to share data and knowledge. caGrid is the underlying service-oriented infrastructure that supports caBIG, and it relies heavily on the Globus Toolkit. Some of the data services in this architecture use a query language called CQL that is, well... complicated. To make life easier for scientists, Monika Machunik, mentored by Wei Tan, wrote a plug-in for Taverna (an open source tool used by scientists to design and execute workflows) for constructing CQL queries, allowing scientists to focus on their work rather than on the intricacies of the CQL language.

GridWay-Google Maps Mashup

Grids require coordinating resources across multiple organizations, and the Globus GridWay meta-scheduler is a great tool to do just that. However, coordinating hundreds or even thousands of machines across dozens of sites can get a bit messy using the console-based tools included with GridWay. Carlos Martín, mentored by Alejandro Lorca, tackled this problem by creating an interactive GridWay-Google Maps mashup, allowing the administrators and users of a GridWay installation to get a quick snapshot of the status of multiple sites and the jobs running in them, as shown in this screenshot:

Carlos used the Google Web Toolkit to develop this application, which is totally decoupled from GridWay, making it easy to install it alongside existing installations of GridWay. In fact, you can download the GridWay+Google Maps application and check out its documentation, including more screenshots, at the application's page on the GridWay site.

GridWay GUI

Srinivasan Natarajan, mentored by Jose Luis Vazquez-Poletti, worked on a more administration-oriented GUI for GridWay, allowing users to compose, manage and control their jobs instead of using the command line interface. This GUI includes a host of other features, such as host and user monitoring, filtering account statistics and execution history information, and support for processing DAGMan workflows, including visualizing dependencies between jobs in the workflow.

Both of the GridWay projects were presented in several sessions, including one on nuclear fusion, at the EGEE'09 conference in Barcelona, Spain back in September.

GridFTP Benchmarking

How about we get back to the subject of data management? The recent addition of UDT (UDP Data Transfer) support to GridFTP has made even faster transfer speeds possible. You guessed it: here's another student who couldn't resist the need for speed this summer. Jamie Schwettmann, mentored by Raj Kettimuthu, sought to characterize the performance of GridFTP over 10Gb/s networks, specifically to measure the speed increase given by UDT as compared to TCP transfers, as well as a number of other considerations such as CPU and memory overhead at both ends of the transfer. In doing so, they decided to develop an automated GridFTP benchmarking and throughput optimization utility called globus-transfer-test, which takes URL pairs from a list or on the command line, and allows for varying input parameters such as parallelism level, transfer type (memory-to-memory, disk-to-disk, etc), TCP Buffer Sizes, MTU sizes, and all other standard globus-url-copy options (except multicasting) and when possible, compares with other performance and throughput utilities such as iperf or scp. Designed for general use by users or administrators as well as to carry out our performance characterization, globus-transfer-test aims to provide enough information to optimize GridFTP options for maximizing throughput between grid sites. This common need has allowed collaboration with many other projects and organizations in the course of development and testing, including the US ATLAS Project, TeraGrid, and OSCER. Jamie even presented a poster on her project at the 2009 Oklahoma Supercomputing Symposium.

AJAX Framework for Globus Web Services

Many of the components in Globus are web services, which are not exactly human-readable creatures. Fugang Wang, mentored by Tom Howe, developed a JavaScript API that enables accessing Globus services from a web client using AJAX. Fugang's framework, which includes a backend service that mediates service requests to the Globus toolkit and an AJAX web client to access this services, makes life easier for Globus developers and users by allowing them to interact with Globus services from the comfort of their web browsers.

Secure Cloud Communications

And we'll end with the ever-popular subject of data management. Melissa Weaver, mentored by John Bresnahan, developed a PSK driver for Globus XIO. She first developed a program that, using OpenSSL libraries to encrypt and decrypt data using a stream or block cipher of the user's choice, allowed her to experiment with different lengths of keys and initialization vectors and different file sizes to make performance measurements. Then, she developed the XIO PSK driver itself, which used the results of the first program to implement an RC2 block cipher to ensure any communication between computers, once a connection has been set up, is secure.

High energy physics experiments at CERN! Cancer research! Nuclear fusion! Cloud computing! Fast data transfers! Oh my! Oodles of congratulations to our mentors and students for all their hard work and for making this such an awesome Google Summer of Code for the Globus Alliance!

Introducing namebench

Friday, December 11, 2009

Slow DNS servers can make for a terrible web browsing experience, but knowing which one to use isn't easy. namebench is a new open source tool that helps to take the guess-work out of the DNS server selection process. namebench benchmarks available DNS services and provides a personalized comparison to show you which name servers perform the best. As a System Administrator at Google, I was curious about measuring how BGP route selection affected the performance of Google Public DNS. This curiosity resulted in writing a small benchmarking script, which was further developed during my 20% time to become a full-featured application for Windows, Linux, and Mac OS X.

namebench is covered by the Apache 2.0 license, and was made possible by using several other great open-source tools including Python, Tkinter, PyObjC, dnspython, jinja2 and graphy. It also makes use of the Google Chart API to visualize the results:

In order to provide the most relevant results, namebench employs a number of interesting techniques. First, it personalizes the benchmark by making use of your browser history to see what hosts to benchmark with. It also determines cache-sharing relationships between different IP's and removes the slowest of these servers to avoid improperly benchmarking them solely on cached results. namebench will also report on DNS misbehavior such as DNS hijacking and censorship.

namebench 1.0 is available for download now. If you would like to discuss or have any questions namebench, please join the namebench mailing list. Happy hacking!

Joomla! Google Summer of Code™ 2009: Lots to Shout About

Thursday, December 3, 2009

The Joomla! project was thrilled to sponsor 18 Google Summer of Code students for 2009, and we are pleased to report that 16 (89%) successfully completed their projects. Most of the projects were based on ideas generated by the Joomla! community, and our community seems to be very excited about the results.

Our two primary goals for Google Summer of Code 2009 are to (1) develop relationships with student developers that will encourage them to continue working in the project; and (2) add features and functionality to the Joomla! CMS. Our participation in Google Summer of Code 2009 was very successful on both fronts.

Relationship to the Project

Several of our students this year were already contributing to Joomla! prior to participating in the program, and the Google Summer of Code experience has only strengthened that relationship. For example, one of our students, in addition to completing his project, is now a leader in the release of the next Joomla! version. At least two students (so far) have officially joined project working groups, and several others have contributed to the project over and above their Google Summer of Code projects. Many other students have also expressed interest in continuing the development of their code beyond the program timeframe.

This year, at the end of the term, we gave each student the opportunity to present a webinar where they could demonstrate their project to the community. Even though it was a lot of extra work, more than half the students did this. The results were excellent, and the students did really good, concise, focused presentations. We recorded and linked to the webinars on our site so that anyone in the community who is interested in the Google Summer of Code work can simply watch a short webinar to see an actual demonstration of the projects.

Using the Code

There are three ways the code from Google Summer of Code projects can be used within the Joomla! CMS. In some cases, some or all of the code will be incorporated directly into the core codebase for the upcoming Joomla! version 1.6. In other cases, the code has been published as an extension that can be downloaded and used by any Joomla! user on their website. The third method is that the code will be used as a basis for further work.

Some students have combined two of the methods above, for example, producing an extension for the current version 1.5 and making the code available for the core in our version 1.6.

More Information

We invite you to visit our Joomla! Community site for more information about the different projects and what was accomplished, and to download the code.

Etherboot Project GSoC 2009 Report

Wednesday, December 2, 2009

The Etherboot Project is very pleased to have participated in Google Summer of Code™ 2009. This summer marks our fourth consecutive annual participation in this excellent mentoring program.

Google generously sponsored five students to work with us, and four of our five students (80%) successfully completed their projects. We would like to thank Google, our mentors, and our students for making this a pleasant, productive, and memorable summer.

We particularly wish thank one of our mentors, Stefan Hajnoczi, who was a GSoC student with us last year. His insights and diligence are extremely helpful and enlightening.

Although all of our GSoC projects were not successfully completed, our students' work was generally of excellent quality, and we sincerely thank them all for their diligence. Our participation in GSoC has strengthened our project by encouraging us to create additional technical and social infrastructure. These improved facilities make it easier for new people to become involved with our project and also help us better support and communicate with our existing community.

We look forward to giving future GSoC students and other interested and motivated people a positive introduction to FOSS development. What follows is a brief summary of our 2009 students' work with links to their full project pages. We conclude with a brief outline of our mentoring system that we hope may be helpful to other projects.

Student Project Summaries

Daniel Verkamp
Daniel implemented an automated regression testing framework to help us consistently deliver high-quality releases.

Joshua Oreman
Joshua extended gPXE, our network bootloader, with an 802.11 wireless stack, and added drivers for two wireless cards.

Lynus Vaz
Lynus extended gPXE scripting with a more powerful language that is capable of expressing advanced boot policies.

Pravin Shinde
Pravin created a central resource to network boot operating systems, diagnostic tools, and utilities at

Chris Kluka
Chris worked on adding a network driver DLink DGE-530T ethernet cards. Though unable to complete his project, he compiled and created useful information which will facilitate future work on this driver.

Our Mentoring System

Over our years of GSoC participation we have developed and refined a system for mentoring that works quite well for us. One of the most important attributes of our system is that we break the twelve week GSoC coding period into twelve one week evaluation periods. By doing this we ensure that we always have recent information on how each of our students is doing, which allows us to intervene in a timely fashion when needed.

Here are some of the other ways we structure our GSoC participation:

* We mentor as a team. We have a mailing list and private IRC channel specifically for mentors.

* Our mentoring team interviews all qualified applicants in a private IRC channel. Multiple perspectives have proven very helpful in identifying excellent candidates. Mentors communicate among themselves during interviews in a second private IRC channel.

* We request code samples from all of our applicants to get a sense of their proficiency and coding style.

* We present real-time coding exercises during our IRC interviews with applicants, and ask them questions about their proposed solutions, and also about their code samples.

* We inform our students of our team mentoring approach and encourage them to send general questions to the mentors mailing list.

* We require our selected students to have IRC access and to define "work hours" where they will be online and available on our main project IRC channel (#etherboot on We have found that this requirement encourages them to interact with our project community as well as their primary mentor.

* We use any and all available means to communicate directly with our students, including IRC, email, phone, VOIP, and IM. It is important to discover what works best to promote effective, open communication between students and mentors.

* We require our students to maintain a set of project pages, which include their:
-Project Plan
-Journal (broken into twelve weeks)
-git repository link

* Our mentors meet weekly with each student in a private IRC channel to review their project pages and generally discuss their progress. We have found these meetings to be very beneficial to both students and mentors.

* We make the steady progress and ultimate success of each of our student projects central to our mentoring goals. We meet as mentors to discuss how we can help each student succeed, and we discuss our formal GSoC evaluations as a team.

* We base our project's success on the quality of our code and the health of our community, and we work continuously to improve as programmers and as people.

The Apertium Project's First Google Summer of Code

Friday, November 27, 2009

The Apertium Project works on open-source machine translation and language technology. We try to focus our efforts on lesser-resourced and marginalized languages, but also work with larger languages. To date, we have released translators for 21 language pairs, covering languages spoken by 1.1 billion people, ranging from English (est. 500m speakers) to Aranese (est. 4,000 speakers). A similar number of additional language pairs are in development. The Apertium software is licensed under the GPL, but in addition (a rarer situation in the machine translation field) so is the data for all these language pairs. This means that the data can be re-used by other language projects (e.g. in developing spelling or grammar checkers, thesauri, etc).

This was our first year in Google Summer of Code and we were very fortunate to receive nine student slots. We filled them with some great students and are pleased to report that out of the nine projects, eight were successful.

The completed project were:

A translator for Norwegian Bokmål (nb) and Norwegian Nynorsk (nn)

This project was accepted as part of our "adopt a language pair" idea from our ideas page. Some work had already been done on the translator but it was a long way from finished. Kevin Unhammer from the University of Bergen was mentored by Trond Trosterud from the University of Tromsø. The final result, after an epic effort, is a working translator (and the first free software translator for nb-nn) that makes a mistake in only 11 words out of every 100 translated, making using the system for post-edition feasible.

One of the key aspects of Kevin's work was the re-use and adaptation of existing open source resources. Much of the bilingual dictionary was statistically inferred from the existing translations in KDE, using ReTraTos and GIZA++ (created by Franz Och). In addition to this, Kevin used the Oslo-Bergen Constraint Grammer, contributing fixes not only to that, but to the VISL CG3 software itself. After the GSoC deadline, Kevin has continued his work, including incorporating some changes from feedback from the Nynorsk Wikipedia.

A translator for Swedish (sv) to Danish (da)

Another language pair adoption, Michael Kristensen, who had previously done some work on this translator, was mentored by Jacob Nordfalk, the author of our English to Esperanto translator. As there are very few free linguistic resources for Swedish and Danish the work was pretty much started from scratch, although we took great advantage of the Swedish Wiktionary. The translator is only unidirectional, from Swedish to Danish, and it has an error rate of around 20%.

The completion of this translator is something of a triumph for Apertium. Begun back in 2005, the project had been neglected for many years. This was the first translator for the Apertium platform that focused on non-Romance languages.

Multi-engine machine translation (MEMT)

Gabriel Synnaeve was mentored by Francis Tyers to work on a module to improve the quality of machine translation by taking translations from different systems and merging their strengths and discarding their weaknesses. The two systems focused on in the initial prototype are Apertium (rule-based MT) and Moses (statistical MT) but it can easily be extended to more. The idea behind the system is that for some languages there is often not one MT system which is better than all others, but some are better at some phrases and some are better at others. Thus, if we can combine the output of two or more systems with different strengths/weaknesses, we can make better translations.

Perhaps the most exciting aspect of the MEMT project is its potential for use as a research platform for future work on hybrid machine translation, by allowing the researcher to focus only on the algorithms they wish to implement. During the project, Gabriel was joined by Francis in person for a 'mini-hackathon', which, despite something of a farcical start involving requests made on IRC for phone calls across Europe on behalf of two people who were in the same city, lead to a greater degree of functionality and modularization in the code.

Highly scalable web service architecture for Apertium

Víctor Manuel Sánchez Cartagena
worked with mentor Juan Antonio Perez-Ortiz on a highly-scalable web service architecture, or, Apertium for Cloud computing. Initially targeting Amazon's EC2, as well as standalone servers, the scalable web service allows the use of multiple translation services on multiple physical or virtual servers, scaling to meet the translation demands of users, from a single user-facing service, which implements the Google Language API.

The core of the system is the translation router, which controls the flow between user and translation server, based on a variety of factors, including the availability of the language pair, the current load on the server, as well as providing a framework to allow these factors to have different priorities on a per-user basis. It also takes into account the cost of each translation request. The project is a complete package; as well as the router, it includes a translation daemon, and convenience scripts to ease the rollout of server instances.

In addition to his work on his project, Víctor is also serving as an organiser for the FreeRBMT workshop.

Conversion of Anubadok

Abu Zaher was mentored by Kevin Donnelly and Francis Tyers to convert Anubadok, an open-source MT system for English to Bengali to work with the Apertium engine. This was an ambitious project and not all of the goals were realised, but we were able to make the first wide-coverage morphological analyser / generator for Bengali and a substantial amount of lexical transfer, so the project was a great success.

Zaher is also looking at improving the Ankur spell checker with information from his analyser / generator, so the work done is already being reused; there is also interest in using the data to create a Bengali stemmer, for more efficient searching/indexing of Bengali texts, and a number of tools which were created to model the various aspects of Bengali inflection will certainly prove useful in other areas of NLP for Bengali.

Apertium going SOA

Pasquale Minervini's work was motivated by the needs of Informatici senza Frontiere to have a translation engine that would fit into a Service-Oriented architecture. To this end, Pasquale, mentored by Jimmy O'Regan, designed an XML-RPC-based server that efficiently contains the Apertium pipeline, and layered it with JSON (still under development), SOAP, and CORBA services, which, as well as making Apertium more buzzword compliant, gives a greater range of options to programmers wishing to integrate Apertiums translation services into a wider range of architectures. This is undoubtedly a popular project idea: Alexa's keywords for Apertium show 'apertium going soa' and 'deadbeef apertium' (deadbeef is Pasquale's IRC nick) in 2nd and 4th place for search keywords leading to Apertium.

Because of the potential overlap between their projects, in the first weeks of their GSoC work, Pasquale and Víctor agreed on the Google Language API as a standard for their projects to communicate; Pasquale took this agreement one step further by implementing the 'language detection' feature of the API - something previously unavailable in Apertium. In addition to that, Pasquale also contributed memory leak checks against the Apertium platform, as well as other fixes, and has helped another (non-GSoC) student in the goal of porting Apertium to Windows.

Trigram part-of-speech tagging

Zaid Md. Abdul Wahab Sheikh
was mentored by Felipe Sánchez Martínez to improve our part-of-speech tagging module to use trigrams instead of bigrams, as well as implementing changes to the training tools to create data for it.

Apertium was originally designed for closely related languages, but is growing to meet the challenges of translating between more distant languages. One of the unique aspects of Dr. Sanchez's work on Part-of-Speech tagging is the use of target language information which allows an accurate tagger to be trained using much less data than usual. Zaid's work builds on Dr. Sanchez's work with first-order Hidden Markov Models, extending it to second-order HMMs, similarly to TnT. This enables more accurate translation between more distant languages, using the same methods, so that the rest of the Apertium system can continue to grow.

Java port of lttoolbox

Raphaël Laurent worked with Sergio Ortiz Rojas to port lttoolbox to Java. lttoolbox is the core component of the Apertium system; as well as providing morphological analysis and generation, it also provides pattern matching and dictionary lookup to the rest of Apertium, so a Java port is the first step towards a version of Apertium for Java-based devices. Raphaël finished an earlier line-for-line port contributed by Nic Cotrell, first making it work; then making it binary compatible.

As it stands currently, lttoolbox-java can be integrated into other Java-based tools, facilitating the re-use of our software and our extensive repository of morphological analysers. Tools such as LanguageTool, the open source proofreading tool, also make extensive use of morphological analysis, but OmegaT, the open source CAT tool, could use it for dictionary look-up of inflected words; it could even be used with our own apertium-morph tool: a plugin for Lucene that allows linguistically-rich document indexing.


On the 2nd and 3rd of November, we held the first FreeRBMT workshop, which was heavily inspired by the Google Summer of Code program, both as a way for students and mentors to meet in person, and to provide the students with an opportunity to present peer-reviewed papers about the work they completed during the program. The entire proceedings are available from the University of Alicante; in particular, we would like to highlight the papers which were successfully presented by the students who took part in GSoC:

Apertium goes SOA: an efficient and scalable service based on the Apertium rule-based machine translation platform; Minervini, Pasquale

Development of a morphological analyser for Bengali; Faridee, Abu Zaher Md.; Tyers, Francis M.

An open-source highly scalable web service architecture for the Apertium machine translation engine
; Sánchez-Cartagena, Víctor M.; Pérez-Ortiz, Juan Antonio

Reuse of free resources in machine translation between Nynorsk and Bokmål; Unhammer, Kevin; Trosterud, Trond

A trigram part-of-speech tagger for the Apertium free/open-source machine translation platform
; Sheikh, Zaid Md Abdul Wahab; Sánchez-Martínez, Felipe

In addition, the following paper was presented by the mentors of a successful project (Michael, the student, was unfortunately too busy to participate in its writing):

Shallow-transfer rule-based machine translation for Swedish to Danish; Tyers, Francis M.; Nordfalk, Jacob

We would like to thank Google for providing us with the opportunity to participate in the Summer of Code program; in particular, Leslie, Cat, and Ellen, for making it run so smoothly. We would also like to make special mention of two students: Ankitha Rao and Daniel Beck, who, despite being unsuccessful in their applications, continued to work on their proposed projects (an English to Hindi translator, and a module for multi-word units, respectively). Finally, we would like to thank all of the students, mentors, and administrators who contributed their time and skill to Apertium.

SWIG's Second Summer of Code

Monday, November 23, 2009

SWIG is a programmer's tool designed to make it easier to use C and C++ code from other popular programming languages such as Python, Perl, Ruby, PHP, Java, and C#. 2009 was SWIG's second Summer of Code, and this year we mentored five projects related to SWIG. All five students were very active over the summer period and produced some great new features. In no particular order:

Matevž Jekovec has been busy working at the coal face of SWIG to add support for C++0x, the forthcoming C++ standard. Matevž has managed to achieve close to full support for C++0x. The C++0x Wikipaedia article details the numerous planned new features and Matevž has put together a SWIG C++0x page documenting the new SWIG support for each of these. In summary the enhanced C++ language can now be parsed by SWIG, which in itself is a great step. There is much more than just this though, as most of the information parsed is used to create useful wrappers of C++0x code. The work can be tried out on the C++0x branch which should be merged fairly soon into a forthcoming release.

Miklos Vajna has been working on SWIG's PHP support to implement an advanced SWIG feature already supported for most other target languages, but not PHP. The feature is called "directors" and allows cross-language polymorphism - wrapped C++ classes can be subclassed in PHP and virtual method calls work in the natural way, whether they're made from PHP or C++ code. You can read more in the new PHP Director documentation. Miklos made such great progress that we were able to merge this support into SWIG 1.3.40, which was released even before the Summer of Code finished. Miklos also spent some time working on improving SWIG's test suite for PHP, and fixing bugs in the PHP support.

Ashish Sharma spent the summer adding support for Objective-C as a new target language. Objective-C is a major language on the Mac OS X platform. This means that now SWIG can be used to generate Objective-C wrappers over C++ code. In particular the wrappers include proxy classes, which preserve the class hierarchy from the C++ code. Ultimately this means that from the user's perspective, proxy objects look no different to objects originally written in Objective-C. Adding a new target language is quite a considerable task and Ashish is keen to add plenty more improvements over the coming months. Ashish's work is in Subversion and can be accessed in the ashishs99 branch.

Baozeng Ding has also added a new target language, in this case for the Scilab language, a free numerical computing package. He has coded up support for all the C features: variables, functions, constants, enums, structs, unions, pointers and arrays and also intends to develop it further in the near future. Documentation for SWIG and Scilab can be viewed online direct from Baozeng's Subversion branch.

Kosei Moriyama has been working on Perl bindings for the Xapian library using SWIG, to replace some existing bindings implemented by hand. He's achieved almost complete compatibility with the API of the existing bindings (the only real omission is callbacks which are waiting for completion of director support for Perl in SWIG). He has also wrapped features which weren't previously accessible from Perl. You can view Kosei's work online in his Subversion branch.

Finally, many thanks to Google for sponsoring the Summer of Code and a special thanks for all the hard work done by the students, mentors and Olly Betts, the co-administrator.

Chromium OS Now Open Sourced

Wednesday, November 18, 2009

In July we announced that we were working on a project called Google Chrome OS, an open source operating system based on the Google Chrome browser and built for today's web. For the past few months we have been working hard on developing a solid foundation and today we are excited to announce the Chromium OS open source project.

You can read more about our open source announcement at the Chromium Blog, or get involved directly at We look forward to working with the open source community to help shape the future of personal computing.

Hey! Ho! Let's Go!

Tuesday, November 10, 2009

Here at Google, we believe programming should be fast, productive, and most importantly, fun. That's why we're excited to open source an experimental new language called Go. Go combines the development speed of working in a dynamic language like Python with the performance and safety of a compiled language like C or C++. Typical builds feel instantaneous; even large binaries compile in just a few seconds. And the compiled code runs close to the speed of C. Go lets you move fast.

Go is a great language for systems programming with support for multi-processing, a fresh and lightweight take on object-oriented design, plus some cool features like true closures and reflection.

Want to write a server with thousands of communicating threads? Want to spend less time reading blogs while waiting for builds? Feel like whipping up a prototype of your latest idea? Go is the way to go! Check out the video for more information or visit

London Open Source Jam 14

Wednesday, November 4, 2009

We held the 14th Google London Open Source Jam at our Victoria HQ on September 24th. The topic this time was "Video and Sound", and our Jammers had some real treats to share.

Steven Goodwin told us how his open source SGX 3D graphics engine deals with three key problems of other computer game engines. On a similar theme, Themis Bourdenas discussed the vine engine, a modular game engine for 2d and 3d games.

Borys Musielak presented Filmaster, an open source film recommendation engine. Neil Harris told us about an attempt by the Kendra Initiative to foster a common meta data format for content discovery on the semantic web.

In an Open Source Jam first, Jagannathan gave a performance of his Din software musical instrument. Din is designed for playing live Indian music, is based on Bezier curves and really has to be heard to be fully appreciated.

Sam Mbale gave us an update on his projects to help Africans build online communities using open source. Mike Mahemoff discussed some web tools frameworks for intranets, bookmarklets and trails in Scrumptious.

The UK government has plans to introduce a law to allow content-owners to force ISPs to disconnect the internet connection of users suspected of file sharing, without any proof. Glyn Wintle gave us an overview of how the proposed law will affect us, how the Open Rights Group is campaigning against it, and how we can help.

Douglas Squirrel talked about the difficulty blind people have in finding information on websites, and presented - a new project to reformat the web in a screen-reader friendly way. He also demoed a prototype telephone interface to the service.

Much pizza was eaten and free beer drunk, and we all ended up in the pub next door to continue our discussions. A big thank you to all our speakers and attendees, and we hope to see you at the next Jam!

Google Summer of Code Mentor Summit 2009

Friday, October 30, 2009

This past weekend, approximately 250 Open Source developers from around the world gathered at Google's headquarters in Mountain View, CA for the fourth Google Summer of Code™ Mentor Summit. These developers who mentored students in this year's Google Summer of Code program gathered "unconference" style to discuss ways to improve the program, share their experiences, and learn about each other's projects.

One of the recurring comments about what makes the Mentor Summit special was that it gathers developers from a diverse range of projects (all 150 organizations participating in this year's Google Summer of Code were invited to send two delegates). This allowed for a cross pollination of ideas that isn't usually found at conferences dedicated to one specific platform or language. In addition, the summit was an opportunity for developers who usually collaborate online to meet face to face. In fact, some of our attendees met colleagues they had been working with for several years in person for the first time at the summit!

Most of all, the summit was a great place to meet like minded Open Source developers who are passionate about bringing in new contributors to their communities. Check out photos from the event or read through the session notes to find out more about what happened at this year's summit.

Fall at the OSPO

Friday, October 23, 2009

The leaves are turning here in Mountain View, but they are not the only ones blazing away. It's a busy time of year for open source for Google, with lots of talks and events going on.

- Ben Collins-Sussman and Brian (Fitz) Fitzpatrick gave their "Myth of the Genius Programmer" talk as part of the Opening sessions at "Reflections / Projections", the 15th ACM@UIUC Student Computing Conference at the University of Illinois Urbana-Champaign.

- They were joined by Googler and Python maintainer Alex Martelli, who spoke on "Python and the Programmer".

- Chris DiBona, head of the Open Source Programs Office at Google gave a keynote at AstriCon in Glendale, Arizona.

- Earlier this week Leslie Hawthorn, manager of the Google Summer of Code program, was part of the amazing team that completed a new "Manual on GSoC Mentoring" in 2, count them, 2 DAYS, finishing up late last night. You will hear more about this feat in a later post after the...

- Google Summer of Code Mentor Summit 2009, being held in Mountain View this weekend, October 24th and 25th. This invitation-only gathering of mentors from each of the participating mentoring organizations in this year's GSoC gives the projects a chance to come together to compare notes on the mentoring process and cross-pollinate their projects. A good time promises to be had by all, and a full report will be forthcoming.

Coming up:
- Jonathan Blocksom will be speaking on Google App Engine and the All For Good project at the DC edition of Stack Overflow Dev Days, October 26th.

- On October 4th the LISA Conference in Baltimore, Maryland will feature a talk by Daniel Berlin and Joe Gregorio on the Google Wave Federation Protocol, the underlying open network protocol for sharing waves between wave providers. Interested attendees of LISA will be able to sign up for a developers Wave Sandbox Account. They will also have a chance to win Googley prizes at the Google Birds of a Feather session the next evening, hosted by Cat Allman and Tom Limoncelli.

TalkBack: An Open Source Screenreader For Android

Tuesday, October 20, 2009

Earlier this year, we blogged about project Eyes-Free — a collection of Android applications that enable efficient eyes-free interaction with your mobile phone. Since then, one of the questions we have received most often is about a complete access API to enable general purpose adaptive technologies such as screenreaders.

We are happy to announce the first version of such an API as part of the latest Android release (Donut). This new API is now available within the Android 1.6 SDK , and we welcome developer feedback. The Android Access framework generates android.view.accessibility.AccessibilityEvent
in response to user interaction events; the event payload contains additional details about the event, e.g., the user interface control that received focus. This access framework enables the creation of general purpose screenreading applications that make all of Android's user interface, as well as native Android applications built with standard Android widgets usable without looking at the screen.

You can see this API in use within our Open Source Android screenreader TalkBack. With TalkBack installed, standard Android user interface elements such as ListView produce spoken feedback during user interaction. Applications SoundBack (for producing non-spoken auditory feedback) and KickBack (for producing haptic feedback) generate additional augmentative output and demonstrate how multiple access applications can be active simultaneously.

in response to user interaction events; the event payload contains additional details about the event, e.g., the user interface control that received focus. This access framework enables the creation of general purpose screenreading applications that make all of Android's user interface, as well as native Android applications built with standard Android widgets usable without looking at the screen.

What This Means For Developers

If you are interested in developing innovative access solutions on Android and have been eagerly waiting for our access APIs, the Donut SDK contains what you have been waiting for — including a set of free voices for English (US and UK), French, Italian, German and Spanish. You can use TalkBack, SoundBack and KickBack as a starting point for designing your own access innovations.

If you are an Android developer interested in making your applications more widely usable, you can use TalkBack and friends to quickly verify whether your applications remain usable when not looking at the screen. In this context, here are a few coding tips to ensure that your applications work out of the box with these tools:
  1. Ensure that all visually drawn UI controls have meaningful textual labels.
  2. Ensure that users can navigate to all controls in your application using the trackball.
  3. Ensure that navigating controls in your application with the trackball results in a meaningful traversal order.
What This Means For End Users

End-users of Android 1.6 (Donut) can enable TalkBack, SoundBack and KickBack via the Accessibility section of the Settings menu. You need to do this only once i.e., once enabled, these access applications remain active across restarts. Note that depending on your Android device, you may need to install these applications from the Android Market; we will post videos that demonstrate step-by-step instructions for specific Android devices in the Eyes-Free channel on YouTube.

Providing Feedback

We (T. V. Raman, Charles L Chen, and Svetoslav Ganov) will be continuously improving the underlying APIs and access tools, and we look forward to your questions and feedback on the Android Developers Group.

Boldly Talking Python in Boulder

Friday, October 16, 2009

On Saturday, October 10, the Front Range Pythoneers had a Python "unconference" at the Google facilities in Boulder, Colorado, USA. An "Unconference" is a conference organized around the principles ofopen space technologies, which tries to provide many of the benefits of traditional conferences without the associated ceremony. We still got to enjoy some delicious pizza, though.

Introducing the Pycon Boulder Attendees to Principles of Open Space

Photo Credit: Matt Boersma

It was unseasonably snowy and cold Saturday morning, but in spite of the weather, almost everybody that signed up in advance was there, along with a few last-minute registrants. We had nearly 40 attendees join us for 15+ sessions, plus the always loved "hallway track." Many thanks to the three Googlers who came out to shepherd our group and facilitate the meeting.

You can find more information about the event and sessions on our wiki, Tweets about the event and this great post-conference write up. You can also check out some more photos of the participants and our scheduling process. We discussed the following topics, among others:
The best surprise of the event? Bruce Eckel, the author of Thinking in Java, was among the participants. Thanks again to Google for hosting the unconference; it worked really well for our purposes. The Google Boulder facility is gorgeous.

Fighting Bad Memories: The Stressful Application Test

Thursday, October 15, 2009

We've just released Stressful Application Test (or stressapptest), a hardware test used here at Google to test a large number of components in a machine. The test tries to maximize random traffic to memory from processor and disks with the intent of creating a realistic high load situation. The source code is available under the Apache license.

stressapptest may be used for various purposes:
  • Stress test for machines.

  • Hardware qualification and debugging.

  • Memory interface test.

  • Disk testing.

The stressapptest team (from left to right): Matthew Blecker, John Huang, Raphael Menderico, Nick Sanders, John Hawley and James Vera

Photo credit: Taral Joglekar

stressapptest is a user space test, primarily composed of threads doing memory copies and direct I/O disk read/write. Since many hardware issues reproduce infrequently, or only under corner cases, the idea behind the test is that by maximizing bus and memory traffic, the number of transactions is increased, and therefore the probability of failing a transaction is increased. It loads the memory with specially-designed patterns that cause the signal lines to rapidly switch between 1 and 0, drawing the maximum amount of power and cause maximal noise on the nearby voltage rails. Noise on voltage rails and coupling with other nearby lines is likely to cause signaling problems on marginal lines. Also, given a probability of any signal level transition failing, these patterns have the most memory transitions per period of time, and are thus more likely to exhibit a failure.

This test was designed to test all memory available on a machine, which is not guaranteed with the execution of a CPU-intensive application (for instance, compiling the kernel on multiple threads). Moreover, it is focused on testing the memory interface and connections, not the memory internally, like memtest86. As a consequence, Stressful Application Test will detect errors not detected by regular memory tests or extended executions. A comparison with some other memory reliability tests showed that about 20% of the DIMM-related failures detected on the machines tested were only detected by Stressful Application Test, and it was capable of reporting 70% of all DIMM errors detected by all tests.

We hope this software will be useful to system administrators who need to diagnose and repair DIMM or other components. We look forward to your questions and feedback in our discussion group. Happy hacking and may your testing be less stressful!

Testing Race Conditions in Java

Monday, October 12, 2009

Can you spot the bug in the following piece of Java code?

/** Maintains a list of names. */
public class NameManager {
  private List<String> names = new ArrayList<String>();
  /** Stores a new list of names. This method is threadsafe. */
  public void setNames(List<String> newNames) {
    synchronized (names) {
      names = new ArrayList<String>();
      for (String name : newNames) {

(Hint: the method setNames() is synchronized on the names field, but that field is then modified to point to a new object.)

OK, so spotting the bug was easy. But how would you write a Unit Test to demonstrate the problem? You would need to have two or more threads calling setNames() simultaneously, but you still don't have any control over how the threads will be scheduled.

Enter Thread Weaver, a test framework that lets you control thread execution. By setting breakpoints in your code, you can stop one thread at exactly the point that you want, and then allow a second thread to run. This allows you to write repeatable multi-threaded unit tests, without relying on the thread scheduler.

Thread Weaver is released as an open source project under the Apache license, and is available on Google Code. Many examples can be found in the initial documentation. If you have comments or questions, please see our discussion group. Happy testing!

Ed. Note: Post updated with corrected formatting.

By Alasdair Mackintosh, Software Engineering Team

MoinMoin's Google Summer of Code Wrap Up

Friday, October 9, 2009

We at the MoinMoin Wiki software development team had a wonderful time with our participation in Google Summer of Code™ 2009. We greatly enjoyed collaborating with our students, hacking Python and Javascript code for the wiki engine. Thanks to Google's support, we had four student projects total, and three of them were successfully completed:

Christopher Denter, whom I mentored, worked on making MoinMoin's modular storage code production-ready by adding an access control middleware. Christopher's work in this area made MoinMoin safer and more flexible. He also worked on a router middleware - think of it as a kind of a wiki
"mount/fstab" - and a SQLAlchemy backend. Our users can now enjoy MoinMoin with MySQL, PostgreSQL, SQLite, etc. Christopher's work was done directly in the repo that will become the 2.0 release of MoinMoin.

Alexandre Martani, mentored by Bastian Blank, worked on a realtime collaborative wiki editor based on Google's mobwrite. Multiple people can now choose to edit the same wiki page at the same time and they all see each other's changes shortly after typing. We hope that we can merge his code into the MoinMoin 2.0 repository soon.

Dmitrijs Milajevs, mentored by Reimar Bauer, worked on groups and dictionary code with modular backends. You can now fetch group definitions from wiki pages or a wiki, and preparations have been made to make an LDAP group backend possible as part of future development. Dmitrijs also refactored the search code to get rid of the unmaintained xapwrap library and use the new xappy library. All his work has already merged into the MoinMoin 1.9 main repo.

Thanks also to Alexander Schremmer for his contributions as a mentor. Unfortunately, his student's project did not work out, but in true community fashion he provided valuable help and feedback for the other students.

In case you're curious about when all this nice code will be released:

MoinMoin 1.9 will be released later in 2009 (likely in November). Please help us beta testing, translating and generally making the release ready.

MoinMoin 2.0 will not just 1.9 + 0.1, but a major rewrite of big parts of the code base. Right now, it's like a big construction site, so it'll naturally take some time until the release will be ready, likely 2010 or 2011. We'd be happy to have your help with it; if you enjoy coding in Python, playing with new features, cleanly refactoring code and working with a fun team, then do join us to make MoinMoin an even better wiki. Check out the MoinMoin 2.0 page for more details.

Many thanks to all the students and mentors as well as everyone in the community who helped or supported the process. It was a very productive summer and we are greatly looking forward to continued work with our new contributors!