The Globus Alliance's First Google Summer of Code

Wednesday, January 14, 2009

The Globus Alliance is a community of organizations and individuals developing fundamental technologies behind the "Grid," which lets people share computing power, databases, instruments, and other on-line tools securely across corporate, institutional, and geographic boundaries without sacrificing local autonomy. Globus currently hosts more than 20 projects, actively developed by a community of more than 100 committers, and spanning a variety of technology concerns on grid systems.

This was the first year that we have participated in Google Summer of Code™ and, despite being total newbies, we were fortunate to be given ten students to mentor. Overall, we couldn't be happier with how the summer turned out. Eight of our students made it through the program and, three months after the end of Summer of Code, most of the code produced by these students has either made it into the official Globus code repository or is in the process of being added. Most of the mentors feel that their students have become a part of the Globus community thanks to their participation in the program. One of our students has already been voted in as a Globus committer (Globus uses an open and meritocratic governance model — similar to Apache Jakarta's — where new committers are voted into projects based on their work).

Through their projects, our students addressed a variety of specific issues and needs across multiple fields, ranging from grid security to virtual machines. More specifically:

The Portal-based User Registration Service (PURSe) is a set of tools and Java APIs, developed for constructing portal-based systems that automate user registration, the creation of PKI credentials, and subsequent credential management. A typical PURSe-based portal allows users to register via a Web page and then use a username and password to obtain X.509 proxy certificates. Mehran Ahsant, mentored by Rachana Ananthakrishnan, developed a standalone Credential Translation Service (CTS), integrated with PURSe, to provide grid users with other formats of security credentials such as SAML assertions and X.509 certificates. The CTS is a standalone WS-Trust security token web service, capable of issuing security tokens as defined by the WS-Trust specification and translating tokens into another format when a token is not in a format or syntax understandable by the recipient.

The Virtual Workspace Service, one of several services that make up the Globus Nimbus cloud toolkit, was only capable of using Xen as a virtualization backend. Michael Fenn, mentored by Kate Keahey, set things straight by refactoring the existing code to allow multiple backends, and implementing a KVM backend.

Globus GridFTP is a high-performance, secure, reliable data transfer protocol which generally assumes the existence of a high-performance parallel file system, a relatively expensive resource. On the other hand, FreeLoader is a storage system that aggregates idle storage space from workstations connected within a local area network to build a low-cost, yet high-performance data store. Is this a match made in heaven? Hesam Ghasemi, mentored by Raj Kettimuthu, thought so, and modified GridFTP so it could use FreeLoader as a backend, potentially reducing the cost and increasing the performance of GridFTP deployments.

AliEn is the Grid infrastructure which is used by scientists participating in the ALICE experiment at CERN. Artem Harutyunyan, mentored by Tim Freeman, developed a set of scripts on top of Globus Nimbus to dynamically deploy an entire AliEn Grid site, enabling 'one-click' deployment of all the services necessary for ALICE job retrieval and their execution. Artem is still actively working on this project and has even submitted a paper on his work to the CHEP 2009 conference. Screenshots of ALICE jobs running on the University of Chicago's Nimbus science cloud can be seen here.

Globus GridFTP can help you move data fast. However, Mattias Lidman, mentored by John Bresnahan, thought this wasn't fast enough, so he developed a compression driver for the Globus XIO input/output library (which GridFTP depends on) to compress/uncompress data as it passes through it. He even wrote a performance study (PDF) showing that his driver is, in fact, totally awesome.

The Globus GridShib project integrates a federated authorization infrastructure (Shibboleth) with Grid technology to provide attribute-based authorization for distributed scientific communities. Joana Matos Fonseca da Trindade, mentored by Tom Scavo, contributed to GridShib by implementing a Holder-of-Key Single Sign-On profile handler for the Shibboleth Identity Provider. Her contribution was completely integrated into the GridShib development and distribution framework and Joana did such a great job that she was asked to join the GridShib project as a committer. More details on Joana's work can be found on the GridShib website and on the Globus wiki.

Swift is a system for the rapid and reliable specification, execution, and management of large-scale science and engineering workflows. One of its main components is SwiftScript, a simple scripting language that can be used to specify complex parallel computations. Milena Nikolic, mentored by Ben Clifford, improved the SwiftScript compiler by adding stronger type checking and type inference.

The OpenNebula virtual infrastructure engine, developed by collaborators of the Globus Alliance, can be used to dynamically deploy and re-allocate virtual machines on a pool of physical resources but lacks a "cloud-like" interface, like the one provided by Globus Nimbus. Nimbus, in turn, lacks the advanced resource management features provided by OpenNebula. William Voorsluys, mentored by yours truly, tackled this particular issue by working on integrating OpenNebula and Nimbus.

Many congratulations to all of our mentors and students for their tremendous success in our first Summer of Code!