Google Open Source Blog: March 2010

Posts from March 2010

Summer Student Stats: Degrees and Majors

Tuesday, March 30, 2010

The student application period for Google Summer of Code™ opened yesterday and many students are wondering if they should apply. We encourage students of all backgrounds to apply, as long as they meet our eligibility requirements, which state that if you are enrolled as of April 26, 2010 in a college or university program, you are eligible to participate in the program.

From our FAQ’s:

“Google defines a student as an individual enrolled in or accepted into an accredited institution including (but not necessarily limited to) colleges, universities, masters programs, PhD programs and undergraduate programs.”

Many people don’t realize that we accept graduate students in addition to undergrads, when in fact graduate students make up ~36% of our student population.

Regardless of where you are in your academic career, you should consider applying!

Many people also assume that all of our students are studying computer science. While computer science is indeed the most popular area of study, we also have a significant number of students who focus on math, physics and engineering. What is surprising is that we also have some students who study biology, music, visual art, and a wide range of other fields. One interesting note about our students is there is an impressive number of double and triple majors in the population.

A variety of statistics about past Google Summer of Code students is available for those who are interested in learning more about what kind of students have participated in the program in the past. If you apply to this year’s program, you’ll have the chance to shape the statistics for the future! Good luck to all our student applicants!

By Ellen Ko, Open Source Team

Students, Apply Now for Google Summer of Code 2010!

Monday, March 29, 2010

Students, want to gain real world software engineering experience and get paid? We are now accepting applications for Google Summer of Code™ 2010, our global program to introduce students, ages 18 and over, to the wonderful world of Open Source development. For our sixth Google Summer of Code, students can choose from 150 Free and Open Source software projects, in technical areas as diverse as gaming to humanitarian efforts to operating system design. All accepted students will be paired with a mentor from academia or industry and will receive coaching in all aspects of software development over the course of their three month coding project. Successful students will receive a stipend of 5000 USD for their participation in the program.

Check out the program Frequently Asked Questions and the extensive set of resources for student applicants on the program wiki, then talk to your prospective mentors about your ideas. Each mentoring organization has provided an Ideas List to help you learn more about what the project needs and to get your creative juices flowing. You’ll also note that each organization has provided tags to help you better understand their technical focus areas, so if you’re looking for opportunities to, say, geek out on gaming or hack on networking, you can narrow the list of organizations based on various tags.

Our mentors are also very excited to hear from students who have their own plans for improving the projects’ code bases, so let their ideas inspire rather than constrain you. You can find knowledgeable folks on hand to answer questions in #gsoc on Freenode and on the program discussion list, or you can keep up with our announcements on various social networking sites.

We'll be accepting student applications through April 9, 2010 at 19:00 UTC. Best of luck to all of our student applicants, and get those applications going!

By Leslie Hawthorn, Open Source Team

GNOME Usability Hackfest

Wednesday, March 24, 2010

Google recently sponsored the GNOME Usability Hackfest, which took place in London. With over 30 GNOME design and usability experts attending on some days, it was an unusually large, exciting and dynamic event. As GNOME 3.0 is just around the corner, people took advantage of the opportunity to build bridges within the GNOME usability community, and re-think the desktop paradigm.

Highlights for me included:

GNOME 3.0 and Shell Design Planning by William Jon McCann and Seth Nickell
Talking accessibility with Willie Walker
Charline Poirier's Empathy Usability Report and Icon Usability Study
HIG Planning
Card sorting exercises to get a better understanding of how to organize applications and settings
Usability improvements for GTK+.
Discussions about how to improve Nautilus usability

In the last few days at the event, I spent a fair amount of time cleaning up the GNOME Usability Project Wiki so it is more clear and straightforward. I also did a lot of coordination with the Dev8D conference organizers to arrange for GNOME speakers at their event, and made arrangements for Antonio Roberts from the Dev8D community to attend the GNOME Usability Hackfest and participate on Thursday the 25th.

The GNOME Usability team has posted a great deal of blog posts, articles, and photos highlights about the work done at this hackfest and as more attendees post their notes, they will be updated onto the GNOME Planet blog aggregator. You can see my blog for more photos and a full report on the event. Many thanks to Google for sponsoring this event.

By Brian Cameron, GNOME Foundation Board Secretary

Training a Toy Elephant with Google Summer of Code

Monday, March 22, 2010

Google has redefined many many things. It has redefined scalability. When the entire world was racing towards high performance computing, Google came up with MapReduce and the Google File System that allowed them to process the whole web in a matter of hours across thousands of cheap computers. With its education on MapReduce, with its contributions to Open Source in terms of code, infrastructure and innovative initiatives like the Google Summer of Code™, Google has taken openness to a whole new level. Through its dataliberation.org initiatives, Google also allows you to export your private data outside outside the Google server. Google also liberates public user data like the MapMaker annotations, which was exported within hours of the Chilean earthquake. When you are inspired by the technology, the data liberation, the Open Source and have two amazing years in Google Summer of Code, you end up with a great open tool like Apache Mahout.

I talked about the different algorithms in Mahout and was thrilled by the enthusiasm of the students there.

Mahout is an Apache Software Foundation project, which aims to create scalable machine-learning libraries using a variety of techniques including leveraging Apache Hadoop. Unlike other Open Source machine-learning libraries, Mahout was built with one thing in mind: the ability to scale over large sized data. We are not talking about the whole Internet here, just a small fraction of it, but large enough that processing them is near to impossible on one machine. Mahout is becoming more and more relevant in a world where gigabytes and terabytes of data are coming into the hands of the public. The latest release of Mahout has really solid and scalable implementations of recommendation, clustering, classification, pattern mining, and genetic algorithms.

Two years ago, I had a chance to join the project along with Deneche Abdel Hakim and David Hall when we were selected in the Google Summer of Code program. With help from our mentors and other committers on Mahout, we were able to contribute a lot of algorithms to the project. After two amazing years in Google Summer of Code and on the verge of the third one, the project looks like its about to break free. We have more contributors coming in, more algorithms, improvements in quality and performance. Mahout is also being made a top-level project under Apache. The latest release of Mahout contains the Colt high performance collections. This has given a great boost to the performance of the core data-structures. Mahout can create vectors from the entire articles of Wikipedia in English in under an hour on an 8 node Hadoop cluster. This is just the beginning, as more interesting things are being planned for future releases and I see a big role of Summer of code students in it. Mahout is a great platform for students and professors in universities to use for their research work in machine learning to get results quickly for large data-sets.

Recently, I went to the India Hadoop Summit at Bangalore, India to help spread awareness of Apache Mahout and Google Summer of Code. I had the good fortune of presenting Mahout in the un-conference to a big group of cloud computing lovers from India.

Many people including cloud computing adopters and students were hearing about the Google Summer of Code program for the very first time and I am happy that I helped spread the awareness of the same.

Mahout has grown, and so have I, from a Google Summer of Code student to a committer at Mahout, to a Googler and hopefully to being a mentor this year. I am also co-authoring a book on Mahout with Manning publications. Google Summer of Code has opened up many doors for me. It helped me hone my coding skills, helped me get in touch with cutting edge research work, helped me find great peers in the Open Source community whose help I will always cherish. Many thanks to Google and the Google Summer of Code program for giving me this opportunity and for helping thousands of students and hundreds of Open Source projects worldwide and for ensuring that the world and its information stays open.

You can find more about Mahout Project and the usages of various algorithms on the Mahout wiki. If you are a student interested in implementing a data-mining or a machine-learning algorithm, Mahout is the right place to be this summer. Take a look at our GSOC project ideas here and please come and discuss your proposal with us on the Mahout mailing list.

Pictures courtesy of Dave Nielson, Co-Founder, Cloudcamp

By Robin Anil, Google Summer of Code Student (2008, 2009) & Apache Committer

Google Summer of Code Meetups in Sofia and Strasbourg

Friday, March 19, 2010

Back at the end of January, when Google first announced that Google Summer of CodeTM was on for 2010, I happened to read the mail in the company of a group of Computer Science M.Sc. students. I quickly shared the news with them but rather than cheers of enthusiasm I was surprised that all I got in return were puzzled stares. It turned out that most of the students there hadn't heard of the program before and those that had, didn't really know what it was all about.

I have always thought that Computer Science students were particularly lucky to be able to participate in Open Source. Most of the time newly graduated students would have a hard time finding a decent job because of their lack of experience, which makes experience itself hard to accumulate. Open Source offers an easy way out of this: no project is going to refuse a patch simply because you don't have the necessary entries in your CV. Of course, many would say, that getting into an Open Source project is not really that easy since the learning curve in most of the popular projects is often quite steep and could prove discouraging.

This is exactly why Google Summer of Code is a unique program. A hundred and fifty of the world's greatest FOSS projects get organized by proposing ideas that students know are within their reach. They also allocate mentors to guide the work of the students, and their whole communities follow and comment on the projects ... And all this happens while students are actually paid for their work!

So getting the puzzled stares from CS students after mentioning the program was like looking at people who were preparing to spend a cold night in front of a warm house, because they didn't know there was a key under the doormat.

After sharing this thought with a few other people that had been mentoring for SIP Communicator, we decided we definitely needed to make sure everyone knew what Google Summer of Code is and, more importantly, how it works. We therefore decided to organize a couple of quick information sessions in universities that our mentors were somehow related to: the University of Strasbourg, France (which was eventually split in two), and the Sofia University in Bulgaria. We were particularly lucky to also get the help of Shteryana Shopova from FreeBSD who agreed to join in for the Sofia session and tell us about her experience as both a student and a mentor.

Both universities were particularly helpful in making room reservations and advertising the meetings to the potentially interested students. I would also like to thank Vladimir Vassilev, Alexander Todorov, and Julien Montavont for their help with the organization!

Both sessions went quite well and attracted a decent number of students. Questions were mostly related to the student selection process, whether or not one could participate with a project of their own, where does the work happen, and how does one communicate with their mentor and community. I guess this is one of the advantages of attending live sessions: one gets to ask as many Frequently Asked Questions as they want ;)

The Strasbourg Sessions

We held two meetings there in order to make it easier for students from different campuses to attend. On both of the sessions we had Vincent Lucas, Romain Kuntz, Julien Montavont and myself (Emil Ivov), all mentors from SIP Communicator's Google Summer of Code participation in 2007, 2008, and 2009. (Unfortunately, we currently only have photos from the first meeting.)

The Sofia Sessions

We already mentioned Shteryana Shopova from FreeBSD (GSoC student in 2005 and 2006, and mentor in 2007). We also had Damian Minkov from SIP Communicator (2007, 2008, and 2009), as well as Vladimir Vassilev and Alexander Todorov.

By Emil Ivov, SIP Communicator Project

Meet Your Mentors: Announcing Accepted Project for Our Sixth Google Summer of Code

Thursday, March 18, 2010

We've just announced the list of accepted mentoring organizations for Google Summer of Code™ 2010. Congratulations to all of our future mentors!

After reviewing just over 365 applications, we finally narrowed our selection to 150 Free and Open Source projects. The accepted projects are now busy adding details about their participation in Google Summer of Code to the program website, but you can already take a look at the list of accepted projects and their Ideas Lists.

As with every year, we had to make some very tough decisions in 2010. We simply weren’t able to accept every great project that applies. Once again, we are also bidding fond farewell to some past participants in favor of bringing new projects into the program. We greatly appreciate everything they have contributed to the program in past years and hope they will remain actively involved in our community. We want to thank everyone for their applications and would encourage those who were not accepted to apply for future instances of the program.

What Happens Now?

No doubt many would-be Google Summer of Code students are wondering what their next steps should be. You'll have about 1.5 weeks to learn about each participating organization before student applications open on March 29, 2010. Use this time to meet your potential mentors and to discuss how you'd like to contribute to their organization, especially your ideas for improving their code base. Keep on eye on the program mailing lists, as we'll post notes about additional resources for learning about our mentoring organizations there.

Most organizations have provided individual points of contact for each project suggestion, and you can always propose ideas and look for guidance on project mailing lists or forums, as well as on IRC. You can also look for your potential mentors in the program IRC channel, #gsoc on Freenode.

Remember, some of our most successful proposals come from ideas suggested by the students themselves, so take advantage of this time to explore what areas of development most excite you. You can then find people to help you brainstorm about your initial thoughts and further refine them. Don't be nervous about how your ideas will be received; take some time to think through what you'd like to accomplish, propose a plan of action, then work with your potential mentors to iterate, iterate, iterate.

Congratulations to all of our future mentors! We look forward to working with all of you this year, and to working with many of you once again.

By Leslie Hawthorn, Open Source Team

Leuven, Belgium GSoC Infosession

Monday, March 15, 2010

On the 9th of March, Google Summer of Code™ veterans Vincent Verhoeven (student for both KDE and Thousand Parsec), Ruben Vermeersch (K.U. Leuven researcher and GNOME Google Summer of Code admin) and Bram Luyten (@mire co-founder and mentor for DSpace) gave a presentation about the Google Summer of Code 2010 program to an audience of interested students.

The Google Summer of Code schedule is quite challenging for Belgian students because of the large overlaps between the program and their examinations. However, the presenters made it clear that with careful planning in the application, and transparent communication with mentors, successful participation is definitely possible. As an added bonus, if students can find a mentor in a company, participation in Google Summer of Code can be counted as an internship for some of the master's programs at K.U. Leuvens, which adds even more value on top of the stipends.

For many of the attending students, it sounded too good to be true, as we saw true stares of disbelief when the stipend of 5000 USD in exchange for a few months of programming was announced. Vincent's testimonials of his experiences as a student for the KDE and Thousand Parsec projects, and Ruben's recruiting talk for GNOME "Become the Next GNOME Rockstar," convinced them in the end.

The slides, a recording of the event (in Dutch) and additional information is available.

By Bram Luyten, Head of Sales and Marketing, @Mire

Living La Vida LibrePlanet

Thursday, March 11, 2010

The LibrePlanet Conference will be held next week, March 19th-21st, in Cambridge, Massachusetts. The Google Open Source Programs Office's Leslie Hawthorn will be participating in the lively discussions about software that the user can share, modify and distribute. On Sunday, Leslie will be talking about Free Software Mentoring at 11 AM as part of the Women's Caucus which is dedicated to increasing the participation of women in free software. Leslie participated in the last Women's Caucus and we're excited to see the community continue the great work that was started there. Also at the Women's Caucus, you can check out Google Summer of Code™ alumna Selena Decklemann's speaker training workshop, along with her lightning talk on skillful soldering.

Come by and learn about practical steps in free software advocacy!

By Ellen Ko, Open Source Team

RE2: a principled approach to regular expression matching

Regular expressions are one of computer science's shining examples of the benefits of good computer science theory. They were originally developed by theorists as a way to describe infinite sets, but Ken Thompson introduced them to programmers as a way to describe text patterns in his implementation of the text editor QED for CTSS. Dennis Ritchie followed suit in his own implementation of QED, for GE-TSS. Thompson and Ritchie would go on to create Unix, and they brought regular expressions with them. By the late 1970s, regular expressions were a key feature of the Unix landscape, in tools such as ed, sed, grep, egrep, awk, and lex. They remain a key feature of the open source landscape today, in those venerable Unix tools and at the core of new languages like Perl, Python, and JavaScript.

The feature-rich regular expression implementations of today are based on a backtracking search with a potential for exponential run time and unbounded stack usage. At Google, we use regular expressions as part of the interface to many external and internal systems, including Code Search, Sawzall, and Bigtable. Those systems process large amounts of data; exponential run time would be a serious problem. On a more practical note, these are multithreaded C++ programs with fixed-size stacks: the unbounded stack usage in typical regular expression implementations leads to stack overflows and server crashes. To solve both problems, we've built a new regular expression engine, called RE2, which is based on automata theory and guarantees that searches complete in linear time with respect to the size of the input and in a fixed amount of stack space.

Today, we released RE2 as an open source project. It's a mostly drop-in replacement for PCRE's C++ bindings
and is available under a BSD-style license. See the RE2 project page for details.

By Russ Cox, Software Engineering Team

Google Summer of Code: Applications Now Open for Mentoring Organizations

Monday, March 8, 2010

Looking for new contributors and fresh perspectives for your open source software project? Through the Google Summer of Code™ program, we fund students worldwide to work with mentors from the FLOSS community on a three month coding project. Over the past five years, we've successfully paired nearly 3,400 students "with more than 3,000 mentors from backgrounds spanning industry to academia, with some spectacular results: more than 8 million lines of source code produced and over $20M in funding in support of open source development. We're particularly excited by the social ties our students form through the course of the program. We've connected people in more than 100 countries, and hope to bring people from even more places into the Google Summer of Code community this year. We're looking forward to our sixth year and welcoming another group of 1,000 student developers to the program.

We're now accepting applications from open source projects who wish to act as mentoring organizations. We'll be taking mentoring organization applications until Friday, March 12th at 23:00 UTC. Our list of approved organizations will be published on the 2010 Google Summer of Code site on March 18th. Interested students will then have several days to discuss their ideas with the accepted organizations before student applications open on March 29th.

Check out our Frequently Asked Questions page for more details and a preview of the application. And remember, if you have any questions, you can always find us in the Google Summer of Code Discussion group or in #gsoc on Freenode. Best of luck to all of our applicants!

By Leslie Hawthorn, Open Source Team

Make Contact with Google at SIGCSE 2010

Friday, March 5, 2010

Next week several Googlers will be attending and presenting at the 41st ACM Technical Symposium on Computer Science Education (SIGCSE 2010). From March 10-13, Leslie Hawthorn and Cat Allman from the Open Source Programs Office will be in Milwaukee, WI, USA to talk about Google’s open source student programs, Google Summer of Code™ and the Google Highly Open Participation Contest. Check out Google’s vendor session on Friday to hear more from Leslie and Cat. Leslie will also be speaking at a roundtable and panel discussion with Hal Abelson from the Google App Inventor team at the Humanitarian FOSS Symposium on Wednesday.

If you are interested in learning more about Google’s activities in computer science education, make sure to attend some of the talks we have scheduled or drop by the Google booth!

By Ellen Ko, Open Source Team

Low-Impact Operating System Tracing

The Google Open Source Team has the privilege of funding some really great projects in the Open Source space. Mathieu Desnoyers, a student at Ecole Polytechnique, recently defended his Ph.D. thesis, which we helped to fund. The topic of his thesis was "Low-Impact Operating System Tracing."

The open source projects he created as part of his work were two-fold: Linux Trace Toolkit Next Generation (LTTng), a LGPLv2.1/GPLv2 tracer for the Linux kernel; and Userspace RCU library (liburcu), a highly-scalable user-space synchronization library, distributed under the LGPLv2.1 license.

Mathieu was kind enough to send us this summary of his research:

Computer systems, both at the hardware and software-levels, are becoming increasingly complex. Tracing is the key to solving some or all of this increasing complexity. In the case of Linux, used in a large range of applications, from small embedded devices to high-end servers, the size of the operating system kernels are increasing, libraries are being added, and major redesign of existing software is required to benefit from multi-core architectures. As a result, the software development industry and individual developers are facing problems whose resolution requires an understanding of the interaction between applications and all components of an operating system.

In my thesis, I propose the LTTng (Linux Trace Toolkit next generation) tracer as an answer to the industry and open source community tracing needs. The low-intrusiveness of the tracer is a key aspect of its usefulness because we need to be able to reproduce problems occurring in normal conditions. In some cases, users leave tracers active at all times in production, which makes the tracer overhead definitely critical. Our approach involves the design of synchronization primitives that meet the low-impact requirements. The linearly scalable and wait-free RCU (Read-Copy Update) synchronization mechanism used by the LTTng tracer fulfills these requirements with respect to data read. A custom-made buffer synchronization scheme is proposed to extract tracing data while preserving linear scalability and wait-free characteristics.

By measuring the LTTng impact, I demonstrate that it is possible to create a tracer that satisfy all the following characteristics: low latency, deterministic real-time impact (wait-free), small impact on operating system throughput and linear scalability with the number of cores. Experiments on various architectures show that this tracer is portable.

I propose a general model for superscalar multi-core systems with weakly-ordered memory accesses to perform formal verification of the RCU correctness and wait-free guarantees by model-checking. The LTTng
buffering scheme is also formally verified for safety and progress. Formal verification demonstrates that these algorithms allow reentrancy from multiple execution contexts, ranging from standard thread to non-maskable interrupts handlers, allowing a wide instrumentation coverage of the operating system.

Many thanks to Mathieu for sending us this report. You can download the full dissertation for more details.

By Carol Smith, Open Source Team

Google and the Tor Project

Thursday, March 4, 2010

When it comes to code, Google's support has made a big difference to the Tor Project. Providing privacy and helping to circumvent censorship online is a challenge that keeps our software developers and volunteers very busy. The Google Summer of Code™ brings students and mentors in the open source community together to write code for three months every year. A lot of coding got done in a few months in 2009, and Tor was lucky to get a group of students who kept on working past the summer months to improve existing projects and support users. Tor also works on Libevent with Google.

All of these changes in software are very exciting, but who is it all for? Why is anonymity online so important? Companies like Google have privacy and opt-out policies, but not everyone has this stance. Corporations, nations, criminal organizations and individuals want your information. Companies collect information on your web browsing habits and sell it or are sloppy when it comes to protecting it from identity thieves. Others can threaten lives, from repressive nations tracking down outspoken journalists, to abusive spouses or stalkers who want to find out where their victims are hiding; from enemy military forces trying to find a communications link, to criminals who know when law enforcement is watching online.

Political upheaval sparks protests and renewed efforts to control the flow of information online. Interest in censorship circumvention also rises. In 2009, use of Tor increased, as users tried to get around national firewalls during the elections in Iran, and after the introduction of national Internet filters in other countries.

In times of relative political stability, governments routinely filter out international news outlets, information on reproductive health, religion, human rights and other topics deemed unfit. Women blogging about things considered mundane elsewhere, like being forbidden to drive or shop alone, are harassed by authorities. On the one hand, technology has made it easier to crack down on dissent, but the right technology can influence policy in good ways. In Mauritania, the use of censorship circumvention software after 2005 became widespread enough to prompt the government to stop filtering, since it was becoming a waste of time.

Even people living in countries where free speech is protected by law need anonymity for political activities. People blogging about political views that differ from the prevailing attitudes in a small community may lose a job or face boycotts if they run a business. In a company town, writing about the misdeeds of the company that employs your neighbors may be dangerous. Telling people about corruption could lead to harassment from guilty officials.

When someone finds the courage to leave an abusive relationship, the support of victims' advocates is vital. The Internet can help a survivor find counseling, shelter, and encouragement from people who have gone through the same process. Sadly, stalkers are also using technology to find their victims. Abusers monitor web browsers to see if a victim is planning to leave. Information about a shelter's location can be found in email headers, forcing abuse survivors to relocate. According to the U.S. Bureau of Justice Statistics, over one in four people who are stalked experience some sort of cyberstalking. Though some software in a stalker's toolkit is installed on a home computer, IP addresses can reveal which internet cafe or library someone uses to get online. Even if you don't have a stalker, hiding your IP address can be a good idea. Kids and adults alike are advised not to tell strangers where they live, but an IP address can reveal it for them.

Sting operations fail if criminals can tell that the police are connecting to message boards and chat from a government network. The information disappears. Insurgents may be looking for soldiers connecting to their defense department's computers back home. Anonymous tip lines are not so anonymous if someone telling authorities about crime is the only person in the neighborhood connecting to a government website. Without anonymity, going after organized crime can be dangerous to officers and their families.

Some companies do not reveal how much they know about their customers, or who sees the information. Some Internet Service Providers feel entitled to sell data collected from their subscribers to marketers. Though they claim that the information is not tied to any particular users, it is easy to find someone based on their search history. Information about visits to banking websites, searches for details on pre-existing health conditions, or other sensitive online activity could be damaging in the wrong hands; whether made available through carelessness or commercial interest.

Privacy online can protect people offline whether they are organizing protests, covering the news, blowing the whistle on threats to public health, or just blogging about daily life. In the "real world" assaults on privacy like peeking in windows, opening mail, or breaking and entering are obvious crimes. In the online world, however, assaults on privacy are subtle and unyielding. These threats to your health, your wealth and your well-being have no "opt-out" button. They have no "scrub my data" option. Your online activities, e-mails, bank transactions and everything else can be used to trace where you are and who you are. Using software like Tor gives ordinary citizens more choice about the information they reveal online.

For more information about online privacy and circumventing internet censorship, visit the Tor Project's website.

By The Tor Project