Highlights for me included:
In the last few days at the event, I spent a fair amount of time cleaning up the GNOME Usability Project Wiki so it is more clear and straightforward. I also did a lot of coordination with the Dev8D conference organizers to arrange for GNOME speakers at their event, and made arrangements for Antonio Roberts from the Dev8D community to attend the GNOME Usability Hackfest and participate on Thursday the 25th.
The GNOME Usability team has posted a great deal of blog posts, articles, and photos highlights about the work done at this hackfest and as more attendees post their notes, they will be updated onto the GNOME Planet blog aggregator. You can see my blog for more photos and a full report on the event. Many thanks to Google for sponsoring this event.
Mahout is an Apache Software Foundation project, which aims to create scalable machine-learning libraries using a variety of techniques including leveraging Apache Hadoop. Unlike other Open Source machine-learning libraries, Mahout was built with one thing in mind: the ability to scale over large sized data. We are not talking about the whole Internet here, just a small fraction of it, but large enough that processing them is near to impossible on one machine. Mahout is becoming more and more relevant in a world where gigabytes and terabytes of data are coming into the hands of the public. The latest release of Mahout has really solid and scalable implementations of recommendation, clustering, classification, pattern mining, and genetic algorithms.
Two years ago, I had a chance to join the project along with Deneche Abdel Hakim and David Hall when we were selected in the Google Summer of Code program. With help from our mentors and other committers on Mahout, we were able to contribute a lot of algorithms to the project. After two amazing years in Google Summer of Code and on the verge of the third one, the project looks like its about to break free. We have more contributors coming in, more algorithms, improvements in quality and performance. Mahout is also being made a top-level project under Apache. The latest release of Mahout contains the Colt high performance collections. This has given a great boost to the performance of the core data-structures. Mahout can create vectors from the entire articles of Wikipedia in English in under an hour on an 8 node Hadoop cluster. This is just the beginning, as more interesting things are being planned for future releases and I see a big role of Summer of code students in it. Mahout is a great platform for students and professors in universities to use for their research work in machine learning to get results quickly for large data-sets.
Recently, I went to the India Hadoop Summit at Bangalore, India to help spread awareness of Apache Mahout and Google Summer of Code. I had the good fortune of presenting Mahout in the un-conference to a big group of cloud computing lovers from India.
Mahout has grown, and so have I, from a Google Summer of Code student to a committer at Mahout, to a Googler and hopefully to being a mentor this year. I am also co-authoring a book on Mahout with Manning publications. Google Summer of Code has opened up many doors for me. It helped me hone my coding skills, helped me get in touch with cutting edge research work, helped me find great peers in the Open Source community whose help I will always cherish. Many thanks to Google and the Google Summer of Code program for giving me this opportunity and for helping thousands of students and hundreds of Open Source projects worldwide and for ensuring that the world and its information stays open.
You can find more about Mahout Project and the usages of various algorithms on the Mahout wiki. If you are a student interested in implementing a data-mining or a machine-learning algorithm, Mahout is the right place to be this summer. Take a look at our GSOC project ideas here and please come and discuss your proposal with us on the Mahout mailing list.
The LibrePlanet Conference will be held next week, March 19th-21st, in Cambridge, Massachusetts. The Google Open Source Programs Office's Leslie Hawthorn will be participating in the lively discussions about software that the user can share, modify and distribute. On Sunday, Leslie will be talking about Free Software Mentoring at 11 AM as part of the Women's Caucus which is dedicated to increasing the participation of women in free software. Leslie participated in the last Women's Caucus and we're excited to see the community continue the great work that was started there. Also at the Women's Caucus, you can check out Google Summer of Code™ alumna Selena Decklemann's speaker training workshop, along with her lightning talk on skillful soldering.
Come by and learn about practical steps in free software advocacy!
Computer systems, both at the hardware and software-levels, are becoming increasingly complex. Tracing is the key to solving some or all of this increasing complexity. In the case of Linux, used in a large range of applications, from small embedded devices to high-end servers, the size of the operating system kernels are increasing, libraries are being added, and major redesign of existing software is required to benefit from multi-core architectures. As a result, the software development industry and individual developers are facing problems whose resolution requires an understanding of the interaction between applications and all components of an operating system.In my thesis, I propose the LTTng (Linux Trace Toolkit next generation) tracer as an answer to the industry and open source community tracing needs. The low-intrusiveness of the tracer is a key aspect of its usefulness because we need to be able to reproduce problems occurring in normal conditions. In some cases, users leave tracers active at all times in production, which makes the tracer overhead definitely critical. Our approach involves the design of synchronization primitives that meet the low-impact requirements. The linearly scalable and wait-free RCU (Read-Copy Update) synchronization mechanism used by the LTTng tracer fulfills these requirements with respect to data read. A custom-made buffer synchronization scheme is proposed to extract tracing data while preserving linear scalability and wait-free characteristics.By measuring the LTTng impact, I demonstrate that it is possible to create a tracer that satisfy all the following characteristics: low latency, deterministic real-time impact (wait-free), small impact on operating system throughput and linear scalability with the number of cores. Experiments on various architectures show that this tracer is portable.I propose a general model for superscalar multi-core systems with weakly-ordered memory accesses to perform formal verification of the RCU correctness and wait-free guarantees by model-checking. The LTTngbuffering scheme is also formally verified for safety and progress. Formal verification demonstrates that these algorithms allow reentrancy from multiple execution contexts, ranging from standard thread to non-maskable interrupts handlers, allowing a wide instrumentation coverage of the operating system.