For the third time, Twitter had the opportunity to participate in Google Summer of Code (GSoC), and we wanted to share news on the resulting open source activities. Unlike many GSoC participating organizations that focus on a single ecosystem, Twitter has a variety of projects that span multiple programming languages and communities. They include:
Use zero-copy read path in @ApacheParquet
Sunyu Duan worked with mentors Julien Le Dem (@J_) and Gera Shegalov (@gerashegalov) on improving performance in Parquet by using the new ByteBuffer based APIs in Hadoop. As a result of the work over the summer, performance has improved up to 40% based on initial testing and the work will make its way into the next Parquet release.
A pluggable algorithm to choose next EventLoop in Netty
Jakob Buchgraber worked with mentor Norman Maurer (@normanmaurer) to add pluggable algorithm support to Netty’s event loop (see pull request). At the start of the summer when a new EventLoop was needed to register a Channel, EventLoopGroup implementations used a round-robin like algorithm to choose the next EventLoop. This was challenging because different events may become more busy than others over time, hence the need for Jakob’s project to support pluggable algorithms to increase performance.
Various compression codecs for Netty
Idel Pivnitskiy (@pivnitskiy) worked with mentor Trustin Lee (@trustin) to add multiple compression codes (LZ4, FastLZ and BZip2) to the Netty project. Compression codecs will allow cutting traffic and creating applications, which are able to transfer large amounts of data even more effectively and quickly.
Android Support For Pants
Mateo Rodriguez (@mateornaut) added Android support to the Pants build system (see commits) so Pants can build Android applications (APKs) on top of the many other languages and tools it supports.
A pure ZooKeeper client for Finagle
Pierre-Antoine Ganaye (@pa-ganaye) was mentored by Evan Meagher (@evanm) to add a pure Apache ZooKeeper client to Finagle to improve performance (see project).
An SMTP client for Finagle
Lera Dymbitska (@suncelesta) worked with mentors Selvin George (@selvin) and Travis Brown (@travisbrown) to add SMTP protocol support to Finagle to improve performance (see pull request). Finagle strives to provide fully asynchronous protocol support so baking in SMTP support was required versus using third party libraries such as javamail and commons-email which are synchronous by design.
Analyze Wikipedia using Cassovary
Szymon Matejczyk (@szymonmatejczyk) worked with mentors Pankaj Gupta (@pankaj) and Ajeet Grewal (@ajeet) to enable Cassovary to analyze Wikipedia data. The result of this work improved the performance of Cassovary when dealing with large graphs. See the commits associated with the project to see how it was done.
We really enjoyed the opportunity to take part in GSoC again this year. Thanks again to our seven students, mentors and Google for the program. We hope to participate again next summer.
Chris Aniszczyk, Organization Administrator, Twitter