Improving Freenet's Performance

Friday, May 29, 2009

The Free Network project is the community that creates and maintains Freenet, free software that allows you publish and obtain information on the Internet without fear of censorship by means of a decentralized, anonymous network. Since version 0.7 , the software has had built-in support for downloading and uploading large files. These are long-term downloads, which persist between restarts of the node. This support has improved performance and usability, but it has also meant that when lots of downloads are going on at the same time, Freenet uses a lot of memory, takes a long time to complete the startup process, and crashes if you queue too many downloads. By storing the current progress of uploads and downloads in's open source object database (= a file on disk) rather than in memory, Freenet's memory usage can be greatly reduced, the end-user doesn't need to worry about running out of memory, we can have an unlimited number of uploads and tens of gigs of downloads, and so on.

To begin at the beginning, Freenet divides all files into 32KB blocks (called CHKs), which are each fetched and decrypted separately. Then we have a layer of redundancy, and various complexities surrounding putting files together and putting in-Freenet websites together, which makes up the client layer. Before the db4o branch, uploads were persistent, but downloads were restarted from scratch after every restart, pulling huge numbers of blocks from the datastore (on-disk cache). Worse, memory usage was rather large if you had any significant number of downloads on the queue. 

The db4o project puts the client layer (persistent downloads and uploads) into a database (db4o). I had initially hoped that this would be a relatively quick project, which shows how much I knew about databases then! We decided to use db4o in a fairly low-level way, specifically to minimize memory usage. We had heard from testimonials that some embedded applications had done this, but unfortunately this is not really the way that db4o is usually used, which caused some complications. Overall, the project took one developer most of a year, the final diff was over 46K lines of code covering 320 files, and went well beyond its original remit, solving many long-standing problems in the process. New architecture was required for optimal performance, including using Bloom filters to identify blocks we are interested in, a queue of database jobs, major refactoring in many areas of the client layer, a new system for handling temporary files, etc.

The effort was well worth it. Our client layer overall has vastly improved and Freenet now
  • starts up quickly

  • resumes work on downloads and uploads almost instantly on startup

  • can have an almost unlimited number of downloads and uploads

  • doesn't need the user to worry about or configure the maximum memory usage

  • doesn't go into limbo with constant 100% CPU usage desperately trying to scrounge a few more bytes

  • can insert DVD-sized files and huge websites (or git/hg repositories) on relatively low end systems

  • uses fewer file handles

This project would not have happened without support from Google's Open Source Programs Office. It will be one of the most important changes in version 0.8  of Freenet when it is released later this year, and current work includes Bloom filter sharing, a new feature that should greatly improve performance both for popular and rare content. Google is also funding that project, watch this space!