Debuting dcs-bwt, An Experimental Burrows-Wheeler Compressor

Wednesday, June 11, 2008

Today we've released dcs-bwt, a C++ implementation of a data compressor based on the Burrows-Wheeler compression technique. The compressor is intended for experimentation with different algorithms and parameter combinations.

To support experimentation, the compressor has a modular structure that separates the Burrows-Wheeler transform stage from the compression stage. There are multiple algorithms for each stage and new algorithms can be added with minimal changes to the existing code. Any combination of algorithms and their parameters can be chosen at run-time.

The compressor contains a high-performance implementation of an advanced algorithm for the Burrows-Wheeler transform stage. (The "dcs" in the name stands for Difference Cover Sampling and refers to this advanced algorithm.) It can efficiently handle blocks of hundreds of megabytes even for highly redundant data. The use of very large blocks improves the compression rate dramatically in some cases.

We hope you'll find dcs-bwt useful. Check out the code and let us know what you think, either in the dcs-bwt Google Group or the comments section.