|
| 1 | + |
| 2 | +# lz4net |
| 3 | +**LZ4** - ultra fast compression algorithm - for all .NET platforms |
| 4 | + |
| 5 | +LZ4 is lossless compression algorithm, sacrificing compression ratio for compression/decompression speed. Its compression speed is ~400 MB/s per core while decompression speed reaches ~2 GB/s, not far from RAM speed limits. |
| 6 | + |
| 7 | +LZ4net brings LZ4 to all (most?) .NET platforms: .NET 2.0+, .NET 4.0+, .NET Core, Mono, Windows Phone, Xamarin.iOS, Xamarin.Android and Silverlight |
| 8 | + |
| 9 | +Original LZ4 has been written by Yann Collet and original C sources can be found [here](https://github.com/Cyan4973/lz4) |
| 10 | + |
| 11 | +## Migration from codeplex |
| 12 | +Sources has been moved to GitHub, while project documentation has not been properly migrated yet and is still hosted at [codeplex](https://lz4net.codeplex.com/) |
| 13 | + |
| 14 | +## Change log |
| 15 | +You can find it [here](CHANGES.md) |
| 16 | + |
| 17 | +## NuGet |
| 18 | +You can download lz4net from [NuGet](http://nuget.org/packages/lz4net/) |
| 19 | + |
| 20 | +## Releases |
| 21 | +Releases are also available on [github](https://github.com/MiloszKrajewski/lz4net/releases) |
| 22 | + |
| 23 | +## What is 'Fast compression algorithm'? |
| 24 | +While compression algorithms you use day-to-day to archive your data work around the speed of 10MB/s giving you quite decent compression ratios, 'fast algorithms' are designed to work 'faster than your hard drive' sacrificing compression ratio. |
| 25 | +One of the most famous fast compression algorithms in Google's own [Snappy](http://code.google.com/p/snappy/) which is advertised as 250MB/s compression, 500MB/s decompression on i7 in 64-bit mode. |
| 26 | +Fast compression algorithms help reduce network traffic / hard drive load compressing data on the fly with no noticeable latency. |
| 27 | + |
| 28 | +I just tried to compress some sample data (Silesia Corpus) receiving: |
| 29 | +* **zlib** (7zip) - 7.5M/s compression, 110MB/s decompression, 44% compression ratio |
| 30 | +* **lzma** (7zip) - 1.5MB/s compression, 50MB/s decompression, 37% compression ratio |
| 31 | +* **lz4** - 280MB/s compression, 520MB/s decompression, 57% compression ratio |
| 32 | + |
| 33 | +**Note**: Values above are for illustration only. they are affected by HDD read/write speed (in fact LZ4 decompression in much faster). The 'real' tests are taking HDD speed out of equation. For detailed performance tests see [Performance Testing] and [Comparison to other algorithms]. |
| 34 | + |
| 35 | +## Why use it? |
| 36 | +Here is the thing. At first I needed fast compression to pass huge amount of data to SQL Server over network of unknown speed (or maybe even via shared memory on the same machine) . If network speed was known to be large I wouldn't use compression at all because it would just slow the whole process down. If network speed was known to be small I would use Deflate or ZLib ([DotNetZip](http://dotnetzip.codeplex.com/) - it would reduce the amount of data sent but compression would be fast enough to feed the connection. |
| 37 | +Anyway, I decided to go for 'near memcpy' compression algorithm. It reduces the amount of data pushed over the network and does not introduce much latency when using local server. |
| 38 | + |
| 39 | +## Other 'Fast compression algorithms' |
| 40 | +There are multiple fast compression algorithms, to name a few: [LZO](http://lzohelper.codeplex.com/), [QuickLZ](http://www.quicklz.com/index.php), [LZF](http://csharplzfcompression.codeplex.com/), [Snappy](https://github.com/Kintaro/SnappySharp), FastLZ. |
| 41 | +You can find comparison of them on [LZ4 webpage](http://code.google.com/p/lz4/) or [here](http://www.technofumbles.com/weblog/2011/04/22/survey-of-fast-compression-algorithms-part-1-2/) |
| 42 | + |
| 43 | +Personally I found LZ4 most interesting. Quite good compression ratio, with decent compression speed and excellent decompression speed. I actually trusted the author that is is faster than others and never thoroughly tested it. You are most welcome to do it (please note, make the comparison fair: do not compare native C++ implementation to .NET safe implementation). |
| 44 | + |
| 45 | +## Why not just link to pre-compiled (native) .DLL? |
| 46 | +If my life was depending on it I would, but otherwise I just don't like P/Invoke. |
| 47 | + |
| 48 | +## There is already [LZ4Sharp](https://github.com/stangelandcl/LZ4Sharp), why not use it? |
| 49 | +The other thing was that I needed 'safe' (in .NET terms - pure CLR, no pointers) implementation for SQL Server side. LZ4Sharp uses 'unsafe' code. |
| 50 | +But then I also wanted it to be fast when application is 'trusted', so I also did 'unsafe' implementation. Hey why not 'Mixed Mode' then? Come on, I have sources so I can also do C++/CLI. Still pure CLR but on original sources with no risk of making a mistake during translation. |
| 51 | + |
| 52 | +So I ended up with 4 implementations: |
| 53 | + |
| 54 | +* **Mixed Mode** - C# interface + native C in one assembly |
| 55 | + * Pros: |
| 56 | + * Fastest (almost as fast as original) |
| 57 | + * Cons: |
| 58 | + * Requires VC++ 2010 Redistributable to be installed on target machine ([x86](http://www.microsoft.com/en-us/download/details.aspx?id=5555) and/or [x64](http://www.microsoft.com/en-us/download/details.aspx?id=14632) |
| 59 | + * Contains unmanaged code, may not be allowed in some environments |
| 60 | + * Does not have AnyCPU configuration (this can be solved though, see: [Automatic loading of x86 or x64]) |
| 61 | +* **C++/CLI** - original C sources recompiled for CLR |
| 62 | + * Pros: |
| 63 | + * Almost as fast as Mixed Mode |
| 64 | + * Only managed code |
| 65 | + * Cons: |
| 66 | + * Contains unsafe code, may not be allowed in some environments |
| 67 | + * Does not have AnyCPU configuration (this can be solved though, see: [Automatic loading of x86 or x64]) |
| 68 | +* **unsafe C#** - C# but still fast |
| 69 | + * Pros: |
| 70 | + * C# (more .NET-ish) |
| 71 | + * Still quite fast |
| 72 | + * Cons: |
| 73 | + * Contains unsafe code, may not be allowed in some environments |
| 74 | +* **safe C#** - just in case (mobile phone maybe?) |
| 75 | + * Pros: |
| 76 | + * Runs everywhere |
| 77 | + * Cons: |
| 78 | + * Slow (for LZ4 standards; it still beats Deflate by a mile) |
| 79 | + |
| 80 | +Plus class which chooses the best available implementation for the job: [One class to access them all] and [Performance Testing] |
| 81 | + |
| 82 | +## Platform availability |
| 83 | + |
| 84 | +| Platform | Implementations | Notes | |
| 85 | +| --- | --- | --- | |
| 86 | +| NET 2.0 | Safe | could be Unsafe as well, but I didn't bother | |
| 87 | +| NET 4.0 | MixedMode, C++/CLI, Unsafe, Safe | does work on Mono as well | |
| 88 | +| Portable | Unsafe, Safe | Windows Phone, Xamarin, Windows Store (1) | |
| 89 | +| Silverlight | Safe | anyone? | |
| 90 | +| .NET Standard 1.0 | Unsafe, Safe | be first person to try it (2)(3) | |
| 91 | + |
| 92 | +* (1) It looks like .NET Standard is picked anyway on Xamarin, so the "portable" version may be obsolete. |
| 93 | +* (2) Still experimental but seems to be working |
| 94 | +* (3) I've tested it on Android 6.0 (Nexus 7) and Android 2.3.5 (ancient HTC Desire HD) |
| 95 | + |
| 96 | +## Use with streams |
| 97 | + |
| 98 | +This LZ4 library can be used in two distinctive ways: to compress streams and packets. Compressing streams follow decorator pattern: `LZ4Stream` is-a `Stream` and takes-a `Stream`. Let's start with some imports as text we are going to compress: |
| 99 | + |
| 100 | +```csharp |
| 101 | +using System; |
| 102 | +using System.IO; |
| 103 | +using LZ4; |
| 104 | + |
| 105 | +const string LoremIpsum = |
| 106 | + "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla sit amet mauris diam. " + |
| 107 | + "Mauris mollis tristique sollicitudin. Nunc nec lectus nec ipsum pharetra venenatis. " + |
| 108 | + "Fusce et consequat massa, eu vestibulum erat. Proin in lectus a lacus fermentum viverra. " + |
| 109 | + "Aliquam vel tellus aliquam, eleifend justo ultrices, elementum elit. " + |
| 110 | + "Donec sed ullamcorper ex, ac sagittis ligula. Pellentesque vel risus lacus. " + |
| 111 | + "Proin aliquet lectus et tellus tristique, eget tristique magna placerat. " + |
| 112 | + "Maecenas ut ipsum efficitur, lobortis mauris at, bibendum libero. " + |
| 113 | + "Curabitur ultricies rutrum velit, eget blandit lorem facilisis sit amet. " + |
| 114 | + "Nunc dignissim nunc iaculis diam congue tincidunt. Suspendisse et massa urna. " + |
| 115 | + "Aliquam sagittis ornare nisl, quis feugiat justo eleifend iaculis. " + |
| 116 | + "Ut pulvinar id purus non convallis."; |
| 117 | +``` |
| 118 | + |
| 119 | +Now, we can write this text to compressed stream: |
| 120 | + |
| 121 | +```csharp |
| 122 | +static void WriteToStream() |
| 123 | +{ |
| 124 | + using (var fileStream = new FileStream("lorem.lz4", FileMode.Create)) |
| 125 | + using (var lz4Stream = new LZ4Stream(fileStream, LZ4StreamMode.Compress)) |
| 126 | + using (var writer = new StreamWriter(lz4Stream)) |
| 127 | + { |
| 128 | + for (var i = 0; i < 100; i++) |
| 129 | + writer.WriteLine(LoremIpsum); |
| 130 | + } |
| 131 | +} |
| 132 | +``` |
| 133 | + |
| 134 | +and read it back: |
| 135 | + |
| 136 | +```csharp |
| 137 | +static void ReadFromStream() |
| 138 | +{ |
| 139 | + using (var fileStream = new FileStream("lorem.lz4", FileMode.Open)) |
| 140 | + using (var lz4Stream = new LZ4Stream(fileStream, LZ4StreamMode.Decompress)) |
| 141 | + using (var reader = new StreamReader(lz4Stream)) |
| 142 | + { |
| 143 | + string line; |
| 144 | + while ((line = reader.ReadLine()) != null) |
| 145 | + Console.WriteLine(line); |
| 146 | + } |
| 147 | +} |
| 148 | +``` |
| 149 | + |
| 150 | +`LZ4Stream` constructor requires inner stream and compression mode, plus takes some optional arguments, but their defaults are relatively sane: |
| 151 | + |
| 152 | +```csharp |
| 153 | +LZ4Stream( |
| 154 | + Stream innerStream, |
| 155 | + LZ4StreamMode compressionMode, |
| 156 | + LZ4StreamFlags compressionFlags = LZ4StreamFlags.Default, |
| 157 | + int blockSize = 1024*1024); |
| 158 | +``` |
| 159 | + |
| 160 | +where: |
| 161 | + |
| 162 | +```csharp |
| 163 | +enum LZ4StreamMode { |
| 164 | + Compress, |
| 165 | + Decompress |
| 166 | +}; |
| 167 | + |
| 168 | +[Flags] enum LZ4StreamFlags { |
| 169 | + None, |
| 170 | + InteractiveRead, |
| 171 | + HighCompression, |
| 172 | + IsolateInnerStream, |
| 173 | + Default = None |
| 174 | +} |
| 175 | +``` |
| 176 | + |
| 177 | +`compressionMode` configures `LZ4Stream` to either `Compress` or `Decompress`. `compressionFlags` is optional argument and allows to: |
| 178 | + |
| 179 | +* use `HighCompression` mode, which provides better compression ratio for the price of performance. This is relevant on compression only. |
| 180 | +* use `IsolateInnerStream` mode to leave inner stream open after disposing `LZ4Stream`. |
| 181 | +* use `InteractiveRead` mode to read bytes as soon as they are available. This option may be useful when dealing with network stream, but not particularly useful with regular file streams. |
| 182 | + |
| 183 | +`blockSize` is set 1MB by default but can be changed. Bigger `blockSize` allows better compression ratio, but uses more memory, stresses garbage collector and increases latency. It might be worth to experiment with it. |
| 184 | + |
| 185 | +## Use with byte arrays |
| 186 | + |
| 187 | +You can also compress byte arrays. It is useful when compressed chunks are relatively small and their size in known when compressing. `LZ4Codec.Wrap` compresses byte array and returns byte array: |
| 188 | + |
| 189 | +```csharp |
| 190 | +static string CompressBuffer() |
| 191 | +{ |
| 192 | + var text = Enumerable.Repeat(LoremIpsum, 5).Aggregate((a, b) => a + "\n" + b); |
| 193 | + |
| 194 | + var compressed = Convert.ToBase64String( |
| 195 | + LZ4Codec.Wrap(Encoding.UTF8.GetBytes(text))); |
| 196 | + |
| 197 | + return compressed; |
| 198 | +} |
| 199 | +``` |
| 200 | + |
| 201 | +In the example above we a little bit more, of course: first we concatenate multiple strings (`Enumerable.Repeat(...).Aggregate(...)`), then encode text as UTF8 (`Encoding.UTF8.GetBytes(...)`), then compress it (`LZ4Codec.Wrap(...)`) and then encode it with Base64 (`Convert.ToBase64String(...)`). On the end we have base64-encoded compressed string. |
| 202 | + |
| 203 | +To decompress it we can use something like this: |
| 204 | + |
| 205 | +```csharp |
| 206 | +static string DecompressBuffer(string compressed) |
| 207 | +{ |
| 208 | + var lorems = |
| 209 | + Encoding.UTF8.GetString( |
| 210 | + LZ4Codec.Unwrap(Convert.FromBase64String(compressed))) |
| 211 | + .Split('\n'); |
| 212 | + |
| 213 | + foreach (var lorem in lorems) |
| 214 | + Console.WriteLine(lorem); |
| 215 | +} |
| 216 | +``` |
| 217 | + |
| 218 | +Which is a reverse operation: decoding base64 string (`Convert.FromBase64String(...)`), decompression (`LZ4Codec.Unwrap(...)`), decoding UTF8 (`Encoding.UTF8.GetString(...)`) and splitting the string (`Split('\n')`). |
| 219 | + |
| 220 | +## Compatibility |
| 221 | + |
| 222 | +Both `LZ4Stream` and `LZ4Codec.Wrap` is not compatible with original LZ4. It is an outstanding task to implement compatible streaming protocol and, to be honest, it does not seem to be likely in nearest future, but... |
| 223 | + |
| 224 | +If you want to do it yourself, you can. It requires a little bit more understanding though, so let's look at "low level" compression. Let's create some compressible data: |
| 225 | + |
| 226 | +```charp |
| 227 | +var inputBuffer = |
| 228 | + Encoding.UTF8.GetBytes( |
| 229 | + Enumerable.Repeat(LoremIpsum, 5).Aggregate((a, b) => a + "\n" + b)); |
| 230 | +``` |
| 231 | + |
| 232 | +we also need to allocate buffer for compressed data. |
| 233 | +Please note, it might be actually more than input data (as not all data can be compressed): |
| 234 | + |
| 235 | +```csharp |
| 236 | +var inputLength = inputBuffer.Length; |
| 237 | +var maximumLength = LZ4Codec.MaximumOutputLength(inputLength); |
| 238 | +var outputBuffer = new byte[maximumLength]; |
| 239 | +``` |
| 240 | + |
| 241 | +Now, we can run actual compression: |
| 242 | + |
| 243 | +```csharp |
| 244 | +var outputLength = LZ4Codec.Encode( |
| 245 | + inputBuffer, 0, inputLength, |
| 246 | + outputBuffer, 0, maximumLength); |
| 247 | +``` |
| 248 | + |
| 249 | +`Encode` method returns number of bytes which were actually used. It might be less or equal to `maximumLength`. It me be also `0` (or less) to indicate that compression failed. This happens when provided buffer is too small. |
| 250 | + |
| 251 | +Buffer compressed this way can be decompressed with: |
| 252 | + |
| 253 | +```csharp |
| 254 | +LZ4Codec.Decode( |
| 255 | + inputBuffer, 0, inputLength, |
| 256 | + outputBuffer, 0, outputLength, |
| 257 | + true); |
| 258 | +``` |
| 259 | + |
| 260 | +Last argument (`true`) indicates that we actually know output length. Alternatively we don't have to provide it, and use: |
| 261 | + |
| 262 | +```csharp |
| 263 | +var guessedOutputLength = inputLength * 10; // ??? |
| 264 | +var outputBuffer = new byte[guessedOutputLength]; |
| 265 | +var actualOutputLength = LZ4Codec.Decode( |
| 266 | + inputBuffer, 0, inputLength, |
| 267 | + outputBuffer, 0, guessedOutputLength); |
| 268 | +``` |
| 269 | + |
| 270 | +but this will require guessing outputBuffer size (`guessedOutputLength`) which might be quite inefficient. |
| 271 | + |
| 272 | +**Buffers compressed this way are fully compatible with original implementation if LZ4.** |
| 273 | + |
| 274 | +Both `LZ4Stream` and `LZ4Codec.Wrap/Unwrap` use them internally. |
0 commit comments