Skip to content
This repository was archived by the owner on Jan 5, 2021. It is now read-only.

Commit ad3b0a3

Browse files
Create readme.md
1 parent 6e031f0 commit ad3b0a3

File tree

1 file changed

+274
-0
lines changed

1 file changed

+274
-0
lines changed

doc/readme.md

+274
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,274 @@
1+
2+
# lz4net
3+
**LZ4** - ultra fast compression algorithm - for all .NET platforms
4+
5+
LZ4 is lossless compression algorithm, sacrificing compression ratio for compression/decompression speed. Its compression speed is ~400 MB/s per core while decompression speed reaches ~2 GB/s, not far from RAM speed limits.
6+
7+
LZ4net brings LZ4 to all (most?) .NET platforms: .NET 2.0+, .NET 4.0+, .NET Core, Mono, Windows Phone, Xamarin.iOS, Xamarin.Android and Silverlight
8+
9+
Original LZ4 has been written by Yann Collet and original C sources can be found [here](https://github.com/Cyan4973/lz4)
10+
11+
## Migration from codeplex
12+
Sources has been moved to GitHub, while project documentation has not been properly migrated yet and is still hosted at [codeplex](https://lz4net.codeplex.com/)
13+
14+
## Change log
15+
You can find it [here](CHANGES.md)
16+
17+
## NuGet
18+
You can download lz4net from [NuGet](http://nuget.org/packages/lz4net/)
19+
20+
## Releases
21+
Releases are also available on [github](https://github.com/MiloszKrajewski/lz4net/releases)
22+
23+
## What is 'Fast compression algorithm'?
24+
While compression algorithms you use day-to-day to archive your data work around the speed of 10MB/s giving you quite decent compression ratios, 'fast algorithms' are designed to work 'faster than your hard drive' sacrificing compression ratio.
25+
One of the most famous fast compression algorithms in Google's own [Snappy](http://code.google.com/p/snappy/) which is advertised as 250MB/s compression, 500MB/s decompression on i7 in 64-bit mode.
26+
Fast compression algorithms help reduce network traffic / hard drive load compressing data on the fly with no noticeable latency.
27+
28+
I just tried to compress some sample data (Silesia Corpus) receiving:
29+
* **zlib** (7zip) - 7.5M/s compression, 110MB/s decompression, 44% compression ratio
30+
* **lzma** (7zip) - 1.5MB/s compression, 50MB/s decompression, 37% compression ratio
31+
* **lz4** - 280MB/s compression, 520MB/s decompression, 57% compression ratio
32+
33+
**Note**: Values above are for illustration only. they are affected by HDD read/write speed (in fact LZ4 decompression in much faster). The 'real' tests are taking HDD speed out of equation. For detailed performance tests see [Performance Testing] and [Comparison to other algorithms].
34+
35+
## Why use it?
36+
Here is the thing. At first I needed fast compression to pass huge amount of data to SQL Server over network of unknown speed (or maybe even via shared memory on the same machine) . If network speed was known to be large I wouldn't use compression at all because it would just slow the whole process down. If network speed was known to be small I would use Deflate or ZLib ([DotNetZip](http://dotnetzip.codeplex.com/) - it would reduce the amount of data sent but compression would be fast enough to feed the connection.
37+
Anyway, I decided to go for 'near memcpy' compression algorithm. It reduces the amount of data pushed over the network and does not introduce much latency when using local server.
38+
39+
## Other 'Fast compression algorithms'
40+
There are multiple fast compression algorithms, to name a few: [LZO](http://lzohelper.codeplex.com/), [QuickLZ](http://www.quicklz.com/index.php), [LZF](http://csharplzfcompression.codeplex.com/), [Snappy](https://github.com/Kintaro/SnappySharp), FastLZ.
41+
You can find comparison of them on [LZ4 webpage](http://code.google.com/p/lz4/) or [here](http://www.technofumbles.com/weblog/2011/04/22/survey-of-fast-compression-algorithms-part-1-2/)
42+
43+
Personally I found LZ4 most interesting. Quite good compression ratio, with decent compression speed and excellent decompression speed. I actually trusted the author that is is faster than others and never thoroughly tested it. You are most welcome to do it (please note, make the comparison fair: do not compare native C++ implementation to .NET safe implementation).
44+
45+
## Why not just link to pre-compiled (native) .DLL?
46+
If my life was depending on it I would, but otherwise I just don't like P/Invoke.
47+
48+
## There is already [LZ4Sharp](https://github.com/stangelandcl/LZ4Sharp), why not use it?
49+
The other thing was that I needed 'safe' (in .NET terms - pure CLR, no pointers) implementation for SQL Server side. LZ4Sharp uses 'unsafe' code.
50+
But then I also wanted it to be fast when application is 'trusted', so I also did 'unsafe' implementation. Hey why not 'Mixed Mode' then? Come on, I have sources so I can also do C++/CLI. Still pure CLR but on original sources with no risk of making a mistake during translation.
51+
52+
So I ended up with 4 implementations:
53+
54+
* **Mixed Mode** - C# interface + native C in one assembly
55+
* Pros:
56+
* Fastest (almost as fast as original)
57+
* Cons:
58+
* Requires VC++ 2010 Redistributable to be installed on target machine ([x86](http://www.microsoft.com/en-us/download/details.aspx?id=5555) and/or [x64](http://www.microsoft.com/en-us/download/details.aspx?id=14632)
59+
* Contains unmanaged code, may not be allowed in some environments
60+
* Does not have AnyCPU configuration (this can be solved though, see: [Automatic loading of x86 or x64])
61+
* **C++/CLI** - original C sources recompiled for CLR
62+
* Pros:
63+
* Almost as fast as Mixed Mode
64+
* Only managed code
65+
* Cons:
66+
* Contains unsafe code, may not be allowed in some environments
67+
* Does not have AnyCPU configuration (this can be solved though, see: [Automatic loading of x86 or x64])
68+
* **unsafe C#** - C# but still fast
69+
* Pros:
70+
* C# (more .NET-ish)
71+
* Still quite fast
72+
* Cons:
73+
* Contains unsafe code, may not be allowed in some environments
74+
* **safe C#** - just in case (mobile phone maybe?)
75+
* Pros:
76+
* Runs everywhere
77+
* Cons:
78+
* Slow (for LZ4 standards; it still beats Deflate by a mile)
79+
80+
Plus class which chooses the best available implementation for the job: [One class to access them all] and [Performance Testing]
81+
82+
## Platform availability
83+
84+
| Platform | Implementations | Notes |
85+
| --- | --- | --- |
86+
| NET 2.0 | Safe | could be Unsafe as well, but I didn't bother |
87+
| NET 4.0 | MixedMode, C++/CLI, Unsafe, Safe | does work on Mono as well |
88+
| Portable | Unsafe, Safe | Windows Phone, Xamarin, Windows Store (1) |
89+
| Silverlight | Safe | anyone? |
90+
| .NET Standard 1.0 | Unsafe, Safe | be first person to try it (2)(3) |
91+
92+
* (1) It looks like .NET Standard is picked anyway on Xamarin, so the "portable" version may be obsolete.
93+
* (2) Still experimental but seems to be working
94+
* (3) I've tested it on Android 6.0 (Nexus 7) and Android 2.3.5 (ancient HTC Desire HD)
95+
96+
## Use with streams
97+
98+
This LZ4 library can be used in two distinctive ways: to compress streams and packets. Compressing streams follow decorator pattern: `LZ4Stream` is-a `Stream` and takes-a `Stream`. Let's start with some imports as text we are going to compress:
99+
100+
```csharp
101+
using System;
102+
using System.IO;
103+
using LZ4;
104+
105+
const string LoremIpsum =
106+
"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla sit amet mauris diam. " +
107+
"Mauris mollis tristique sollicitudin. Nunc nec lectus nec ipsum pharetra venenatis. " +
108+
"Fusce et consequat massa, eu vestibulum erat. Proin in lectus a lacus fermentum viverra. " +
109+
"Aliquam vel tellus aliquam, eleifend justo ultrices, elementum elit. " +
110+
"Donec sed ullamcorper ex, ac sagittis ligula. Pellentesque vel risus lacus. " +
111+
"Proin aliquet lectus et tellus tristique, eget tristique magna placerat. " +
112+
"Maecenas ut ipsum efficitur, lobortis mauris at, bibendum libero. " +
113+
"Curabitur ultricies rutrum velit, eget blandit lorem facilisis sit amet. " +
114+
"Nunc dignissim nunc iaculis diam congue tincidunt. Suspendisse et massa urna. " +
115+
"Aliquam sagittis ornare nisl, quis feugiat justo eleifend iaculis. " +
116+
"Ut pulvinar id purus non convallis.";
117+
```
118+
119+
Now, we can write this text to compressed stream:
120+
121+
```csharp
122+
static void WriteToStream()
123+
{
124+
using (var fileStream = new FileStream("lorem.lz4", FileMode.Create))
125+
using (var lz4Stream = new LZ4Stream(fileStream, LZ4StreamMode.Compress))
126+
using (var writer = new StreamWriter(lz4Stream))
127+
{
128+
for (var i = 0; i < 100; i++)
129+
writer.WriteLine(LoremIpsum);
130+
}
131+
}
132+
```
133+
134+
and read it back:
135+
136+
```csharp
137+
static void ReadFromStream()
138+
{
139+
using (var fileStream = new FileStream("lorem.lz4", FileMode.Open))
140+
using (var lz4Stream = new LZ4Stream(fileStream, LZ4StreamMode.Decompress))
141+
using (var reader = new StreamReader(lz4Stream))
142+
{
143+
string line;
144+
while ((line = reader.ReadLine()) != null)
145+
Console.WriteLine(line);
146+
}
147+
}
148+
```
149+
150+
`LZ4Stream` constructor requires inner stream and compression mode, plus takes some optional arguments, but their defaults are relatively sane:
151+
152+
```csharp
153+
LZ4Stream(
154+
Stream innerStream,
155+
LZ4StreamMode compressionMode,
156+
LZ4StreamFlags compressionFlags = LZ4StreamFlags.Default,
157+
int blockSize = 1024*1024);
158+
```
159+
160+
where:
161+
162+
```csharp
163+
enum LZ4StreamMode {
164+
Compress,
165+
Decompress
166+
};
167+
168+
[Flags] enum LZ4StreamFlags {
169+
None,
170+
InteractiveRead,
171+
HighCompression,
172+
IsolateInnerStream,
173+
Default = None
174+
}
175+
```
176+
177+
`compressionMode` configures `LZ4Stream` to either `Compress` or `Decompress`. `compressionFlags` is optional argument and allows to:
178+
179+
* use `HighCompression` mode, which provides better compression ratio for the price of performance. This is relevant on compression only.
180+
* use `IsolateInnerStream` mode to leave inner stream open after disposing `LZ4Stream`.
181+
* use `InteractiveRead` mode to read bytes as soon as they are available. This option may be useful when dealing with network stream, but not particularly useful with regular file streams.
182+
183+
`blockSize` is set 1MB by default but can be changed. Bigger `blockSize` allows better compression ratio, but uses more memory, stresses garbage collector and increases latency. It might be worth to experiment with it.
184+
185+
## Use with byte arrays
186+
187+
You can also compress byte arrays. It is useful when compressed chunks are relatively small and their size in known when compressing. `LZ4Codec.Wrap` compresses byte array and returns byte array:
188+
189+
```csharp
190+
static string CompressBuffer()
191+
{
192+
var text = Enumerable.Repeat(LoremIpsum, 5).Aggregate((a, b) => a + "\n" + b);
193+
194+
var compressed = Convert.ToBase64String(
195+
LZ4Codec.Wrap(Encoding.UTF8.GetBytes(text)));
196+
197+
return compressed;
198+
}
199+
```
200+
201+
In the example above we a little bit more, of course: first we concatenate multiple strings (`Enumerable.Repeat(...).Aggregate(...)`), then encode text as UTF8 (`Encoding.UTF8.GetBytes(...)`), then compress it (`LZ4Codec.Wrap(...)`) and then encode it with Base64 (`Convert.ToBase64String(...)`). On the end we have base64-encoded compressed string.
202+
203+
To decompress it we can use something like this:
204+
205+
```csharp
206+
static string DecompressBuffer(string compressed)
207+
{
208+
var lorems =
209+
Encoding.UTF8.GetString(
210+
LZ4Codec.Unwrap(Convert.FromBase64String(compressed)))
211+
.Split('\n');
212+
213+
foreach (var lorem in lorems)
214+
Console.WriteLine(lorem);
215+
}
216+
```
217+
218+
Which is a reverse operation: decoding base64 string (`Convert.FromBase64String(...)`), decompression (`LZ4Codec.Unwrap(...)`), decoding UTF8 (`Encoding.UTF8.GetString(...)`) and splitting the string (`Split('\n')`).
219+
220+
## Compatibility
221+
222+
Both `LZ4Stream` and `LZ4Codec.Wrap` is not compatible with original LZ4. It is an outstanding task to implement compatible streaming protocol and, to be honest, it does not seem to be likely in nearest future, but...
223+
224+
If you want to do it yourself, you can. It requires a little bit more understanding though, so let's look at "low level" compression. Let's create some compressible data:
225+
226+
```charp
227+
var inputBuffer =
228+
Encoding.UTF8.GetBytes(
229+
Enumerable.Repeat(LoremIpsum, 5).Aggregate((a, b) => a + "\n" + b));
230+
```
231+
232+
we also need to allocate buffer for compressed data.
233+
Please note, it might be actually more than input data (as not all data can be compressed):
234+
235+
```csharp
236+
var inputLength = inputBuffer.Length;
237+
var maximumLength = LZ4Codec.MaximumOutputLength(inputLength);
238+
var outputBuffer = new byte[maximumLength];
239+
```
240+
241+
Now, we can run actual compression:
242+
243+
```csharp
244+
var outputLength = LZ4Codec.Encode(
245+
inputBuffer, 0, inputLength,
246+
outputBuffer, 0, maximumLength);
247+
```
248+
249+
`Encode` method returns number of bytes which were actually used. It might be less or equal to `maximumLength`. It me be also `0` (or less) to indicate that compression failed. This happens when provided buffer is too small.
250+
251+
Buffer compressed this way can be decompressed with:
252+
253+
```csharp
254+
LZ4Codec.Decode(
255+
inputBuffer, 0, inputLength,
256+
outputBuffer, 0, outputLength,
257+
true);
258+
```
259+
260+
Last argument (`true`) indicates that we actually know output length. Alternatively we don't have to provide it, and use:
261+
262+
```csharp
263+
var guessedOutputLength = inputLength * 10; // ???
264+
var outputBuffer = new byte[guessedOutputLength];
265+
var actualOutputLength = LZ4Codec.Decode(
266+
inputBuffer, 0, inputLength,
267+
outputBuffer, 0, guessedOutputLength);
268+
```
269+
270+
but this will require guessing outputBuffer size (`guessedOutputLength`) which might be quite inefficient.
271+
272+
**Buffers compressed this way are fully compatible with original implementation if LZ4.**
273+
274+
Both `LZ4Stream` and `LZ4Codec.Wrap/Unwrap` use them internally.

0 commit comments

Comments
 (0)