Decoding general linear codes with Groebner bases #11413

sagetrac-Niels · 2011-05-31T14:59:46Z

I have implemented a decoding method for general linear codes in Sage. The method decodes up to half the true minimum distance using Groebner bases. This method was introduced by Bulygin and Pellikaan.

I was expecting the method to be faster than syndrome decoding, but it appears to be equally fast. It may be worth having this method around in Sage since Groebner basis computation may become faster. I have attached a report with my findings.

Apply:

CC: @sagetrac-sbulygin @kwankyu

Component: coding theory

Keywords: general decoding groebner basis

Author: Niels Duif

Reviewer: Burcin Erocal

Issue created by migration from https://trac.sagemath.org/ticket/11413

sagetrac-Niels · 2011-05-31T15:00:53Z

Decoding with Groebner bases

sagetrac-Niels · 2011-05-31T15:08:57Z

comment:1

Attachment: report.pdf.gz

Here is my implementation. Please review!

sagetrac-Niels · 2011-05-31T15:09:58Z

Attachment: decoding_GB.patch.gz

burcin · 2011-06-01T13:27:14Z

Reviewer: Burcin Erocal

burcin · 2011-06-01T13:27:14Z

comment:2

Thanks for the patch Niels. A little more work is needed to get this merged. Here are a few observations based on reading your code without any familiarity with the algorithm:

You can't have a comment after the output of a doctest. As the patchbot also noticed (click on the yellow dot at the issue header), this test fails.
The function decode_BP() doesn't have any doctests. You can find more information about how to write tests in the developer manual:
http://sagemath.org/doc/developer/conventions.html#automated-testing
It would be good to include a reference to the paper in the docstring for decode().
I don't see the advantage of assigning single variable names to C.length(), C.dimension(). This just makes the code unreadable since you have to look up what these variables mean.
There must be a more efficient way to create the Reed-Solomon matrix. You could at least store each g^i and compute (g<sup>i)</sup>j in the inner loop.
which brings me to the question, did you profile this function at all? It is entirely possible that GB computation is not the bottleneck and more time is spent elsewhere.
the var() function creates symbolic variables. You should create the polynomial ring up front and use the generators of that for all the computations. This would eliminate conversion between symbolic objects and Singular polynomials and make arithmetic faster.
the first if statement in linear_code.py should be changed to if algorithm in ["syndrome", ...]. It looks like the second if statement is really an else. The error message should be changed to include all implemented algorithms.

sagetrac-Niels · 2011-07-06T19:14:17Z

comment:3

Thanks for reviewing my patch, Bursin! Here is what I have done with your comments:

You can't have a comment after the output of a doctest. As the patchbot also noticed (click on the yellow dot at the issue header), this test fails.

Thanks, I fixed this.
- The function decode_BP() doesn't have any doctests. You can find more information about how to write tests in the developer manual: http://sagemath.org/doc/developer/conventions.html#automated-testing
Thanks, I included some doctests and tested the files decoder.py and linear_code.py. All tests now pass.
- It would be good to include a reference to the paper in the docstring for decode().
Thanks, I included a reference to the paper. Can I expect the url of this ticket to persist? I can host the paper on my own website instead, so it will surely persist.
- I don't see the advantage of assigning single variable names to C.length(), C.dimension(). This just makes the code unreadable since you have to look up what these variables mean.
Ok, I changed this. For me the short names are more readable, but I guess it depends on what you're used to.
- There must be a more efficient way to create the Reed-Solomon matrix. You could at least store each g^i and compute (g<sup>i)</sup>j in the inner loop.
Thanks for pointing this out; the new version should be more efficient.
- which brings me to the question, did you profile this function at all? It is entirely possible that GB computation is not the bottleneck and more time is spent elsewhere.
Well, sort of. I ran the Sage code in chunks, using different inputs. The GB computation usually takes multiple seconds for large dimension, while the rest of the code only takes a split second. I also tried %prun, but I don't understand the result. Apparently there are >2k calls to the method ___iter___ in linear_code.py, but I don't understand why. This is actually what takes the most time.
- the var() function creates symbolic variables. You should create the polynomial ring up front and use the generators of that for all the computations. This would eliminate conversion between symbolic objects and Singular polynomials and make arithmetic faster.
I changed this. The point is that the ring R now has more variables than necessary. Some of the V_i will not appear in any equation, unless the code is an MDS code and the number of errors is maximal. I hope this does not slow down the GB computation.
- the first if statement in linear_code.py should be changed to if algorithm in ["syndrome", ...]. It looks like the second if statement is really an else. The error message should be changed to include all implemented algorithms.
You're right; this looks ugly. The original code was written in this way, and I didn't bother changing it at first. It is fixed now.

The changes are in patch number 2. So please apply patch 1 first, then apply patch 2.

sagetrac-Niels · 2011-07-06T19:15:12Z

Attachment: decoding_GB_2.patch.gz

rwst · 2014-05-12T13:59:16Z

comment:9

Patches don't merge. Please rebase.

johanrosenkilde · 2017-05-19T14:32:41Z

comment:11

I had a look at this patch and the algorithm behind it to see if it would be interesting to polish it off and merge it. My conclusion is that I'm not overly excited about the algorithm, and I report my observations for others.

It's all about speed, since Sage already ships with two "maximum likelihood"-decoders: "syndrome" and "nearest neighbor". They have complexity roughly O(n*q^(n-k)) respectively O(n*q^k)). "syndrome" decoding computes a table which can be stored and make subsequent decodings fast, while "nearest neighbor" simply runs through the entire code for each decoding.

The ticket's decoding algorithm needs to solve a Grobner basis problem for each decoding, and hence its theoretical complexity is difficult to judge. We are therefore left with a comparison on how well it does in practice. Both in the original article by Bulygin and Pellikaan, as well as Niels who wrote the code, comparison is made only with syndrome decoding. Moreover they do so only with codes where k << n-k, since they mention that their decoder performs better for small k. But so does "nearest neighbor" decoding!

Indeed, the selling point of the article is codes where [n,k] = [120, 20] and [n,k] = [120, 30], where Bulygin--Pellikaan's own implementation runs in several hundreds of seconds for correcting 12 respectively 7 errors. Beyond that they do not report. However, it is clear that the complexity sharply increases with the number of errors to correct.

Comparing with Nearest Neighbor, I get the following with Sage 8.0:

sage: C = codes.random_linear_code(GF(2), 120, 20)
sage: V = C.ambient_space()
sage: timeit('C.decode_to_code(V.random_element(), "NearestNeighbor")')
5 loops, best of 3: 2.04 s per loop
  
sage: C = codes.random_linear_code(GF(2), 120, 22)
sage: V = C.ambient_space()
sage: timeit('C.decode_to_code(V.random_element(), "NearestNeighbor")')
5 loops, best of 3: 8.15 s per loop

We can then guess that correcting in a [120, 30] code with "nearest neighbor" would take roughly 2000 seconds. Note, this is decoding completely random vectors, which is much, much worse than what Bulygin--Pellikaan can cope with (for e.g. [120, 20] this often corrects 30-35 errors). However, the Bulygin--Pellikaan method does appear to be faster when the number of errors is small.

Best,
Johan

johanrosenkilde · 2017-07-06T12:39:56Z

comment:12

Note that after #20138 this decoder should also consider how it matches up to Information-set decoding, which actually looks much more like the proposed Gröbner basis decoder than the aforementioned ML-decoders. At time of writing, this information-set decoder is much, much faster than the proposed decoder:

sage: C = codes.random_linear_code(GF(2), 120, 22) # usually has min-dist >= 32.
sage: Chan = channels.StaticErrorRateChannel(C.ambient_space(), 15)
sage: timeit('C.decode_to_code(Chan(C.random_element()), "InformationSet", 15)')
125 loops, best of 3: 3.39 ms per loop

sage: C = codes.random_linear_code(GF(2), 120, 30) # usually has min-dist >= 25.
sage: Chan = channels.StaticErrorRateChannel(C.ambient_space(), 12)
sage: timeit('C.decode_to_code(Chan(C.random_element()), "InformationSet", 12)')
125 loops, best of 3: 6.23 ms per loop

sagetrac-Niels mannequin added this to the sage-5.11 milestone May 31, 2011

sagetrac-Niels mannequin added c: coding theory labels May 31, 2011

sagetrac-Niels mannequin added the s: needs review label May 31, 2011

burcin added s: needs work and removed s: needs review labels Jun 1, 2011

sagetrac-Niels mannequin added s: needs review and removed s: needs work labels Jul 6, 2011

This comment has been minimized.

Sign in to view

jdemeyer modified the milestones: sage-5.11, sage-5.12 Aug 13, 2013

sagetrac-vbraun-spam mannequin modified the milestones: sage-6.1, sage-6.2 Jan 30, 2014

sagetrac-vbraun-spam mannequin modified the milestones: sage-6.2, sage-6.3 May 6, 2014

rwst added s: needs work and removed s: needs review labels May 12, 2014

sagetrac-vbraun-spam mannequin modified the milestones: sage-6.3, sage-6.4 Aug 10, 2014

mkoeppe removed this from the sage-6.4 milestone Dec 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decoding general linear codes with Groebner bases #11413

Decoding general linear codes with Groebner bases #11413

sagetrac-Niels mannequin commented May 31, 2011

sagetrac-Niels mannequin commented May 31, 2011

sagetrac-Niels mannequin commented May 31, 2011

sagetrac-Niels mannequin commented May 31, 2011

burcin commented Jun 1, 2011

burcin commented Jun 1, 2011

sagetrac-Niels mannequin commented Jul 6, 2011

sagetrac-Niels mannequin commented Jul 6, 2011

This comment has been minimized.

rwst commented May 12, 2014

johanrosenkilde commented May 19, 2017

johanrosenkilde commented Jul 6, 2017

Decoding general linear codes with Groebner bases #11413

Decoding general linear codes with Groebner bases #11413

Comments

sagetrac-Niels mannequin commented May 31, 2011

sagetrac-Niels mannequin commented May 31, 2011

sagetrac-Niels mannequin commented May 31, 2011

sagetrac-Niels mannequin commented May 31, 2011

burcin commented Jun 1, 2011

burcin commented Jun 1, 2011

sagetrac-Niels mannequin commented Jul 6, 2011

sagetrac-Niels mannequin commented Jul 6, 2011

This comment has been minimized.

rwst commented May 12, 2014

johanrosenkilde commented May 19, 2017

johanrosenkilde commented Jul 6, 2017