Skip to content

Commit c48dea6

Browse files
committed
rework BIP moreorless from scratch
1 parent b49c8fe commit c48dea6

File tree

7 files changed

+181
-692
lines changed

7 files changed

+181
-692
lines changed

AUTHORS

+2-3
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,2 @@
1-
Pavol Rusnak <[email protected]>
2-
Marek Palatinus <[email protected]>
3-
Aaron Voisine <[email protected]>
1+
Marek Palatinus <[email protected]>
2+
Pavol Rusnak <[email protected]>

README

+64-89
Original file line numberDiff line numberDiff line change
@@ -3,42 +3,62 @@
33
<pre>
44
BIP: BIP-0039
55
Title: Mnemonic code for generating deterministic keys
6-
Author: Pavol Rusnak <[email protected]>
7-
Marek Palatinus <[email protected]>
6+
Authors: Marek Palatinus <[email protected]>
7+
Pavol Rusnak <[email protected]>
8+
89
Aaron Voisine <[email protected]>
910
Status: Draft
1011
Type: Standards Track
1112
Created: 10-09-2013
1213
</pre>
1314

14-
==Abstract==
15+
== Abstract ==
1516

16-
This BIP proposes a scheme for translating binary data (usually master seeds
17-
for deterministic keys, but it can be applied to any binary data) into a group
18-
of easy to remember words also known as mnemonic code or mnemonic sentence.
17+
This BIP describes an usage of mnemonic code or mnemonic sentence - a group of
18+
easy to remember words - to generate deterministic wallets.
1919

20-
==Motivation==
20+
It consists of two parts: generating the mnemonic and converting it into
21+
a binary seed. This seed can be later used to generate deterministic wallets
22+
using BIP-0032 or similar methods.
23+
24+
== Motivation ==
2125

2226
Such mnemonic code or mnemonic sentence is much easier to work with than working
2327
with the binary data directly (or its hexadecimal interpretation). The sentence
2428
could be writen down on paper (e.g. for storing in a secure location such as
2529
safe), told over telephone or other voice communication method, or memorized
2630
in ones memory (this method is called brainwallet).
2731

28-
==Backwards Compatibility==
32+
== Generating the mnemonic ==
33+
34+
First, we decide how much entropy we want mnemonic to encode. Recommended size
35+
is 128-256 bits, but basically any multiple of 32 bits will do. More bits
36+
mean more security, but also longer word sentence.
37+
38+
We take initial entropy of ENT bits and compute its checksum by taking first
39+
ENT / 32 bits of its SHA256 hash. We append these bits to the end of the initial
40+
entropy. Next we take these concatenated bits and split them into groups of 11
41+
bits. Each group encodes number from 0-2047 which is a position in a wordlist.
42+
We convert numbers into words and use joined words as mnemonic sentence.
2943

30-
As this BIP is written, only one Bitcoin client (Electrum) implements mnemonic
31-
codes, but it uses a different wordlist than the proposed one.
44+
The following table describes the relation between initial entropy length (ENT),
45+
checksum length (CS) and length of the generated mnemonic sentence (MS) in words.
3246

33-
For compatibility reasons we propose adding a checkbox to Electrum, which will
34-
allow user to indicate if the legacy code is being entered during import or
35-
it is a new one that is BIP-0039 compatible. For exporting, only the new format
36-
will be used, so this is not an issue.
47+
CS = ENT / 32
48+
MS = (ENT + CS) / 11
3749

38-
==Rationale==
50+
| ENT | CS | ENT+CS | MS |
51+
+-------+----+--------+------+
52+
| 128 | 4 | 132 | 12 |
53+
| 160 | 5 | 165 | 15 |
54+
| 192 | 6 | 198 | 18 |
55+
| 224 | 7 | 231 | 21 |
56+
| 256 | 8 | 264 | 24 |
3957

40-
Our proposal is inspired by implementation used in Electrum, but we enhanced
41-
the wordlist and algorithm so it meets the following criteria:
58+
== Wordlist ==
59+
60+
In previous section we described how to pick words from a wordlist. Now we
61+
describe how does a good wordlist look like.
4262

4363
a) smart selection of words
4464
- wordlist is created in such way that it's enough to type just first four
@@ -55,90 +75,45 @@ c) sorted wordlists
5575
(i.e. implementation can use binary search instead of linear search)
5676
- this also allows trie (prefix tree) to be used, e.g. for better compression
5777

58-
d) localized wordlists
59-
- we would like to allow localized wordlists, so it is easier for users
60-
to remember the code in their native language
61-
- by using wordlists with no colliding words among languages, it's easy to
62-
determine which language was used just by checking the first word of
63-
the sentence
64-
65-
e) mnemonic checksum
66-
- this leads to better user experience, because user can be notified
67-
if the mnemonic sequence is wrong, instead of showing the confusing
68-
data generated from the wrong sequence.
69-
70-
f) seed stretching
71-
- before the encoding and after the decoding the binary sequence is
72-
stretched using a symmetric cipher (Rijndael) in order to prevent
73-
brute-force attacks in case some of the mnemonic words are leaked
74-
- this also provides a method for password protection of the seed
78+
Wordlist can contain native characters, but they have to be encoded using UTF-8.
7579

76-
==Specification==
80+
== From mnemonic to seed ==
7781

78-
<pre>
79-
Our proposal implements two methods - "encode" and "decode".
80-
81-
The first method takes a binary data which have length of 128, 192 or 256 bits
82-
and returns a sentence that consists of 12, 18 or 24 words from the wordlist.
83-
84-
The second method takes sentences generated by the first method (number of words
85-
is 12, 18 or 24 and reconstructs the original binary data of length 128, 192 or
86-
256 bits.
82+
User can decide to protect his mnemonic by passphrase. If passphrase is not present
83+
an empty string "" is used instead.
8784

88-
Words can repeat in the sentence more than one time.
85+
To create binary seed from mnemonic, we use HMAC-SHA512 function with string
86+
"mnemonic" + passphrase (in UTF-8) as key and mnemonic sentence (again in UTF-8)
87+
as the message. We perform 10000 HMAC rounds and use the final result as the binary
88+
seed.
8989

90-
Wordlist contains 2048 words (instead of 1626 words in Electrum), allowing
91-
the code to compute the checksum of the whole mnemonic sequence.
92-
Each 32 bits of input data add 1 bit of checksum.
90+
Pseudocode:
9391

94-
See the following table for relation between input lengths, output lengths and
95-
checksum sizes:
92+
K = "mnemonic" + passphrase
93+
M = mnemonic_sentence
94+
for i in 1 ... 10000 do
95+
M = hmac_sha512(K, M)
96+
done
97+
seed = M
9698

97-
+--------+---------+---------+----------+
98-
| input | input | output | checksum |
99-
| (bits) | (bytes) | (words) | (bits) |
100-
+--------+---------+---------+----------+
101-
| 128 | 16 | 12 | 4 |
102-
| 192 | 24 | 18 | 6 |
103-
| 256 | 32 | 24 | 8 |
104-
+--------+---------+---------+----------+
105-
</pre>
99+
This seed can be later used to generate deterministic wallets using BIP-0032 or
100+
similar methods.
106101

107-
===Algorithm:===
102+
The conversion of the mnemonic sentence to binary seed is completely independent
103+
from generating the sentence. This results in rather simple code, there are no
104+
constraints on sentence structure and clients are free to implement their own
105+
wordlists or even whole sentence generators (they'll lose the proposed method
106+
for typo detection in that case, but they can come up with their own).
108107

109-
<pre>
110-
Encoding:
111-
1. Read input data (I).
112-
2. Make sure its length (L) is 128, 192 or 256 bits.
113-
3. Encrypt input data 10000x with Rijndael (ECB mode).
114-
Set key to SHA256 hash of string ("mnemonic" + user_password).
115-
Set block size to input size (that's why Rijndael is used, not AES).
116-
4. Compute the length of the checkum (LC). LC = L/32
117-
5. Split I into chunks of LC bits (I1, I2, I3, ...).
118-
6. XOR them altogether and produce the checksum C. C = I1 xor I2 xor I3 ... xor In.
119-
7. Concatenate I and C into encoded data (E). Length of E is divisable by 33 bits.
120-
8. Keep taking 11 bits from E until there are none left.
121-
9. Treat them as integer W, add word with index W to the output.
122-
123-
Decoding:
124-
1. Read input mnemonic (M).
125-
2. Make sure the number of words is 12, 18 or 24.
126-
3. Figure out word indexes in a dictionary and output them as binary stream E.
127-
4. Length of E (L) is divisable by 33 bits.
128-
5. Split E into two parts: B and C, where B are first L/33*32 bits, C are last L/33 bits.
129-
6. Make sure C is the checksum of B (using the step 5 from the above paragraph).
130-
7. If it's not we have invalid mnemonic code.
131-
8. Treat B as binary data.
132-
9. Decrypt this data 10000x with Rijndael (ECB mode),
133-
use the same parameters as used in step 3 of encryption.
134-
10. Return the result as output.
135-
</pre>
108+
Described method also provides plausable deniability, because every passphrase
109+
generates a valid seed (and thus deterministic wallet) but only the correct one
110+
will make the desired wallet available.
136111

137-
==Test vectors==
112+
== Test vectors ==
138113

139114
See https://github.com/trezor/python-mnemonic/blob/master/vectors.json
140115

141-
==Reference Implementation==
116+
== Reference Implementation ==
142117

143118
Reference implementation including wordlists is available from
144119

generate_vectors.py

+12-15
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,15 @@
44
from random import choice
55
from mnemonic import Mnemonic
66

7+
def process(data, lst):
8+
code = mnemo.to_mnemonic(unhexlify(data))
9+
seed = hexlify(Mnemonic.to_seed(code))
10+
print 'input : %s (%d bits)' % (data, len(data) * 4)
11+
print 'mnemonic : %s (%d words)' % (code, len(code.split(' ')))
12+
print 'seed : %s (%d bits)' % (seed, len(data) * 4)
13+
print
14+
lst.append((data, code, seed))
15+
716
if __name__ == '__main__':
817
out = {}
918

@@ -15,23 +24,11 @@
1524
data = []
1625
for l in range(16, 32 + 1, 8):
1726
for b in ['00', '7f', '80', 'ff']:
18-
data = (b * l)
19-
code = mnemo.encode(unhexlify(data))
20-
21-
print 'input : %s (%d bits)' % (data, len(data) * 4)
22-
print 'mnemonic : %s (%d words)' % (code, len(code.split(' ')))
23-
24-
out[lang].append((data, code))
27+
process(b * l, out[lang])
2528

2629
# Generate random seeds
2730
for i in range(12):
28-
data = ''.join(chr(choice(range(0, 256))) for _ in range(8 * (i % 3 + 2)))
29-
print 'input : %s (%d bits)' % (hexlify(data), len(data) * 8)
30-
code = mnemo.encode(data)
31-
print 'mnemonic : %s (%d words)' % (code, len(code.split(' ')))
32-
print
33-
34-
out[lang].append((hexlify(data), code))
31+
data = hexlify(''.join(chr(choice(range(0, 256))) for _ in range(8 * (i % 3 + 2))))
32+
process(data, out[lang])
3533

3634
json.dump(out, open('vectors.json', 'w'), sort_keys=True, indent=4,)
37-

mnemonic/mnemonic.py

+38-61
Original file line numberDiff line numberDiff line change
@@ -18,21 +18,18 @@
1818
# WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
1919
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
2020
#
21-
# The code is inspired by Electrum mnemonic code by ThomasV
22-
#
2321

24-
import struct
25-
import binascii
2622
import os
2723
import hashlib
28-
import rijndael
24+
import hmac
25+
import binascii
2926

3027
class Mnemonic(object):
3128
def __init__(self, language):
3229
self.radix = 2048
3330
self.wordlist = [w.strip() for w in open('%s/%s.txt' % (self._get_directory(), language), 'r').readlines()]
3431
if len(self.wordlist) != self.radix:
35-
raise Exception('Wordlist should contain %d words. Contains %d words.' % (self.radix, len(self.wordlist)))
32+
raise Exception('Wordlist should contain %d words, but it contains %d words.' % (self.radix, len(self.wordlist)))
3633

3734
@classmethod
3835
def _get_directory(cls):
@@ -54,63 +51,43 @@ def detect_language(cls, code):
5451

5552
raise Exception("Language not detected")
5653

57-
def checksum(self, b):
58-
l = len(b) / 32
59-
c = 0
60-
for i in range(32):
61-
c ^= int(b[i * l:(i + 1) * l], 2)
62-
c = bin(c)[2:].zfill(l)
63-
return c
64-
65-
def stretch(self, data, passphrase):
66-
key = hashlib.sha256("mnemonic" + passphrase).digest()
67-
cipher = rijndael.Rijndael(key, block_size=len(data))
68-
for _ in range(10000):
69-
data = cipher.encrypt(data)
70-
return data
71-
72-
def unstretch(self, data, passphrase):
73-
key = hashlib.sha256("mnemonic" + passphrase).digest()
74-
cipher = rijndael.Rijndael(key, block_size=len(data))
75-
for _ in range(10000):
76-
data = cipher.decrypt(data)
77-
return data
54+
def generate(self, strength = 128):
55+
if strength % 32 > 0:
56+
raise Exception('Strength should be divisible by 32, but it is not (%d).' % strength)
57+
return self.to_mnemonic(os.urandom(strength / 8))
7858

79-
def encode(self, data, passphrase=''):
80-
if len(self.wordlist) != self.radix:
81-
raise Exception('Wordlist does not contain %d items!' % self.radix)
82-
if len(data) * 8 not in (128, 192, 256):
83-
raise Exception('Data is not 128, 192 or 256 bits long!')
84-
data = self.stretch(data, passphrase)
85-
b = bin(int(binascii.hexlify(data), 16))[2:].zfill(len(data) * 8)
86-
assert len(b) % 32 == 0
87-
c = self.checksum(b)
88-
assert len(c) == len(b) / 32
89-
e = b + c
90-
assert len(e) % 33 == 0
59+
def to_mnemonic(self, data):
60+
if len(data) % 4 > 0:
61+
raise Exception('Data length in bits should be divisible by 32, but it is not (%d bytes = %d bits).' % (len(data), len(data) * 8))
62+
h = hashlib.sha256(data).digest()
63+
b = bin(int(binascii.hexlify(data), 16))[2:].zfill(len(data) * 8) + \
64+
bin(int(binascii.hexlify(h), 16))[2:][:len(data) * 8 / 32]
9165
result = []
92-
for i in range(len(e) / 11):
93-
idx = int(e[i * 11:(i + 1) * 11], 2)
66+
for i in range(len(b) / 11):
67+
idx = int(b[i * 11:(i + 1) * 11], 2)
9468
result.append(self.wordlist[idx])
9569
return ' '.join(result)
9670

97-
def decode(self, code, passphrase=''):
98-
if len(self.wordlist) != self.radix:
99-
raise Exception('Wordlist does not contain %d items!' % self.radix)
100-
code = [w for w in code.split(' ') if w]
101-
if len(code) not in (12, 18, 24):
102-
raise Exception('Mnemonic code is not 12, 18 or 24 words long!')
103-
e = [ bin(self.wordlist.index(w))[2:].zfill(11) for w in code ]
104-
e = ''.join(e)
105-
l = len(e)
106-
assert l % 33 == 0
107-
b = e[:l / 33 * 32]
108-
c = e[l / 33 * 32:]
109-
assert len(b) % 32 == 0
110-
assert len(c) == len(b) / 32
111-
if self.checksum(b) != c:
112-
raise Exception('Mnemonic checksum error')
113-
b = hex(int(b, 2))[2:].rstrip('L').zfill(len(b) / 4)
114-
data = binascii.unhexlify(b)
115-
data = self.unstretch(data, passphrase)
116-
return data
71+
def check(self, mnemonic):
72+
mnemonic = mnemonic.split(' ')
73+
if len(mnemonic) % 3 > 0:
74+
return False
75+
try:
76+
idx = map(lambda x: bin(self.wordlist.index(x))[2:].zfill(11), mnemonic)
77+
except:
78+
return False
79+
b = ''.join(idx)
80+
l = len(b)
81+
d = b[:l / 33 * 32]
82+
h = b[-l / 33:]
83+
nd = binascii.unhexlify(hex(int(d, 2))[2:].rstrip('L').zfill(l / 33 * 8))
84+
nh = bin(int(hashlib.sha256(nd).hexdigest(), 16))[2:2 + l / 33]
85+
return h == nh
86+
87+
@classmethod
88+
def to_seed(cls, mnemonic, passphrase = ''):
89+
k = 'mnemonic' + passphrase
90+
m = mnemonic
91+
for i in range(10000):
92+
m = hmac.new(k, m, hashlib.sha512).digest()
93+
return m

0 commit comments

Comments
 (0)