jase100k
diff --git a/‎AUTHORS
+2-3 b/‎AUTHORS
+2-3
diff --git a/‎README
+64-89 b/‎README
+64-89
diff --git a/‎generate_vectors.py
+12-15 b/‎generate_vectors.py
+12-15
diff --git a/‎mnemonic/mnemonic.py
+38-61 b/‎mnemonic/mnemonic.py
+38-61
@@ -1,3 +1,2 @@
-Pavol Rusnak <[email protected]>
-Marek Palatinus <[email protected]>
-Aaron Voisine <[email protected]>
+Marek Palatinus <[email protected]>
+Pavol Rusnak <[email protected]>
@@ -3,42 +3,62 @@
 <pre>
   BIP:     BIP-0039
   Title:   Mnemonic code for generating deterministic keys
-  Author:  Pavol Rusnak <[email protected]>
-           Marek Palatinus <[email protected]>
+  Authors: Marek Palatinus <[email protected]>
+           Pavol Rusnak <[email protected]>
+           ThomasV <[email protected]>
            Aaron Voisine <[email protected]>
   Status:  Draft
   Type:    Standards Track
   Created: 10-09-2013
 </pre>
 
-==Abstract==
+== Abstract ==
 
-This BIP proposes a scheme for translating binary data (usually master seeds
-for deterministic keys, but it can be applied to any binary data) into a group
-of easy to remember words also known as mnemonic code or mnemonic sentence.
+This BIP describes an usage of mnemonic code or mnemonic sentence - a group of
+easy to remember words - to generate deterministic wallets.
 
-==Motivation==
+It consists of two parts: generating the mnemonic and converting it into
+a binary seed. This seed can be later used to generate deterministic wallets
+using BIP-0032 or similar methods.
+
+== Motivation ==
 
 Such mnemonic code or mnemonic sentence is much easier to work with than working
 with the binary data directly (or its hexadecimal interpretation). The sentence
 could be writen down on paper (e.g. for storing in a secure location such as
 safe), told over telephone or other voice communication method, or memorized
 in ones memory (this method is called brainwallet).
 
-==Backwards Compatibility==
+== Generating the mnemonic ==
+
+First, we decide how much entropy we want mnemonic to encode. Recommended size
+is 128-256 bits, but basically any multiple of 32 bits will do. More bits
+mean more security, but also longer word sentence.
+
+We take initial entropy of ENT bits and compute its checksum by taking first
+ENT / 32 bits of its SHA256 hash. We append these bits to the end of the initial
+entropy. Next we take these concatenated bits and split them into groups of 11
+bits. Each group encodes number from 0-2047 which is a position in a wordlist.
+We convert numbers into words and use joined words as mnemonic sentence.
 
-As this BIP is written, only one Bitcoin client (Electrum) implements mnemonic
-codes, but it uses a different wordlist than the proposed one.
+The following table describes the relation between initial entropy length (ENT),
+checksum length (CS) and length of the generated mnemonic sentence (MS) in words.
 
-For compatibility reasons we propose adding a checkbox to Electrum, which will
-allow user to indicate if the legacy code is being entered during import or
-it is a new one that is BIP-0039 compatible. For exporting, only the new format
-will be used, so this is not an issue.
+CS = ENT / 32
+MS = (ENT + CS) / 11
 
-==Rationale==
+|  ENT  | CS | ENT+CS |  MS  |
++-------+----+--------+------+
+|  128  |  4 |   132  |  12  |
+|  160  |  5 |   165  |  15  |
+|  192  |  6 |   198  |  18  |
+|  224  |  7 |   231  |  21  |
+|  256  |  8 |   264  |  24  |
 
-Our proposal is inspired by implementation used in Electrum, but we enhanced
-the wordlist and algorithm so it meets the following criteria:
+== Wordlist ==
+
+In previous section we described how to pick words from a wordlist. Now we
+describe how does a good wordlist look like.
 
 a) smart selection of words
    - wordlist is created in such way that it's enough to type just first four
@@ -55,90 +75,45 @@ c) sorted wordlists
      (i.e. implementation can use binary search instead of linear search)
    - this also allows trie (prefix tree) to be used, e.g. for better compression
 
-d) localized wordlists
-   - we would like to allow localized wordlists, so it is easier for users
-     to remember the code in their native language
-   - by using wordlists with no colliding words among languages, it's easy to
-     determine which language was used just by checking the first word of
-     the sentence
-
-e) mnemonic checksum
-   - this leads to better user experience, because user can be notified
-     if the mnemonic sequence is wrong, instead of showing the confusing
-     data generated from the wrong sequence.
-
-f) seed stretching
-   - before the encoding and after the decoding the binary sequence is
-     stretched using a symmetric cipher (Rijndael) in order to prevent
-     brute-force attacks in case some of the mnemonic words are leaked
-   - this also provides a method for password protection of the seed
+Wordlist can contain native characters, but they have to be encoded using UTF-8.
 
-==Specification==
+== From mnemonic to seed ==
 
-<pre>
-Our proposal implements two methods - "encode" and "decode".
-
-The first method takes a binary data which have length of 128, 192 or 256 bits
-and returns a sentence that consists of 12, 18 or 24 words from the wordlist.
-
-The second method takes sentences generated by the first method (number of words
-is 12, 18 or 24 and reconstructs the original binary data of length 128, 192 or
-256 bits.
+User can decide to protect his mnemonic by passphrase. If passphrase is not present
+an empty string "" is used instead.
 
-Words can repeat in the sentence more than one time.
+To create binary seed from mnemonic, we use HMAC-SHA512 function with string
+"mnemonic" + passphrase (in UTF-8) as key and mnemonic sentence (again in UTF-8)
+as the message. We perform 10000 HMAC rounds and use the final result as the binary
+seed.
 
-Wordlist contains 2048 words (instead of 1626 words in Electrum), allowing
-the code to compute the checksum of the whole mnemonic sequence.
-Each 32 bits of input data add 1 bit of checksum.
+Pseudocode:
 
-See the following table for relation between input lengths, output lengths and
-checksum sizes:
+    K = "mnemonic" + passphrase
+    M = mnemonic_sentence
+    for i in 1 ... 10000 do
+        M = hmac_sha512(K, M)
+    done
+    seed = M
 
-+--------+---------+---------+----------+
-| input  |  input  | output  | checksum |
-| (bits) | (bytes) | (words) |  (bits)  |
-+--------+---------+---------+----------+
-|   128  |    16   |    12   |     4    |
-|   192  |    24   |    18   |     6    |
-|   256  |    32   |    24   |     8    |
-+--------+---------+---------+----------+
-</pre>
+This seed can be later used to generate deterministic wallets using BIP-0032 or
+similar methods.
 
-===Algorithm:===
+The conversion of the mnemonic sentence to binary seed is completely independent
+from generating the sentence. This results in rather simple code, there are no
+constraints on sentence structure and clients are free to implement their own
+wordlists or even whole sentence generators (they'll lose the proposed method
+for typo detection in that case, but they can come up with their own).
 
-<pre>
-Encoding:
-1. Read input data (I).
-2. Make sure its length (L) is 128, 192 or 256 bits.
-3. Encrypt input data 10000x with Rijndael (ECB mode).
-   Set key to SHA256 hash of string ("mnemonic" + user_password).
-   Set block size to input size (that's why Rijndael is used, not AES).
-4. Compute the length of the checkum (LC). LC = L/32
-5. Split I into chunks of LC bits (I1, I2, I3, ...).
-6. XOR them altogether and produce the checksum C. C = I1 xor I2 xor I3 ... xor In.
-7. Concatenate I and C into encoded data (E). Length of E is divisable by 33 bits.
-8. Keep taking 11 bits from E until there are none left.
-9. Treat them as integer W, add word with index W to the output.
-
-Decoding:
-1. Read input mnemonic (M).
-2. Make sure the number of words is 12, 18 or 24.
-3. Figure out word indexes in a dictionary and output them as binary stream E.
-4. Length of E (L) is divisable by 33 bits.
-5. Split E into two parts: B and C, where B are first L/33*32 bits, C are last L/33 bits.
-6. Make sure C is the checksum of B (using the step 5 from the above paragraph).
-7. If it's not we have invalid mnemonic code.
-8. Treat B as binary data.
-9. Decrypt this data 10000x with Rijndael (ECB mode),
-   use the same parameters as used in step 3 of encryption.
-10. Return the result as output.
-</pre>
+Described method also provides plausable deniability, because every passphrase
+generates a valid seed (and thus deterministic wallet) but only the correct one
+will make the desired wallet available.
 
-==Test vectors==
+== Test vectors ==
 
 See https://github.com/trezor/python-mnemonic/blob/master/vectors.json
 
-==Reference Implementation==
+== Reference Implementation ==
 
 Reference implementation including wordlists is available from
 
 
@@ -4,6 +4,15 @@
 from random import choice
 from mnemonic import Mnemonic
 
+def process(data, lst):
+    code = mnemo.to_mnemonic(unhexlify(data))
+    seed = hexlify(Mnemonic.to_seed(code))
+    print 'input    : %s (%d bits)' % (data, len(data) * 4)
+    print 'mnemonic : %s (%d words)' % (code, len(code.split(' ')))
+    print 'seed     : %s (%d bits)' % (seed, len(data) * 4)
+    print
+    lst.append((data, code, seed))
+
 if __name__ == '__main__':
     out = {}
 
@@ -15,23 +24,11 @@
         data = []
         for l in range(16, 32 + 1, 8):
             for b in ['00', '7f', '80', 'ff']:
-                data = (b * l)
-                code = mnemo.encode(unhexlify(data))
-
-                print 'input    : %s (%d bits)' % (data, len(data) * 4)
-                print 'mnemonic : %s (%d words)' % (code, len(code.split(' ')))
-
-                out[lang].append((data, code))
+                process(b * l, out[lang])
 
         # Generate random seeds
         for i in range(12):
-            data = ''.join(chr(choice(range(0, 256))) for _ in range(8 * (i % 3 + 2)))
-            print 'input    : %s (%d bits)' % (hexlify(data), len(data) * 8)
-            code = mnemo.encode(data)
-            print 'mnemonic : %s (%d words)' % (code, len(code.split(' ')))
-            print
-
-            out[lang].append((hexlify(data), code))
+            data = hexlify(''.join(chr(choice(range(0, 256))) for _ in range(8 * (i % 3 + 2))))
+            process(data, out[lang])
 
     json.dump(out, open('vectors.json', 'w'), sort_keys=True, indent=4,)
-
@@ -18,21 +18,18 @@
 # WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
 # CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 #
-# The code is inspired by Electrum mnemonic code by ThomasV
-#
 
-import struct
-import binascii
 import os
 import hashlib
-import rijndael
+import hmac
+import binascii
 
 class Mnemonic(object):
 	def __init__(self, language):
 		self.radix = 2048
 		self.wordlist = [w.strip() for w in open('%s/%s.txt' % (self._get_directory(), language), 'r').readlines()]
 		if len(self.wordlist) != self.radix:
-		    raise Exception('Wordlist should contain %d words. Contains %d words.' % (self.radix, len(self.wordlist)))
+			raise Exception('Wordlist should contain %d words, but it contains %d words.' % (self.radix, len(self.wordlist)))
 
 	@classmethod
 	def _get_directory(cls):
@@ -54,63 +51,43 @@ def detect_language(cls, code):
 
 		raise Exception("Language not detected")
 
-	def checksum(self, b):
-		l = len(b) / 32
-		c = 0
-		for i in range(32):
-			c ^= int(b[i * l:(i + 1) * l], 2)
-		c = bin(c)[2:].zfill(l)
-		return c
-
-	def stretch(self, data, passphrase):
-		key = hashlib.sha256("mnemonic" + passphrase).digest()
-		cipher = rijndael.Rijndael(key, block_size=len(data))
-		for _ in range(10000):
-			data = cipher.encrypt(data)
-		return data
-
-	def unstretch(self, data, passphrase):
-		key = hashlib.sha256("mnemonic" + passphrase).digest()
-		cipher = rijndael.Rijndael(key, block_size=len(data))
-		for _ in range(10000):
-			data = cipher.decrypt(data)
-		return data
+	def generate(self, strength = 128):
+		if strength % 32 > 0:
+			raise Exception('Strength should be divisible by 32, but it is not (%d).' % strength)
+		return self.to_mnemonic(os.urandom(strength / 8))
 
-	def encode(self, data, passphrase=''):
-		if len(self.wordlist) != self.radix:
-			raise Exception('Wordlist does not contain %d items!' % self.radix)
-		if len(data) * 8 not in (128, 192, 256):
-			raise Exception('Data is not 128, 192 or 256 bits long!')
-		data = self.stretch(data, passphrase)
-		b = bin(int(binascii.hexlify(data), 16))[2:].zfill(len(data) * 8)
-		assert len(b) % 32 == 0
-		c = self.checksum(b)
-		assert len(c) == len(b) / 32
-		e = b + c
-		assert len(e) % 33 == 0
+	def to_mnemonic(self, data):
+		if len(data) % 4 > 0:
+			raise Exception('Data length in bits should be divisible by 32, but it is not (%d bytes = %d bits).' % (len(data), len(data) * 8))
+		h = hashlib.sha256(data).digest()
+		b = bin(int(binascii.hexlify(data), 16))[2:].zfill(len(data) * 8) + \
+		    bin(int(binascii.hexlify(h), 16))[2:][:len(data) * 8 / 32]
 		result = []
-		for i in range(len(e) / 11):
-			idx = int(e[i * 11:(i + 1) * 11], 2)
+		for i in range(len(b) / 11):
+			idx = int(b[i * 11:(i + 1) * 11], 2)
 			result.append(self.wordlist[idx])
 		return ' '.join(result)
 
-	def decode(self, code, passphrase=''):
-		if len(self.wordlist) != self.radix:
-			raise Exception('Wordlist does not contain %d items!' % self.radix)
-		code = [w for w in code.split(' ') if w]
-		if len(code) not in (12, 18, 24):
-			raise Exception('Mnemonic code is not 12, 18 or 24 words long!')
-		e = [ bin(self.wordlist.index(w))[2:].zfill(11) for w in code ]
-		e = ''.join(e)
-		l = len(e)
-		assert l % 33 == 0
-		b = e[:l / 33 * 32]
-		c = e[l / 33 * 32:]
-		assert len(b) % 32 == 0
-		assert len(c) == len(b) / 32
-		if self.checksum(b) != c:
-			raise Exception('Mnemonic checksum error')
-		b = hex(int(b, 2))[2:].rstrip('L').zfill(len(b) / 4)
-		data = binascii.unhexlify(b)
-		data = self.unstretch(data, passphrase)
-		return data
+	def check(self, mnemonic):
+		mnemonic = mnemonic.split(' ')
+		if len(mnemonic) % 3 > 0:
+			return False
+		try:
+			idx = map(lambda x: bin(self.wordlist.index(x))[2:].zfill(11), mnemonic)
+		except:
+			return False
+		b = ''.join(idx)
+		l = len(b)
+		d = b[:l / 33 * 32]
+		h = b[-l / 33:]
+		nd = binascii.unhexlify(hex(int(d, 2))[2:].rstrip('L').zfill(l / 33 * 8))
+		nh = bin(int(hashlib.sha256(nd).hexdigest(), 16))[2:2 + l / 33]
+		return h == nh
+
+	@classmethod
+	def to_seed(cls, mnemonic, passphrase = ''):
+		k = 'mnemonic' + passphrase
+		m = mnemonic
+		for i in range(10000):
+			m = hmac.new(k, m, hashlib.sha512).digest()
+		return m