adds base64 decoding (fixes #5656) #9157

amartgon · 2014-11-26T08:42:09Z

nalimilan · 2014-11-26T08:54:01Z

base/base64.jl

 #############################################################################

 # Based on code by Stefan Karpinski from https://github.com/hackerschool/WebSockets.jl (distributed under the same MIT license as Julia)

 const b64chars = ['A':'Z','a':'z','0':'9','+','/']

+const revb64chars = Dict('A'=> 0, 'B'=> 1, 'C'=> 2, 'D'=> 3, 'E'=> 4, 'F'=> 5, 'G'=> 6, 'H'=> 7, 'I'=> 8, 'J'=> 9, 


I've not really looked at the code, but wouldn't using an array be much more efficient here? You could convert the character to an Int and directly index into the array, with a few if to transform the codepoint of the character into a position in the array (i.e. one condition for upper-case letters, one for lower-case letters, one for digits, and one for + and /).

timholy · 2014-11-26T09:05:25Z

There's an implementation in the Codecs.jl package. Out of curiosity, why is it necessary to have this in base given that Codecs exists?

ivarne · 2014-11-26T09:05:29Z

base/base64.jl

+                         'e'=> 30, 'f'=> 31, 'g'=> 32, 'h'=> 33, 'i'=> 34, 'j'=> 35, 'k'=> 36, 'l'=> 37, 'm'=> 38, 'n'=> 39, 
+                         'o'=> 40, 'p'=> 41, 'q'=> 42, 'r'=> 43, 's'=> 44, 't'=> 45, 'u'=> 46, 'v'=> 47, 'w'=> 48, 'x'=> 49, 
+                         'y'=> 50, 'z'=> 51, '0'=> 52, '1'=> 53, '2'=> 54, '3'=> 55, '4'=> 56, '5'=> 57, '6'=> 58, '7'=> 59,
+                         '8'=> 60, '9'=> 61, '+'=> 62, '/'=> 63)


I think you'll get better performance if you use an array and the integer value of the character code to lookup the values. (Be sure to use an unchecked conversion from Char to Int, so that you only check for valid range once.)

EDIT: As the values are sequential, you'll might get better performance using @nalimilan's trick with if 'A' <= x <= 'Z' y = x - 'A'

amartgon · 2014-11-26T09:19:05Z

El 26/11/2014 10:05, Tim Holy escribió:

There's an implementation in the Codecs.jl package. Out of curiosity,
why is it necessary to have this in base given that Codecs exists?

I think the issue was created because Multimedia module (in base) uses
the base64 encoder. An then it was confusing for users not having the
corresponding decoder.

ivarne · 2014-11-26T09:20:54Z

base/base64.jl

-# someday reads?) into base64 encoded (decoded) data send to a stream.
-# (You must close the pipe to complete the encode, separate from
-# closing the target stream).  We also have a function base64(f,
+# Base64EncodePipe is a pipe-like IO object, which converts into base64 data sent 


You have a trailing whitespace on this line (and a few other lines), and it makes our CI fail (on purpose, because many developers sees trailing whitespace as a poor sign of code quality).

If you type mv .git/hooks/pre-commit.sample .git/hooks/pre-commit at the root of your repository, you'll get a warning when trying to commit trailing whitespace in that local repository.

timholy · 2014-11-26T10:21:43Z

Sounds reasonable, but I bet the implementation in Codecs solves the performance issues noticed here.

…hitespace

Conflicts: base/deprecated.jl

amartgon · 2014-11-29T09:45:43Z

Hi again.
Thanks for the comments. I've updated the code in the following ways:

Improve efficiency by replacing the dictionary by an array, in he same way as is done in the Codecs module
removed trailing whitespaces
Moved the old aliases as const to deprecated.jl
Replaced references to the old aliases in other files.

timholy · 2014-12-14T01:49:20Z

base/base64.jl

+    end
+    @inbounds u = revb64chars[encvec[1]]
+    @inbounds v = revb64chars[encvec[2]]
+    decvec = [(u << 2) | (v >> 4)]


Does this mean that a whole array needs to be allocated for every 4 bytes? That seems unlikely to have good performance.

timholy · 2014-12-14T01:51:44Z

Sorry this sat so long. I restarted the failed Travis job, we'll see if it has more luck this time.

amartgon · 2014-12-15T22:51:25Z

After this modification, the arrays are created only once and reused. According to my (very basic and incomplete) performance tests the code is now about 40% faster than last version.

Conflicts: NEWS.md base/deprecated.jl

amartgon · 2014-12-26T09:39:40Z

Hi! Any idea why the AppVeyor build failed? Is it related to the changes in the base64 module?

ivarne · 2014-12-26T09:52:50Z

Hard to say. Usually Julia only changes that causes a crash is considered a bug, but as you use @inbounds that might not be the case. We have some random issues on AppVeyor so I restarted the build, to see if it fails a second time.

adds base64 decoding (fixes #5656)

timholy · 2014-12-31T04:12:40Z

Sorry it took so long to merge this. Great stuff, many thanks!

stevengj · 2014-12-31T19:55:45Z

base/deprecated.jl

@@ -246,6 +246,10 @@ const Uint128 = UInt128
 @deprecate iround(x)              round(Integer,x)
 @deprecate iround{T}(::Type{T},x) round(T,x)

+export Base64Pipe, base64
+const Base64Pipe = Base64EncodePipe
+const base64 = base64encode


Shouldn't this be a @deprecate base64 ..., not a const? That way people will get a warning when they call base64.

ScottPJones · 2015-05-02T13:27:10Z

Would there be any interest in adding RFC4648 compliant "URL and Filename safe" base64 encoding to this (i.e. base64url)? This is a fairly common use case...
It is only using a different array for the encoding... (i.e. -_ instead of +/).
I had to write my own, but it would be nice if it were part of Julia instead.

tkelman · 2015-05-03T09:28:04Z

Not sure base64 encoding is really fundamental enough to need in base at all. If Codecs.jl has some existing base64 encoding functionality, that would be a good place to extend it with additional variants.

ScottPJones · 2015-05-03T18:39:47Z

@tkelman Frankly, with all the other stuff in Base, (multimedia.jl?), and with base64 already in Base, that's where I thought it would have to be... I'll look at Codecs.jl [but to be honest... how many people looking for base64 encoding are going to think about codecs? ;-) ] Thanks!

tkelman · 2015-05-03T18:54:58Z

Package discoverability is a problem. There are multiple issues open on that.

base64 is in base because no one has tried to remove stringmime which appears to be the only place in base that it gets used. The MIME types are used by IJulia and a few other places for rich output display, but I imagine they could be adapted to get their base64 encoding from an external package if they needed to. #5155 is a very slow and arduous issue to make progress on, since removing code from base is harder to do in a painless way than adding code to base. But going further in the direction of "anything anyone ever found useful enough to implement in a PR to JuliaLang/julia" is probably not ideal for long-term maintainability.

ScottPJones · 2015-05-03T19:03:07Z

@tkelman I agree totally... I think there's way too much stuff there as it is! ;-) I'm totally fine with adding RFC4648 base64url to Codecs.jl, if people think it would be useful (besides me, who's already using it in my own ocde), and that is the right place.

timholy · 2015-05-03T19:20:44Z

Agreed with the package idea.

stevengj · 2015-05-19T14:44:12Z

base/base64.jl

+# Decodes a base64-encoded string
+function base64decode(s)
+    b = IOBuffer(s)
+    decoded = readall(Base64DecodePipe(b))


I think this is wrong. It should be readbytes, not readall. The Base64-encoded data need not produce a valid UTF-8 string.

Yes, that's definitely wrong... esp. since most uses of base64 encoding are to store binary data, not Unicode text...

…uliaLang#9157)

adds base64 decoding (fixes JuliaLang#5656)

9d4d0c3

nalimilan reviewed Nov 26, 2014
View reviewed changes

ivarne reviewed Nov 26, 2014
View reviewed changes

amartinez added 3 commits November 29, 2014 00:18

improves efficiency, moves alias to deprecated.jl, removes trailing w…

f529f93

…hitespace

Merge remote-tracking branch 'upstream/master' into base64_decode_#5656

dbe0b02

Conflicts: base/deprecated.jl

update old names in other files

487c349

timholy reviewed Dec 14, 2014
View reviewed changes

avoid creating new array for each 4 byte block

14e2ced

Merge remote-tracking branch 'upstream/master' into base64_decode_#5656

58956c0

Conflicts: NEWS.md base/deprecated.jl

timholy added a commit that referenced this pull request Dec 31, 2014

Merge pull request #9157 from amartgon/base64_decode_#5656

3109867

adds base64 decoding (fixes #5656)

timholy merged commit 3109867 into JuliaLang:master Dec 31, 2014

stevengj reviewed Dec 31, 2014
View reviewed changes

amartgon deleted the base64_decode_#5656 branch January 20, 2015 17:08

stevengj reviewed May 19, 2015
View reviewed changes

stevengj added a commit to stevengj/julia that referenced this pull request May 19, 2015

fix base64decode to return array of decoded bytes, not a string (see J…

a4ba697

…uliaLang#9157)

stevengj mentioned this pull request May 19, 2015

fix base64decode to return array of decoded bytes #11349

Merged

mbauman pushed a commit to mbauman/julia that referenced this pull request Jun 6, 2015

fix base64decode to return array of decoded bytes, not a string (see J…

244d4bf

…uliaLang#9157)

tkelman pushed a commit to tkelman/julia that referenced this pull request Jun 6, 2015

fix base64decode to return array of decoded bytes, not a string (see J…

cf21209

…uliaLang#9157)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adds base64 decoding (fixes #5656) #9157

adds base64 decoding (fixes #5656) #9157

amartgon commented Nov 26, 2014

nalimilan Nov 26, 2014

timholy commented Nov 26, 2014

ivarne Nov 26, 2014

amartgon commented Nov 26, 2014

ivarne Nov 26, 2014

timholy commented Nov 26, 2014

amartgon commented Nov 29, 2014

timholy Dec 14, 2014

timholy commented Dec 14, 2014

amartgon commented Dec 15, 2014

amartgon commented Dec 26, 2014

ivarne commented Dec 26, 2014

timholy commented Dec 31, 2014

stevengj Dec 31, 2014

ScottPJones commented May 2, 2015

tkelman commented May 3, 2015

ScottPJones commented May 3, 2015

tkelman commented May 3, 2015

ScottPJones commented May 3, 2015

timholy commented May 3, 2015

stevengj May 19, 2015

ScottPJones May 19, 2015

adds base64 decoding (fixes #5656) #9157

adds base64 decoding (fixes #5656) #9157

Conversation

amartgon commented Nov 26, 2014

nalimilan Nov 26, 2014

Choose a reason for hiding this comment

timholy commented Nov 26, 2014

ivarne Nov 26, 2014

Choose a reason for hiding this comment

amartgon commented Nov 26, 2014

ivarne Nov 26, 2014

Choose a reason for hiding this comment

timholy commented Nov 26, 2014

amartgon commented Nov 29, 2014

timholy Dec 14, 2014

Choose a reason for hiding this comment

timholy commented Dec 14, 2014

amartgon commented Dec 15, 2014

amartgon commented Dec 26, 2014

ivarne commented Dec 26, 2014

timholy commented Dec 31, 2014

stevengj Dec 31, 2014

Choose a reason for hiding this comment

ScottPJones commented May 2, 2015

tkelman commented May 3, 2015

ScottPJones commented May 3, 2015

tkelman commented May 3, 2015

ScottPJones commented May 3, 2015

timholy commented May 3, 2015

stevengj May 19, 2015

Choose a reason for hiding this comment

ScottPJones May 19, 2015

Choose a reason for hiding this comment