-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
adds base64 decoding (fixes #5656) #9157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
############################################################################# | ||
|
||
# Based on code by Stefan Karpinski from https://github.com/hackerschool/WebSockets.jl (distributed under the same MIT license as Julia) | ||
|
||
const b64chars = ['A':'Z','a':'z','0':'9','+','/'] | ||
|
||
const revb64chars = Dict('A'=> 0, 'B'=> 1, 'C'=> 2, 'D'=> 3, 'E'=> 4, 'F'=> 5, 'G'=> 6, 'H'=> 7, 'I'=> 8, 'J'=> 9, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not really looked at the code, but wouldn't using an array be much more efficient here? You could convert the character to an Int
and directly index into the array, with a few if
to transform the codepoint of the character into a position in the array (i.e. one condition for upper-case letters, one for lower-case letters, one for digits, and one for + and /).
There's an implementation in the Codecs.jl package. Out of curiosity, why is it necessary to have this in base given that Codecs exists? |
'e'=> 30, 'f'=> 31, 'g'=> 32, 'h'=> 33, 'i'=> 34, 'j'=> 35, 'k'=> 36, 'l'=> 37, 'm'=> 38, 'n'=> 39, | ||
'o'=> 40, 'p'=> 41, 'q'=> 42, 'r'=> 43, 's'=> 44, 't'=> 45, 'u'=> 46, 'v'=> 47, 'w'=> 48, 'x'=> 49, | ||
'y'=> 50, 'z'=> 51, '0'=> 52, '1'=> 53, '2'=> 54, '3'=> 55, '4'=> 56, '5'=> 57, '6'=> 58, '7'=> 59, | ||
'8'=> 60, '9'=> 61, '+'=> 62, '/'=> 63) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you'll get better performance if you use an array and the integer value of the character code to lookup the values. (Be sure to use an unchecked conversion from Char
to Int
, so that you only check for valid range once.)
EDIT: As the values are sequential, you'll might get better performance using @nalimilan's trick with if 'A' <= x <= 'Z' y = x - 'A'
El 26/11/2014 10:05, Tim Holy escribió:
I think the issue was created because Multimedia module (in base) uses |
# someday reads?) into base64 encoded (decoded) data send to a stream. | ||
# (You must close the pipe to complete the encode, separate from | ||
# closing the target stream). We also have a function base64(f, | ||
# Base64EncodePipe is a pipe-like IO object, which converts into base64 data sent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have a trailing whitespace on this line (and a few other lines), and it makes our CI fail (on purpose, because many developers sees trailing whitespace as a poor sign of code quality).
If you type mv .git/hooks/pre-commit.sample .git/hooks/pre-commit
at the root of your repository, you'll get a warning when trying to commit trailing whitespace in that local repository.
Sounds reasonable, but I bet the implementation in Codecs solves the performance issues noticed here. |
Hi again.
|
end | ||
@inbounds u = revb64chars[encvec[1]] | ||
@inbounds v = revb64chars[encvec[2]] | ||
decvec = [(u << 2) | (v >> 4)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that a whole array needs to be allocated for every 4 bytes? That seems unlikely to have good performance.
Sorry this sat so long. I restarted the failed Travis job, we'll see if it has more luck this time. |
After this modification, the arrays are created only once and reused. According to my (very basic and incomplete) performance tests the code is now about 40% faster than last version. |
Conflicts: NEWS.md base/deprecated.jl
Hi! Any idea why the AppVeyor build failed? Is it related to the changes in the base64 module? |
Hard to say. Usually Julia only changes that causes a crash is considered a bug, but as you use |
adds base64 decoding (fixes #5656)
Sorry it took so long to merge this. Great stuff, many thanks! |
@@ -246,6 +246,10 @@ const Uint128 = UInt128 | |||
@deprecate iround(x) round(Integer,x) | |||
@deprecate iround{T}(::Type{T},x) round(T,x) | |||
|
|||
export Base64Pipe, base64 | |||
const Base64Pipe = Base64EncodePipe | |||
const base64 = base64encode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be a @deprecate base64 ...
, not a const
? That way people will get a warning when they call base64
.
Would there be any interest in adding RFC4648 compliant "URL and Filename safe" base64 encoding to this (i.e. base64url)? This is a fairly common use case... |
Not sure base64 encoding is really fundamental enough to need in base at all. If Codecs.jl has some existing base64 encoding functionality, that would be a good place to extend it with additional variants. |
@tkelman Frankly, with all the other stuff in Base, (multimedia.jl?), and with base64 already in Base, that's where I thought it would have to be... I'll look at Codecs.jl [but to be honest... how many people looking for base64 encoding are going to think about codecs? ;-) ] Thanks! |
Package discoverability is a problem. There are multiple issues open on that. base64 is in base because no one has tried to remove |
@tkelman I agree totally... I think there's way too much stuff there as it is! ;-) I'm totally fine with adding RFC4648 base64url to Codecs.jl, if people think it would be useful (besides me, who's already using it in my own ocde), and that is the right place. |
Agreed with the package idea. |
# Decodes a base64-encoded string | ||
function base64decode(s) | ||
b = IOBuffer(s) | ||
decoded = readall(Base64DecodePipe(b)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is wrong. It should be readbytes
, not readall
. The Base64-encoded data need not produce a valid UTF-8 string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's definitely wrong... esp. since most uses of base64 encoding are to store binary data, not Unicode text...
fixes #5656