-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add isvalid(Type, value) methods, to replace is_valid_* #11241
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It strikes me that we could check for valid ASCII much faster than calling
byte_string_classify
. Something like:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm quite aware that there are many more performance enhancements that can be done, all over the string and character handling code... however, I was told numerous times to keep PRs to single issues... I'd already planned on improving that, as soon as this and #11004 are merged in...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this doesn't have to be in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I don't think you want to mix two or more major changes in the same PR. However, if in the course of a PR cleaning up some issue, you notice a minor (few line) improvement somewhere in the same functions, it's not usually a problem to combine that into the same PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, as you well know by now, I'm still learning my way around the preferred way of contributing here! I was planning on making another round of performance optimizations, once this and #11004 are merged (hopefully!) in... using some of the stuff I've learned in the last 3 weeks...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please calm down!
I wasn't threatening ceasing contributing - I said that I wouldn't have as much time, which is simply the facts.
I also did NOT say that people haven't been responsive in general to me... and in this very thread I thanked Tony and you for the thorough review. I have been nothing but appreciative, in public and private, for the time people have spent giving constructive criticism, and have tried to respond as quickly as possible (and in fact, lost a lot of time due to Julia bugs, after having followed someone's suggestion to extend
convert
instead of having separateutfx_to_utfy
functions)Except for that issue, the code for #11004 hasn't changed in over a week.
I have been donating a lot of my time as well, trying to fix bugs in Julia, and solve some rather severe performance issues (whether or not you believe that string handling performance is important or not, is another issue).
I was just describing my situation... simply the facts... the client already gave up a week and a half ago on using Julia for a part of the project, because of issues with ODBC.jl being broken due to tupocalypse, string handling performance, and lack of decimal floating point support (even with DecFP.jl, that's very new, and is not integrated into JSON.jl nor ODBC.jl).
I've got a conference call first thing tomorrow morning, and I'm just worried that they will decide not to go ahead with using Julia for other parts of the project.
If they do (which is not up to me at all), then my involvement for now trying to improve Julia will necessarily drop down to what I can do in my spare time. That's NOT an ultimatum, just the sad situation I'm in... and I've been doing everything in my power to prevent that happening.
I'd much prefer to do nothing but help make Julia the No. 1 language for string/database processing, than be programming in C++ and Python...
About the technical note: by adding a new method to
convert
, would that be able to override all the code that uses the code in utf8.jl, utf16.jl, and utf32.jl for conversions?I didn't realize that was possible - or even if possible, recommended (I've heard a lot about "monkey-patching" around here... a term I'd never come across before)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the technical note: yes and no.
utf16.jl
currently definesconvert(::Type{UTF16String}, s::AbstractString)
. If you define a more specialized/optimized conversion routineBase.convert(::Type{UTF16String}, s::UTF8String)
, for example, and a function in Base subsequently calls e.g.utf16(....some UTF8String....)
, then it will dispatch to your method. The exception would be if that function in Base has already been compiled and inlined the old conversion call prior to your new definition. This might have happened, for example, on Windows for a Win32 filesystem call that was already made while booting Julia, but of course the argument string conversion is negligible anyway for a filesystem call.Adding new methods for more specific argument signatures is not the same thing as "monkey patching" in the Python sense. It's sort of analogous to adding a new subtype of an existing type and passing it to an existing function that now dispatches to the new subtype's method. Except that in Julia you don't have to add a new type to add new methods.
I wish you would stop with the comments to the effect that you are the only one who cares about string-handling performance. As I said in #11004, everyone cares, but we also realize that there are tradeoffs to code bloat for the sake of performance. Sometimes there is no alternative, but some data and effort is required to make that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never said that I was the only one who cares about string-handling performance, a number of people have already told me that they were very happy to see me trying to do something about it...
My point has been all along that a lot of people do care about the string-handling performance, which is why some attention should be paid to it... (I've also said, a number of times, that current poor performance wasn't because the Julia team did a bad job, but rather, it just hasn't been the focus of their time... the places where they have spent their time, is pretty darned remarkable, IMO).
I'm well aware of said trade-offs, I've spent quite a long time developing and maintaining a quite large code base... if you (or anybody) can point me to a better way of writing those conversion functions, so that Julia takes care of spitting out the specialized cases, that doesn't kill the performance gains, then I'm all for it. #11004 was my very first Julia code that wasn't just wrappers, and I've already got some ideas about how I might be able to condense it (although the bug in Julia with inference might have to get fixed first, and I have no idea [yet] how to accomplish that). The code is very simple, and has more inline comments than what was there before... so I don't understand the worries about maintainability.
About adding more specialized methods, is there any way of telling just where the old code might still be getting used? There is so much in Base, that it seems like quite a lot might already be going directly to the old code, no matter what I do, not just some Win32 calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just deprecating them? That should print a backtrace for every place the function is called from. Not sure about the interaction with precompilation, though, so you may want (re)move the sysimage before starting Julia.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just profile your code. You don't care if the old (slower) methods are being used unless it is performance critical, so you only care about the routines where it is spending a lot of time. I'm skeptical that you will notice an issue, because Base only uses UTF-16 for calling Windows routines.
@nalimilan, deprecating a posteriori them won't help won't help for methods that have already been compiled.