-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize ToString
implementation for integers
#136264
Open
GuillaumeGomez
wants to merge
2
commits into
rust-lang:master
Choose a base branch
from
GuillaumeGomez:optimize-integers-to-string
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+73
−15
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, it looks like there are two separate places we now have size and non-size optimized code for printing integers (cfgs in core/src/fmt/num.rs, and here). Could we perhaps unify on just one place where the full set of code lives?
Part of why I'm asking is that it seems like there is some amount of strange choices (IMO):
0u64.to_string()
will give me a String with capacity for at least 20 bytes)So for the byte types (u8/i8) there's actually 4 separate pieces of code that we are maintaining:
{}
formatting){}
formatting).to_string()
).to_string()
) -- defers now to core::fmt::numPlus, IIUC the signed core::fmt::num impl is now only reachable via Display, never via
.to_string()
, which also seems like an odd decision.I also don't see much in the way of rationale for why we are making certain tradeoffs (e.g., why single-byte types are special cased here, but not for Display). Maybe we can file a tracking issue of some kind and layout a plan for what we're envisioning the end state to be? The individual changes here are I guess fine, but it doesn't seem like we're moving towards a specific vision, rather tweaking to optimize a particular metric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want it to be part of this PR or as follow-up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd personally rather see a plan and cleanup work followed by "random" changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the plan is: making integer to string conversion faster. There were a few problems when I started working on this:
i8
/u8
), but since we were casting all integers intou64
before converting to string, this optimization was missed.ToString
implementation uses the same code, which relies onFormatter
, meaning that all theFormatter
code (checking the internal flags in short) was still run even though it was never actually used.The points 1. and 2. were fixed in #128204. This PR is fixing the last one.
Now about the
optimize_for_size
feature usage: considering these optimizations require specialized code, it also means that it needs a lot more code. And because the code to convert integers to string isn't the same depending on whether or not theoptimize_for_size
feature is enabled, I can't have the same code because the internal API changes.Does it answer your question? Don't hesitate if something isn't clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What benchmarks are we using to evaluate "make it faster"? I don't see results in this PR (and e.g. the description explicitly calls out not being sure how to check).
Does the formatter flag checking not get optimized out after inlining? If not, maybe we can improve on that, for example by dispatching early on defaults or similar?
I'm not personally convinced that having N different implementations (can you confirm all of the cases are at least equally covered by tests?) is worth what I'd expect to be marginal improvements (this is where concrete numbers would be useful to justify this), especially when I'd expect that in most cases if you want the fastest possible serialization, ToString is not what you want -- it's pretty unlikely that an owned String containing just the integer is all you need, and the moment you want more than that you're going to be reaching for e.g.
itoa
to avoid heap allocations etc.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I posted a comparison with and without these changes in this comment. A lot less of assembly code is generated, however, just this difference is not enough to know the impact on performance. I wrote benchmarks but this is not my specialty and all I could get was a 1-2% performance difference (which is already significant, but did I write the benches correctly? That's another question).
No and unless you add a new field to
Formatter
allowing you to know that no field was updated and that the checks should be skipped, I don't see how you could get this optimization.I'm not convinced either but since the
optimize_for_size
feature flag exists, I need to deal with it. As for test coverage, no idea foroptimize_for_size
but the conversions are tested in the "normal" case. But in any case, this code doesn't change the behaviour ofoptimize_for_size
so on this side we're good.Also, I'm not looking for the fastest implementation, I'm looking at improving the current situation which is really suboptimal. We could add a new
write_into<W: Write>(self, &mut W)
method to have something as fast asitoa
, but that's a whole other discussion and I don't plan to start it. My plan ends with this PR. Also to be noted: with this PR, the only remaining difference withitoa
is that we don't allow to write an integer into a String, everything else is the exact same.Anyway, the PR is ready, it has a visible impact on at least the generated assembly which is notably smaller by allowing to skip all
Formatter
code. It doesn't change the behaviour ofoptimize_for_size
and adds a very small amount of code. Having this kind of small optimization in places like integers to string optimization is always very welcome.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide these benchmarks and the raw numbers you produced? Perhaps add them as benches to the code, so they can be run by others, or extend rustc-perf's runtime benchmark suite.
Smaller assembly is (as you say) no real indicator of performance (though is nice) so I'm not sure it really means much by itself.
This PR is still adding implementations that could get called (regardless of optimize_for_size) that didn't exist before it (taking us from 2 to 4 impls IIUC). Can you point concretely at some test coverage for each of the 4 impls (source links)? If not, then we really ought to add it, especially when there's a bunch of specialization involved in dispatch.
I disagree with this assertion. We have to balance the cost of maintenance, and while this code is important, it sounds like we don't actually hit the itoa perf anyway with these changes. It's nice to be a bit faster, but I'm not convinced that a few percent is worth an extra 2 code paths (presuming I counted correctly) in this code, especially with rust-lang/libs-team#546 / #138215 expected to come soon and possibly add 2 more paths (or at least become the "high performance" path).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering how specific this is, not sure it's worth adding to rustc-perf. As for adding them into the codebase, I'll need to ensure that they're correctly written first.
Here is the code I used:
The results are:
I have to admit I'm a bit surprised as I didn't remember the difference to be this big... But in any case, seeing how big the difference is, I wonder if the benches are correctly written (hence why I asked help for it).
Yep, hence why I wrote benches. :)
There is no complete test for this as far as I can see, but some small checks like
tests/ui/traits/to-str.rs
andtest_simple_types
inlibrary/alloc/tests/string.rs
.Might be worth adding one?
I can help with maintaining this code. The plan is not to be as good as
itoa
(which isn't possible anyway with the current API) but to be as good as possible in the current limitations. The improvements seem noticeable and I think are worth it. I also think that doing integers to string conversion is very common, and I even if we provide new APIs to handle them (which would be nice), this code is common enough to make a nice impact in existing codebases.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding
#[inline]
to fmt::Display for $unsigned seems to allow removing at least some of the indirection through the formatting infrastructure (down to a direct invocation of_fmt
in the user-written code calling.to_string()
, as I'd expect). It's unavoidable that we still retain the fmt::Formatter in the current version of things, but that would go away if we applied some of the changes in this PR which makes_fmt
take a&mut [...]
rather than the fmt::Formatter.IMO, the complexity this adds, especially without thorough testing being added for the new code (which it sounds like you agree may be missing!), is not something I'm prepared to r+ -- I agree that there is opportunity here, but I don't think this PR is the right shape for it. For the users that do really care about formatting integers at speed, this PR probably does very little, since it retains the inability to inline
fmt
calls on integers. Only when calling.to_string()
is this maybe a win, and I continue to maintain that's just not something you need to be particularly fast at -- the allocator-per-integer will quite possibly dominate the cost there.If you want to find a libs reviewer willing to merge these changes, I'm not going to stop them; feel free to re-roll. Otherwise revising this PR to add the test cases (or adding them separately in a different PR that's just tests) would help build confidence that this change is at least correct (and will continue to be so). But I continue to feel uncomfortable with the duplication this introduces, so I don't think it would be enough for me to approve this (but I consider it minimum necessary for std to accept a change like this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the test in another PR as a first step would be a great change in any case so let's pause this one until tests have been added.