-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make std::fs::initial_buffer_size
a method, rename it and make it public
#85084
Comments
@rustbot label C-feature-request |
Instead the |
If I'm guessing correctly here, is https://github.com/rust-lang/rust/blob/master/library/std/src/io/mod.rs#L733 the correct method to be changed? At that point we already have a buffer being passed to us so we would have to modify the existing one. |
Well, it creates a bit more assembly, sure. But since you start with an empty string it would still boil down to a single allocation and no byte-movement. The code for reallocation would be there but it wouldn't actually run. If the goal is to avoid the incremental growth of the string then that can be achieved with |
Unfortunately it does seem to boil down to a bit more than use criterion::{criterion_group, criterion_main, Criterion};
pub fn new_string1() -> String {
String::with_capacity(10)
}
pub fn new_string2() -> String {
let mut string = String::new();
string.reserve_exact(10);
string
}
fn criterion_benchmark(c: &mut Criterion) {
c.bench_function("new_string1", |b| b.iter(|| new_string1()));
c.bench_function("new_string2", |b| b.iter(|| new_string2()));
}
criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
new_string1 time: [9.5486 ns 9.5614 ns 9.5740 ns]
change: [-1.3351% -1.1752% -1.0227%] (p = 0.00 < 0.05)
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) low mild
2 (2.00%) high mild
4 (4.00%) high severe
new_string2 time: [14.134 ns 14.157 ns 14.185 ns]
change: [+0.1010% +0.3440% +0.5897%] (p = 0.01 < 0.05)
Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) low severe
4 (4.00%) low mild
2 (2.00%) high severe I ran it three times and |
That's for allocating and deallocating 10 bytes in a loop which is likely served from a thread-local allocation pool. I don't think that's realistic for the scenario given in OP since it omits the file IO and most likely involves much larger strings which will be served either from global pools or via yet another syscall to |
@the8472 can you make a diff demonstrating how exactly you imagine the new method to look like? |
Looking through impl Read for File it implements neither Also, |
I'm not sure I have the required expertise to implement what you proposed. |
That is quite a different use-case from Many uses of In those cases existing metadata (if present) plus the seek position of the file would be more appropriate to get a size hint.
|
I agree. It's better to just let the user allocate by themself. Thanks for your input. |
Just FYI, it seems this is now happening! #89582 |
There is cases where using
std::fs::read_to_string
orstd::fs::read
isn't the best thing to do or really cumbersome to do when your code is already shaped to actually create its ownString
and read a file into that. For instance imagine you have a file that you opened a while ago, did some processing on it like reading the metadata and now you decide to read it all into aString
.std::fs::read_to_string
would obviously be a very bad choice because it would open the file again. Instead it would be better to usestd::Read::read_to_string
which lets me read the content of an already existingFile
to aString
. But then I wouldn't have the benefit thatstd::fs::read_to_string
provides: an optimally allocatedString
.In that case I would really like to use the private
std::fs::initial_buffer_size
function that thestd
already provides and uses internally instd::fs::read_to_string
andstd::fs::read
so that I can efficiently preallocate my very ownString
in order to be as efficient asstd::fs::read_to_string
would be.The biggest reason that ultimately drove me to open this issue is the fact that
std::fs::read_to_string
doesn't let me keep my file that it opened. There might even be people that would open the file twice to solve this. It's obviously very bad and leads to inefficiencies.So, in summary, there is definitely cases where you want to do this:
The problem with this is that it's not efficient because
String
is not preallocating with the file's size. But as I mentioned thestd
already provides us with a function that lets us compute just that, the optimal size, but we can't use it because it's private.All in all what I'm saying is that
initial_buffer_size
being private leads to inefficiency, inconvenience, restrictiveness and possible boilerplate code because you might just copy-pasteinitial_buffer_size
into your code just to be able to preallocate efficiently. And with that, it can lead to error-proneness as well.I propose to
initial_buffer_size
a method onfs::File
taking&self
because currently it takes&fs::File
as its argument and I think it's just nicer like that.optimal_buffer_size
because I feel like that's going to be a bit clearer. Orget_optimal_buffer_size
orget_optimal_buffer_capacity
oroptimal_buffer_capacity
? I'm not 100% sure on which would be the most correct or appropriate here so I appreciate suggestions.In those names
buffer
refers to whatever structure you want to optimally allocate, not just limited toString
of course.If this is accepted, I would be willing to work on it.
The text was updated successfully, but these errors were encountered: