Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs::read_dir() and ReadDir need clear warnings about not relying on iteration order #63183

Closed
abonander opened this issue Aug 1, 2019 · 16 comments
Assignees
Labels
A-docs Area: Documentation for any part of the project, including the compiler, standard library, and tools C-enhancement Category: An issue proposing an enhancement or a PR with one. E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion.

Comments

@abonander
Copy link
Contributor

abonander commented Aug 1, 2019

The docs for fs::read_dir() and fs::ReadDir need to clearly state that iteration order is implementation defined and may vary not just between OS's but between identical versions of a directory on different filesystems.

Currently the warning on fs::read_dir() is uselessly vague:

Platform-specific behavior

This function currently corresponds to the opendir function on Unix and the FindFirstFile function on Windows. Note that, this may change in the future.

Meanwhile even this warning is completely missing from ReadDir.

Finding the semantics on ordering requires going through the docs for the target platform:

  • opendir() does not specify order but links to readdir() which states:

The order in which filenames are read by successive calls to
readdir() depends on the filesystem implementation; it is unlikely
that the names will be sorted in any fashion.

  • FindFirstFileA does not specify order but links to FindNextFile which states:

The order in which the search returns the files, such as alphabetical order, is not guaranteed, and is dependent on the file system. If the data must be sorted, the application must do the ordering after obtaining all the results.

The order in which this function returns the file names is dependent on the file system type. With the NTFS file system and CDFS file systems, the names are usually returned in alphabetical order. With FAT file systems, the names are usually returned in the order the files were written to the disk, which may or may not be in alphabetical order. However, as stated previously, these behaviors are not guaranteed.

In both cases the relevant information is two links deep which isn't really acceptable for stdlib documentation.

I want to note that I just spent 20 minutes trying to figure this out from a unit test that was dependent on this ordering and was passing locally but failing on our CI server even though both machines are running Linux x64. The problem was I'm running btrfs (which apparently returns files in lexicographical order, or just happened to return them that way) while I'm not entirely sure what filesystem the CI server is using (build jobs are running inside Docker anyway). Either way it turns out they iterated identical copies of the same directory in different orders. Frustratingly, a very similar test also depending on the ordering of read_dir() but for a different directory was passing on both machines, which lead me to initially discount that it might have been an issue of iteration order.

This issue has been assigned to @ali-raheem via this comment.

@abonander abonander changed the title Note on fs::ReadDir about iteration order fs::read_dir() and ReadDir need clear warnings about not relying on iteration order Aug 1, 2019
@abonander
Copy link
Contributor Author

abonander commented Aug 1, 2019

The good news is that this would be a good issue for a beginner since it should just be touching docs. An example for getting entries sorted by path would be nice too. I'll gladly mentor on this if no one else is available.

@jonas-schievink jonas-schievink added C-enhancement Category: An issue proposing an enhancement or a PR with one. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Aug 1, 2019
@GuillaumeGomez GuillaumeGomez added E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. A-docs Area: Documentation for any part of the project, including the compiler, standard library, and tools and removed T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Aug 1, 2019
@abonander
Copy link
Contributor Author

@GuillaumeGomez I can mentor on this if you'd like to mark it E-Mentor

@GuillaumeGomez
Copy link
Member

Oh nice! Adding right away.

@GuillaumeGomez GuillaumeGomez added the E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. label Aug 2, 2019
@ali-raheem
Copy link
Contributor

@rustbot claim

I'd like to try dealing with this, it will be my first issue :)

@rustbot rustbot self-assigned this Aug 2, 2019
@ali-raheem
Copy link
Contributor

ali-raheem commented Aug 2, 2019

@abonander

master...ali-raheem:rust-lang#63183

Is this the kind of change you were hoping for?

The changes are to ReadDir and read_dir in src/libstd/fs.rs

@abonander
Copy link
Contributor Author

@ali-raheem I'd like the message to have its own header if possible, like:

#### Note: Iteration Order is Implementation-Defined

I'd also like to maybe see an example of collecting the DirEntry::path() values to a Vec which is then sorted; it should be as simple as calling .sort() on the vec as PathBuf implements Ord. You will have to handle the io::Results returned from the iterator but that's not too hard as you can collect to an io::Result<Vec<PathBuf>> and then handle the result outside the iterator chain.

@ali-raheem
Copy link
Contributor

ali-raheem commented Aug 5, 2019

use std::fs;
use std::io;
use std::path::{Path, PathBuf};

fn main() {
    let path = Path::new(".");
    sorted_read_dir(&path);
}

fn sorted_read_dir(path: &Path) -> io::Result<()> {
    let dir = fs::read_dir(path)?;
    let mut entries: Vec<PathBuf> = dir
        .filter(Result::is_ok)
        .map(|e| e.unwrap().path())
        .collect();
    entries.sort();
    for entry in entries {
        println!("{:?}", entry);
    }
    Ok(())
}

Hows this for the example?

@abonander
Copy link
Contributor Author

abonander commented Aug 5, 2019

@ali-raheem I don't think they want examples showing throwing away io::Errors. You can do this instead:

let mut entries = dir
    .map(|res| res.map(|e| e.path()))
    // this will return the iterator collected to a `Vec` or the first error
    .collect::<Result<Vec<_>>>()?;

println!("Entries before sorting (may or may not be sorted already): {:?}", entries);

entries.sort();

println!("Entries after sorting: {:?}", entries);

@ali-raheem
Copy link
Contributor

@abonander

Okay, I've changed the example, there are two examples for read_dir now including this one.

I've added the warning to ReadDir, and read_dir (and added a slightly more verbose note).

https://github.com/rust-lang/rust/compare/master...ali-raheem:rust-lang%2363183?expand=1

Hows this looking?

@abonander
Copy link
Contributor Author

abonander commented Aug 5, 2019

@ali-raheem Looks good, though you don't need a separate function from main as you can just return an io::Result<()> from that. If you start your code block with ```rust,no_run then it will be compile-checked but not run so the directory you use doesn't have to exist. fs::read_dir() also can take a &str so you don't need to convert to Path first:

use std::{fs, io};

fn main() -> io::Result<()> { 
    // The order read_dir returns entries is not guaranteed. If reproducible
    // ordering is required the entries should be explicitly sorted.
    let mut entries = fs::read_dir(".")?
        .map(|res| res.map(|e| e.path()))
        .collect::<Result<Vec<_>, io::Error>>()?;

    println!(
        "Entries before sorting (may or may not be sorted already): {:?}",
        entries
    );
 
    entries.sort();
 
    println!("Entries after sorting: {:?}", entries);
    Ok(())
}

I would place this example as the second one, though.

I would also put the same message under the header on ReadDir at line 118 as you have on fs::read_dir() lines 1966-1967.

Finally, it's "reproducible" and not "reproducable" but I can never remember which one it is either, the Firefox spell-checker had to tell me.

@ali-raheem
Copy link
Contributor

ali-raheem commented Aug 7, 2019

@abonander Great! Here it is all together (and spell checked -- sorry!)

https://github.com/rust-lang/rust/compare/master...ali-raheem:issue%2363183?expand=1

Let me know if you're happy for me to put in a pull request.

@abonander
Copy link
Contributor Author

@ali-raheem looks good to me now, good work!

@TheOregonTrail
Copy link
Contributor

I think this can get closed since it seems to have been resolved. Thanks for help @ali-raheem and @abonander for opening the issue

@abonander
Copy link
Contributor Author

@luigisHat this shouldn't be closed until #63356 is merged.

@TheOregonTrail
Copy link
Contributor

Ah I see still getting used to how all this works. Thanks for letting me know @abonander

Centril added a commit to Centril/rust that referenced this issue Sep 23, 2019
Issue#63183: Add fs::read_dir() and ReadDir warning about iterator order + example

As per rust-lang#63183

Add warning about iterator order to read_dir and ReadDir, add example of explicitly ordering direntrys.
Centril added a commit to Centril/rust that referenced this issue Sep 24, 2019
Issue#63183: Add fs::read_dir() and ReadDir warning about iterator order + example

As per rust-lang#63183

Add warning about iterator order to read_dir and ReadDir, add example of explicitly ordering direntrys.
Centril added a commit to Centril/rust that referenced this issue Sep 24, 2019
Issue#63183: Add fs::read_dir() and ReadDir warning about iterator order + example

As per rust-lang#63183

Add warning about iterator order to read_dir and ReadDir, add example of explicitly ordering direntrys.
@dunnock
Copy link

dunnock commented Oct 20, 2019

@abonander this seems was merged to master, should it be closed now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-docs Area: Documentation for any part of the project, including the compiler, standard library, and tools C-enhancement Category: An issue proposing an enhancement or a PR with one. E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion.
Projects
None yet
Development

No branches or pull requests

7 participants