-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dsync: disable --delete on src walk error #599
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Note that I used global variables to detect errors during the walk. I'm open to some other mechanism if that seems important. Other options that I can think of are to modify libcircle so that it has some return value like mechanism, or to use libcircle reductions. |
Implement a test where dsync --delete is run, and the walk of the source tree is caused to fail by making a directory unreadable. dsync currently has a bug which causes it to fail that test, hpc#587 After the source walk fails, dsync continues with the dest walk followed by the copies and the deletes. Because the source walk failed, files or directories that are already on both source and destination, but which were not walked on the source, are missing from the source list. Those files or directories are then deleted from the destination tree. Signed-off-by: Olaf Faaland <[email protected]>
Record the ocurrence of errors in mfu_flist_walk_param_paths() and friends. Alter mfu_flist_walk_path() and friends to return an int indicating success (0) or failure. Add (void) to mfu_flist_walk_param_paths() calls where we do not use the return value. In dsync, if there was an error during the walk of the source tree, the source flist generated is likely incomplete. In the case where the source and destination have the same files and directories before dsync --delete is run, an incomplete source flist results in files being incorrectly deleted from the destination. When an error was reported for the source walk, warn the user and disable the --delete option. Continue with the dsync, so that any new files identified are still copied and metadata is synced. See rsync(1) for the --delete option, it takes the same approach. Fixes hpc#587 Signed-off-by: Olaf Faaland <[email protected]>
In places where we disregard the return value of mfu_flist_walk_param_paths(), add (void) as a visual indicator. In these cases we still need to analyze what we would do differently, if anything, with the knowledge that the walk failed. Signed-off-by: Olaf Faaland <[email protected]>
d1b0a66
to
dcd6e51
Compare
Cleanup at end of the test had been commented out accidentally. I fixed that in update to dcd6e51 |
int all_rc; | ||
MPI_Allreduce(&walk_rc, &all_rc, 1, MPI_INT, MPI_MAX, MPI_COMM_WORLD); | ||
if (all_rc > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI there is a helper function for this
int all_rc; | |
MPI_Allreduce(&walk_rc, &all_rc, 1, MPI_INT, MPI_MAX, MPI_COMM_WORLD); | |
if (all_rc > 0) { | |
if (!mfu_alltrue(walk_rc == 0, MPI_COMM_WORLD) { |
Woops, reviewed too late :) |
Fixes issues identified in the code review of hpc#599 Since WALK_RESULT is an int and can only hold one error code, but any number of errors might be encountered during a walk, there is little value in recording errno. In addition, the value of errno might have changed unless it is captured immediately after the library or system call that failed, which would require deeper code changes. Instead, set to -1 as the indicator of a failure. Signed-off-by: Olaf Faaland <[email protected]>
Fixes issues identified in the code review of hpc#599 Since WALK_RESULT is an int and can only hold one error code, but any number of errors might be encountered during a walk, there is little value in recording errno. In addition, the value of errno might have changed unless it is captured immediately after the library or system call that failed, which would require deeper code changes. Instead, set to -1 as the indicator of a failure. Signed-off-by: Olaf Faaland <[email protected]>
@daltonbohning said:
Thanks for the feedback! I created a new PR, #600 for those fixes. |
Fixes issues identified in the code review of #599 Since WALK_RESULT is an int and can only hold one error code, but any number of errors might be encountered during a walk, there is little value in recording errno. In addition, the value of errno might have changed unless it is captured immediately after the library or system call that failed, which would require deeper code changes. Instead, set to -1 as the indicator of a failure. Signed-off-by: Olaf Faaland <[email protected]>
In dsync, if there was an error during the walk of the source tree, the source flist generated is likely incomplete.
In the case where the source and destination have the same files and directories before dsync --delete is run, an incomplete source flist results in files being incorrectly deleted from the destination.
When an error was detected for the source walk, warn the user and disable the --delete option. Continue with the dsync, so that any new files identified are still copied and metadata is synced.
See rsync(1) for the --delete option, it takes the same approach.
Implement a test where dsync --delete is run, and the walk of the source tree is caused to fail by making a directory unreadable.
Before the patch, the test detects that files on the destination are deleted when they should not be, due to the error during the source walk. After the patch, the test shows the improper deletes do not occur.
Fixes #587