-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-12057. Implement command ozone debug replicas verify checksums #7748
Conversation
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/ReadReplicas.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/ReadReplicas.java
Outdated
Show resolved
Hide resolved
File dir = createDirectory(volumeName, bucketName, keyName); | ||
OzoneKeyDetails keyInfoDetails = checksumClient.getKeyDetails(volumeName, bucketName, keyName); | ||
Map<OmKeyLocationInfo, Map<DatanodeDetails, OzoneInputStream>> replicas = | ||
checksumClient.getKeysEveryReplicas(volumeName, bucketName, keyName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not totally related, but RpcClient.getKeysEveryReplicas() doesn't seems to create input stream that refreshes container cache upon failure. Could that lead problems in such a corner case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ptlrs
The Replicas.java
implementation already exists under the name ReplicasDebug.java
as part of the replicas package. Could you please move these files to that package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ptlrs for working on this.
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/ReadReplicas.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/ReadReplicas.java
Outdated
Show resolved
Hide resolved
6b649c9
to
53130e9
Compare
Thanks for the reviews @jojochuang @adoroszlai @sarvekshayr. |
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/Replicas.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/ReplicasVerify.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ptlrs for updating the patch. Suggested two minor improvements below:
da5bfee
to
4c05627
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ptlrs for improving the patch.
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/replicas/Checksums.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/replicas/ReplicasUtils.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just left some comments on the CLI construction. See HDDS-12206 for a description of how this should fit together.
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/replicas/Checksums.java
Outdated
Show resolved
Hide resolved
4c05627
to
d271183
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @ptlrs. Overall looks good as a port of the existing checks. There is a lot of room for improvement within the existing code of those checks though so I'll file a follow-up Jira to redo the read replicas/checksum verification in a more efficient manner.
Usually we don't put log messages in CLI commands:
- Users may not see them depending how the log4j config is set up.
- It's clunky to control log levels at the command line, and often a full set of levels is not needed.
- Usually just a two level system with regular messages, and additional messages controlled by a
--verbose
flag will suffice.
- Usually just a two level system with regular messages, and additional messages controlled by a
There's some older commands using the LOG
object fromHandler
that we should probably fix up later. I think we can replace IOUtils#close
with closeQuietly
in this context, since we've already completed verification at this point and failure to close the streams is not actionable by the user.
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/replicas/ReplicasVerify.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/replicas/ReplicasVerify.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/replicas/ReplicasVerify.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/replicas/ReplicasVerify.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/replicas/ReplicasVerify.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/replicas/ReplicasVerify.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/replicas/ReplicasVerify.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/replicas/KeyParts.java
Outdated
Show resolved
Hide resolved
…/replicas/ReplicasVerify.java Co-authored-by: Ethan Rose <[email protected]>
…/replicas/ReplicasVerify.java Co-authored-by: Ethan Rose <[email protected]>
…/replicas/ReplicasVerify.java Co-authored-by: Ethan Rose <[email protected]>
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/replicas/ReplicasVerifier.java
Outdated
Show resolved
Hide resolved
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/debug/replicas/ReplicasVerifier.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We can improve the implementations of existing checks that were ported over in follow up tasks.
@CommandLine.Option(names = "--checksums", | ||
description = "Do client side data checksum validation of all replicas.", | ||
// value will be true only if the "--checksums" option was specified on the CLI | ||
defaultValue = "false") | ||
private boolean doExecuteChecksums; | ||
|
||
@CommandLine.Option(names = "--padding", | ||
description = "Check for missing padding in erasure coded replicas.", | ||
defaultValue = "false") | ||
private boolean doExecutePadding; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The command does not do anything useful without one of these two options. They should be in an @ArgGroup(exclusive = false, multiplicity = "1")
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ptlrs for updating the patch, LGTM.
$ ozone debug replicas verify --output-dir /tmp /
Error: Missing required argument(s): ([--checksums] [--padding])
What changes were proposed in this pull request?
This PR:
ozone debug replicas verify
commandread-replicas
tochecksums
commandchecksums
command with the ability to walk the file tree and calculate checksums of all filesreplicas verify
Note: This is a WIP change. Further enhancements and refactoring will be done and acceptance tests will be updated.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-12057
How was this patch tested?
Manual testing in docker-compose environment
CI: https://github.com/ptlrs/ozone/actions/runs/12954171235