-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-10338. Implement a Client Datanode API to stream a block #6613
base: master
Are you sure you want to change the base?
Conversation
@@ -458,4 +485,77 @@ private static ContainerCommandRequestProto createContainerRequest( | |||
.setContainerID(containerID).setPipelineID(UUID.randomUUID().toString()) | |||
.build(); | |||
} | |||
|
|||
@Test | |||
public void testReadBlock() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add add an end to end case, to test read an empty file with an empty block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add testReadEmptyBlock
in the TestStreamBlockInputStream
to test read an empty block.
hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientGrpc.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/StreamBlockInput.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/StreamBlockInput.java
Outdated
Show resolved
Hide resolved
@chungen0126 , thanks for the quick patch updating. I will try to finish the review this week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @chungen0126. I have a few comments.
private static final int CHUNK_SIZE = 100; | ||
private static final int BYTES_PER_CHECKSUM = 20; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some tests which cover the conditions where BYTES_PER_CHECKSUM
is equal-to and greater than CHUNK_SIZE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to add this test case. Logically, BYTES_PER_CHECKSUM should be smaller than CHUNK_SIZE. Otherwise, a single chunk file cannot be verified.
hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientGrpc.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/StreamBlockInputStream.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/StreamBlockInputStream.java
Outdated
Show resolved
Hide resolved
Hi @chungen0126 @guohao-rosicky, do we know how much of an improvement is being seen for files smaller than 1GB? |
@ptlrs
further tests were conducted with verifying checksum skipped, it turns out checksum have a substantial impact on read time.
|
Can you resolve the code conflict? @chungen0126 |
Temporarily converted to draft and assigned to myself, to resolve conflicts. |
Thanks @adoroszlai for fixing the conflicts. I was just about to address it. |
What changes were proposed in this pull request?
To reduce round trips between the Client and Datanode for reading a block, we nee a new API to read.
This is using the ability of gRPC to send bidirectional traffic such that the server can pipeline the chunks to the client without waiting for ReadChunk API calls. This also avoids the client from creating multiple Chunk Stream Clients and should simplify the read path on the client side by a bit.
Please describe your PR in detail:
StreamBlockInput
at client side called from KeyInputStream to read a block from the container.BlockInputStream
.What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-10338
How was this patch tested?
There are existed test for reading data.