-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-12557. Added progress indicator for checkpoint tarball in leader OM #8085
base: master
Are you sure you want to change the base?
Conversation
@hemantk-12 @swamirishi could you please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @SaketaChalamchala for the patch.
@@ -292,6 +304,29 @@ private boolean getFilesForArchive(DBCheckpoint checkpoint, | |||
AtomicLong copySize = new AtomicLong(0L); | |||
// Get the active fs files. | |||
Path dir = checkpoint.getCheckpointLocation(); | |||
|
|||
// Log estimated total data transferred on first request. | |||
if (logTotals) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New parameter logTotals
is not needed.
if (logTotals) { | |
if (sstFilesToExclude.isEmpty()) { |
estimateTotals(dir, totalFiles, totalSize); | ||
if (includeSnapshotData) { | ||
Set<Path> snapshotPaths = getSnapshotDirs(checkpoint, false); | ||
totalSnapshots = snapshotPaths.size(); | ||
for (Path snapshotDir: snapshotPaths) { | ||
estimateTotals(snapshotDir, totalFiles, totalSize); | ||
} | ||
} | ||
LOG.info("Transfer estimates to Checkpoint Tarball Stream - " + | ||
"Estimated Data size: {} MB, " + | ||
"Estimated number of SST files: {}, " + | ||
"Estimated number of Snapshots: {}", | ||
totalSize.get() / (1024 * 1024), totalFiles.get(), totalSnapshots); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We can use Commons IO for getting recursive file count and total size.
estimateTotals
can be removed. - Declare variables like
totalSnapshots
close to their usage. - Log size in same unit as in other places.
Counters.PathCounters counters = Counters.longPathCounters();
CountingPathVisitor visitor = new CountingPathVisitor(counters, SST_FILE_FILTER, TRUE);
Files.walkFileTree(dir, visitor);
int totalSnapshots = 0;
if (includeSnapshotData) {
Set<Path> snapshotPaths = getSnapshotDirs(checkpoint, false);
totalSnapshots = snapshotPaths.size();
for (Path snapshotDir: snapshotPaths) {
Files.walkFileTree(snapshotDir, visitor);
}
}
LOG.info("Estimates for checkpoint tarball stream: size: {} KB, SST files: {}{}",
counters.getByteCounter().get() / 1024, counters.getFileCounter().get(),
(includeSnapshotData ? ", snapshots: " + totalSnapshots : "")
);
with:
private static final PathFilter SST_FILE_FILTER = new SuffixFileFilter(ROCKSDB_SST_SUFFIX, IOCase.INSENSITIVE);
and imports:
import static org.apache.commons.io.filefilter.TrueFileFilter.TRUE;
import org.apache.commons.io.IOCase;
import org.apache.commons.io.file.Counters;
import org.apache.commons.io.file.CountingPathVisitor;
import org.apache.commons.io.file.PathFilter;
import org.apache.commons.io.filefilter.SuffixFileFilter;
Path f = entry.getKey(); | ||
if (!f.toFile().isDirectory()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename to path
and store result of toFile()
in another variable.
if (System.currentTimeMillis() - lastLoggedTime >= 30000) { | ||
LOG.info("Transferred {} KB, #files {} to checkpoint tarball stream...", | ||
bytesWritten / (1024), filesWritten); | ||
lastLoggedTime = System.currentTimeMillis(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please replace with System.currentTimeMillis
with Time.monotonicNow()
.
LOG.info("Completed checkpoint tarball transfer."); | ||
} | ||
LOG.info("Completed transfer of {} KB, #files {} " + | ||
"to checkpoint tarball stream.", | ||
bytesWritten / (1024), filesWritten); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could indicate in the Completed transfer of ...
message whether transfer is really completed
or not, and then Completed checkpoint tarball transfer
is not needed.
bytesWritten += Files.size(f); | ||
filesWritten++; | ||
} | ||
includeFile(f.toFile(), fixedFile, archiveOutputStream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can avoid Files.size
lookup by changing includeFile
to return the number of bytes it copies.
AtomicInteger totalFiles, | ||
AtomicLong totalSize) throws IOException { | ||
List<Path> subDirs = new ArrayList<>(); | ||
Stream<Path> paths = Files.list(dir); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stream returned by Files.list
should be closed, so best using it in try (...)
.
(but estimateTotals
can be completely removed, so not applicable now)
throw new RuntimeException(e); | ||
} | ||
return f.toString(); | ||
}).collect(Collectors.toList()).size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Collecting into a list just to get its size is not needed, Stream
has count()
.
(but estimateTotals
can be completely removed, so not applicable now)
What changes were proposed in this pull request?
No logs to indicate checkpoint tarball in leader OM.
Added logs to provide an initial estimate of the amount of data, number of files and snapshots to be transferred in the tarball stream.
Added progress indicator logs every 30s detailing # sst files and size of data trasferred per request.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-12557
How was this patch tested?
Manually tested with docker with a progress log interval of
500ms
Leader OM:
As usual the follower tracks the amount of data transferred
Follower OM: