Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-12608. Race condition in datanode version file creation #8093

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

adoroszlai
Copy link
Contributor

What changes were proposed in this pull request?

Multiple threads may create the VERSION file. After HDDS-12456 they use the same temporary file for AtomicFileOutputStream.

2025-03-15 14:06:39,828 [1653fbbe-77b3-4beb-944f-f44f2eab0adb-EndpointStateMachineTaskThread-scm3.org/172.25.0.118:9861-0 ] INFO utils.DatanodeStoreCache: Added db /data/hdds/hdds/CID-b6cecacf-2a78-4270-850e-f7a04b3207d8/DS-b262fcfa-1d4c-4a05-b9b2-8c973ff80dd4/container.db to cache
2025-03-15 14:06:39,828 [1653fbbe-77b3-4beb-944f-f44f2eab0adb-EndpointStateMachineTaskThread-scm3.org/172.25.0.118:9861-0 ] INFO volume.HddsVolume: SchemaV3 db is created and loaded at /data/hdds/hdds/CID-b6cecacf-2a78-4270-850e-f7a04b3207d8/DS-b262fcfa-1d4c-4a05-b9b2-8c973ff80dd4/container.db for volume DS-b262fcfa-1d4c-4a05-b9b2-8c973ff80dd4
2025-03-15 14:06:39,835 [1653fbbe-77b3-4beb-944f-f44f2eab0adb-EndpointStateMachineTaskThread-scm4.org/172.25.0.120:9861-0 ] WARN util.FileUtils: Failed to Atomic Files.move /data/metadata/dnlayoutversion/VERSION.tmp to /data/metadata/dnlayoutversion/VERSION with options [REPLACE_EXISTING]: java.nio.file.NoSuchFileException: /data/metadata/dnlayoutversion/VERSION.tmp -> /data/metadata/dnlayoutversion/VERSION
2025-03-15 14:06:39,836 [1653fbbe-77b3-4beb-944f-f44f2eab0adb-EndpointStateMachineTaskThread-scm1.org/172.25.0.116:9861-0 ] WARN util.FileUtils: Failed to Atomic Files.move /data/metadata/dnlayoutversion/VERSION.tmp to /data/metadata/dnlayoutversion/VERSION with options [REPLACE_EXISTING]: java.nio.file.NoSuchFileException: /data/metadata/dnlayoutversion/VERSION.tmp -> /data/metadata/dnlayoutversion/VERSION
2025-03-15 14:06:39,836 [1653fbbe-77b3-4beb-944f-f44f2eab0adb-EndpointStateMachineTaskThread-scm2.org/172.25.0.117:9861-0 ] WARN util.FileUtils: Failed to Atomic Files.move /data/metadata/dnlayoutversion/VERSION.tmp to /data/metadata/dnlayoutversion/VERSION with options [REPLACE_EXISTING]: java.nio.file.NoSuchFileException: /data/metadata/dnlayoutversion/VERSION.tmp -> /data/metadata/dnlayoutversion/VERSION

https://issues.apache.org/jira/browse/HDDS-12608

How was this patch tested?

CI:
https://github.com/adoroszlai/ozone/actions/runs/13874311835

@adoroszlai adoroszlai self-assigned this Mar 15, 2025
@adoroszlai adoroszlai requested a review from szetszwo March 15, 2025 16:49
@szetszwo
Copy link
Contributor

@adoroszlai , as mentioned in HDDS-12608, this seems a bug in the datanode code (or in the test) -- it should never have more than one threads writing to the VERSION file at the same time. So, we should fix them instead.

@adoroszlai
Copy link
Contributor Author

this seems a bug in the datanode code (or in the test) -- it should never have more than one threads writing to the VERSION file at the same time. So, we should fix them instead.

There is no test involved, it can be reproduced by simply starting a new datanode in HA cluster.

cd hadoop-ozone/dist/target/ozone-2.0.0-SNAPSHOT/compose/ozone-h
docker compose up -d --scale datanode=3
docker compose exec s3g ozone admin safemode wait -t 60
docker compose up -d --no-recreate --scale datanode=4
docker logs ozone-ha-datanode-4 2>&1 | grep 'Failed to Atomic Files.move'

I think there are two issues:

  1. Given that multiple threads may write file X, using the same temp file X.tmp for AtomicFileOutputStream introduces race condition. This can be fixed simply by this patch. The fix is in utils code, which is beneficial for other possible users of these methods. (It should also be fixed in Ratis, but currently we cannot upgrade (HDDS-12103) and do not want to wait for new release anyway.)
  2. The problem that datanode writes the VERSION file from multiple threads. The file includes clusterID, which datanode gets from SCM, so these "get version" tasks seem to be the right place to do it. They should probably check if the file exists, and use a shared lock while checking/writing. They may also need to check if clusterID from each SCM is the same.

This PR is limited to (1), we can fix (2) separately.

@szetszwo
Copy link
Contributor

Given that multiple threads may write file X, using the same temp file X.tmp for AtomicFileOutputStream introduces race condition. ...

The race condition is good thing since it has caught the bug. If we write to different tmp files, then the bug will happen silently. We don't have a legitimate case to allow multiple threads writing to the same file at the same time.

@adoroszlai adoroszlai marked this pull request as draft March 17, 2025 06:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants