HDDS-8660. Notify ReplicationManager when nodes go dead or out of service #7997

peterxcli · 2025-03-02T18:07:47Z

What changes were proposed in this pull request?

If someone triggers decommission / maintenance, there is potentially a 5 minute lag from the decommission process starting and RM noticing that containers need replication, due to RM running on a 5 minute interval. Similarly, if a node goes dead, it has already been gone for 10 minutes, and it will take up to another 5 minutes for RM to notice and process the containers.

It would be good to notify the RM thread to wake it up when these events happen to reduce the time it takes to start to repair the problem.

One thing that comes to mind about for any solution, is that RM operates by:

Getting a list of all containers.
Processing the list
Sleeping for 5 minutes.

If a dead node happens at during step 2, and we notify the thread, it will already be running so the notify will not do anything. It may be that some of the containers from the node in question have been processed already, or they may still to be processed - we don't really know. Perhaps this is OK, rather than complicating the solution, as in general fixing decommission or under-replication will take a long time.

It is also possible that several nodes go dead in quick succession, or several nodes go out of service quickly, resulting in several notify calls occurring. We don't want to wake up the thread too frequently if this happens, as it will result in a new replication queue getting created over and over. Perhaps if the queue is not empty, then there is replication work to do, and we should not run again.

Finally, we might want to consider notifying on a node coming back into service, as that could cause over-replication. However over-replication is not as big of a problem as under-replication if it is not addressed quickly.

What has been done?

Add ReplicatonManagerEventHandler to handle message with "REPLICATION_MANAGER_NOTIFY" then notify RM's thread
When one node go to dead, DeadNodeHandler would send a "REPLICATION_MANAGER_NOTIFY" message to event queue
When the persisted op state of one node changed after receiving DN report, SCMNodeManager would send a "REPLICATION_MANAGER_NOTIFY" message to event queue

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-8660

How was this patch tested?

unit tests to test the ReplicationManager notify call for NodeManger and DeadNodeHandler
Add ReplicationMangerIntegration to test if the ReplicationManger would be notified and start to process over/under replicated container when the persisted op state or health state of nodes changed.

CI:
~~https://github.com/peterxcli/ozone/actions/runs/13785637929~~
https://github.com/peterxcli/ozone/actions/runs/13885375401

peterxcli · 2025-03-06T04:11:58Z

Temporarily convert to a draft, as I would like to introduce ReplicationActionHandler to act as an event subscriber and move all notify calls on ReplicationManager to go through the eventQueue.

…ed scenario

peterxcli · 2025-03-11T11:33:22Z

Hi @adoroszlai @sodonnel,
This PR is ready for review. Whenever you have time, please take a look. Thanks!
I’ve also added more details to the description.

xichen01

@peterxcli Thanks for your patch, overall looks good. A few comments for your reference.

xichen01 · 2025-03-16T13:21:52Z

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/SCMNodeManager.java

+    maybeNotifyReplicationManager(reportedDn, oldPersistedOpState, newPersistedOpState);
+  }
+
+  private void maybeNotifyReplicationManager(


We should only notify when current SCM is leader.

xichen01 · 2025-03-16T13:29:25Z

...r-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManager.java

+    // Only wake up the thread if there's no active replication work
+    // This prevents creating a new replication queue over and over
+    // when multiple nodes change state in quick succession
+    if (getQueue().isEmpty()) {


We can check isThreadWaiting too, only the wait thread can be notified.

xichen01 · 2025-03-16T13:50:21Z

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DeadNodeHandler.java

+      // Notify ReplicationManager
+      LOG.info("Notifying ReplicationManager about dead node: {}", 
+          datanodeDetails);
+      publisher.fireEvent(SCMEvents.REPLICATION_MANAGER_NOTIFY, datanodeDetails);


We can only notify when this node is not IN_MAINTENANCE status.

xichen01 · 2025-03-16T13:51:22Z

...in/java/org/apache/hadoop/hdds/scm/container/replication/ReplicationManagerEventHandler.java

+  }
+
+  @Override
+  public void onMessage(DatanodeDetails datanodeDetails, EventPublisher eventPublisher) {


A leader check should be added.

… in DeadNodeHandler

peterxcli · 2025-03-16T23:45:10Z

@xichen01 Thanks for the review, addressed all comments, please take a look. Thanks!

adoroszlai requested a review from sodonnel March 3, 2025 21:35

peterxcli marked this pull request as draft March 6, 2025 04:12

peterxcli added 5 commits March 8, 2025 10:16

Add node state change notification for ReplicationManager

a06b7a4

Node state change in NodeManager should notify ReplicationManager

27ccf55

Notify ReplicationManager when a node becomes dead

caaa220

fix pmd

431c717

fix tests

bc7aa2f

peterxcli force-pushed the hdds8660-ReplicationManager-Notify-when-dead-nodes-or-nodes-go-out-of-service branch from 8c56c07 to bc7aa2f Compare March 8, 2025 13:16

peterxcli added 7 commits March 8, 2025 22:38

Introduce ReplicationManagerEventHandler to decouple NM and RM

3c9f838

Change notify call to RM to publish RM_NOTIFY event to queue

c969709

Some old code cleanup

f7363de

Add ReplicationManagerEventHandler to SCM event queue

00e526f

NPE in TestDeadNodeHandler

6a53c3a

Notify RM if persisted op state changed

32705a2

Add integration test for RM on RM being notify when node status chang…

bf17079

…ed scenario

peterxcli marked this pull request as ready for review March 11, 2025 11:29

adoroszlai requested review from xichen01 and siddhantsangwan March 11, 2025 11:34

adoroszlai changed the title ~~HDDS-8660. ReplicationManager: Notify when dead nodes or nodes go out of service~~ HDDS-8660. Notify ReplicationManager when nodes go dead or out of service Mar 13, 2025

xichen01 reviewed Mar 16, 2025

View reviewed changes

peterxcli added 3 commits March 17, 2025 00:36

Addressed comment: Only notify when leader ready and out of safemode

6092de1

Addressed comment: don't notify when RM is running

b15af64

Addressed comment: Only notify when node is not IN_MAINTENANCE status…

b2642bc

… in DeadNodeHandler

peterxcli requested a review from xichen01 March 16, 2025 22:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-8660. Notify ReplicationManager when nodes go dead or out of service #7997

HDDS-8660. Notify ReplicationManager when nodes go dead or out of service #7997

peterxcli commented Mar 2, 2025 •

edited

Loading

peterxcli commented Mar 6, 2025

peterxcli commented Mar 11, 2025

xichen01 left a comment

xichen01 Mar 16, 2025

xichen01 Mar 16, 2025

xichen01 Mar 16, 2025

xichen01 Mar 16, 2025

peterxcli commented Mar 16, 2025

HDDS-8660. Notify ReplicationManager when nodes go dead or out of service #7997

Are you sure you want to change the base?

HDDS-8660. Notify ReplicationManager when nodes go dead or out of service #7997

Conversation

peterxcli commented Mar 2, 2025 • edited Loading

What changes were proposed in this pull request?

What has been done?

What is the link to the Apache JIRA

How was this patch tested?

peterxcli commented Mar 6, 2025

peterxcli commented Mar 11, 2025

xichen01 left a comment

Choose a reason for hiding this comment

xichen01 Mar 16, 2025

Choose a reason for hiding this comment

xichen01 Mar 16, 2025

Choose a reason for hiding this comment

xichen01 Mar 16, 2025

Choose a reason for hiding this comment

xichen01 Mar 16, 2025

Choose a reason for hiding this comment

peterxcli commented Mar 16, 2025

peterxcli commented Mar 2, 2025 •

edited

Loading