Update manage-zookeeper-snapshot.md

jieyan1269 · web-flow · commit 6e3abc3741bc · 2019-12-11T15:01:02.000-08:00
diff --git a/zookeeper/manage-zookeeper-snapshot.md b/zookeeper/manage-zookeeper-snapshot.md
@@ -36,9 +36,10 @@ Many services running on HDInsight clusters depend on ZooKeeper service. ZooKeep
    1) Login each ZooKeeper host and check ZooKeeper heap usage with command “top | grep zookeep+”. By default, ZooKeeper heap size is 1024MB.
    2) On each ZooKeeper host, check ZooKeeper logs in directory /var/log/zookeeper, look for “java.lang.OutOfMemoryError: GC overhead limit exceeded” or “java.lang.OutOfMemoryError: Java heap space”.
    3) To mitigate this problem, in Ambari, go to ZooKeeper tab, click on “Configs” and search for “zk_server_heapsize”, the default value should be 1024MB. Increase this value, then restart all affected services from Ambari and the service which has problems.
-4. Check if cleanup ZooKeeper snapshots can do the magic. ZooKeeper will not remove old snapshot files from its data directory, instead, it is a periodic task to be performed by users to maintain the healthiness of ZooKeeper. When the volume of snapshot files is large or snapshot files are corrupted, ZooKeeper server will fail to form a quorum, which causes ZooKeeper related services unhealthy. For more details, refer to https://zookeeper.apache.org/doc/r3.3.5/zookeeperAdmin.html#sc_strengthsAndLimitations
-   1) Login each ZooKeeper hosts, backup snapshots in /hadoop/zookeeper/version-2 and /hadoop/hdinsight-zookeeper/version-2, then cleanup the snapshots in these two directories.
-   2) Restart all 3 ZooKeeper servers in Ambari or restart all 3 ZooKeeper hosts. Then restart the service which has problems.
+4. Check if cleanup ZooKeeper snapshots and transaction logs can do the magic. In HDInsight clusters, by default, the most recent 30 snapshots and related transaction logs will be retained and older files are automatically purged every 24 hours. When the volume of snapshot and transaction log files is large or the files are corrupted, ZooKeeper server will fail to start, which causes ZooKeeper related services unhealthy. For more details, refer to https://zookeeper.apache.org/doc/r3.3.5/zookeeperAdmin.html#sc_strengthsAndLimitations
+   1) Check the status of other ZooKeeper servers in the same quorum to make sure they are working fine with the command "echo stat | nc {zk_host_ip} 2181 (or 2182)".
+   1) Login the problematic ZooKeeper host, backup snapshots and transaction logs in /hadoop/zookeeper/version-2 and /hadoop/hdinsight-zookeeper/version-2, then cleanup these files in the two directories.
+   2) Restart the problematic ZooKeeper server in Ambari or the ZooKeeper host. Then restart the service which has problems.
 5. Check if ZooKeeper is refusing incoming connections from a certain host:
    1) On each ZooKeeper host, check ZooKeeper logs in /var/log/zookeeper, look for “Too many connections from /{host_ip} - max is 60”.
    2) Login the host with the “host_ip”, run command “echo mntr | nc {zk_host_ip} 2181”. If no output from the command, run “netstat -nape | awk '{if ($5 == "{zk_host_ip}:2181") print $4, $9;}' | sort | uniq -c” to find which process is sending active connections to ZooKeeper. Then restart the service corresponding to that process in Ambari.