Skip to content

Files

Latest commit

fe913f2 · Nov 1, 2017

History

History
52 lines (31 loc) · 2.68 KB

hbase-region-server-dead.md

File metadata and controls

52 lines (31 loc) · 2.68 KB
title description services documentationcenter author manager ms.service ms.custom ms.devlang ms.topic ms.tgt_pltfrm ms.workload ms.date ms.author
One or more region servers dead | Microsoft Docs
Diagnosing and fixing dead region servers on hbase cluster
hdinsight
gkanade
ashitg
hdinsight
hdinsightactive
na
article
na
big-data
10/04/2017
gkanade

One or more dead region servers observed on hbase cluster

If you are running HBase cluster v3.4 you might have been hit by a potential bug caused by upgrade of jdk to version 1.7.0_151. The symptom we see is region server process starts occupying close to 200% CPU (to verify this run the top command; if there is a process occupying close to 200% CPU get its pid and confirm it is region server process by running ps -aux | grep ) and the region server is essentially rendered dead, causing alerts to fire on HBase Master process and cluster to not function at full capacity.

The mitigation/solution for the problem at a high level (details below) is to:

  1. Install jdk 1.8 on ALL nodes of the cluster as below:

Run the script action https://raw.githubusercontent.com/Azure/hbase-utils/master/scripts/upgradetojdk18allnodes.sh

Be sure to select the option to run on all nodes. Alternatively, you can log in to every individual node and run the command

"sudo add-apt-repository ppa:openjdk-r/ppa -y && sudo apt-get -y update && sudo apt-get install -y openjdk-8-jdk"

  1. Go to Ambari UI - https://<clusterdnsname>.azurehdinsight.net; go to HBase->Configs->Advanced->Advanced hbase-env configs and change the variable JAVA_HOME as below:

"export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64."

Save the config change.

  1. [Optional but recommended] Flush all tables on cluster. https://blogs.msdn.microsoft.com/azuredatalake/2016/09/19/hdinsight-hbase-how-to-improve-hbase-cluster-restart-time-by-flushing-tables/

  2. From Ambari UI again, restart all HBase services that need restart.

  3. Depending on the data on cluster, it might take a few minutes to upto an hour for the cluster to reach stable state. The way you confirm the cluster reaches stable state is by either checking HMaster UI (all region servers should be active) from Ambari (refresh) or from headnode run hbase shell and then run status command

To verify that your upgrade was successful check that the relevant HBase processes are started using the appropriate java version - for instance for regionserver check as

"ps -aux | grep regionserver, and verify the version like '''/usr/lib/jvm/java-8-openjdk-amd64/bin/java"