Skip to content

Commit 01c2ccb

Browse files
committed
Update
1 parent f5dc03c commit 01c2ccb

File tree

6 files changed

+151
-6
lines changed

6 files changed

+151
-6
lines changed

.gitignore

+1-1
Original file line numberDiff line numberDiff line change
@@ -161,5 +161,5 @@ cython_debug/
161161
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
162162
#.idea/
163163

164-
/RoughWork
164+
/roughWork
165165
/.idea

README.md

+3-2
Original file line numberDiff line numberDiff line change
@@ -35,8 +35,9 @@ keywords = {query-optimization, query-processing},
3535

3636
Instruction files:
3737

38-
1. [Hadoop Setup in a Fully Distributed Mode](setup-hadoop.md)
39-
2. [Tutorial on HDFS and MapReduce in Java and Python](tutorial-hdfs-mapreduce.md)
38+
1. [Hadoop Setup in a Fully Distributed Mode](instructions/1-hadoop-hdfs-mapreduce/1-setup-hadoop.md)
39+
2. [Instructions on HDFS and MapReduce in Java and Python](instructions/1-hadoop-hdfs-mapreduce/2-hdfs-mapreduce.md)
40+
3. [Instructions on HBase (Standalone Mode)](instructions/2-hbase/hbase-standalone.md)
4041

4142
Please refer to the "Lab Manual on Hadoop" for further instructions.
4243

setup-hadoop.md instructions/1-hadoop-hdfs-mapreduce/1-setup-hadoop.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -461,7 +461,7 @@ Remove the existing content and paste the following:
461461
</property>
462462
<property>
463463
<name>hbase.zookeeper.quorum</name>
464-
<value>zookeeper</value>
464+
<value>zookeeper:2181</value>
465465
</property>
466466
<property>
467467
<name>hbase.zookeeper.property.dataDir</name>
@@ -485,7 +485,7 @@ Remove the existing content and paste the following:
485485
</property>
486486
<property>
487487
<name>hbase.zookeeper.quorum</name>
488-
<value>zookeeper</value>
488+
<value>zookeeper:2181</value>
489489
</property>
490490
<property>
491491
<name>zookeeper.session.timeout</name>

tutorial-hdfs-mapreduce.md instructions/1-hadoop-hdfs-mapreduce/2-hdfs-mapreduce.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# **Tutorial on HDFS and MapReduce in Java and Python**
1+
# **Instructions on HDFS and MapReduce in Java and Python**
22

33
## Store Data in Hadoop
44

File renamed without changes.
+144
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
# HBase in Standalone Mode
2+
3+
## 1. Pull the required Docker Images and use them to create and run the Docker Containers
4+
5+
Delete the 4 previous containers then create and run the Docker containers specified in [Docker-Compose.yaml](Docker-Compose.yaml) i.e.,
6+
7+
* HBase-Master
8+
* HBase-Regionserver
9+
* Zookeeper
10+
11+
## 2. Connect to the HBase Shell hosted in the `hbase-master` Docker Container
12+
13+
HBase Shell is a JRuby-based command-line program you can use to interact with HBase.
14+
15+
```shell
16+
docker exec -it hbase-master hbase shell
17+
```
18+
19+
You can also confirm that HBase is running via its Web-UI: [http://localhost:16010/](http://localhost:16010/)
20+
21+
Execute the following statements in HBase shell:
22+
23+
```shell
24+
# To show the version of HBase (it should be version 2.1.3)
25+
version
26+
27+
# To show the details of the servers running HBase:
28+
# The output according to the setup should be:
29+
# 1 active master, 0 backup masters, 1 servers, 0 dead, 2.0000 average load
30+
status
31+
```
32+
33+
## 3. Getting Help
34+
35+
To get guidance on a specific command:
36+
37+
```shell
38+
# Replace COMMAND with the command you want guidance on
39+
help 'COMMAND'
40+
```
41+
42+
For general guidance on how to use table-referenced commands.
43+
44+
```shell
45+
table_help
46+
```
47+
48+
## 4. Create a Table
49+
50+
The table emp has 2 column families:
51+
52+
* personal data
53+
* professional data
54+
55+
```shell
56+
create 'emp', 'personal data', 'professional data'
57+
create 'employee', 'Personally_Identifiable_Information_PII', 'KPI_Appraisal'
58+
59+
create 'wiki', 'text'
60+
```
61+
62+
The table `wiki` has 1 column family:
63+
64+
* text
65+
66+
```shell
67+
create 'wiki', 'text'
68+
```
69+
70+
Verify that the table has been created:
71+
72+
```shell
73+
list
74+
```
75+
76+
## 5. View the table's metadata
77+
78+
Execute the following to view the metadata of the created table:
79+
80+
```shell
81+
describe 'wiki'
82+
```
83+
84+
## 6. Insert data
85+
86+
We use the keyword `put` to insert data in HBase. The following statement inserts a new record with the key **`Home`** adding **`Welcome to the wiki!`** to the column family `text:`. If there was a specific column in the column family, then it would be specified as `[column family]:[column]`
87+
88+
```shell
89+
put 'wiki', 'Home', 'text:', 'Welcome to the wiki!'
90+
```
91+
92+
Unfortunately, the `put` command in HBase shell allows you to insert only one column value at a time.
93+
94+
## 7. Retrieve data
95+
96+
We use the keyword `get` to retrieve data from HBase. `get` requires the **table name** and the **row key**.
97+
98+
```shell
99+
get 'wiki', 'Home', 'text:'
100+
```
101+
102+
We use the keyword `scan` to retrieve all the rows. This is compute-intensive for large databases and should be avoided in production. By default, HBase uses the current timestamp when inserting data and the most recent timestamp when retrieving data.
103+
104+
```shell
105+
scan 'wiki'
106+
```
107+
108+
## 8. Altering Tables
109+
110+
Altering tables is computationally expensive because HBase creates a new column family with the chosen specifications and then copies all the data to the new column.
111+
112+
* Disable the table
113+
114+
```shell
115+
disable 'wiki'
116+
```
117+
118+
By default, HBase stores only 3 versions of values (each with a timestamp). But this can be changed as follows:
119+
120+
```shell
121+
alter 'wiki', { NAME => 'text', VERSIONS => org.apache.hadoop.hbase.HConstants::ALL_VERSIONS }
122+
```
123+
124+
We can also add a column-family (while the table is still disabled). The new column family called `revision`.
125+
126+
```shell
127+
alter 'wiki', { NAME => 'revision', VERSIONS => org.apache.hadoop.hbase.HConstants::ALL_VERSIONS }
128+
```
129+
130+
Similar to the `text` column family, the `revision` column family is added without any columns.
131+
132+
It is upon the user to honour the schema. However, if the user decides not to honour the schema, e.g., by adding data to `revision:new_column`, HBase will not stop them.
133+
134+
Lastly, we can set the compression method as follows:
135+
136+
```shell
137+
alter 'wiki', {NAME=>'text', COMPRESSION=>'GZ', BLOOMFILTER=>'ROW'}
138+
```
139+
140+
* Enable the table
141+
142+
```shell
143+
enable 'wiki'
144+
```

0 commit comments

Comments
 (0)