forked from decster/nativetask
-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathinstallation_guide.txt
94 lines (91 loc) · 4.16 KB
/
installation_guide.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
1. 进入当前hadoop目录
a. vi /src/mapred/org/apache/hadoop/mapred/MapTask.java
b. 到400多行找到以下代码注释掉,如果版本较低,可能不存在此问题
/*
int numReduceTasks = conf.getNumReduceTasks();
LOG.info("numReduceTasks: " + numReduceTasks);
MapOutputCollector<OUTKEY, OUTVALUE> collector = null;
if (numReduceTasks > 0) {
collector = new MapOutputBuffer<OUTKEY, OUTVALUE>(umbilical, job, reporter);
} else if(job.getBoolean("map.sort.force", false)){
// run sort and combine in map-only jobs
collector = new DirectMapOutputBufferedCollector<OUTKEY, OUTVALUE>(umbilical, job, reporter);
} else {
collector = new DirectMapOutputCollector<OUTKEY, OUTVALUE>(umbilical, job, reporter);
}
MapRunnable<INKEY,INVALUE,OUTKEY,OUTVALUE> runner =
ReflectionUtils.newInstance(job.getMapRunnerClass(), job);
try {
runner.run(in, new OldOutputCollector<OUTKEY, OUTVALUE>(collector, conf), reporter);
collector.flush();
} finally {
//close
in.close(); // close input
collector.close();
}*/
c. 注释前加入以下代码
int numReduceTasks = conf.getNumReduceTasks();
LOG.info("numReduceTasks: " + numReduceTasks);
MapOutputCollector collector = null;
if (numReduceTasks > 0) {
collector = TaskDelegation.tryGetDelegateMapOutputCollector(job, getTaskID(), mapOutputFile);
if (collector == null)
collector = new MapOutputBuffer(umbilical, job, reporter);
} else {
collector = new DirectMapOutputCollector(umbilical, job, reporter);
}
MapRunnable<INKEY,INVALUE,OUTKEY,OUTVALUE> runner =
ReflectionUtils.newInstance(job.getMapRunnerClass(), job);
try {
runner.run(in, new OldOutputCollector(collector, conf), reporter);
collector.flush();
} finally {
//close
in.close(); // close input
collector.close();
}
}
2. 拷贝delegate-hadoop-0.20.205.patch到hadoop1.03的主目录下
patch -p1 < delegate-hadoop-0.20.205.patch
3. 编译hadoop本地库
ant compile-native
4. 拷贝build/native/Linux-amd64-64目录下内容到lib/native/Linux-amd64-64
cp -r build/native/Linux-amd64-64/* lib/native/Linux-amd64-64/*
5. 编译hadoop-core
ant jar
6. 进入nativetask主目录,修改nativetask下面的pom.xml文件,这一行
<systemPath>/Users/decster/projects/hadoop-20-git/build/hadoop-core-0.20.205.1.jar</systemPath>
修改nativetask下面的pom.xml文件,改为你的hadoop目录build成功后的jar包
<systemPath>$HADOOP_HOME/build/hadoop-core-1.0.3-Intel.jar</systemPath>
7. 编译nativetask,需要联网,也可能缺少zlib或者snappy的库
mvn jar
PS:nativetask源码可能存在的几处错误
1. src/main/native/src/util/Process.cc 里面需要加头文件 #include<sys/types.h> #include<sys/wait.h>
2. src/main/native/examples/Streaming.cc里面需要加头文件 #include<sys/types.h> #include<sys/wait.h>
3. src/main/native/test/util/TestSyncUtils.cc里面第83行
ParallelFor(*this, &TestParallelFor::add, 0ULL, n, threadnum);
应改为ParallelFor(*this, &TestParallelFor::add, (uint64_t)0, n, threadnum);
8. nativetask编译成功后,将target下面的hadoop-nativetask-0.1.0.jar拷贝到hadoop的lib目录下,需分布到每台机器上
cp target/hadoop-nativetask-0.1.0.jar $HADOOP_HOME/lib/
9. 将target/native/target/usr/local/lib/下内容拷贝到hadoop,需分布到每台机器上
10. 每台slave必须安装snappy库,libsnappy为nativeruntime必须。
a. 如果可以联网可以执行
yum/apt-get install snappy
b. 不可以联网则脱机安装
下载snappy并安装
http://code.google.com/p/snappy/
http://code.google.com/p/snappy/downloads/detail?name=snappy-1.0.5.tar.gz
tar -zxvf snappy-1.0.5.tar.gz
cd snappy-1.0.5
./configure --prefix=/usr/
sudo make
sudo make install
验证安装
echo "int main(){ return 0;}" > /tmp/a.c && gcc /tmp/a.c -o /tmp/a.out -lsnappy
/tmp/a.out
如果没报错就说明安装成功了。如果显示下面的消息则说明$LD_LIBRARY_PATH变量没有配置正确.
解决方案:
1. echo "/usr/local/lib" >> /etc/ld.so.conf //所需文件所在目录加入此文件
2. ldconfig
11. 引用方式举例,目前nativetask仅实现了wordcount和terasort,如需开发仍需联系开发者:
bin/hadoop jar lib/hadoop-nativetask-0.1.0.jar wordcount /input /output