前景提要
从此篇文章开始,不定期更新 Hadoop 大数据平台搭建的相关文章,包含但不限于 Hadoop完全分布式搭建、HBase、Zookeeper、MySQL、Hive、Sqoop、Flink 等大数据集群平台以及大数据相关集成软件搭建。
系统环境
三台 Centos7 系统的主机
一台 master 节点,两台 slave 节点,分别为 slave1、slave2
所需安装包
jdk-8u151-linux-x64.tar.gz
1
| https://repo.huaweicloud.com/java/jdk/8u151-b12/jdk-8u151-linux-x64.tar.gz
|
hadoop-3.2.1.tar.gz
1
| https://archive.apache.org/dist/hadoop/core/hadoop-3.2.1/hadoop-3.2.1.tar.gz
|
1、设置拥有root权限的用户
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| [root@master ~]adduser hadoop
[root@master ~]passwd hadoop 更改用户 hadoop 的密码 。 新的 密码: 无效的密码: 密码未通过字典检查 - 过于简单化/系统化 重新输入新的 密码: passwd:所有的身份验证令牌已经成功更新。
vim /etc/sudoers
root ALL=(ALL) ALL hadoop ALL=(ALL) ALL
|
2、配置三台主机的 hosts 文件
1 2 3 4 5 6
| [root@master ~]vim /etc/hosts
192.168.1.1 master 192.168.1.2 slave1 192.168.1.3 slave2
|
3、三台主机互相配置免密登陆
1 2 3 4 5 6 7 8 9 10 11 12 13
| [root@master ~]yum install openssh-server -y [root@master ~]service sshd restart
[root@master ~]sudo su – hadoop
[hadoop@master ~]sudo /usr/sbin/sshd
[hadoop@master ~]ssh-keygen -t rsa
[hadoop@master ~]ssh-copy-id master [hadoop@master ~]ssh-copy-id slave1 [hadoop@master ~]ssh-copy-id slave2
|
4、配置 JDK
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
|
[hadoop@master ~]sudo mkdir /tdsgpo
[hadoop@master ~]sudo mkdir /data
[hadoop@master ~]cd /tdsgpo [hadoop@master tdsgpo]sudo cp /root/jdk-8u151-linux-x64.tar.gz /tdsgpo/ [hadoop@master tdsgpo]sudo tar -zxvf jdk-8u151-linux-x64.tar.gz
[hadoop@master tdsgpo]sudo vim /etc/profile [hadoop@master tdsgpo]sudo vim ~/.bashrc
export JAVA_HOME=/tdsgpo/jdk1.8.0_151 export PATH=$PATH:$JAVA_HOME/bin
[hadoop@master tdsgpo]source /etc/profile [hadoop@master tdsgpo]source ~/.bashrc
[hadoop@master tdsgpo]java -version java version "1.8.0_150"
|
5、配置 Hadoop
1)解压
1 2 3 4 5 6 7 8 9 10 11
| [hadoop@master tdsgpo]sudo tar -zxvf hadoop-3.2.1.tar.gz
[hadoop@master tdsgpo]sudo vim /etc/profile [hadoop@master tdsgpo]sudo vim ~/.bashrc
export HADOOP_HOME=/tdsgpo/hadoop-3.2.1 export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin
[hadoop@master tdsgpo]source /etc/profile [hadoop@master tdsgpo]source ~/.bashrc
|
2)配置 core-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13
| [hadoop@master tdsgpo]cd /tdsgpo/hadoop-3.2.1/etc/hadoop/ [hadoop@master hadoop]sudo vim core-site.xml
<property> <name>fs.defaultFS</name> <value>hdfs://master:8020</value> </property> <property> <name>fs.trash.interval</name> <value>10080</value> </property>
|
3)配置 hdfs-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
| [hadoop@master hadoop]sudo vim hdfs-site.xml
<property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.http.address</name> <value>master:50070</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/data/hadoop/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/data/hadoop/hdfs/datanode</value> </property> <property> <name>dfs.permissions.superusergroup</name> <value>hadoop</value> </property>
|
4) 配置yarn-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
| [hadoop@master hadoop]sudo vim yarn-site.xml
<property> <name>yarn.acl.enable</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property>
|
5)配置mapred-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
| [hadoop@master hadoop]sudo vim mapred-site.xml
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value> master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value> master:19888</value> </property> <property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=/tdsgpo/hadoop-3.2.1</value> </property> <property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=/tdsgpo/hadoop-3.2.1</value> </property> <property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=/tdsgpo/hadoop-3.2.1</value> </property>
|
6)配置 workers
1 2 3 4 5 6
| [hadoop@master hadoop]sudo vim workers
master slave1 slave2
|
6、拷贝 master 主机的信息到两个从节点
1 2 3 4 5
| [hadoop@master hadoop]cd /tdsgpo
[hadoop@master tdsgpo]sudo scp -r hadoop-3.2.1/ slave1:/tdsgpo/ [hadoop@master tdsgpo]sudo scp -r hadoop-3.2.1/ slave2:/tdsgpo/
|
7、格式化集群
1 2 3 4
| [hadoop@master tdsgpo]cd /tdsgpo/hadoop-3.2.1/bin/ [hadoop@master bin]hdfs namenode -format
|
8、启动 Hadoop 集群
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
| [hadoop@master sbin]cd /tdsgpo/hadoop-3.2.1/sbin/
[hadoop@master sbin]sudo chmod 777 -R /tdsgpo/hadoop-3.2.1
[hadoop@master sbin]./hadoop-daemon.sh start namenode [hadoop@master sbin]./hadoop-daemon.sh start datanode [hadoop@master sbin]./yarn-daemon.sh start resourcemanager [hadoop@master sbin]./mr-jobhistory-daemon.sh start historyserver [hadoop@master sbin]./yarn-daemon.sh start nodemanager
[hadoop@master bin]jps 26546 jps 24536 NameNode 26454 ResourceManager 13452 NodeManager 43245 JobHistoryServer 32425 DataNode
|
本片文章在此处就结束了,主要是 Hadoop 的完全分布式搭建,在这里小伙伴们如果使用的 Vmware 安装的,就可以拍个快照,如果是使用的 Docker 镜像制作的,可以先使用 docker ps -a 查看你的三台镜像,再使用 docker commit 打包镜像,方便下次使用,下篇文章我们讲解在 Hadoop 完全分布式集群里面搭建 MySQL 以及 Hbase 。