Apache Ambari It's based on Web Tools for , support Apache Hadoop Supply of clusters , Management and monitoring .Ambari Most supported Hadoop assembly , include HDFS
,MapReduce,Hive,Pig, Hbase,Zookeper,Sqoop and Hcatalog etc. .

Apache Ambari support HDFS,MapReduce,Hive,Pig,Hbase,Zookeper,Sqoop and Hcatalog Centralized management of . It's also 5
Top tier hadoop One of management tools .


Need for hadoop Components of ecosystem and environmental configuration , Parameters have some knowledge , Only recommended ambari build

contrast (CDH)
1,Hortonworks Hadoop Different from others Hadoop Release
( as Cloudera) It's all about ,Hortonworks Our products are 100% open source . 2,Cloudera Free and enterprise , Enterprise version only has trial period
. 3,apache hadoop It's native hadoop. 4, What is currently popular in China is apache hadoop,Cloudera CDH, of course
Hortonworks It's also useful 5,Apache Ambari Is based on web Tools for , For configuration , Management and monitoring Apache Hadoop colony , support Hadoop
HDFS,Hadoop MapReduce,Hive,HCatalog,,HBase,ZooKeeper,Oozie,Pig and Sqoop.Ambari
The cluster health dashboard is also available , such as heatmaps And view MapReduce,Pig,Hive Application capabilities , Diagnose their performance characteristics with a friendly user interface .

get ready

Install before installation Centos 7.2, jdk-8u91, mysql5.7.13

Master node :master(

Slave node :slave1(,slave2(,slave3(

matters needing attention : Ensure all nodes are time synchronized ; Ensure that all nodes can communicate with each other and access the external network

to configure SSH Password free login

Master node (master) in root The user logs in as follows



cd ~/.ssh/

cat>> authorized_keys


Logging in from node root Execute command

mkdir ~/.ssh/


Distribute the configured authorized_keys To each slave node

scp/root/.ssh/authorized_keys [email protected]:~/.ssh/authorized_keys

scp/root/.ssh/authorized_keys [email protected]:~/.ssh/authorized_keys

scp/root/.ssh/authorized_keys [email protected]:~/.ssh/authorized_keys
<mailto:[email protected]:~/.ssh/authorized_keys>

establish ambari System users and user groups

Operate on the primary node only

add to ambari install , Running users and user groups , You can also not create a new user , Direct use root Or other system accounts

adduser ambari

passwd ambari

open NTP service

All nodes on the cluster need to operate

Centos 7 command

yum install ntp

systemctl is-enabled ntpd

systemctl enable ntpd

systemctl start ntpd

inspect DNS and NSCD

All nodes should be set

ambari Full domain name needs to be configured during installation , So we need to check DNS.

vi /etc/hosts master slave1 slave2 slave3

Configuration in each node FQDN, Take the master node as an example ( it is to be noted that FQDN Naming conventions for :hostname+ domain name )

vi /etc/sysconfig/network


Turn off firewall

All nodes should be set

systemctl disable firewalld

systemctl stop firewalld

close SELinux

All nodes should be set

see SELinux state :


If SELinuxstatus Parameters are enabled Is the open state  

SELinux status: enabled

Machine restart required to modify configuration file :

vi /etc/sysconfig/selinux


Make local source

Making local source only needs to be done on the primary node

<> Related preparations

<> install Apache HTTP The server

install HTTP The server , allow http Service through firewall ( permanent )


yum install httpd

firewall-cmd --add-service=http

firewall-cmd --permanent --add-service=http


add to Apache Service to the system layer to start automatically with the system

systemctl start httpd.service

systemctl enable httpd.service

Install local source making tools

yum install yum-utils createrepo

Download installation resources

download Ambari 2.2.2 , HDP 2.4.2 Installation resources for , This installation is in Centos 7 upper , List only centos7 Resources for , Please use the resources of other systems

Ambari 2.2.2 Download resources




CentOS 7

Base URL

CentOS 7

Repo File


CentOS 7

Tarball md5 asc


HDP2.4.2 Download resources


Repository Name



CentOS 7


Base URL

CentOS 7


Repo File

CentOS 7


Tarball md5 asc


CentOS 7


Base URL

CentOS 7


Repo File


Download the package in the list above , 
The compressed packages to be downloaded are as follows :

Ambari 2.2.2

HDP 2.4.2


stay httpd Site root , The default is /var/www/html/, Create directory ambari, 
And decompress the downloaded package to /var/www/html/ambari catalog

cd /var/www/html/

mkdir ambari

cd /var/www/html/ambari/

tar -zxvf ambari-

tar -zxvf HDP-

tar -zxvf HDP-UTILS-

verification httd Is the website available , have access to links Command or browser directly to the following address :

give the result as follows :

to configure ambari,HDP,HDP-UTILS Local source of

First download the corresponding resources in the resource list above repo file , Modify the URL Is the local address , The configuration is as follows :





vi ambari.repo

vi hdp.repo


Place the modified source above /etc/yum.repos.d/ below

Execute the following command

yum clean all

yum list update

yum makecache

yum repolist

install Mysql data base

Ambari Installation will write installation and other information to the database , It is recommended to use self installed Mysql data base , You can also use the default database without installing PostgreSQL

Mysql Please refer to the following article for the database installation process :

Create after installation ambari Database and users , Sign in root The user executes the following statement :

create database ambari character set utf8 ;

CREATE USER 'ambari'@'%'IDENTIFIED BY 'ambari';

GRANT ALL PRIVILEGES ON *.* TO 'ambari'@'%';


install mysqljdbc drive

yum install mysql-connector-java

install JDK

Install the uncompressed version JDK, Download it on the official website first jdk-8u91-linux-x64.tar.gz , Execute the following command again :

tar -zxvf jdk-8u91-linux-x64.tar.gz -C /opt/java/

vi /etc/profile

export JAVA_HOME=/opt/jdk1.8



source /etc/profile

install Ambari

install Ambari2.2.2

install Ambari

yum install ambari-server

to configure Ambari

ambari-server setup

Select prompt according to operation

Select note :ambari-server Service account password is ambari

JDK Path is a custom path /var/opt/jdk1.8

Database configuration is custom installed Mysql

The account and password of the database are ambari

Import ambari script

take Ambari Database script import to database

If you use your own defined database , Must be started at Ambari Import before service Ambari Of sql script

use Ambari user ( Users set up above ) Sign in mysql

mysql -u ambari -p

use ambari source/var/lib/ambari-server/resources/Ambari-DDL-MySQL-CREATE.sql

start-up Ambari

Execute start command , start-up Ambari service

ambari-server start

Enter in browser after successful startup Ambari address :

The login interface appears , Default administrator account login , account :admin password :admin

After successful login, the following interface appears , thus Ambari Installation of succeeded  

Ambari Cluster configuration

frame : <> /default-rack <> /default-rack <> /default-rack

to configure :

Hadoop The component logs of the ecosystem are installed in the root directory of each component by default , Easy to find .

utilize ambari It can be built quickly Hadoop colony , After installation, the default installation of each component is in the /usr/hdp/ Directory .

Ambari-server The log directory of the server is installed in the /var/log Directory

Ambari-agent The log directory of the collector is installed in the /var/log Directory

File backup is generally 3 individual

HDFS The default block size is 128M, Each file size should not be less than 128.

Yarn Default configuration memory 8G, That is, the minimum number of nodes 8G,CPU Number of cores 8 nucleus

reference :

Hadoop The process of cluster operation is to pull all data distribution records into memory , So this means that as the data of the whole cluster grows larger and larger , We know that in the environment of big data , Several TB Level or PB
Levels of data are common , This means that the data distribution records will also increase , So we need to increase the memory , Here is a reference :

commonly 1GB Memory can manage millions block file .

give an example :bolck by 128M, Copy as 3 individual ,200 Station cluster ,4TB data , Needed Namenode Memory is :200( Number of servers )x 4194304MB(4TB data
) / (128MB x 3)=2184533.33 Files =2.18 Million documents , So the memory value is close to 2.2G 了 .

again , Because here's a machine for backup , therefore secondarynamenode Required memory vs namenode
It needs about the same amount of memory , Then there is the amount of memory required by each server of the slave node

First calculate the current CPU Virtual cores of (Vcore):

Number of virtual cores (Vcore)=CPU number * single CPU Sum *HT( Number of hyper threads )

Then configure the memory capacity according to the number of virtual cores :

Memory capacity = Number of virtual cores (Vcore)*2GB( at least 2GB)

     about CPU Choice of , because Hadoop Computing for distributed computing , So its running model is basically intensive parallel computing , So recommended CPU
Try to choose multi-channel and multi-core , This is true for each node if conditions permit .

     then , In a large distributed cluster , It should also be noted that , Because of distributed computing , Frequent communication and IO
operation , This means that there are requirements for network bandwidth , Therefore, it is recommended to use a network card with the capacity of more than gigabytes , Conditions allow 10 Gigabit network card , The same is true for switches .

be careful : because zookeeper Some components need to be elected leader as well as follow, The minimum number of nodes required is 3 Node above and odd , Otherwise, a node hangs up , Cluster cannot be selected leader
, whole zookeeper Can't run .

give an example :

       As long as more than half of the machines in the cluster are working properly , Then the whole cluster is available for external use . That is to say, if there is 2 individual zookeeper, So as long as there is 1 Dead zookeeper
It can't be used , because 1 No more than half , therefore 2 individual zookeeper The tolerance of death is 0; Homology , If there is 3 individual zookeeper, One is dead , There are still 2 Normal , More than half , therefore 3 individual
zookeeper The tolerance of is 1; In the same way, you can list more :2->0;3->1;4->1;5->2;6->2 You'll find a pattern ,2n and 2n-1 The tolerance is the same , It's all n-1
, So in order to be more efficient , Why add that unnecessary zookeeper.