Ambari

introduce

Apache Ambari It's based onWeb Tools, SupportApache Hadoop Supply of clusters, Management and monitoring.Ambari Most supportedHadoop assembly, IncludeHDFS
,MapReduce,Hive,Pig, Hbase,Zookeper,Sqoop andHcatalog etc..

Apache Ambari SupportHDFS,MapReduce,Hive,Pig,Hbase,Zookeper,Sqoop andHcatalog Centralized management of. Also5
Top levelhadoop One of management tools.

proposal

Need tohadoop Components of ecosystem and environmental configuration, Parameters have some knowledge, Only recommendedambari build

Contrast(CDH)
1,Hortonworks Hadoop Different from othersHadoop Distribution version
<https://www.baidu.com/s?wd=Hadoop%E5%8F%91%E8%A1%8C%E7%89%88&tn=44039180_cpr&fenlei=mv6quAkxTZn0IZRqIHckPjm4nH00T1dWmWNBPAF9uyDvuAnkrjf10ZwV5Hcvrjm3rH6sPfKWUMw85HfYnjn4nH6sgvPsT6KdThsqpZwYTjCEQLGCpyw9Uz4Bmy-bIi4WUvYETgN-TLwGUv3ErH6drH0dnjDzP1TvPjR3PjD4>
( asCloudera) It's all about,Hortonworks Our products are 100% open source. 2,Cloudera Free and enterprise, Enterprise version only has trial period
<https://www.baidu.com/s?wd=%E8%AF%95%E7%94%A8%E6%9C%9F&tn=44039180_cpr&fenlei=mv6quAkxTZn0IZRqIHckPjm4nH00T1dWmWNBPAF9uyDvuAnkrjf10ZwV5Hcvrjm3rH6sPfKWUMw85HfYnjn4nH6sgvPsT6KdThsqpZwYTjCEQLGCpyw9Uz4Bmy-bIi4WUvYETgN-TLwGUv3ErH6drH0dnjDzP1TvPjR3PjD4>
. 3,apache hadoop It's nativehadoop. 4, What is currently popular in China isapache hadoop,Cloudera CDH, Of course
Hortonworks Also useful5,Apache Ambari Is based onweb Tools, Used for configuration, Management and monitoringApache Hadoop colony, SupportHadoop
HDFS,Hadoop MapReduce,Hive,HCatalog,,HBase,ZooKeeper,Oozie,Pig andSqoop.Ambari
The cluster health dashboard is also available, such asheatmaps And viewMapReduce,Pig,Hive Application capabilities, Diagnose their performance characteristics with a friendly user interface.
 

Get ready

Install before installation Centos 7.2, jdk-8u91, mysql5.7.13

Master node:master(172.26.99.126)

Slave node:slave1(172.26.99.127),slave2(172.26.99.128),slave3(172.26.99.129)

Matters needing attention: Ensure all nodes are time synchronized; Ensure that all nodes can communicate with each other and access the external network

To configureSSH Password free login

Master node(master) inroot The user logs in as follows

 

ssh-keygen

cd ~/.ssh/

cat id_rsa.pub>> authorized_keys

 

Logging in from noderoot Executive order

mkdir ~/.ssh/

 

Distribute the configuredauthorized_keys To each slave node

scp/root/.ssh/authorized_keys [email protected]:~/.ssh/authorized_keys

scp/root/.ssh/authorized_keys [email protected]:~/.ssh/authorized_keys

scp/root/.ssh/authorized_keys [email protected]:~/.ssh/authorized_keys
<mailto:[email protected]:~/.ssh/authorized_keys>

Establishambari System users and user groups

Operate on the primary node only

Add toambari install, Running users and user groups, You can also not create a new user, Direct useroot Or other system accounts

adduser ambari

passwd ambari

openNTP service

All nodes on the cluster need to operate

Centos 7 command

yum install ntp

systemctl is-enabled ntpd

systemctl enable ntpd

systemctl start ntpd

inspectDNS andNSCD

All nodes should be set

ambari Full domain name needs to be configured during installation, So we need to checkDNS.

vi /etc/hosts

172.26.99.126 master.chinadci.com master

172.26.99.127 slave1.chinadci.com slave1

172.26.99.128 slave2.chinadci.com slave2

172.26.99.129 slave3.chinadci.com slave3

Configuration in each nodeFQDN, Take the master node as an example( it is to be noted thatFQDN Naming conventions for:hostname+ domain name)

vi /etc/sysconfig/network

NETWORKING=yes

HOSTNAME=master.chinadci.com

Turn off firewall

All nodes should be set

systemctl disable firewalld

systemctl stop firewalld

CloseSELinux

All nodes should be set

SeeSELinux state:

sestatus

IfSELinuxstatus Parameter isenabled Is the open state 

SELinux status: enabled

Machine restart required to modify configuration file:

vi /etc/sysconfig/selinux

SELINUX=disabled

Make local source

Making local source only needs to be done on the primary node

<> Related preparations

<> install Apache HTTP The server

installHTTP The server, allow http Service through firewall( permanent)

 

yum install httpd

firewall-cmd --add-service=http

firewall-cmd --permanent --add-service=http

 

Add toApache Service to the system layer to start automatically with the system

systemctl start httpd.service

systemctl enable httpd.service

Install local source making tools

yum install yum-utils createrepo

Download installation resources

downloadAmbari 2.2.2 , HDP 2.4.2 Installation resources for, This installation is inCentos 7 upper, List onlycentos7 Resources, Please use the resources of other systems

Ambari 2.2.2 Download resources


OS

Format

URL


CentOS 7

Base URL

http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.2.2.0
<http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.2.2.0>


CentOS 7

Repo File


http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.2.2.0/ambari.repo

<http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.2.2.0/ambari.repo>


CentOS 7

Tarball md5 asc


http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.2.2.0/ambari-2.2.2.0-centos7.tar.gz

<http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.2.2.0/ambari-2.2.2.0-centos7.tar.gz>

HDP2.4.2 Download resources


OS

Repository Name

Format

URL


CentOS 7

HDP

Base URL

http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.4.2.0
<http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.4.2.0>


CentOS 7

HDP

Repo File

http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.4.2.0/hdp.repo
<http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.4.2.0/hdp.repo>


CentOS 7

HDP

Tarball md5 asc


http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.4.2.0/HDP-2.4.2.0-centos7-rpm.tar.gz

<http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.4.2.0/HDP-2.4.2.0-centos7-rpm.tar.gz>


CentOS 7

HDP-UTILS

Base URL

http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.20/repos/centos7
<http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.20/repos/centos7>


CentOS 7

HDP-UTILS

Repo File


http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.20/repos/centos7/HDP-UTILS-1.1.0.20-centos7.tar.gz

<http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.20/repos/centos7/HDP-UTILS-1.1.0.20-centos7.tar.gz>

Download the package in the list above, 
The compressed packages to be downloaded are as follows:

Ambari 2.2.2


http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.4.0.0/HDP-2.4.0.0-centos7-rpm.tar.gz

HDP 2.4.2


http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2.2.2.0/ambari-2.2.2.0-centos7.tar.gz

HDP-UTILS 1.1.0


http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.20/repos/centos7/HDP-UTILS-1.1.0.20-centos7.tar.gz

stayhttpd Site root, The default is/var/www/html/, Create directoryambari, 
And decompress the downloaded package to/var/www/html/ambari Catalog

cd /var/www/html/

mkdir ambari

cd /var/www/html/ambari/

tar -zxvf ambari-2.2.2.0-centos7.tar.gz

tar -zxvf HDP-2.4.2.0-centos7-rpm.tar.gz

tar -zxvf HDP-UTILS-1.1.0.20-centos7.tar.gz

Verificationhttd Is the website available, have access tolinks Command or browser directly to the following address:

http://172.26.99.126/ambari/

give the result as follows:




To configureambari,HDP,HDP-UTILS Local source

First download the corresponding resources in the resource list aboverepo file, Modify theURL Is the local address, The configuration is as follows:

wget
http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.2.2.0/ambari.repo

<http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.2.2.0/ambari.repo>

wget
http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.4.2.0/hdp.repo
<http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.4.2.0/hdp.repo>

 

vi ambari.repo




vi hdp.repo



 

Place the modified source above/etc/yum.repos.d/ Below

Execute the following command

yum clean all

yum list update

yum makecache

yum repolist

installMysql data base

Ambari Installation will write installation and other information to the database, It is recommended to use self installedMysql data base, You can also use the default database without installingPostgreSQL

Mysql Please refer to the following article for the database installation process:

http://blog.csdn.net/u011192458/article/details/77394703
<http://blog.csdn.net/u011192458/article/details/77394703>

Create after installationambari Database and users, Sign inroot The user executes the following statement:

create database ambari character set utf8 ;

CREATE USER 'ambari'@'%'IDENTIFIED BY 'ambari';

GRANT ALL PRIVILEGES ON *.* TO 'ambari'@'%';

FLUSH PRIVILEGES;

installmysqljdbc drive

yum install mysql-connector-java

installJDK

Install the uncompressed versionJDK, Download it on the official website firstjdk-8u91-linux-x64.tar.gz , Execute the following command again:

tar -zxvf jdk-8u91-linux-x64.tar.gz -C /opt/java/

vi /etc/profile

export JAVA_HOME=/opt/jdk1.8

exportCLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

PATH=$PATH:$HOME/bin:$JAVA_HOME/bin

source /etc/profile

installAmbari

installAmbari2.2.2

installAmbari

yum install ambari-server

To configureAmbari

ambari-server setup

Select prompt according to operation

Selective attention:ambari-server Service account password isambari

JDK Path is a custom path/var/opt/jdk1.8

Database configuration is custom installedMysql

The account and password of the database areambari




Importambari Script

takeAmbari Database script import to database

If you use your own defined database, Must be started atAmbari Import before serviceAmbari Ofsql Script

useAmbari user( Users set up above) Sign inmysql

mysql -u ambari -p

use ambari source/var/lib/ambari-server/resources/Ambari-DDL-MySQL-CREATE.sql

start-upAmbari

Execute start command, start-upAmbari service

ambari-server start

Enter in browser after successful startupAmbari address:

http://172.26.99.126:8080/

The login interface appears, Default administrator account login, account:admin Password:admin






After successful login, the following interface appears, thusAmbari Installation of succeeded 

Ambari Cluster configuration

Frame:

slave1.chinadci.com <http://172.26.99.126:8080/>172.26.99.127 /default-rack

slave2.chinadci.com <http://172.26.99.126:8080/>172.26.99.128 /default-rack

slave3.chinadci.com <http://172.26.99.126:8080/>172.26.99.129 /default-rack

To configure:

Hadoop The component logs of the ecosystem are installed in the root directory of each component by default, Easy to find.

utilizeambari It can be built quicklyHadoop colony, After installation, the default installation of each component is in the/usr/hdp/ Directory.

Ambari-server The log directory of the server is installed in the/var/log Directory

Ambari-agent The log directory of the collector is installed in the/var/log Directory

File backup is generally3 individual

HDFS The default block size is128M, Each file size should not be less than128.

Yarn Default configuration memory8G, That is, the minimum number of nodes8G,CPU Kernel number8 nucleus

Reference: https://www.cnblogs.com/zhijianliutang/p/5731002.html

Hadoop The process of cluster operation is to pull all data distribution records into memory, So this means that as the data of the whole cluster grows larger and larger, We know that in the environment of big data, SeveralTB Grade orPB
Levels of data are common, This means that the data distribution records will also increase, So we need to increase the memory, Here is a reference:

commonly1GB Memory can manage millionsblock file.

Give an example:bolck by128M, Copy is3 individual,200 Taiwan cluster,4TB data, NeededNamenode Memory is:200( Server number)x 4194304MB(4TB data
) / (128MB x 3)=2184533.33 File=2.18 Million documents, So the memory value is close to2.2G 了.

again, Because here's a machine for backup, thereforesecondarynamenode Required memory vsnamenode
It needs about the same amount of memory, Then there is the amount of memory required by each server of the slave node

First calculate the currentCPU Virtual cores of(Vcore):

Virtual kernel number(Vcore)=CPU Number* singleCPU Composite number*HT( Hyper threading number)

Then configure the memory capacity according to the number of virtual cores:

Memory capacity= Virtual kernel number(Vcore)*2GB( at least2GB)

     aboutCPU Choice, becauseHadoop Computing for distributed computing, So its running model is basically intensive parallel computing, So recommendedCPU
Try to choose multi-channel and multi-core, This is true for each node if conditions permit.

     Then? In a large distributed cluster, It should also be noted that, Because of distributed computing, Frequent communication andIO
operation, This means that there are requirements for network bandwidth, Therefore, it is recommended to use a network card with the capacity of more than gigabytes, Conditions allow 10 Gigabit network card, The same is true for switches.

Be careful: Becausezookeeper Some components need to be electedleader as well asfollow, The minimum number of nodes required is3 Node above and odd, Otherwise, a node hangs up, Cluster cannot be selectedleader
, Wholezookeeper Can't run.

Give an example:

       As long as more than half of the machines in the cluster are working properly, Then the whole cluster is available for external use. That is to say, if there is2 individualzookeeper, So as long as there is1 Dead.zookeeper
It can't be used, because1 No more than half, therefore2 individualzookeeper The tolerance of death is0; Empathy, If there is3 individualzookeeper, One is dead. Still left2 Normal, More than half. therefore3 individual
zookeeper The tolerance of is1; In the same way, list a few more:2->0;3->1;4->1;5->2;6->2 You'll find a pattern,2n and2n-1 The tolerance is the same, All aren-1
, So in order to be more efficient, Why add that unnecessaryzookeeper.