One , Getting started
1,linux Operation basis

* Linux Introduction to ,Linux Installation of :VMware Workstation Virtual software installation process ,CentOS Virtual machine installation process
* Linux Common commands for : Introduction of common commands , Use and practice of common commands ( File operation , User management and authority , Security free login configuration and network management )
* Linux Basic principles of system process management and related management tools such as ps,pkill,top,htop Use of etc ;
* Linux Start process , Operation level details ,chkconfig Detailed explanation
* VI,VIM editor :VI,VIM Introduction to the editor ,VI,VIM Use and common shortcut keys
* Linux Disk management ,lvm Logical volume ,nfs Detailed explanation
* Linux System file authority management : Introduction to file authority , Operation of file permission
* Linux Of RPM Software package management :RPM Package introduction ,RPM install , Unloading and other operations
* yum command ,yum Source building
* Linux network :Linux Introduction to network ,Linux Network configuration and maintenance Firewall configuration
* Shell programming :Shell Introduction to ,Shell Script writing
* Linux Installation of common software on : install JDK, install Tomcat, install mysql,web Project deployment
13)linux Advanced text processing commands cut,sed,awklinux

14) Timed tasks crontab

2, High concurrency processing of large websites

Layer 4 load balancing

a) Lvs load balancing i. Load algorithm ,NAT pattern , Direct routing mode (DR), tunnel mode (TUN)
b) F5 Introduction of load balancer

Layer 7 load balancing
a) Nginx b) Apache

Tomcat,jvm Optimize to increase concurrency

Cache optimization
a) Java Cache framework i. Oscache,ehcache
b) Cache database i. Redis,Memcached

Lvs+nginx+tomcat+redis|memcache Build two layer load balancing and ten million concurrent processing


Fastdfs Small file independent storage management

Redis Cache system a) Redis Basic use b) Redis sentinel High availability c) Redis Friend recommendation algorithm

3,Lucene Basics

Lucene introduce

Lucene Principle of inverted index

Index building IndexWriter

search IndexSearcher


Sort and filter (filter)

Index optimization and highlighting

4,solr Basics

* What is? solr
* Why should it be used in engineering solr
* Solr The principle of
* How to tomcat Run in solr
* How to use it solr Index and search
* solr Various queries for
* solr Of Filter
* solr Sort of
* solr Highlight of
* solr Domain statistics for
* solr Range statistics of
* solrcloud Cluster building
5, Cloth coordination service zookeeper

zookeeper Introduction and application scenarios
zookeeper Cluster installation and deployment
zookeeper Data node and command line operation of
zookeeper Of java Client basic operation and event monitoring
zookeeper Core mechanism and data node
zookeeper Application cases – Distributed shared resource lock
zookeeper Application cases – Dynamic perception of server online and offline
zookeeper Data consistency principle and leader Electoral mechanism
6,java Advanced feature enhancement

Java Basic knowledge of multithreading
Java Synchronized keyword explanation
java Concurrent package thread pool and its application in open source software
Java The application in open source software
Java JMS technology
Java Dynamic proxy reflection

Two , Off line computing system
1,hadoop quick get start
hadoop Background
Overview of distributed system
Introduction of offline data analysis process
Cluster building
Preliminary use of cluster

2,HDFS enhance
HDFS The concept and characteristics of
HDFS Of shell( Command line client ) operation
HDFS Working mechanism of
NAMENODE Working mechanism of
java Of api operation
case 1: development shell Collection script

3,MAPREDUCE Detailed explanation
custom hadoop Of RPC frame
Mapreduce Programming specification and sample writing
Mapreduce Program operation mode and debug method
mapreduce Internal mechanism of program running mode
mapreduce The main workflow of computing framework
Serialization method of custom object
MapReduce Programming case

4,MAPREDUCE enhance
Mapreduce sort
custom partitioner
Mapreduce Of combiner
mapreduce Detailed explanation of working mechanism

5,MAPREDUCE actual combat
maptask Parallelism mechanism - File slicing
maptask Parallelism setting
Inverted index
Common friends

6,federation Introduction and hive use
Hadoop Of HA mechanism
HA Cluster installation and deployment
Cluster operation and maintenance test Datanode Dynamic online and offline
Cluster operation and maintenance test Namenode State switching management
Cluster operation and maintenance test of data block balance
HA lower HDFS-API change
hive brief introduction
hive framework
hive Installation and deployment
hvie Initial use

7,hive Enhance and flume introduce
HQL-DDL Basic grammar
HQL-DML Basic grammar
HIVE Of join
HIVE Parameter configuration
HIVE Custom functions and Transform
HIVE implement HQL Case analysis of
HIVE Best practices
HIVE Optimization strategy
HIVE Practical cases
Flume introduce
Flume Installation and deployment of
case : Collect directory to HDFS
case : Collect files to HDFS

Three , Data migration tools Sqoop

* introduce and to configure Sqoop
* Sqoop shell use
* Sqoop-import a) DBMS-hdfs b) DBMS-hive c) DBMS-hbase
* Sqoop-export
Four ,Flume Distributed logging framework

* flume brief introduction - Basic knowledge
* flume Installation and testing
* flume Deployment mode
* flume source Related configuration and test
* flume sink Related configuration and test
* flume selector Related configuration and case analysis
* flume Sink Processors Relevant configuration and case analysis
* flume Interceptors Relevant configuration and case analysis
* flume AVRO Client development
* flume and kafka Integration of
Five , In memory database redis
* redis characteristic , Comparison with other databases
* How to install redis
* How to use the command line client
* redis String type of
* redis Hash type of
* redis List type of
* redis Collection type of
* How to use it java visit redis【a.python visit redis,scala visit redis】
* redis Business of (transaction)
* redis The pipeline of (pipeline)
* redis Persistence (AOF+RDB)
* redis optimization
* redis Master slave replication of
* redis Of sentinel High availability
* twemproxy,codis actual combat
* redis3.x Cluster installation configuration
Six ,Storm Upstream and downstream and architecture integration

kafka What is it?

kafka Architecture

kafka Configuration details

kafka Installation of

kafka Storage strategy of

kafka Zoning features

kafka Publish and subscribe to

zookeeper Coordination management

java Programming operation kafka

scala Programming operation kafka

flume and kafka Integration of

Kafka and storm Integration of

Seven ,Storm From introduction to mastery

Storm Basic concepts of

Storm Application scenarios of

Storm and Hadoop Comparison of

Storm Cluster installation linux Environmental preparation

zookeeper Cluster building

Storm Cluster building

Storm Configuration file configuration item explanation

Solutions to common problems in cluster building

Storm Common components and programming API:Topology, Spout,Bolt

Storm Grouping strategy (stream groupings)

use Strom Develop a WordCount example

Storm Program local mode debug,Storm Program remote debug

Storm Business processing

Storm Message reliability and fault tolerance


Storm Combining message queuing Kafka: Basic concepts of message queue (Producer,Consumer,Topic,Broker etc. ), Message queuing Kafka Usage scenarios ,Storm combination Kafka programming API

Storm Trident concept

Trident state principle

Trident Development example

Storm DRPC( Distributed remote call ) introduce

Storm DRPC Practical explanation

Storm and Hadoop 2.x Integration of :Storm on Yarn

Eight ,scala programming

* scala interpreter , variable , Common data types, etc
* scala Conditional expression for , Input and output , Control structure such as circulation
* scala Function of , Default parameters , Variable length parameters, etc
* scala Array of , Variable length array , Multidimensional array, etc
* scala Mapping of , Tuple and other operations
* scala Class of , include bean attribute , Auxiliary constructor , Main constructors, etc
* scala Object of , Singleton object , Companion Object , Extension class ,apply Methods, etc
* scala Bag of , introduce , Inheritance and other concepts
* scala Characteristics of
* scala Operator for
* scala Higher order function of
* scala Set of
* scala Database connection
Nine , Memory computing system Spark

* Spark introduce
* Spark Application scenarios
* Spark and Hadoop MR,Storm Comparison and advantages of
* Transformation
* Action
* Spark calculation PageRank
* Lineage
* Spark Introduction to the model
* Spark Cache policy and fault tolerance
* Wide dependence and narrow dependence
* Spark Configuration explanation
* Spark Cluster building
* Solutions to common problems in cluster building
* Spark Principle core components and common RDD
* Data locality
* task scheduling
* DAGScheduler
* TaskScheduler
* Spark Source code interpretation
* performance tuning
* Spark and Hadoop2.x integration :Spark on Yarn principle
Ten ,SparkStreaming Practical application
Spark-Streaming brief introduction
Spark-Streaming programming
actual combat :StageFulWordCount
Flume combination Spark Streaming
Kafka combination Spark Streaming
Window function
ELK Introduction to technology stack
ElasticSearch Installation and use
Storm Architecture analysis
Storm Programming model ,Tuple Source code , Concurrency analysis
Storm WordCount Cases and common use Api analysis

eleven , Machine learning algorithm
1,python and numpy library
Introduction to machine learning
Machine learning and python
python language – quick get start
python language – Explain data type
python language – Process control statement
python language – Function usage
python language – Modules and packages
phthon language – object-oriented
python Machine learning algorithm library –numpy
Mathematical knowledge necessary for machine learning – probability theory

2, Implementation of common algorithms
knn Classification algorithm – Algorithm principle
knn Classification algorithm – code implementation
knn Classification algorithm – Handwritten character recognition case
lineage Regression classification algorithm – Algorithm principle
lineage Regression classification algorithm – Algorithm implementation and demo
Naive Bayesian classification algorithm – Algorithm principle
Naive Bayesian classification algorithm – Algorithm implementation
Naive Bayesian classification algorithm – Application case of spam identification
kmeans clustering algorithm – Algorithm principle
kmeans clustering algorithm – Algorithm implementation
kmeans clustering algorithm – Application of geographical location clustering
Decision tree classification algorithm – Algorithm principle
Decision tree classification algorithm – Algorithm implementation