<> Application scenario
1. Information flow processing
Storm Can be used to process new data and update database in real time , Fault tolerance and scalability . Namely Storm It can be used to deal with the continuous flow of messages , Write the result to a storage after processing .
2. Continuous calculation
Storm Continuous query and immediate feedback to client . For example Twitter Hot topics on send to browser .
3. Distributed remote call
Storm Can be used for parallel processing of intensive queries .Storm The topology of is a distribution function waiting for call information , When it receives a call message , The query will be calculated , And return the query results . for instance
Distributed RPC Can do parallel search or deal with large set of data .
<> Operation steps
<>1. Storm summary
Storm It's a real-time , Distributed , Reliable streaming data processing system . Its job is to delegate various components to handle some simple tasks independently . stay Storm What processes the input flow in the cluster is
Spout assembly , and Spout And pass the read data to the Bolt Components of .Bolt The component processes the received data tuples , It's also possible to pass it on to the next Bolt. We can
Clusters are imagined as a Bolt Chain set of components , Data is transmitted on these chains , and Bolt Process data as nodes in the chain .
Storm Ensure that every message is processed , And it's very fast , In a small cluster , Millions of messages can be processed per second .Storm
The processing speed is amazing ： Tested , Each node can process every second 100 10000 data tuples . Its main application areas are real-time analysis , Online machine learning , Continuous calculation , Distributed
RPC（ Far procedure call protocol , A service request from a remote computer program over a network , Without understanding the protocol of underlying network technology .）,ETL（ Data extraction , Converting and loading ） etc. .
Storm and Hadoop Clusters look similar on the surface , however Hadoop It's running MapReduce Jobs, And in Storm Topology is running on
Topology, It's very different between the two , The key difference is ：MapReduce It will end , And one Topology Will always run （ Unless you do it by hand kill
fall ）, let me put it another way ,Storm Real time data analysis oriented , and Hadoop For offline data analysis ,Storm stay HDP The location in is shown in the following figure .
<>2. Storm Cluster architecture
Storm Cluster of consists of one master node and multiple work nodes . The primary node runs a system named “Nimbus” Daemons for , Each work node runs a task named “Supervisor” Daemons for , The coordination between the two is carried out by ZooKeeper
To complete ,ZooKeeper For managing different components in a cluster ,Storm The cluster architecture is shown in the figure below .
<>2.1 Master node Nimbus
The primary node usually runs a background program ——Nimbus, Used to respond to nodes distributed in the cluster , Assign tasks and monitor faults , At a node
Supervisor After a breakdown , If the Worker Process aborted ,Nimbus Will terminate abnormally Worker Process assigned to other
Supervisor Continue running on node , This is similar to Hadoop In JobTracker.
<>2.2 Work node Supervisor
Each working node is running on a platform called Supervisor process .Supervisor Monitor from Nimbus Tasks assigned to it , It can also ensure normal operation
Worker You can restart the Worker.Nimbus and Supervisor The coordination between them is through ZooKeeper system .
<>2.3 Coordination service component Zookeeper
ZooKeeper It's done Nimbus and Supervisor
Coordinated services between . The real-time logic of the application is encapsulated in Storm In “topology”.Topology A set of Spout( data source ) and
Bolts( data processing ) adopt Stream Groupings Connecting diagram .
<>2.4 Working process Worker
Worker It's a Java process , Perform part of the topology . One Worker The process executes a Topology Subset of , It will start one or more Executor
Thread to execute a Topology Components of （Spout or Bolt）, As shown below .
<>3. Storm Use of
One dan storm Task on , So it's running all the time , Unless terminated manually .