Always talking about concurrent topics , We're all immune , So this time I'm going to talk about the next topic —— database （ Welcome to correct and supplement ）
After reading, ask yourself a question from my test ：NoSQL How should I choose ?
Database ranking ：https://db-engines.com/en/ranking <https://db-engines.com/en/ranking>
There are three main categories ：（ No matter how old the database is ）
1. Traditional database (SQL)：
* relational database ：SQLite,MySQL,SQLServer,PostgreSQL,Oracle...
2. High concurrent products (NoSQL)：
* Key value database ：Redis,MemCached...
* Document database ：MongoDB,CouchBase,CouchDB,RavenDB...
* Column database ：Cassandra,HBase,BigTable...
* Search Engine Department ：Elasticsearch,Solr,Sphinx...
* Graphic database ：Neo4J,ArangoDB,Flockdb,OrientDB,Infinite Graph,InfoGrid...
PS：ArangoDB Is a native multi model database , With documents , Flexible database of graphics and key values
3. Products of the new era (TSDB)：
* temporal database ：InfluxDB,LogDevice,Graphite,,OpenTSDB...
Let's look at an authoritative picture ：( The red one is recommended NoSQL, Grey is traditional SQL)
Let's talk about it first NoSQL No, don't use tradition SQL 了 , It's not just traditional SQL（not only sql）
1. Advantages and disadvantages of relational database
Let's look at the benefits of traditional databases ：
* Keeping data consistent through transactions
* sure Join And so on
* Community improvement （ If you have a problem, simply search it ok了）
Of course, there are also shortcomings ：
* Modify the table structure when the amount of data is large .eg： Add a field , If you set this field as an index, it is a card to burst , I dare not do it during working hours
* If the column is not fixed, it will hurt even more , General design database can not be so perfect , They are all more and more perfect in the later period , Even if you reserve reserved fields, it's not humanized
* Big data writing and processing is troublesome ,eg：
* The amount of data is not very good , Write in bulk .
* But the amount of data itself is quite large , Master slave replication was performed , Read data in Salver There's nothing wrong with it , However, a large number of database connections are made Master I can't eat it up , We have to add the master database .
* There is a problem after adding ： Although the main database is divided into two parts , However, data inconsistency is easy to occur
（ The same data is updated to different values in the two main databases ）, At this time, we have to combine the sub database and sub table , Distribute tables in different master databases .
Is it over ?NoNoNo, Think about it between the watches Join how ? Isn't it cross database and cross server join了? It's the rhythm of robbing Peter to pay Paul , So all kinds of middleware are born 【SQLServer This aspect of the expansion is very good , Column storage also comes with , Also cross platform （ It is suggested that Docker Run in ）（
Click on me to see an article I wrote a few years ago <https://www.cnblogs.com/dunitian/p/6041323.html>）】
* Welcome to add ~（ A word of conscience , Small and medium sized companies SQLServer Absolutely the best choice , It saves a lot of time ）
Now let's talk about it NoSQL了：（ In fact, you can understand it as :NoSQL It's about the original SQL The expansion and supplement of ）
* When splitting tables and databases, the associated tables are generally placed on the same server , This is convenient join operation . and NoSQL I won't support it join, It doesn't have to be so limited , Data is easier to be distributed
A lot of data processing , Reading tradition SQL Not too many disadvantages ,NoSQL Mainly for cache processing , In the aspect of batch data writing, testing is often much higher than that of traditional methods SQL, and NoSQL It's too convenient to expand
* Multi scene type NoSQL（ Key value , file , column , graphical ）
If you still don't know how to choose NoSQL, Let's talk about the characteristics of each type in detail ：
* Key value database ： This is very familiar to everyone , It is mainly key value storage , representative =>Redis（ Support persistence and data recovery , We'll talk about it later ）
* Document database ： representative =>MongoDB（ Youku's online reviews are based on MongoDB Of ）
* Generally, there is no business （MongoDB 4.0 Start supporting ACID Business ）
* I won't support it Join（Value Is a mutable class JSON format , It is convenient to modify the table structure ）
* Column database ： representative ：Cassandra,HBase
* Modify and update a large number of rows and a small number of columns （ Add a new field , What kind of batch operation should not be too convenient ~ Read and write for column as a unit ）
* High scalability , The increase of data does not reduce the corresponding processing speed （ Especially writing ）
* Search Engine Department ： representative ：Elasticsearch, It's classic. Needless to say （ Traditional fuzzy search can only like too low, So there's this ）
* Graphic database ： representative ：Neo4J,Flockdb,ArangoDB（ The data model is graph structured , Mainly used for The relationship is complicated
Design of , For example, draw one QQ Visual graph of group relationship , Or draw a micro blog fan relationship diagram ）
We should go back to the remaining topics of concurrency , If you look at it carefully, you will find that no matter what language the underlying implementation is, it is almost the same .
Like the process , The bottom layer is what we said in the first part OS.fork. Let's talk about it ( Line ) Program communication , Yes PIPE,FIFO,Lock,Semaphore It's rarely used ? however Queue
The bottom layer is the implementation , How to read the source code ?
Remember when it was introduced Queue It is mentioned in the article Java Inside CountDownLatch Do you ? If you don't understand Condition How to quickly simulate one yourself Python What about the functions that are not in it ?
It is absolutely not advisable to know what it is and why it is . I'll talk about it later MQ We have to use it again Queue Of knowledge , It can be described as a ring set of a ring ~
Since it's not the cute girl of the company , So what ~ It's up to you to improve your technology ^_^, Come here first , At the end of the article, I will post a common solution ：
Python,NetCore Common solutions （ Continuous update ）
It was mentioned in the first part ACID I'm going to talk about it this time , And then talk about it CAP And data consistency
Let's continue with the example of Xiaoming and Xiaozhang's transfer ：
* A： Atomicity （Atomic）
* Xiaoming transfer 1000 To Xiao Zhang ： Xiao Ming -=1000 => Xiao Zhang +=1000, this （ affair ） It is an indivisible whole
, If Xiaoming -1000 After the problem , that 1000 I have to give it back to Xiao Ming
* C： uniformity （Consistent）
* Xiaoming transfer 1000 To Xiao Zhang , Xiao Ming must be guaranteed + The total amount of Xiaozhang remains unchanged （ Assuming no other transfer ( affair ) influence ）
* I： Isolation （Isolated）
* When Xiao Ming transfers money to Xiao Zhang , Xiao pan also transferred money to Xiao Zhang , We need to make sure they don't interact with each other （ It is mainly isolation in the case of concurrency ）
* D： persistence （Durable）
* There should be a record of Xiaoming's transfer to Xiaozhang bank , Even if we argue with each other in the future, we can draw a daily account 【 Persistence after successful transaction execution （ Even if the database is hung, it can pass through Log recovery ）】
CAP <https://baike.baidu.com/item/CAP principle > They are three indexes that need to be considered in distributed system , Data sharing can only satisfy two but not both ：
* C： uniformity （Consistency）
* All nodes access the same copy of the latest data （ All data backup in distributed system , Is the same value at the same time ）
* eg: After updating in distributed system , All users should read the latest value
* A： usability （Availability）
* After some nodes in the cluster fail , Can the cluster respond to the read and write requests from clients .（ High availability for data updates ）
* eg: In a distributed system, every operation always returns the result in a certain time （ Overtime does not count 【 What has been waiting for online shopping ? The computer room hangs a few server also does not affect 】）
* P： Partition tolerance （Partition Toleranc）
* In terms of actual effect , The partition is equivalent to the time limit of communication . If the system fails to reach data consistency within the time limit , This means that partitioning has occurred , Must be in the C and A Choose between .
* eg: In distributed system , There is network delay （ partition ） Can still accept requests to meet consistency and availability
representative ： Traditional relational database
If you want to avoid partition fault tolerance problems , One way is to take all the data （ Transaction related ） All on one machine . Although not 100% Make sure the system doesn't go wrong , But you won't encounter the negative effects of partitioning （ Will seriously affect the scalability of the system ）
As a distributed system , give up P, This is equivalent to abandoning the distribution , Once the concurrency is high , Stand alone service can't bear the pressure at all . Like a lot of banking services , It's really giving up P, Only high performance single minicomputer is used to guarantee service availability .（
All NoSQL Databases are assumptions P It exists ）
representative ：Zookeeper,Redis（ Distributed database , Distributed lock ）
Relative to giving up “ Partition tolerance “ Come on , The opposite is to give up usability . In case of partition fault tolerance , Then the affected services need to wait for the data to be consistent （ The system cannot provide external services while waiting for data consistency ）
representative ：DNS database （IP Distributed database mapping with domain name , Lenovo modification IP Why TTL need 10 About minutes to ensure that all parsing takes effect ）
back DNS query ：https://www.cnblogs.com/dunitian/p/5074773.html
Abandoning strong consensus , Ensure final consistency . be-all NoSQL Databases are between CP and AP between , Try to go AP by ,（
Traditional relational database focuses on data consistency , For the distributed processing of massive data, the priority of availability and partition fault tolerance is higher than data consistency ）eg：
Different data have different consistency requirements ,eg：
* User comments , These are insensitive to consistency , For a long time, inconsistencies do not affect the user experience
Like commodity prices and so on, you dare to have a look ? Consistency is a high requirement , Tolerance must be lower than 10s, Even if the cache is used, the price in the order is up to date （ Pay attention to it at ordinary times JD Cache description under item ,JD Still so , The rest is needless to say ）
2.3. Data consistency
Traditional relational database usually uses pessimistic lock , But scenes like the second kill are hou Not moving , The optimistic lock is often used at this time （CAS mechanism , I mentioned it earlier when I talked about concurrency and locking ）, As mentioned above, different business requirements have different requirements for consistency CAP Not at the same time , There are mainly two kinds ：
* Strong consistency ： No matter which copy the update is on , After that, the operation should be able to get the latest data . Multi copy data needs distributed things to ensure data consistency （ This is the reason why we often ask about the project ）
* Final consistency ： Under this constraint, users can finally read the latest data . Give me a few examples ：
* Causal consistency ：A,B,C Three independent processes ,A The data was modified and informed B, At this time B What we get is the latest data . because A No notice C, therefore C Not up to date
* Session consistency ： Users submit their own updates , He can get the updated data before the end of the session , After the end of the session （ Other users ） It may not be the latest data （ After submission JQ Modify local value , There is no guarantee that the data is up to date ）
* Read and write consistency ： It's about the same as above , It's just not limited to conversation . After the user updates the data, he gets the latest data himself , Other users may not be up to date （ Certain delay ）
* Monotonic reading consistency ： The user reads a value , Subsequent operations will not read an earlier version of the data （ New level >= Read value ）
* Monotonic writing consistency （ Timeline consistency ）： All copies of all databases perform all update operations in the same order （ It's kind of like Redis Of AOF）
2.4. Consistency implementation method
Quorum system NRW strategy （ Commonly used ）
Quorum It's a collection A,A It's a complete collection U Subset of ,A Arbitrary collection in B,C, They both intersect .
NRW algorithm ：
* N： Represents the number of copies the data has .
* R： Represents the minimum number of copies that need to be read to complete the read operation （ The minimum number of nodes required to participate in a read operation ）
* W： Represents the minimum number of copies that need to be written to complete the write operation （ The minimum number of nodes required to participate in a write operation ）
* Just a guarantee R + W > N We can ensure strong consistency （ There is overlap between the nodes that read the data and the nodes that are written synchronously ） such as ：N=3,W=2,R=2（ One node is read + write ）
* In relational database , If N=2, Can be set W=2,R=1（ Write consumption performance ）, At this time, the system needs to update the data on both nodes to confirm the result and return it to the user
* If R + W <= N, At this time, read and write will not appear on one node at the same time , The system can only guarantee the final consistency . The time for the replica to reach consistency depends on the way the system updates asynchronously
, Inconsistent time = Update node from ～ Time consuming for all nodes to be updated asynchronously
* R and W Settings directly affect the performance of the system , Scale and consistency ：
* If W Set to 1, Then a copy is updated and returned to the user , The rest are then updated asynchronously N-W Nodes
* If R Set to 1, As long as one copy is read, the read operation can be completed ,R and W Smaller values of affect consistency , Larger will affect performance
* When W=1,R=N==> The system has high requirements for writing , But reading is slower （N Nodes have 1 I'm dead , I can't finish reading ）
* When R=1,W=N==> The system has high requirements for read operation , But the writing performance is low （N Nodes have 1 I'm dead , I can't finish writing ）
* Common prescription 法：一般设置R = W = N/2 + 1,这样性价比高,eg：N=3,W=2,R=2（3个节点==>1写,1读,1读写）
* 消息接收时间 > 消息发送时刻的时间（要考虑服务器时间差的问题～时间同步服务器）