Sharding The basic idea is to put a database <http://lib.csdn.net/base/mysql>
Cut into multiple parts and put them into different databases (server) upper , So as to alleviate the performance problem of single database . Not strictly speaking , For massive data databases , If there are more tables and more data , At this time, it is suitable to use vertical segmentation , That is, close ties （ The first mock exam. ） The table is sliced out and placed in a server upper . If there are not many tables , But each table has a lot of data , This is suitable for horizontal segmentation , In other words, the data in the table are arranged according to certain rules （ For example, press ID hash ） Shard to multiple databases (server) upper . of course , In reality, these two situations are more mixed together , At this time, we need to make a choice according to the actual situation , A combination of vertical and horizontal segmentation may also be used , Thus, the original database is divided into a database which can be expanded infinitely like a matrix (server) array .
It should be noted that ： When vertical and horizontal segmentation are performed at the same time , There will be subtle changes in the segmentation strategy . such as ： When only vertical segmentation is considered , Any association can be maintained between tables that are divided together , So you can press “ functional module ” Partition table , But once horizontal segmentation is introduced , The relationship between tables will be greatly restricted , Usually only one primary table is allowed （ According to the table ID Hash table ） And its multiple secondary tables , in other words ： When vertical and horizontal segmentation are performed at the same time , Segmentation in the vertical direction will no longer use “ functional module ” Divide , It requires more fine-grained vertical segmentation , And this granularity is related to domain driven design “ polymerization ” The concept coincides , It can even be said that it is completely consistent , each shard The main table of is exactly the aggregation root in an aggregation ! In this way, you will find that the database is too fragmented （shard There will be more , however shard There are not many watches in it ）, To avoid managing too many data sources , Make full use of the resources of each database server , We can consider making the business similar , And it has similar data growth rate （ The data quantity of the main table is in the same order of magnitude ） Two or more of shard Put it in the same data source , each shard Still independent , They have their own master tables , And use their respective master tables ID Hash , The difference is only their hash moduli （ The number of nodes ） Must be consistent .
Common middleware of database and table
Easy to use components ：
* Dangdang sharding-jdbc <https://github.com/dangdangdotcom/sharding-jdbc>
* Mushroom Street TSharding <https://github.com/baihui212/tsharding>
Powerful and heavyweight Middleware ：
* sharding <https://github.com/go-pg/sharding>
* TDDL Smart Client The way （ TaoBao ） <https://github.com/alibaba/tb_tddl>
* Atlas(Qihoo 360) <https://github.com/Qihoo360/Atlas>
* alibaba.cobar( It's Alibaba （B2B） Department development ) <https://github.com/alibaba/cobar>
* MyCAT（ Based on Alibaba open source Cobar Product development ） <http://www.mycat.org.cn/>
* Oceanus(58 Middleware of local database ) <https://github.com/58code/Oceanus>
* OneProxy( Development of Alipay's chief architect, Fang Xin )
* vitess（ Google database development based on Middleware ） <https://github.com/youtube/vitess>
Problems to be solved in sub database and sub table
1, Business issues
At present, there are two feasible solutions to solve the transaction problem ： Distributed transaction and the realization of transaction through the joint control of application program and database. Here is a simple comparison between the two schemes .
* Scheme 1 ： Using distributed transactions
* advantage ： Management by database , Simple and effective
* shortcoming ： High performance cost , especially shard More and more
* Scheme 2 ： Controlled by application and database
* principle ： Split a distributed transaction across multiple databases into multiple locations Small transactions on a single database , And through the application program to control Every little thing .
* advantage ： It has advantages in performance
* shortcoming ： It needs flexible design of application program in transaction control . If used 了 spring <http://lib.csdn.net/base/javaee>
Transaction management of , It will be difficult to change .
2, Cross node Join The problem of
As long as it's segmentation , Cross node Join The problem is inevitable . But good design and segmentation can reduce this kind of situation . The common way to solve this problem is to implement the query twice . In the result set of the first query, find the id, According to these id Initiate a second request to get the associated data .
3, Cross node count,order by,group by And aggregate function
These are a class of problems , Because they all need to be calculated based on the entire data set . Most agents do not automatically handle the merge . Solution ： And solve cross node join The problem is similar , The results are obtained on each node and merged on the application side . and join The difference is that the queries of each node can be executed in parallel , So it's a lot faster than a single big meter . But if the result set is large , Consumption of application memory is a problem .
4, data migration , Capacity planning , Expansion and other issues
From Taobao integrated business platform team , It uses the right 2 The multiple remainder of is forward compatible （ If yes 4 Take the surplus 1 Number pairs of 2 So is the surplus 1） To allocate data , Data migration at the row level is avoided , But it still needs to be done
Table level migration , At the same time, the scale of expansion and the number of sub tables are limited . Generally speaking , These plans are not very ideal , More or less, there are some shortcomings , This also reflects from one side Sharding The difficulty of expansion .
reference resources ： [ About distributed transactions , Two stage submission , One stage submission ,Best Efforts
1PC Research on pattern and transaction compensation mechanism ](http://blog.csdn.net/bluishglc/article/details/7612811)
* Based on two-stage submission , To maximize the guarantee of cross database operations “ Atomicity ”, It is the most strict transaction implementation mode in distributed system .
* Easy to implement , Small workload . Because most application servers and some independent distributed transaction coordinators do a lot of encapsulation work , The difficulty and workload of introducing distributed transaction into the project can be ignored .
system “ level ” Shrinking enemy . Distributed transactions based on two-phase commit need to coordinate among multiple nodes when they commit transactions , Maximum delay in the point in time when the transaction is committed , The execution time of transaction is prolonged objectively , This will increase the probability of transaction collision and deadlock when accessing shared resources , With the increase of database nodes , This trend will become more and more serious , Thus, the system can scale horizontally on the database level " chains ",
It's a lot Sharding The main reasons why the system does not adopt distributed transaction .
be based on Best Efforts 1PC Transaction of pattern
reference resources spring-data-neo4j
Implementation of . Whereas Best Efforts 1PC Performance advantages of patterns , And a relatively simple implementation , It is by most sharding Framework and project adoption
Business compensation （ Power equivalence ）
For those with high performance requirements , But the system with low requirement for consistency , The real-time consistency of the system is not always required , As long as the final consistency is achieved within a allowed time period , This makes the transaction compensation mechanism a feasible scheme . Transaction compensation mechanism was first proposed in the “ Long business ” Is being processed , But it also has a good reference for the consistency of distributed system . In a nutshell , It is different from the way that a transaction is rolled back immediately after an error occurs during execution , Transaction compensation is a measure to check and remedy afterwards , It only expects to get the final consistent result in a permissible time period . The implementation of transaction compensation is closely related to system business , There is no standard treatment . Some common implementations are ： Check the data reconciliation ; Log based comparison ; Periodic synchronization with standard data sources , wait .
Once the database is partitioned into multiple physical nodes , We will no longer rely on the database's own primary key generation mechanism . one side , A partitioned database is self generated ID There is no guarantee that it is unique globally ; on the other hand , The application needs to obtain the data before inserting it ID, In order to SQL route .
Some common primary key generation strategies
use UUID Primary key is the simplest solution , But the disadvantages are also very obvious . because UUID Very long , In addition to taking up a lot of storage space , The main problem is in the index , There are performance problems in both indexing and indexing based queries .
Combined with database maintenance Sequence surface
The idea of this scheme is also very simple , Create one in the database Sequence surface , The structure of the table is similar to ：
CREATE TABLE `SEQUENCE` ( `table_name` varchar(18) NOT NULL, `nextid` bigint(
20)NOT NULL, PRIMARY KEY (`table_name`) ) ENGINE=InnoDB
Whenever a new record needs to be generated for a table ID From then on Sequence Take the corresponding table from the table nextid, And will nextid Value plus 1 Update to the database for next use . This scheme is also simple , But the disadvantages are equally obvious ： Because all inserts require access to the table , This table can easily become a system performance bottleneck , At the same time, it also has a single point of problem , Once the table database fails , The entire application will not work . It has been proposed to use Master-Slave Master slave synchronization , But it can only solve a single point , It can't solve the problem that the read-write ratio is 1:1 Access pressure of .
Twitter Distributed auto increment ID algorithm Snowflake
In a distributed system , Need to generate global UID There are many occasions ,twitter Of snowflake This kind of demand has been solved , The implementation is still very simple , Remove configuration information , The core code is millisecond time 41 position
machine ID 10 position Millisecond sequence 12 position .
* 10---0000000000 0000000000 0000000000 0000000000 0 --- 00000 ---00000
In the string above , The first digit is not used （ In fact, it can also be used as long Sign bit of ）, Next 41 Bits are millisecond time , then 5 position datacenter Identification bit ,5 Bit machine ID（ It's not an identifier , It's actually identifying the thread ）, then 12 Bit count of the current millisecond in that millisecond , It just adds up 64 position , For one Long type .
The advantage of this is that , As a whole, it is sorted according to the time increment , And it will not be generated in the whole distributed system ID collision （ from datacenter And machines ID Make a distinction ）, And the efficiency is high , Tested ,snowflake Can be generated per second 26 ten thousand ID about , Fully meet the needs .
7, Sorting paging across tiles
Generally speaking , Paging needs to be sorted according to the specified fields . When the sort field is a fragment field , We can easily locate the specified fragment through the fragmentation rules , When the sort field is not a fragment field , It's going to get more complicated . For the accuracy of the final results , We need to sort and return the data in different fragment nodes , The result sets returned by different partitions are summarized and sorted again , Finally, it is returned to the user . As shown in the figure below ：
The picture above is just the simplest case （ Take the first page of data ）, It doesn't seem to have much effect on performance . however , If you want to remove the 10 Page data , It's going to be a lot more complicated , As shown in the figure below ：
Some readers may not quite understand , Why can't it be as simple as getting the first page of data （ Before sorting out 10 Re merging , sort ）. It's not hard to understand , Because the data in each partition node may be random , For the accuracy of sorting , Must put all partition nodes before N Page data are sorted and merged , Finally, we sort the whole system . Obviously , This kind of operation consumes more resources , The user turns back , The worse the system performance will be .
How to solve the paging problem in the case of sub database ? There are several ways ：
If it is in the foreground application, provide paging , Users can only see the front n page , This restriction is also reasonable in business , Generally, the pagination after the meaning is not big （ If you have to see it , Users can be asked to narrow down the scope to re query ）.
If it is a background batch task, it is required to obtain data in batches , Can be increased page size, Like every acquisition 5000 Records , Effectively reduce the number of page breaks （ Of course, offline access to the general standby database , Avoid impacting the main warehouse ）.
Sub database design , Generally, there are supporting big data platforms to collect all the records of sub databases , Some paging queries can be considered on the big data platform .
8, Sub database strategy
After the sub database dimension is determined , How to divide the records into different databases ?
There are generally two ways ：
* According to the value range , For example, users Id by 1-9999 The records of are divided into the first library ,10000-20000 Points of 到第二个库,以此类推.
* 根据数值取模,比如用户Id mod n,余数为0的记录放到第一个库,余数为1的放到第二个库,以此类推.