Sharding The basic idea is to put a database <>
Cut into multiple parts and put them into different databases (server) upper , So as to alleviate the performance problem of single database . Not strictly speaking , For massive data databases , If there are more tables and more data , At this time, it is suitable to use vertical segmentation , That is, close ties ( The first mock exam. ) The table is sliced out and placed in a server upper . If there are not many tables , But each table has a lot of data , This is suitable for horizontal segmentation , In other words, the data in the table are arranged according to certain rules ( For example, press ID hash ) Shard to multiple databases (server) upper . of course , In reality, these two situations are more mixed together , At this time, we need to make a choice according to the actual situation , A combination of vertical and horizontal segmentation may also be used , Thus, the original database is divided into a database which can be expanded infinitely like a matrix (server) array .

It should be noted that : When vertical and horizontal segmentation are performed at the same time , There will be subtle changes in the segmentation strategy . such as : When only vertical segmentation is considered , Any association can be maintained between tables that are divided together , So you can press “ functional module ” Partition table , But once horizontal segmentation is introduced , The relationship between tables will be greatly restricted , Usually only one primary table is allowed ( According to the table ID Hash table ) And its multiple secondary tables , in other words : When vertical and horizontal segmentation are performed at the same time , Segmentation in the vertical direction will no longer use “ functional module ” Divide , It requires more fine-grained vertical segmentation , And this granularity is related to domain driven design “ polymerization ” The concept coincides , It can even be said that it is completely consistent , each shard The main table of is exactly the aggregation root in an aggregation ! In this way, you will find that the database is too fragmented (shard There will be more , however shard There are not many watches in it ), To avoid managing too many data sources , Make full use of the resources of each database server , We can consider making the business similar , And it has similar data growth rate ( The data quantity of the main table is in the same order of magnitude ) Two or more of shard Put it in the same data source , each shard Still independent , They have their own master tables , And use their respective master tables ID Hash , The difference is only their hash moduli ( The number of nodes ) Must be consistent .

Common middleware of database and table

Easy to use components :

* Dangdang sharding-jdbc <>
* Mushroom Street TSharding <>
Powerful and heavyweight Middleware :

* sharding <>
* TDDL Smart Client The way ( TaoBao ) <>
* Atlas(Qihoo 360) <>
* alibaba.cobar( It's Alibaba (B2B) Department development ) <>
* MyCAT( Based on Alibaba open source Cobar Product development ) <>
* Oceanus(58 Middleware of local database ) <>
* OneProxy( Development of Alipay's chief architect, Fang Xin )
* vitess( Google database development based on Middleware ) <>
Problems to be solved in sub database and sub table

1, Business issues

At present, there are two feasible solutions to solve the transaction problem : Distributed transaction and the realization of transaction through the joint control of application program and database. Here is a simple comparison between the two schemes .

* Scheme 1 : Using distributed transactions
* advantage : Management by database , Simple and effective
* shortcoming : High performance cost , especially shard More and more
* Scheme 2 : Controlled by application and database
* principle : Split a distributed transaction across multiple databases into multiple locations Small transactions on a single database , And through the application program to control Every little thing .
* advantage : It has advantages in performance
* shortcoming : It needs flexible design of application program in transaction control . If used 了 spring <>
Transaction management of , It will be difficult to change .
2, Cross node Join The problem of

As long as it's segmentation , Cross node Join The problem is inevitable . But good design and segmentation can reduce this kind of situation . The common way to solve this problem is to implement the query twice . In the result set of the first query, find the id, According to these id Initiate a second request to get the associated data .

3, Cross node count,order by,group by And aggregate function

These are a class of problems , Because they all need to be calculated based on the entire data set . Most agents do not automatically handle the merge . Solution : And solve cross node join The problem is similar , The results are obtained on each node and merged on the application side . and join The difference is that the queries of each node can be executed in parallel , So it's a lot faster than a single big meter . But if the result set is large , Consumption of application memory is a problem .

4, data migration , Capacity planning , Expansion and other issues

From Taobao integrated business platform team , It uses the right 2 The multiple remainder of is forward compatible ( If yes 4 Take the surplus 1 Number pairs of 2 So is the surplus 1) To allocate data , Data migration at the row level is avoided , But it still needs to be done
Table level migration , At the same time, the scale of expansion and the number of sub tables are limited . Generally speaking , These plans are not very ideal , More or less, there are some shortcomings , This also reflects from one side Sharding The difficulty of expansion .

5, affair

Distributed transaction
reference resources : [ About distributed transactions , Two stage submission , One stage submission ,Best Efforts
1PC Research on pattern and transaction compensation mechanism ](
* advantage
* Based on two-stage submission , To maximize the guarantee of cross database operations “ Atomicity ”, It is the most strict transaction implementation mode in distributed system .
* Easy to implement , Small workload . Because most application servers and some independent distributed transaction coordinators do a lot of encapsulation work , The difficulty and workload of introducing distributed transaction into the project can be ignored .
* shortcoming
system “ level ” Shrinking enemy . Distributed transactions based on two-phase commit need to coordinate among multiple nodes when they commit transactions , Maximum delay in the point in time when the transaction is committed , The execution time of transaction is prolonged objectively , This will increase the probability of transaction collision and deadlock when accessing shared resources , With the increase of database nodes , This trend will become more and more serious , Thus, the system can scale horizontally on the database level " chains ",
It's a lot Sharding The main reasons why the system does not adopt distributed transaction .
be based on Best Efforts 1PC Transaction of pattern

reference resources spring-data-neo4j
Implementation of . Whereas Best Efforts 1PC Performance advantages of patterns , And a relatively simple implementation , It is by most sharding Framework and project adoption

Business compensation ( Power equivalence )

For those with high performance requirements , But the system with low requirement for consistency , The real-time consistency of the system is not always required , As long as the final consistency is achieved within a allowed time period , This makes the transaction compensation mechanism a feasible scheme . Transaction compensation mechanism was first proposed in the “ Long business ” Is being processed , But it also has a good reference for the consistency of distributed system . In a nutshell , It is different from the way that a transaction is rolled back immediately after an error occurs during execution , Transaction compensation is a measure to check and remedy afterwards , It only expects to get the final consistent result in a permissible time period . The implementation of transaction compensation is closely related to system business , There is no standard treatment . Some common implementations are : Check the data reconciliation ; Log based comparison ; Periodic synchronization with standard data sources , wait .

6,ID problem

Once the database is partitioned into multiple physical nodes , We will no longer rely on the database's own primary key generation mechanism . one side , A partitioned database is self generated ID There is no guarantee that it is unique globally ; on the other hand , The application needs to obtain the data before inserting it ID, In order to SQL route .
Some common primary key generation strategies


use UUID Primary key is the simplest solution , But the disadvantages are also very obvious . because UUID Very long , In addition to taking up a lot of storage space , The main problem is in the index , There are performance problems in both indexing and indexing based queries .

Combined with database maintenance Sequence surface

The idea of this scheme is also very simple , Create one in the database Sequence surface , The structure of the table is similar to :
CREATE TABLE `SEQUENCE` ( `table_name` varchar(18) NOT NULL, `nextid` bigint(
20)NOT NULL, PRIMARY KEY (`table_name`) ) ENGINE=InnoDB

Whenever a new record needs to be generated for a table ID From then on Sequence Take the corresponding table from the table nextid, And will nextid Value plus 1 Update to the database for next use . This scheme is also simple , But the disadvantages are equally obvious : Because all inserts require access to the table , This table can easily become a system performance bottleneck , At the same time, it also has a single point of problem , Once the table database fails , The entire application will not work . It has been proposed to use Master-Slave Master slave synchronization , But it can only solve a single point , It can't solve the problem that the read-write ratio is 1:1 Access pressure of .

Twitter Distributed auto increment ID algorithm Snowflake

In a distributed system , Need to generate global UID There are many occasions ,twitter Of snowflake This kind of demand has been solved , The implementation is still very simple , Remove configuration information , The core code is millisecond time 41 position
machine ID 10 position Millisecond sequence 12 position .
* 10---0000000000 0000000000 0000000000 0000000000 0 --- 00000 ---00000

In the string above , The first digit is not used ( In fact, it can also be used as long Sign bit of ), Next 41 Bits are millisecond time , then 5 position datacenter Identification bit ,5 Bit machine ID( It's not an identifier , It's actually identifying the thread ), then 12 Bit count of the current millisecond in that millisecond , It just adds up 64 position , For one Long type .

The advantage of this is that , As a whole, it is sorted according to the time increment , And it will not be generated in the whole distributed system ID collision ( from datacenter And machines ID Make a distinction ), And the efficiency is high , Tested ,snowflake Can be generated per second 26 ten thousand ID about , Fully meet the needs .

7, Sorting paging across tiles

Generally speaking , Paging needs to be sorted according to the specified fields . When the sort field is a fragment field , We can easily locate the specified fragment through the fragmentation rules , When the sort field is not a fragment field , It's going to get more complicated . For the accuracy of the final results , We need to sort and return the data in different fragment nodes , The result sets returned by different partitions are summarized and sorted again , Finally, it is returned to the user . As shown in the figure below :


The picture above is just the simplest case ( Take the first page of data ), It doesn't seem to have much effect on performance . however , If you want to remove the 10 Page data , It's going to be a lot more complicated , As shown in the figure below :


Some readers may not quite understand , Why can't it be as simple as getting the first page of data ( Before sorting out 10 Re merging , sort ). It's not hard to understand , Because the data in each partition node may be random , For the accuracy of sorting , Must put all partition nodes before N Page data are sorted and merged , Finally, we sort the whole system . Obviously , This kind of operation consumes more resources , The user turns back , The worse the system performance will be .
How to solve the paging problem in the case of sub database ? There are several ways :

If it is in the foreground application, provide paging , Users can only see the front n page , This restriction is also reasonable in business , Generally, the pagination after the meaning is not big ( If you have to see it , Users can be asked to narrow down the scope to re query ).

If it is a background batch task, it is required to obtain data in batches , Can be increased page size, Like every acquisition 5000 Records , Effectively reduce the number of page breaks ( Of course, offline access to the general standby database , Avoid impacting the main warehouse ).

Sub database design , Generally, there are supporting big data platforms to collect all the records of sub databases , Some paging queries can be considered on the big data platform .

8, Sub database strategy

After the sub database dimension is determined , How to divide the records into different databases ?
There are generally two ways :

* According to the value range , For example, users Id by 1-9999 The records of are divided into the first library ,10000-20000 Points of 到第二个库,以此类推.
* 根据数值取模,比如用户Id mod n,余数为0的记录放到第一个库,余数为1的放到第二个库,以此类推.












目前市面上的分库分表中间件相对较多,其中基于代理方式的有MySQL Proxy和Amoeba,基于Hibernate框架的是Hibernate
Shards,基于jdbc的有当当sharding-jdbc <>
,基于mybatis的类似maven插件式的有蘑菇街的蘑菇街TSharding <>
,通过重写spring的ibatis template类是Cobar