Sharding The basic idea is to put a database <>
Cut into multiple parts and put them into different databases(server) upper, To alleviate the performance problems of a single database. Not strictly, For databases with massive data, If there are more tables and more data, In this case, vertical segmentation is suitable, I.e. close relationship( For example, the same module.) The tables are cut out and put in oneserver upper. If there are not many tables, But each table has a lot of data, This time is suitable for horizontal segmentation, That is to say, the data of the table should follow some rules( For example, pressID hash) Split to multiple databases(server) upper. Of course, In reality, these two situations are more mixed, At this time, we need to make a choice according to the actual situation, Vertical and horizontal segmentation may also be used in combination, Thus, the original database can be divided into databases that can be infinitely expanded like a matrix(server) array.

In particular: When vertical and horizontal segmentation are performed at the same time, There are subtle changes in segmentation strategies. such as: When only vertical segmentation is considered, Any association relationship can be maintained between tables divided together, So you can press“ functional module” Partition table, But once horizontal segmentation is introduced, The relationship between tables will be greatly restricted, Usually only one primary table is allowed( With this tableID Hash table) Keep association relationship with its multiple secondary tables, In other words: When vertical and horizontal segmentation are performed at the same time, The cut in the vertical direction will no longer be“ functional module” Divide, It requires more fine-grained vertical segmentation, And this granularity and Domain Driven Design“ polymerization” The concept coincides, It's even exactly the same, eachshard The main table of is exactly the aggregation root in an aggregation! So if you slice it, you will find that the database is too fragmented(shard There will be more, howevershard There are not many watches in it), To avoid managing too many data sources, Make full use of the resources of each database server, Consider making the business similar, And has similar data growth rate( Data quantity of main table is in the same order of magnitude) Two or more ofshard Put it in the same data source, eachshard Still independent, They have their own master tables, And use their own master tablesID Hashing, The only difference is their hash modulus( I.e. number of nodes) Must be consistent.

Common middleware of database and table

Easy to use components:

* Dangdangsharding-jdbc <>
* Mogujie.comTSharding <>
Powerful and heavyweight Middleware:

* sharding <>
* TDDL Smart Client Way( TaoBao) <>
* Atlas(Qihoo 360) <>
* alibaba.cobar( It's Alibaba(B2B) Department development) <>
* MyCAT( Open source based on AlibabaCobar Product development) <>
* Oceanus(58 City database middleware) <>
* OneProxy( Development of Alipay's chief architect, Fang Xin)
* vitess( Database middleware developed by Google) <>
Problems to be solved in sub database and sub table

1, Transaction problem

There are two feasible solutions to the transaction problem: Distributed transaction and transaction control by application and database.

* Scheme 1: Using distributed transactions
* Advantage: Database management, Simple and effective
* shortcoming: High performance cost, Especiallyshard More and more
* Option two: Jointly controlled by application and database
* principle: Split a distributed transaction across multiple databases into multiple locations Small transactions on a single database, And through the application to control Small business.
* Advantage: Performance advantages
* shortcoming: Need flexible design of application program on transaction control. If used 了spring <>
Transaction management of, It will be difficult to change.
2, Cross nodeJoin Problem

As long as it's segmentation, Cross nodeJoin The problem is inevitable. But good design and segmentation can reduce this kind of situation. The common way to solve this problem is to implement two queries. Find out the associated data in the result set of the first queryid, According to theseid Initiate the second request to get the associated data.

3, Cross nodecount,order by,group by And aggregate function problems

These are questions, Because they all need to be calculated based on the entire data set. Most agents do not automatically process the merge. Solution: And solve cross nodejoin Similar problems, After getting the results on each node, merge them on the application side. andjoin The difference is that the query of each node can be executed in parallel, So many times it's much faster than a single large watch. But if the result set is large, Consumption of application memory is a problem.

4, data migration, Capacity planning, Expansion and other issues

From Taobao integrated business platform team, It takes advantage of2 Multiple redundancy of is forward compatible( If right4 Surplus earned1 The number of pairs2 The surplus is also.1) To allocate data, Avoid row level data migration, But it still needs to be done
Table level migration, At the same time, there are restrictions on the scale of expansion and the number of sub tables. Generally speaking, None of these plans is ideal, There are some disadvantages, more or less, This also reflects from one sideSharding Difficulty of expansion.

5, affair

Distributed transaction
Reference resources: [ About distributed transactions, Two phase submission, Phase I submission,Best Efforts
1PC Research on the model and transaction compensation mechanism](
* Advantage
* Based on two-phase commit, To maximize the cross database operation“ Atomicity”, It is the most strict transaction implementation method in distributed system.
* Implementation is simple, Small workload. Because most application servers and some independent distributed transaction coordinators do a lot of encapsulation work, The difficulty and workload of introducing distributed transaction into the project can be ignored.
* shortcoming
system“ level” Retractable enemy. Distributed transaction based on two-phase commit needs to coordinate among multiple nodes when committing transactions, Maximum delay in transaction commit time, Objectively extend the execution time of the transaction, This will increase the probability of conflicts and deadlocks when transactions access shared resources, With the increase of database nodes, This trend will become more and more serious, So that the system can scale horizontally at the database level" chains",
This is a lot.Sharding The main reason why the system does not use distributed transaction.
Be based onBest Efforts 1PC Transactions of pattern

Reference resourcesspring-data-neo4j
Implementation. WhereasBest Efforts 1PC Performance advantages of patterns, And relatively simple implementation, It's beensharding Framework and project adoption

Transaction compensation( Power equivalence)

For those with high performance requirements, Systems with low requirements for consistency, The real-time consistency of the system is often not critical, As long as the final consistency is achieved within an allowed time period, This makes the transaction compensation mechanism a feasible scheme. Transaction compensation mechanism was originally proposed in“ Long transaction” In processing, But it also has a good reference for distributed system to ensure consistency. Generally speaking, Different from rolling back the transaction immediately after an error occurs during execution, Transaction compensation is a measure to check and remedy afterwards, It only expects to get the final consistent result in an allowable time period. The implementation of transaction compensation is closely related to system business, There's no standard way to deal with it. Some common implementations are: Reconciliation check of data; Log based comparison; Synchronize with standard data sources on a regular basis, Wait.

6,ID problem

Once the database is segmented to multiple physical nodes, We can no longer rely on the primary key generation mechanism of the database itself. One side, A partition database generated by itselfID There's no guarantee that it's unique globally; On the other hand, Applications need to get data before inserting itID, In order to carry outSQL Route.
Some common primary key generation strategies


UseUUID Primary key is the simplest solution, But the disadvantages are also very obvious. BecauseUUID Very long, In addition to taking up a lot of storage space, The main problem is index, There are performance problems in indexing and index based queries.

Maintenance of aSequence surface

The idea of this scheme is also very simple, Create aSequence surface, The structure of the table is similar to:
CREATE TABLE `SEQUENCE` ( `table_name` varchar(18) NOT NULL, `nextid` bigint(
20)NOT NULL, PRIMARY KEY (`table_name`) ) ENGINE=InnoDB

Whenever you need to generate a new record for a tableID From time to timeSequence Take out thenextid, And willnextid Value added1 Update to database for next use. This scheme is also simple, But the disadvantages are also obvious: Because all inserts require access to the table, This table can easily become the bottleneck of system performance, At the same time, it also has a single point of problem, Once the table database fails, The entire application will not work. Someone proposed to useMaster-Slave Perform master-slave synchronization, But it can only solve a single problem, It doesn't solve the problem that the read-write ratio is1:1 Access pressure issues for.

Twitter Distributed auto increment ofID algorithmSnowflake

In distributed system, Global generation requiredUID There are many occasions,twitter Ofsnowflake It solves this need, The implementation is also very simple, Remove configuration information, The core code is millisecond time41 position
machineID 10 position Sequence in milliseconds12 position.
* 10---0000000000 0000000000 0000000000 0000000000 0 --- 00000 ---00000

In the string above, The first is not used( In fact, it can also be used aslong Symbol bit), Next41 Bit time in milliseconds, Then?5 positiondatacenter Identification bit,5 Bit machineID( Not an identifier, It is actually a thread identification), Then?12 Bit count of the current millisecond in that millisecond, It just adds up64 position, For oneLong type.

The advantage is, In general, it is sorted according to the increasing time, And the whole distributed system will not generateID collision( fromdatacenter And machineID Distinguish), And high efficiency, Tested,snowflake Can generate per second26 ten thousandID About, Fully meet the needs.

7, Sorting and paging across slices

Generally speaking, Paging needs to be sorted according to the specified fields. When the sorting field is a fragment field, We can easily locate the specified partition through the partition rule, When sorting fields is not fragmented, It's going to get more complicated. For the accuracy of the final result, We need to sort and return data in different sharding nodes, And summarize and reorder the result sets returned by different segments, Finally, return to the user. As shown in the figure below:


This is the simplest case( Take the data on the first page), Seems to have little impact on Performance. however, If you want to remove the10 Page data, It's going to be a lot more complicated, As shown in the figure below:


Some readers may not understand, Why can't it be as simple as getting the first page of data( Before sorting out10 Strip merging, sort). It's not hard to understand, Because the data in each partition node may be random, For the accuracy of sorting, All the nodes must be divided intoN Merge all page data after sorting, Finally, the overall sorting is carried out. Obviously, This kind of operation consumes resources, The user page back, System performance will be worse.
How to solve the paging problem in the case of sub database? There are several ways:

Provide paging in the foreground application, The user can only see the frontn page, This restriction is also reasonable in business, Generally speaking, it doesn't make much sense to look at the following pages( If you have to see it, The user can be asked to narrow down and re query).

In case of background batch task, batch data acquisition is required, Can be increasedpage size, For example, every acquisition5000 Bar record, Effectively reduce the number of pages( Of course, offline access to the general standby database, Avoid impact on main warehouse).

When designing the sub base, Generally, there are supporting big data platforms to summarize the records of all sub databases, Some paging queries can consider big data platform.

8, Sub library strategy

After the sub base dimension is determined, How to divide the records into different libraries?
There are generally two ways:

* According to the range of values, Such as usersId by1-9999 The records of are divided into the first library,10000-20000 Points到第二个库,以此类推.
* 根据数值取模,比如用户Id mod n,余数为0的记录放到第一个库,余数为1的放到第二个库,以此类推.












目前市面上的分库分表中间件相对较多,其中基于代理方式的有MySQL Proxy和Amoeba,基于Hibernate框架的是Hibernate
Shards,基于jdbc的有当当sharding-jdbc <>
,基于mybatis的类似maven插件式的有蘑菇街的蘑菇街TSharding <>
,通过重写spring的ibatis template类是Cobar