Sharding The basic idea is to put a database <http://lib.csdn.net/base/mysql>
Cut into multiple parts and put them into different databases (server) upper , So as to alleviate the performance problem of single database . Not strictly speaking , For massive data databases , If there are more tables and more data , At this time, it is suitable to use vertical segmentation , That is, close ties ( The first mock exam. ) The table is sliced out and placed in a server upper . If there are not many tables , But each table has a lot of data , This is suitable for horizontal segmentation , In other words, the data in the table are arranged according to certain rules ( For example, press ID hash ) Shard to multiple databases (server) upper . of course , In reality, these two situations are more mixed together , At this time, we need to make a choice according to the actual situation , A combination of vertical and horizontal segmentation may also be used , Thus, the original database is divided into a database which can be expanded infinitely like a matrix (server) array .


It should be noted that : When vertical and horizontal segmentation are performed at the same time , There will be subtle changes in the segmentation strategy . such as : When only vertical segmentation is considered , Any association can be maintained between tables that are divided together , So you can press “ functional module ” Partition table , But once horizontal segmentation is introduced , The relationship between tables will be greatly restricted , Usually only one primary table is allowed ( According to the table ID Hash table ) And its multiple secondary tables , in other words : When vertical and horizontal segmentation are performed at the same time , Segmentation in the vertical direction will no longer use “ functional module ” Divide , It requires more fine-grained vertical segmentation , And this granularity is related to domain driven design “ polymerization ” The concept coincides , It can even be said that it is completely consistent , each shard The main table of is exactly the aggregation root in an aggregation ! In this way, you will find that the database is too fragmented (shard There will be more , however shard There are not many watches in it ), To avoid managing too many data sources , Make full use of the resources of each database server , We can consider making the business similar , And it has similar data growth rate ( The data quantity of the main table is in the same order of magnitude ) Two or more of shard Put it in the same data source , each shard Still independent , They have their own master tables , And use their respective master tables ID Hash , The difference is only their hash moduli ( The number of nodes ) Must be consistent .

Common middleware of database and table

Easy to use components :

* Dangdang sharding-jdbc <https://github.com/dangdangdotcom/sharding-jdbc>
* Mushroom Street TSharding <https://github.com/baihui212/tsharding>
Powerful and heavyweight Middleware :

* sharding <https://github.com/go-pg/sharding>
* TDDL Smart Client The way ( TaoBao ) <https://github.com/alibaba/tb_tddl>
* Atlas(Qihoo 360) <https://github.com/Qihoo360/Atlas>
* alibaba.cobar( It's Alibaba (B2B) Department development ) <https://github.com/alibaba/cobar>
* MyCAT( Based on Alibaba open source Cobar Product development ) <http://www.mycat.org.cn/>
* Oceanus(58 Middleware of local database ) <https://github.com/58code/Oceanus>
* OneProxy( Development of Alipay's chief architect, Fang Xin )
<http://www.cnblogs.com/youge-OneSQL/articles/4208583.html>
* vitess( Google database development based on Middleware ) <https://github.com/youtube/vitess>
Problems to be solved in sub database and sub table

1, Business issues

At present, there are two feasible solutions to solve the transaction problem : Distributed transaction and the realization of transaction through the joint control of application program and database. Here is a simple comparison between the two schemes .

* Scheme 1 : Using distributed transactions
* advantage : Management by database , Simple and effective
* shortcoming : High performance cost , especially shard More and more
* Scheme 2 : Controlled by application and database
* principle : Split a distributed transaction across multiple databases into multiple locations Small transactions on a single database , And through the application program to control Every little thing .
* advantage : It has advantages in performance
* shortcoming : It needs flexible design of application program in transaction control . If used 了 spring <http://lib.csdn.net/base/javaee>
Transaction management of , It will be difficult to change .
2, Cross node Join The problem of


As long as it's segmentation , Cross node Join The problem is inevitable . But good design and segmentation can reduce this kind of situation . The common way to solve this problem is to implement the query twice . In the result set of the first query, find the id, According to these id Initiate a second request to get the associated data .

3, Cross node count,order by,group by And aggregate function


These are a class of problems , Because they all need to be calculated based on the entire data set . Most agents do not automatically handle the merge . Solution : And solve cross node join The problem is similar , The results are obtained on each node and merged on the application side . and join The difference is that the queries of each node can be executed in parallel , So it's a lot faster than a single big meter . But if the result set is large , Consumption of application memory is a problem .

4, data migration , Capacity planning , Expansion and other issues

From Taobao integrated business platform team , It uses the right 2 The multiple remainder of is forward compatible ( If yes 4 Take the surplus 1 Number pairs of 2 So is the surplus 1) To allocate data , Data migration at the row level is avoided , But it still needs to be done
Table level migration , At the same time, the scale of expansion and the number of sub tables are limited . Generally speaking , These plans are not very ideal , More or less, there are some shortcomings , This also reflects from one side Sharding The difficulty of expansion .

5, affair

Distributed transaction
reference resources : [ About distributed transactions , Two stage submission , One stage submission ,Best Efforts
1PC Research on pattern and transaction compensation mechanism ](http://blog.csdn.net/bluishglc/article/details/7612811)
* advantage
* Based on two-stage submission , To maximize the guarantee of cross database operations “ Atomicity ”, It is the most strict transaction implementation mode in distributed system .
* Easy to implement , Small workload . Because most application servers and some independent distributed transaction coordinators do a lot of encapsulation work , The difficulty and workload of introducing distributed transaction into the project can be ignored .
* shortcoming
*
system “ level ” Shrinking enemy . Distributed transactions based on two-phase commit need to coordinate among multiple nodes when they commit transactions , Maximum delay in the point in time when the transaction is committed , The execution time of transaction is prolonged objectively , This will increase the probability of transaction collision and deadlock when accessing shared resources , With the increase of database nodes , This trend will become more and more serious , Thus, the system can scale horizontally on the database level " chains ",
It's a lot Sharding The main reasons why the system does not adopt distributed transaction .
be based on Best Efforts 1PC Transaction of pattern

reference resources spring-data-neo4j
<https://github.com/SpringSource/spring-data-graph/blob/master/spring-data-neo4j/src/main/java/org/springframework/data/neo4j/transaction/ChainedTransactionManager.java>
Implementation of . Whereas Best Efforts 1PC Performance advantages of patterns , And a relatively simple implementation , It is by most sharding Framework and project adoption

Business compensation ( Power equivalence )


For those with high performance requirements , But the system with low requirement for consistency , The real-time consistency of the system is not always required , As long as the final consistency is achieved within a allowed time period , This makes the transaction compensation mechanism a feasible scheme . Transaction compensation mechanism was first proposed in the “ Long business ” Is being processed , But it also has a good reference for the consistency of distributed system . In a nutshell , It is different from the way that a transaction is rolled back immediately after an error occurs during execution , Transaction compensation is a measure to check and remedy afterwards , It only expects to get the final consistent result in a permissible time period . The implementation of transaction compensation is closely related to system business , There is no standard treatment . Some common implementations are : Check the data reconciliation ; Log based comparison ; Periodic synchronization with standard data sources , wait .

6,ID problem


Once the database is partitioned into multiple physical nodes , We will no longer rely on the database's own primary key generation mechanism . one side , A partitioned database is self generated ID There is no guarantee that it is unique globally ; on the other hand , The application needs to obtain the data before inserting it ID, In order to SQL route .
Some common primary key generation strategies

UUID


use UUID Primary key is the simplest solution , But the disadvantages are also very obvious . because UUID Very long , In addition to taking up a lot of storage space , The main problem is in the index , There are performance problems in both indexing and indexing based queries .

Combined with database maintenance Sequence surface

The idea of this scheme is also very simple , Create one in the database Sequence surface , The structure of the table is similar to :
CREATE TABLE `SEQUENCE` ( `table_name` varchar(18) NOT NULL, `nextid` bigint(
20)NOT NULL, PRIMARY KEY (`table_name`) ) ENGINE=InnoDB

Whenever a new record needs to be generated for a table ID From then on Sequence Take the corresponding table from the table nextid, And will nextid Value plus 1 Update to the database for next use . This scheme is also simple , But the disadvantages are equally obvious : Because all inserts require access to the table , This table can easily become a system performance bottleneck , At the same time, it also has a single point of problem , Once the table database fails , The entire application will not work . It has been proposed to use Master-Slave Master slave synchronization , But it can only solve a single point , It can't solve the problem that the read-write ratio is 1:1 Access pressure of .

Twitter Distributed auto increment ID algorithm Snowflake
<http://blog.sina.com.cn/s/blog_6b7c2e660102vbi2.html>


In a distributed system , Need to generate global UID There are many occasions ,twitter Of snowflake This kind of demand has been solved , The implementation is still very simple , Remove configuration information , The core code is millisecond time 41 position
machine ID 10 position Millisecond sequence 12 position .
* 10---0000000000 0000000000 0000000000 0000000000 0 --- 00000 ---00000
---000000000000

In the string above , The first digit is not used ( In fact, it can also be used as long Sign bit of ), Next 41 Bits are millisecond time , then 5 position datacenter Identification bit ,5 Bit machine ID( It's not an identifier , It's actually identifying the thread ), then 12 Bit count of the current millisecond in that millisecond , It just adds up 64 position , For one Long type .


The advantage of this is that , As a whole, it is sorted according to the time increment , And it will not be generated in the whole distributed system ID collision ( from datacenter And machines ID Make a distinction ), And the efficiency is high , Tested ,snowflake Can be generated per second 26 ten thousand ID about , Fully meet the needs .

7, Sorting paging across tiles


Generally speaking , Paging needs to be sorted according to the specified fields . When the sort field is a fragment field , We can easily locate the specified fragment through the fragmentation rules , When the sort field is not a fragment field , It's going to get more complicated . For the accuracy of the final results , We need to sort and return the data in different fragment nodes , The result sets returned by different partitions are summarized and sorted again , Finally, it is returned to the user . As shown in the figure below :

 

The picture above is just the simplest case ( Take the first page of data ), It doesn't seem to have much effect on performance . however , If you want to remove the 10 Page data , It's going to be a lot more complicated , As shown in the figure below :

 


Some readers may not quite understand , Why can't it be as simple as getting the first page of data ( Before sorting out 10 Re merging , sort ). It's not hard to understand , Because the data in each partition node may be random , For the accuracy of sorting , Must put all partition nodes before N Page data are sorted and merged , Finally, we sort the whole system . Obviously , This kind of operation consumes more resources , The user turns back , The worse the system performance will be .
How to solve the paging problem in the case of sub database ? There are several ways :

If it is in the foreground application, provide paging , Users can only see the front n page , This restriction is also reasonable in business , Generally, the pagination after the meaning is not big ( If you have to see it , Users can be asked to narrow down the scope to re query ).

If it is a background batch task, it is required to obtain data in batches , Can be increased page size, Like every acquisition 5000 Records , Effectively reduce the number of page breaks ( Of course, offline access to the general standby database , Avoid impacting the main warehouse ).

Sub database design , Generally, there are supporting big data platforms to collect all the records of sub databases , Some paging queries can be considered on the big data platform .

8, Sub database strategy

After the sub database dimension is determined , How to divide the records into different databases ?
There are generally two ways :

* According to the value range , For example, users Id by 1-9999 The records of are divided into the first library ,10000-20000 Points of 到第二个库,以此类推.
* 根据数值取模,比如用户Id mod n,余数为0的记录放到第一个库,余数为1的放到第二个库,以此类推.
优劣比较:
评价指标按照范围分库按照Mod分库
库数量前期数目比较小,可以随用户/业务按需增长前期即根据mode因子确定库数量,数目一般比较大
访问性能前期库数量小,全库查询消耗资源少,单库查询性能略差前期库数量大,全库查询消耗资源多,单库查询性能略好
调整库数量比较容易,一般只需为新用户增加库,老库拆分也只影响单个库困难,改变mod因子导致数据在所有库之间迁移
数据热点新旧用户购物频率有差异,有数据热点问题新旧用户均匀到分布到各个库,无热点

实践中,为了处理简单,选择mod分库的比较多.同时二次分库时,为了数据迁移方便,一般是按倍数增加,比如初始4个库,二次分裂为8个,再16个.这样对于某个库的数据,一半数据移到新库,剩余不动,对比每次只增加一个库,所有数据都要大规模变动.

补充下,mod分库一般每个库记录数比较均匀,但也有些数据库,存在超级Id,这些Id的记录远远超过其他Id,比如在广告场景下,某个大广告主的广告数可能占总体很大比例.如果按照广告主Id取模分库,某些库的记录数会特别多,对于这些超级Id,需要提供单独库来存储记录.

9,分库数量

分库数量首先和单库能处理的记录数有关,一般来说,Mysql
单库超过5000万条记录,Oracle单库超过1亿条记录,DB压力就很大(当然处理能力和字段数量/访问模式/记录长度有进一步关系).


在满足上述前提下,如果分库数量少,达不到分散存储和减轻DB性能压力的目的;如果分库的数量多,好处是每个库记录少,单库访问性能好,但对于跨多个库的访问,应用程序需要访问多个库,如果是并发模式,要消耗宝贵的线程资源;如果是串行模式,执行时间会急剧增加.

最后分库数量还直接影响硬件的投入,一般每个分库跑在单独物理机上,多一个库意味多一台设备.所以具体分多少个库,要综合评估,一般初次分库建议分4-8个库.

10,路由透明

分库从某种意义上来说,意味着DB
schema改变了,必然影响应用,但这种改变和业务无关,所以要尽量保证分库对应用代码透明,分库逻辑尽量在数据访问层处理.当然完全做到这一点很困难,具体哪些应该由DAL负责,哪些由应用负责,这里有一些建议:

对于单库访问,比如查询条件指定用户Id,则该SQL只需访问特定库.此时应该由DAL层自动路由到特定库,当库二次分裂时,也只要修改mod
因子,应用代码不受影响.

对于简单的多库查询,DAL负责汇总各个数据库返回的记录,此时仍对上层应用透明.

11,使用框架还是自主研发

目前市面上的分库分表中间件相对较多,其中基于代理方式的有MySQL Proxy和Amoeba,基于Hibernate框架的是Hibernate
Shards,基于jdbc的有当当sharding-jdbc <https://github.com/dangdangdotcom/sharding-jdbc>
,基于mybatis的类似maven插件式的有蘑菇街的蘑菇街TSharding <https://github.com/baihui212/tsharding>
,通过重写spring的ibatis template类是Cobar
Client,这些框架各有各的优势与短板,架构师可以在深入调研之后结合项目的实际情况进行选择,但是总的来说,我个人对于框架的选择是持谨慎态度的.一方面多数框架缺乏成功案例的验证,其成熟性与稳定性值得怀疑.另一方面,一些从成功商业产品开源出框架(如阿里和淘宝的一些开源项目)是否适合你的项目是需要架构师深入调研分析的.当然,最终的选择一定是基于项目特点,团队状况,技术门槛和学习成本等综合因素考量确定的.


作者:jackcooper
链接:http://www.jianshu.com/p/32b3e91aa22c
來源:简书
著作权归作者所有.商业转载请联系作者获得授权,非商业转载请注明出处.