This article is rsync Official recommended technical report rsync technical report
<https://rsync.samba.org/tech_report/tech_report.html>
Translation of , The main content is Rsync Algorithm principle and rsync How to implement these principles . In the process of translation , Add the translator's own notes to some incomprehensible places .

Collection of my translations :http://www.cnblogs.com/f-ck-need-u/p/7048359.html
<http://www.cnblogs.com/f-ck-need-u/p/7048359.html#mytranslations>

Here is rsync Series :
  1.rsync( One ): Basic commands and usage <http://www.cnblogs.com/f-ck-need-u/p/7220009.html>
  2.rsync( Two ):inotify+rsync Detailed description and sersync
<http://www.cnblogs.com/f-ck-need-u/p/7220193.html>
  3.rsync Algorithm principle and workflow analysis <http://www.cnblogs.com/f-ck-need-u/p/7226781.html>
  4.rsync Technical Report ( translate ) <http://www.cnblogs.com/f-ck-need-u/p/7220753.html>
  5.rsync Working mechanism ( translate ) <http://www.cnblogs.com/f-ck-need-u/p/7221535.html>
  6.man rsync translate (rsync Command manual in Chinese )
<http://www.cnblogs.com/f-ck-need-u/p/7221713.html>

Rsync algorithm

Andrew Tridgell Paul Mackerras 
Department of Computer Science 
Australian National University 
Canberra, ACT 0200, Australia

<>

1.1 abstract

This report describes an algorithm for updating files on one machine to be consistent with files on another machine . We assume low bandwidth between two machines , High latency two-way link for communication .
The algorithm calculates the consistent part of the source file and the target file ( translator's note : Consistent parts of data block ), Then send only those that don't match ( translator's note : That is to say, the inconsistent parts in the documents at both ends ) Part of
. actually , This algorithm calculates a series of differences between two files on two machines . If the two documents are similar , The efficiency of the algorithm is very high , But even though the two documents are very different , It can also ensure correct and efficient work .

<>

1.2 The problem

Suppose you have two files A and B, You want to update B Let it and A Exactly the same , The most direct way is to copy A Change to B.


But imagine , If the two files are connected and communicated between machines with extremely slow communication links , For example, using dial-up IP link . If A The file is big , Copy A reach B It's very slow . In order to speed up , You can A Send after compression , But this method can only obtain 20% reach 40% Promotion of .


Now suppose A and B Files are very similar , Maybe they are both separated from the same source file . For real speed , You need to take advantage of this similarity . A common approach is to send only over a link A and B Part of the difference between , Then reorganize the file according to the difference list ( translator's note : So you also need to create a list of differences , Send difference list , Finally, match the difference list and reorganize it ).


The problem with this general approach is , To create a collection of differences between two files , It relies on the ability to open and read two files . therefore , This method is used before reading two files , It is required to obtain the file from the other end of the link in advance . If you cannot get a file from the other end , This algorithm will fail ( But if you actually get the file from the other end , Both files will be on the same machine , At this time, you can copy directly , There is no need to compare the differences between these documents ).rsync That's what we're dealing with .


rsync The algorithm can efficiently calculate the same part of the source file and the target existing file ( translator's note : That is, the same data block can be matched ). These same parts do not need to be sent over the link ; All you need to do is reference these same parts of the target file to make matching references ( translator's note : Reference document basis
file). Only the unmatched parts of the source file will be byte by byte in a pure data way
Send out . after , The receiver can form a copy of the source file through the same existing parts and byte by byte data received .

generally speaking , The data sent to the receiver can be compressed and transmitted by any common compression algorithm , To further speed up .

<>

1.3 The rsync algorithm

Suppose we have two computers α and β,α There are accessible files on A,β There are accessible files on B, And A and B The two documents are similar .α and β Low speed link communication between .

rsync The algorithm consists of the following processes :

1.β File B Split into a series of non overlapping and fixed size S byte ( translator's note : The authors note that 500 reach 1000 Byte is more suitable ) Data block of . of course , Last block may be less than S byte .


2.β Two check codes are calculated for each such block :32 Weak check code of bit rolling-checksum and 128 Bit strong check code MD4-checksum( translator's note : current rsync Using 128 Bitwise MD5-checksum).

3.β Send these check codes to α.

4.α Will search for files A, Find out from it that all the lengths are S Bytes and B Two data blocks with the same check code in ( Search from any offset ). This search and comparison process can be performed by using the weak scrolling check (rolling
checksum) The special functions of .


( translator's note : In string 123456 take as an example , To search for inclusion 3 Character string , If you search all by any offset 3 Character length string , The most basic method is from 1 Start searching to get 123, from 2 Start searching to get 234, from 3 Start searching to get 345, Until the search is complete . That's what any offset means , That is to say, the search length from any position is S Data block of )


( Translator's note again : Why to search at any offset , Consider a situation , There are two identical files A and B, Now to A Insert a piece of data in the middle of the file , that A From this data , The offsets of all data blocks immediately following are shifted back a certain length , If you do not search for a fixed length block at any offset , Then start with the newly inserted data , All data blocks and B dissimilarity , stay rsync Which means that all these data blocks need to be transmitted , But actually A and B Different data blocks are only inserted in the middle segment )


5.α Send a series of instructions to β To construct A Copies of documents . Each instruction is either right B References to data blocks in , Or pure data . The pure data is A Cannot match to B Data block part of data block in ( translator's note : namely A Different from B Data block of ).


The end result is β Got it A Copy of , But only sent A Exists in B Data part not present in ( It also includes some check codes and data block index data ). The algorithm only requires one link round trip ( translator's note : The first link communication is β Send check code to α, The second communication is α Send instruction , Check code , Index and A Exists in B Pure data that does not exist in β), This minimizes the impact of link latency .

The most important detail of the algorithm is rolling check (rolling
checksum) And many alternatives related to it (multi-alternate) Search mechanism , They guarantee all offset checks (all-offsets
checksum, I.e. the above steps 4 Search data block from any offset position in ) The search process can be handled very quickly . They are discussed in more detail below .

( translator's note : If you don't want to see algorithm theory , The following algorithm can be skipped ( Let's see the final conclusion ), Just make sense of it rsync What's done in the algorithm is enough .)

<>

1.4 Rolling checksum

rsync Weak roll check used in the algorithm (rolling checksum) Need to be fast , Low consumption based on given buffer X1 ...Xn  And X1,Xn+1
The byte value of calculates the check value of the buffer .

We are rsync The design inspiration of weak check algorithm used in Mark Adler Of adler-32 check . Our validation definition formula is :

 





 

among s(k,l) Is the rolling check code of bytes . In order to calculate the rolling check code quickly and simply , We use .

The most important feature of this algorithm is that it can use recursion relation to calculate continuous values very efficiently .





Therefore, the S Length of data block to calculate the check code , And the number of calculations is very small .

Although the algorithm is simple enough , But it's enough to be the first level check when two file data blocks match . We found in practice , When block contents are different , Check code (rolling
checksum) The probability of matching is very low . This is very important , Because every weak parity check can match the data block on it, the strong parity check code must be calculated , It is very expensive to calculate strong check codes .( translator's note : in other words , Different block contents , Weak check codes may still be the same ( Although the probability is very low ), that is rolling
checksum There was a collision , Therefore, it is necessary to calculate the strong check code for the data block with the same weak check code for further matching )

<>

1.5 Checksum searching


When α I got it B Check code list of file data block , It has to search A Data block of file ( Search at any offset ), The goal is to find a match B Data block part of the data block check code . The basic strategy is to A The length of each byte of the file is calculated successively as S Byte of data block 32 Bit weak roll check code (rolling
checksum), Then, for each calculated weak check code , Take it and B Match the check codes in the file check code list . In our algorithm , Simple 3 Layer search check ( translator's note :3 Step search process ).


First floor inspection , For each 32 A bit weak roll check code is calculated 16 Bit length hash value , And every 216 This one hash Items in one sheet hash surface . According to this 16 position hash Value pair check code list ( E.g. received B Check code set of file data block ) Sort .hash Each item in the table points to the corresponding one in the check code list hash First element of value ( translator's note : Data block ID), Or when there is no corresponding in the check code list hash Value , this hash Table entry will be a null value ( translator's note : The reason why there is a null check code and the corresponding null hash Possible values , Because β That will α There are β The check code of the file not found on is set to null and sent to α, such α You'll know when you search for files β The file is not available on , Then send the entire file directly to β).

For each offset in the file , It will be calculated 32 Bit rolling check code and its 16 position hash value . If the hash Value hash Table item is a non null value , Layer 2 check will be called .

( translator's note : in other words , The first level is to compare and match 16 Bitwise hash value , If it can be matched, it is considered that the data block has the same potential , Then enter the second layer of check from the location of this data block )


The second level check will scan the sorted check code list , It will hash Where the table entry points to ( translator's note : This entry corresponds to the data block that can be matched after the first level check ) Start scan , The purpose is to find out what can match the current value 32 Bit rolling check code . When scanning to the same 32 Bit weak roll check value or until different 16 position hash None of the values match 32 Bit weak check code , Scan terminated ( translator's note : because hash The probability of repetition of value and weak check code is very low , So basically scan down again 1 Most items 2 Item to find unmatched weak roll check codes ). If you find a match , Then the third level check is called .


( translator's note : in other words , The second level of checking is to compare and match 32 Weak roll check code of bit , If it can match, it means that it has the same potential , Then from this position, you can enter the third level of inspection , If there is no matching weak roll check code , It indicates the hash Duplicate value , But the good thing is that the matching of weak roll check excludes it )


The third level check will calculate the strong check code for the data block of the current offset in the file , And compare it with the strong check code in the check code list . If two strong check codes match , We think A Data blocks in and B Data blocks in are exactly the same . theoretically , These data blocks may be different , But the probability is extremely small , So in the process of practice , We think it's a reasonable assumption .


When we find that we can match the data block on ,α Will A The offset of this block in the file and the end offset of the previous matching block are sent to β, The matching data block will also be sent in the B Index of data block in ( translator's note : That is, the ID,chunk number ). When matching data is found , These data ( translator's note : Including reassembly instructions related to data blocks on matching and pure data of data blocks that are not matched between two matching blocks ) Will be sent immediately , This allows us to execute communication in parallel with further computation .


If it is found that the data block of the current offset in the file does not match up , The weak check code will scroll down to the next offset address and continue searching ( translator's note : That is to say, a byte is scrolled down ). If you find a match 值时,弱校验码搜索将从匹配到的数据块的终止偏移地址重新开始(译者注:也就是说向下滚动了一个数据块).
对于两个几乎一致的文件(这是最常见的情况),这种策略节省了大量的计算量.另外,对于最常见的情况,A的一部分数据能依次匹配上B的一系列数据块,这时对数据块索引号进行编码将是一件很简单的事.

<>

1.6 Pipelining

上面几个小章节描述了在远程系统上组建一个文件副本的过程.如果我们要拷贝一系列文件,我们可以将过程流水线化(pipelining the
process)以期获得很可观的延迟上的优势.


这要求β上启动的两个独立的进程.其中一个进程负责生成和发送校验码给α,另一个进程则负责从α接收不同的信息数据以便重组文件副本.(译者注:即generator-->sender-->receiver)

如果链路上的通信是被缓冲的,两个进程可以相互独立地不断向前工作,并且大多数时间内,可以保持链路在双方向上被充分利用.

<>

1.7 Results


为了测试该算法,创建了两个不同Linux内核版本的源码文件的tar包.这两个内核版本分别是1.99.10和2.0.0.这个tar包大约有24M且5个不同版本的补丁分隔.

在1.99.10版本中的2441个文件中,2.0.0版本中对其中的291个文件做了更改,并删除了19个文件,以及添加了25个新文件.

使用标准GNU diff程序对这两个tar包进行"diff"操作,结果产生了超过32000行总共2.1 MB的输出.

下表显示了两个文件间使用不同数据块大小的rsync的结果.


block size

matches

tag hits

false alarms

data

written

read


300

64247

3817434

948

5312200

5629158

1632284


500

46989

620013

64

1091900

1283906

979384


700

33255

571970

22

1307800

1444346

699564


900

25686

525058

24

1469500

1575438

544124


1100

20848

496844

21

1654500

1740838

445204

在每种情况下,所占用的CPU时间都比在两个文件间直接运行"diff"所需时间少.

表中各列的意思是:

block size:计算校验和的数据块大小,单位字节.

matches:从A中匹配出B中的某个数据块所匹配的次数.(译者注:相比于下面的false matches,这个是true matches)

tag hits:A文件中的16位hash值能匹配到B的hash表项所需的匹配次数.

false alarms:32位弱滚动校验码能匹配但强校验码不能匹配的匹配次数.(译者注:即不同数据块的rolling
checksum出现小概率假重复的次数)

data:逐字节传输的文件纯数据,单位字节.

written:α所写入的总字节数,包括协议开销.这几乎全是文件中的数据.

read:α所读取的总字节数,包括协议开销,这几乎全是校验码信息数据.


结果表明,当块大小大于300字节时,仅传输了文件的一小部分(大约5%)数据.而且,相比使用diff/patch方法更新远端文件,rsync所传输的数据也要少很多.


每个校验码对占用20个字节:弱滚动校验码占用4字节,128位的MD4校验码占用16字节.因此,那些校验码本身也占用了不少空间,尽管相比于传输的纯数据大小而言,它们要小的多.

false alarms少于true matches次数的1/1000,这说明32位滚动校验码可以很好地检测出不同数据块.

tag hits的数值表明第二层校验码搜索检查算法大致每50个字符被调用一次(译者注:以block
size=1100那行数据为例,50W*50/1024/1024=23M).这已经非常高了,因为在这个文件中所有数据块的数量是非常大的,因此hash表也是非常大的.文件越小,我们期望tag
hit的比例越接近于匹配次数.对于极端情况的超大文件,我们应该合理地增大hash表的大小.


下一张表显示了更小文件的结果.这种情况下,众多文件并没有被归档到一个tar包中,而是通过指定选项使得rsync向下递归整个目录树.这些文件是以另一个称为Samba的软件包为源抽取出来的两个版本,源码大小为1.7M,对这两个版本的文件运行diff程序,结果输出4155行共120kB大小.


block size

matches

tag hits

false alarms

data

written

read


300

3727

3899

0

129775

153999

83948


500

2158

2325

0

171574

189330

50908


700

1517

1649

0

195024

210144

36828


900

1156

1281

0

222847

236471

29048


1100

921

1049

0

250073

262725

23988