solve hash Three methods of conflict
By constructing hash function with good performance , Can reduce conflict , But generally, it is impossible to avoid conflicts completely , Therefore, conflict resolution is another key issue of hashifa . There are conflicts in creating and finding hash tables , The solution to the conflict should be the same in both cases . Let's take creating a hash table as an example , Explain how to resolve conflicts . There are four common conflict resolution methods ：
This method is also called rehash , Its basic idea is ： When keyword key Hash address of p=H（key） When there is a conflict , with p Based on , Generate another hash address p1, If p1 Still in conflict , And then p
Based on , Generate another hash address p2,…, Until you find a non conflicting hash address pi , Store the corresponding element in it . This method has a general form of rehash function ：
Hi=（H（key）+di）% m i=1,2,…,n
among H（key） Is a hash function ,m Table length ,di Called incremental sequence . Different value taking methods of incremental series , The corresponding rehash method is also different . There are mainly three types ：
Linear probe rehash
This method is characterized by ： At the time of conflict , Sequence view next cell in table , Until you find an empty unit or look through the entire table .
Second probe hashing
di=12,-12,22,-22,…,k2,-k2 ( k<=m/2 )
This method is characterized by ： At the time of conflict , Skip around the table , More flexible .
Pseudo random detection rehash
di= Pseudorandom number sequence .
Specific implementation , A pseudo-random number generator should be established ,（ as i=(i+p) % m）, And give a random number as the starting point .
for example , Known hash table length m=11, The hash function is ：H（key）= key % 11, be H（47）=3,H（26）=4,H（60）=5, Suppose the next keyword is 69, be H
（69）=3, And 47 conflict .
If we use linear probe to rehash conflict , The next hash address is H1=（3 + 1）% 11 = 4, Still in conflict , Find the next hash address as H2=（3 + 2）% 11 = 5
, Or conflict , Continue to find the next hash address H3=（3 + 3）% 11 = 6, No more conflicts at this time , take 69 fill 5 Unit No .
If the conflict is handled by hashing with two probes , The next hash address is H1=（3 + 12）% 11 = 4, Still in conflict , Find the next hash address as H2=（3 - 12）% 11 = 2
, No more conflicts at this time , take 69 fill 2 Unit No .
If we use pseudo-random detection to rehash the conflict , And the pseudo-random number sequence is ：2,5,9,…….. The next hash address is H1=（3 + 2）% 11 = 5, Still in conflict , Find the next hash address as
H2=（3 + 5）% 11 = 8, No more conflicts at this time , take 69 fill 8 Unit No .
This method is to construct several different hash functions at the same time ：
When hash address Hi=RH1（key） In case of conflict , Recalculation Hi=RH2（key）……, Until the conflict no longer occurs . This method is not easy to generate aggregation , But increased calculation time .
Chain address method
The basic idea of this method is to set all hash addresses as i A single list of synonym chains is formed by the elements of , And store the header pointer of the single chain table in the i
Of units , So find , Insertion and deletion are mainly in the synonym chain . Chain address method is suitable for frequent insertion and deletion .
Establish public overflow area
The basic idea of this method is ： The hash table is divided into two parts: basic table and overflow table , All elements in conflict with the basic table , Fill in the overflow table .
Advantages and disadvantages
Open hash （open hashing）/ Zipper method （ For bucket chain structure ）
1） advantage ： ① For frequent changes in the total number of records , Handled it well （ That is to avoid the overhead of dynamic adjustment ） ②
Because records are stored in nodes , And nodes are dynamic allocation , No memory waste , So it's especially suitable for the size of the record itself （size） Big picture , Because the cost of the pointer is negligible ③
When deleting records , More convenient , Directly through pointer operation 2） shortcoming ：
① Stored records are randomly distributed in memory , In this way, when querying records , Compact data types （ Like arrays ）, Jump access of hash table will bring extra time cost ② If all key-value
Yes, it can be predicted in advance , When it doesn't change （ That is, insertion and deletion are not allowed ）, You can create a perfect hash function that doesn't conflict （perfect hash
function）, The performance of closed hash is much better than that of open hash ③ Due to the use of pointers , Records are not easy to serialize （serialize） operation
Closed hash （closed hashing）/ Open addressing
1） advantage ： ① Records are easier to serialize （serialize） operation ② If the total number of records is predictable , Perfect hash function can be created , At this time, the efficiency of data processing is very high 2） shortcoming ： ①
The number of stored records cannot exceed the length of the bucket array , If it exceeds, it needs to be expanded , And capacity expansion will cause the time cost of an operation to soar , This may be a serious flaw in real-time or interactive applications
② Use detection sequence , It is possible that the calculated time cost is too high , Resulting in poor processing performance of hash table
③ Because the records are stored in the bucket array , There must be empty slots in the bucket array , So when you record your own size （size） When it's large and the total number of records is large , The space occupied by empty slots will lead to obvious memory waste
④ When deleting records , More trouble . For example, records need to be deleted a, record b Yes a Then insert the , But and record a There is a conflict , It's the address found by skipping through the detection sequence again , So if you delete it directly a,a The position of becomes empty , And empty slot is the termination condition of query record failure , This will result in a record b stay a Location of is not visible until data is reinserted , So it can't be deleted directly a, Instead, set the delete flag . This requires extra space and operation .