By constructing hash function with good performance , Can reduce conflict , But generally, it is impossible to avoid conflicts completely , Therefore, conflict resolution is another key issue of hashifa . There are conflicts in creating and finding hash tables , The solution to the conflict should be the same in both cases . Let's take creating a hash table as an example , Explain how to resolve conflicts . There are four common conflict resolution methods :

Open addressing

This method is also called rehash , Its basic idea is : When keyword key Hash address of p=H(key) When there is a conflict , with p Based on , Generate another hash address p1, If p1 Still in conflict , And then p
Based on , Generate another hash address p2,…, Until you find a non conflicting hash address pi , Store the corresponding element in it . This method has a general form of rehash function :

Hi=(H(key)+di)% m   i=1,2,…,n

among H(key) Is a hash function ,m  Table length ,di Called incremental sequence . Different value taking methods of incremental series , The corresponding rehash method is also different . There are mainly three types :

Linear probe rehash

dii=1,2,3,…,m-1

This method is characterized by : At the time of conflict , Sequence view next cell in table , Until you find an empty unit or look through the entire table .

Second probe hashing

di=12,-12,22,-22,…,k2,-k2    ( k<=m/2 )

This method is characterized by : At the time of conflict , Skip around the table , More flexible .

Pseudo random detection rehash

di= Pseudorandom number sequence .

 

Specific implementation , A pseudo-random number generator should be established ,( as i=(i+p) % m), And give a random number as the starting point .

for example , Known hash table length m=11, The hash function is :H(key)= key  %  11, be H(47)=3,H(26)=4,H(60)=5, Suppose the next keyword is 69, be H
(69)=3, And 47 conflict .

If we use linear probe to rehash conflict , The next hash address is H1=(3 + 1)% 11 = 4, Still in conflict , Find the next hash address as H2=(3 + 2)% 11 = 5
, Or conflict , Continue to find the next hash address H3=(3 + 3)% 11 = 6, No more conflicts at this time , take 69 fill 5 Unit No .

If the conflict is handled by hashing with two probes , The next hash address is H1=(3 + 12)% 11 = 4, Still in conflict , Find the next hash address as H2=(3 - 12)% 11 = 2
, No more conflicts at this time , take 69 fill 2 Unit No .

If we use pseudo-random detection to rehash the conflict , And the pseudo-random number sequence is :2,5,9,…….. The next hash address is H1=(3 + 2)% 11 = 5, Still in conflict , Find the next hash address as
H2=(3 + 5)% 11 = 8, No more conflicts at this time , take 69 fill 8 Unit No .

double hashing

This method is to construct several different hash functions at the same time :

Hi=RH1(key)  i=1,2,…,k

When hash address Hi=RH1(key) In case of conflict , Recalculation Hi=RH2(key)……, Until the conflict no longer occurs . This method is not easy to generate aggregation , But increased calculation time .

Chain address method

The basic idea of this method is to set all hash addresses as i A single list of synonym chains is formed by the elements of , And store the header pointer of the single chain table in the i
Of units , So find , Insertion and deletion are mainly in the synonym chain . Chain address method is suitable for frequent insertion and deletion .

 

Establish public overflow area

The basic idea of this method is : The hash table is divided into two parts: basic table and overflow table , All elements in conflict with the basic table , Fill in the overflow table .
   


Advantages and disadvantages

Open hash (open hashing)/ Zipper method ( For bucket chain structure )
1) advantage : ① For frequent changes in the total number of records , Handled it well ( That is to avoid the overhead of dynamic adjustment ) ②
Because records are stored in nodes , And nodes are dynamic allocation , No memory waste , So it's especially suitable for the size of the record itself (size) Big picture , Because the cost of the pointer is negligible ③
When deleting records , More convenient , Directly through pointer operation   2) shortcoming :
① Stored records are randomly distributed in memory , In this way, when querying records , Compact data types ( Like arrays ), Jump access of hash table will bring extra time cost ② If all key-value
Yes, it can be predicted in advance , When it doesn't change ( That is, insertion and deletion are not allowed ), You can create a perfect hash function that doesn't conflict (perfect hash
function), The performance of closed hash is much better than that of open hash ③ Due to the use of pointers , Records are not easy to serialize (serialize) operation
Closed hash (closed hashing)/ Open addressing
1) advantage : ① Records are easier to serialize (serialize) operation ② If the total number of records is predictable , Perfect hash function can be created , At this time, the efficiency of data processing is very high   2) shortcoming : ①
The number of stored records cannot exceed the length of the bucket array , If it exceeds, it needs to be expanded , And capacity expansion will cause the time cost of an operation to soar , This may be a serious flaw in real-time or interactive applications
② Use detection sequence , It is possible that the calculated time cost is too high , Resulting in poor processing performance of hash table
③ Because the records are stored in the bucket array , There must be empty slots in the bucket array , So when you record your own size (size) When it's large and the total number of records is large , The space occupied by empty slots will lead to obvious memory waste
④ When deleting records , More trouble . For example, records need to be deleted a, record b Yes a Then insert the , But and record a There is a conflict , It's the address found by skipping through the detection sequence again , So if you delete it directly a,a The position of becomes empty , And empty slot is the termination condition of query record failure , This will result in a record b stay a Location of is not visible until data is reinserted , So it can't be deleted directly a, Instead, set the delete flag . This requires extra space and operation .