By constructing hash function with good performance, Can reduce conflict, But generally, it is impossible to avoid conflicts completely, Therefore, conflict resolution is another key issue of hashifa. There are conflicts in creating and finding hash tables, The solution to the conflict should be the same in both cases. Let's take creating a hash table as an example, Explain how to resolve conflicts. There are four common conflict resolution methods:

Open addressing

This method is also called rehash, Its basic idea is: When keywordkey Hash address ofp=H(key) When there is a conflict, withp Based on, Generate another hash addressp1, Ifp1 Still conflict, Again withp
Based on, Generate another hash addressp2,…, Until you find a non conflicting hash addresspi , Store the corresponding element in it. This method has a general form of rehash function:

Hi=(H(key)+di)% m   i=1,2,…,n

amongH(key) Is a hash function,m  Table length,di Called incremental sequence. Different value taking methods of incremental series, The corresponding rehash method is also different. There are mainly three types:

Linear probe rehash

dii=1,2,3,…,m-1

This method is characterized by: At the time of conflict, Sequence view next cell in table, Until you find an empty unit or look through the entire table.

Second probe hashing

di=12,-12,22,-22,…,k2,-k2    ( k<=m/2 )

This method is characterized by: At the time of conflict, Skip around the table, More flexible.

Pseudo random detection rehash

di= Pseudorandom number sequence.

 

Specific implementation, A pseudo-random number generator should be established,( asi=(i+p) % m), And give a random number as the starting point.

for example, Known hash table lengthm=11, The hash function is:H(key)= key  %  11, beH(47)=3,H(26)=4,H(60)=5, Suppose the next keyword is69, beH
(69)=3, And47 conflict.

If we use linear probe to rehash conflict, The next hash address isH1=(3 + 1)% 11 = 4, Still conflict, Find the next hash address asH2=(3 + 2)% 11 = 5
, Or conflict? Continue to find the next hash addressH3=(3 + 3)% 11 = 6, No more conflicts at this time, take69 Fill5 Unit number.

If we use the second probe to rehash the conflict, The next hash address isH1=(3 + 12)% 11 = 4, Still conflict, Find the next hash address asH2=(3 - 12)% 11 = 2
, No more conflicts at this time, take69 Fill2 Unit number.

If we use pseudo-random detection to rehash the conflict, And the pseudo-random number sequence is:2,5,9,…….. The next hash address isH1=(3 + 2)% 11 = 5, Still conflict, Find the next hash address as
H2=(3 + 5)% 11 = 8, No more conflicts at this time, take69 Fill8 Unit number.

double hashing

This method is to construct several different hash functions at the same time:

Hi=RH1(key)  i=1,2,…,k

When hash addressHi=RH1(key) In case of conflict, Re calculationHi=RH2(key)……, Until the conflict no longer occurs. This method is not easy to generate aggregation, But increased calculation time.

Chain address method

The basic idea of this method is to set all hash addresses asi A single list of synonym chains is formed by the elements of, And store the header pointer of the single chain table in thei
Units in, So look for, Insertion and deletion are mainly in the synonym chain. Chain address method is suitable for frequent insertion and deletion.

 

Establish public overflow area

The basic idea of this method is: The hash table is divided into two parts: basic table and overflow table, All elements in conflict with the basic table, Fill in the overflow table.
   


Advantages and disadvantages

Open hash(open hashing)/ Zipper method( For bucket chain structure)
1) Advantage: ① For frequent changes in the total number of records, Handled it well( That is to avoid the overhead of dynamic adjustment) ②
Because records are stored in nodes, And nodes are dynamic allocation, No memory waste, So it's especially suitable for the size of the record itself(size) Big picture, Because the cost of the pointer is negligible ③
When deleting records, More convenient, Directly through pointer operation   2) shortcoming:
① Stored records are randomly distributed in memory, In this way, when querying records, Compact data types( For example, array), Jump access of hash table will bring extra time cost ② If all key-value
Yes, it can be predicted in advance, When it doesn't change( That is, insertion and deletion are not allowed), You can create a perfect hash function that doesn't conflict(perfect hash
function), The performance of closed hash is much better than that of open hash ③ Due to the use of pointers, Records are not easy to serialize(serialize) operation
Closed hash(closed hashing)/ Open addressing
1) Advantage: ① Records are easier to serialize(serialize) operation ② If the total number of records is predictable, Perfect hash function can be created, At this time, the efficiency of data processing is very high   2) shortcoming: ①
The number of stored records cannot exceed the length of the bucket array, If it exceeds, it needs to be expanded, And capacity expansion will cause the time cost of an operation to soar, This may be a serious flaw in real-time or interactive applications
② Use detection sequence, It is possible that the calculated time cost is too high, Resulting in poor processing performance of hash table
③ Because the records are stored in the bucket array, There must be empty slots in the bucket array, So when you record your own size(size) When it's large and the total number of records is large, The space occupied by empty slots will lead to obvious memory waste
④ When deleting records, More trouble. For example, records need to be deleteda, Recordb Is ina Then insert the, But and recorda Conflict, It's the address found by skipping through the detection sequence again, So if you delete it directlya,a The position of becomes empty, And empty slot is the termination condition of query record failure, This will result in a recordb staya Location of is not visible until data is reinserted, So it can't be deleted directlya, Instead, set the delete flag. This requires extra space and operation.