preface :

hashmap Is a very common data structure , It is easy to use , Next, I will give you a deep analysis of this data structure , Let us know when we use it , I know why .

One .Map

first , From the most basic point of view , Let's meet first map What is it . When we write programs, we often encounter data retrieval and other operations , For small programs with hundreds of data , The storage method or retrieval strategy of the data has little impact , But for big data , Efficiency will be far from good . Let's talk about it .

1. Linear search :

Linear retrieval is the most straightforward method , Go through all the data , And find the data you need . Its corresponding data structure is array , Linear structure such as linked list , This method is extremely inefficient for big data , Its time complexity is O(n).

2. Binary search :

Binary search is an improvement of linear search , For example, for 【1,2,3,4,5,6,7,8】, I want to search for a number ( Suppose it is 2), I'll first compare this number with 4( Generally, the number of selected digits is better ) compare , less than 4 Then in 4 To the left of 【1,2,3】 Find in , And 2 compare , equal , We found it , The advantage of this method is that it can save a lot of unnecessary searches , Find only half of the elements in the collection at a time . Its time complexity is O(logn). But there are limits , His number arrangement itself needs to be orderly .

3.Hash Find in table :

okay , Here comes the point ,Hash Watch comes on stage , This is a time complexity of O(1) Retrieval of , That is, no matter how much data you have, you only need to check it once to find the target data . Isn't it amazing ?? Well, it's actually retarded . Let's see the picture below .

You can see that the value in this array is equal to its subscript , For example, I want to save 11, I'll keep it a[11] inside , So when I want to find a number, I can directly correspond to its subscript . It's actually a way of sacrificing space for time , This will take up a lot of memory , But the retrieval speed is very fast , You only need to search once to find the target data .

4.Hash Table changes

Look at the top Hash You must want to ask , If I save only one number 10000, Then I don't want to exist a[10000], Isn't the rest of the space wasted , ok , Nonexistent .Hash The table already has its solution , That is Hash function .Hash The essence of a watch is that it can pass value The feature itself locates to the element subscript of the search set , To quickly find . General Hash The function is : Number to deposit
mod( To find redundancy ) Hash Array length . For example, for the length above 9 Array of ,12 The location of is 12 mod 9=3, That is, existence a3
<>, In this way, large data can be placed .

5.Hash Conflict resolution strategy

Read the above explanation , You must have found a problem , The address from the remainder may be the same . This is what we call Hash conflict , If the data volume is large and Hash The barrel is relatively small , This kind of conflict is very serious . We have taken the following measures to resolve the conflict .

We can see 12 and 0 's location conflicts , Then we change every element of the array into a chain header , Conflicting elements are placed in the linked list , In this way, after finding the corresponding chain header, it will follow the chain list , As for why to use linked list , To save space , Linked list is not stored continuously in memory , So we can make full use of memory .

Java of HashMap

It says so much , That's what we're talking about today HashMap What does it matter ?? All right, basin friends, no side , Get to the point . We know HashMap The values in are key,value Right , Actually, the storage here is very similar to the one above ,key Will be mapped to the address where the data is located , and value It's in the linked list with this address as the head , This kind of data structure can be acquired very quickly . But the problem here is that if hash Small barrel , Large amount of data , It will lead to a very long list . For example, the top one is 11 I want to put 1000 number , whether Hash How to refine functions , The linked list at the back will be very long , such Hash The advantages of the watch no longer exist , On the contrary, it tends to be linear retrieval . okay , Red and black trees show up .


stay jdk1.8 After version ,java Yes HashMap Improved , The length of linked list is greater than 8 When , Store the following data in the red black tree , To speed up retrieval , Let's talk about the red and black trees .

avl tree

To understand the red black tree , First of all avl tree , You know avl tree , The first thing to know is binary tree , It's very simple , A binary tree is one or two child nodes under each parent node , Roughly as shown in the figure below .

When we store data in a binary tree, we put the number larger than the parent node on the right node , Place a number smaller than the parent on the left node , In this way, we only need to compare a number with the parent node when looking for it , Large, go to the right and call recursively , Small, then left recursion . But it has some shortcomings , If it's bad luck, my data is in order , such as 【1,2,3,4,5,6,7】, This will lead to tree imbalance , Binary trees will degenerate into linked lists . So we launched avl tree .

avl Tree is balance tree , He improved the binary tree , Every time we insert a node , It must be ensured that the height difference between the left subtree and the right subtree corresponding to each node does not exceed 1. If it exceeds, adjust the balance , The specific balancing operation is not mentioned here , Four operations —— Sinistral , Left and right , Right and left . Finally, the height of the left and right sides of the binary tree is similar , In this way, when we search, we can search by binary search , It will not degenerate into a linked list .

Two or three trees

There are a lot of articles about red black trees on the Internet , There are all kinds of explanations , But bloggers like to put black and red trees together with two or three trees . Let's take a look at the two or three trees .

notes : This picture is from Baidu

Actually, it's easy to understand , The difference between a binary tree and a binary tree is that it has two nodes and three nodes . There are two sub nodes under two nodes , Two nodes can hold a value , There are three sub nodes under the three nodes , Three nodes can hold two values . Let's talk about the construction of the two and three numbers .

notes : The picture still comes from Baidu , Bloggers' pictures are rubbish .

In fact, the construction of two or three trees is very simple , As shown in the figure , In the picture M Node is a two node ,M sinister EJ Node is a three node . Still big data on the right , Small data to the left . At this time, we will weight the tree if the number can be directly put into two nodes , Go straight in , But if it happens to be in three nodes , Just like in the picture ,Z It's just going to be SX in . Then we need to split the node into two nodes , And the number in the middle is referred to the parent node , Like in the picture X It's on R side . Of course, when the child node is mentioned to the parent node, the number of the parent nodes exceeds two , Just keep going up , Until it's satisfied .


The red black tree is very similar to the two or three trees , Basically, it's a deformation of two or three trees .
The traditional definition of red black tree is to meet the following five characteristics :
(1) Every node or black , Or red .
(2) The root node is black .
(3) Each leaf node (NIL) It's black . [ be careful : Here is the leaf node , Is empty (NIL or NULL) Leaf node of !]
(4) If a node is red , Then its child nodes must be black .
(5) All paths from a node to its children contain the same number of black nodes .
Its characteristic is to add color attribute to each node of the number , Balance by color transformation and node rotation during insertion . In fact, bloggers don't like the above definition very much , Another perspective is to compare it with two or three trees .

Of course, the picture above was also searched .
Red black trees can also be described as :
⑴ Red links are left links .
⑵ No node is connected to two red links at the same time .
⑶ The tree is perfectly black balanced , That is to say, the number of black links on the path of any empty link to the root node is the same .
Here, the connection between nodes is divided into red connection and black connection , Replaced the definition of red node and black node ( The essence is the same ), Define previous black height equality as equal number of black connections . More intuitive .
As shown in the figure , In fact, every step of the operation of the red black tree corresponds to the operation of the two or three trees , If it's two nodes, it's black connection , If there are three nodes, the two numbers are connected in red .

Advantages of Mangrove

Red and black trees avl tree , The efficiency is almost the same when searching , It's all about finding two points by balancing . But the operation efficiency of insertion and deletion is greatly improved . The red and black trees are not like avl Tree like pursuit of absolute balance , He allows for a little partial imbalance , This has little effect on efficiency , But a lot of unnecessary balancing operations are omitted ,avl Tree tuning balance sometimes costs a lot , So it's not as efficient as the red black tree , In many places, there are red and black trees at the bottom ~


HashMap It's a structure of linked list and red black tree , In this way, the memory utilization of linked list and the efficient retrieval of red black tree are used , It's a very happy Data structure of .

Small welfare at the end of the article

I used to use C++ Handwritten avl Tree realization , For those who are a little confused about the course design of data structure in sophomore year, please refer to .

Next notice