Trie tree ( Dictionary tree )

One , introduce

What is a dictionary for ? Looking for a word's .

The dictionary tree also plays the role of searching . What to look for ? word .

Look at the following questions :

1, give n Words and m Questions , Ask one word at a time , Answer if the word appears in the word list .

answer : simple !map, not of imposing stature but strong and capable .

good . next

2, give n Words and m Questions , One prefix at a time , How many words are prefixes for answering questions .

answer :map, Take each word apart .

judge:n<=200000,TLE!

This requires a high-level data structure ——Trie tree ( Dictionary tree )

Two , principle

In this article , Suppose all words are made up of lowercase letters only

Yes cat,cash,app,apple,aply,ok Build a dictionary tree , After completion, as shown in the figure below



From this we can see that :

1, The dictionary tree represents letters with edges

2, Word common prefix nodes with the same prefix , Then we can get that each node has at most 26 Child node ( When a word contains only lowercase letters )

3, The root node of the whole tree is empty . Why? ? Easy to insert and find , This will be explained later .

4, Each word ends with a special character , Used in the figure ‘$’, Then from the root node to any one ‘$’ All the letters on the edge indicate a word .

Three , basic operation

A,insert, Insert a word

1. thinking

  It can be seen directly from the figure , Scan the word from left to right , If the letter does not appear under the corresponding root node , Just insert the letter ; Otherwise, go down the dictionary tree , Look at the next letter of the word .

  That raises a question : Where to insert ? The computer will not choose its own location , We need to give it a location , You need to number each letter .

  Let's set the array trie[i][j]=k, Indicates No i Of nodes j The children are numbered k Node of .

  What do you mean ?

  here you are 2 Species number , One is i,k Indicates the location number of the node , This is relative to the whole tree ; The other is j, Represent node i Of j Children of , This is the relative node i In terms of .

  incomprehension ? Look at the picture

  Or words cat,cash,app,apple,aply,ok 

  We'll number them in the first order , gules
Indicates number result . Because it was entered first cat, therefore c,a,t namely 1,2,3, And the input is cash, because c,a Is a common prefix , So from s Start editing ,s yes 4, and so on .

Note that the numbers of the same letters may be different here

 

  Second number , Number of relative nodes , Purple indicates number result .

Because each node has at most 26 Child node , We can follow their dictionary order from 0——25 number , That's their ASCLL code -a Of ASCLL code .

Note that the numbers of the same letters are the same here



  In fact, the children of each node should 0 To ——25, But you'll find a lot of things that don't work at all . For example, the root node in the figure above should be separated 26 Fork . Save space , Which points to use .

  What's the use of numbering like this ?

Back to array trie[i][j]=k.  array trie[i][j]=k, Indicates No i Of nodes j The children are numbered k Node of .

Then the second number is j, The first number is i,k

2, code
void insert()// Insert word s { len=strlen(s);// word s Length of root=0;// The root node number is 0 for(int i=0
;i<len;i++) { int id=s[i]-'a';// Second number if(!trie[root][id])// If you didn't root reach id Prefix of
trie[root][id]=++tot;// insert ,tot Is the first number root=trie[root][id];// Go down the dictionary tree } }
B,search, lookup

There are many ways to find , You can find a prefix , You can also find the whole word .

Again, let's take looking up whether a prefix has appeared as an example

1, thinking

  Scan each letter from left to right , Look down the dictionary tree , Can find the letter , Go down , Otherwise, end the search , There is no such prefix ; Prefix scanning finished , Indicates that there is this prefix .

2, code
bool find() { len=strlen(s); root=0;// Starting from the root node for(int i=0;s[i];i++) { int
x=s[i]-'a';// if(trie[root][x]==0) return false;// with root For head node x Letter does not exist , return 0
root=trie[root][x];// Prepare for next letter query , Go down } return true;// eureka }
3, If you are looking for a word , We use bool variable v[i] Represent node i Is it a sign of the end of a word .

    So finally return Yes v[root], So after inserting each word in the insert operation is , For the last letter of the word v[i] Set as true, Everything else false

4, If it's the number of times the query prefix appears , That's one sum[], Indicate location i Number of visits ,

    So finally return Yes sum[root], Every node accessed in the insert operation , We have to let him sum++

    Here, the number of prefixes is marked at the position after the last letter of the prefix .

  such as : prefix abc The number of occurrences is marked at c In the next position ,

 

Four , Full code

1, Whether the query appears
/* trie tree Storage mode of : Save letters on edge , The node of an edge connects the letters to it
trie[rt][x]=tot:rt Is the last node number ,x It's a letter ,tot Is the next node number */ #include<cstdio> #include<iostream>
#include<algorithm> #include<cstring> #define maxn 2000010 using namespace std;
int tot=1,n; int trie[maxn][26]; //bool isw[maxn]; To query the whole word void insert(char *s,
int rt) { for(int i=0;s[i];i++) { int x=s[i]-'a'; if(trie[rt][x]==0)//
The inserted letter does not appear at the same node before { trie[rt][x]=++tot;// Insert letters in a new position , Otherwise, we will not deal with it } rt=trie[rt][x];
// Prepare for the insertion of the next letter } /*isw[rt]=true; Mark the end of the last letter of the word , Used when querying the whole word */ } bool find(char *s,
int rt) { for(int i=0;s[i];i++) { int x=s[i]-'a'; if(trie[rt][x]==0)return false
;// with rt For head node x Letter does not exist , return 0 rt=trie[rt][x];// Prepare for next letter query } return true; //
When querying the whole word , should return isw[rt] } char s[22]; int main() { tot=0; int rt=1; scanf("%d"
,&n); for(int i=1;i<=n;i++) { cin>>s; insert(s,rt); } scanf("%d",&n); for(int i=
1;i<=n;i++) { cin>>s; if(find(s,rt))printf("YES\n"); else printf("NO\n"); }
return 0; } Array simulation
2, Number of occurrences of query prefix
#include<iostream> #include<cstring> #include<cstdio> #include<algorithm> using
namespace std; int trie[400001][26],len,root,tot,sum[400001]; bool p; int n,m;
char s[11]; void insert() { len=strlen(s); root=0; for(int i=0;i<len;i++) { int
id=s[i]-'a'; if(!trie[root][id]) trie[root][id]=++tot; sum[trie[root][id]]++;//
Prefix save root=trie[root][id]; } } int search() { root=0; len=strlen(s); for(int i=0
;i<len;i++) { int id=s[i]-'a'; if(!trie[root][id]) return 0; root=
trie[root][id]; }//root After this cycle, it becomes the position of the last letter of the prefix return sum[root]; } int main() {
scanf("%d",&n); for(int i=1;i<=n;i++) { cin>>s; insert(); } scanf("%d",&m); for(
int i=1;i<=m;i++) { cin>>s; printf("%d\n",search()); } } Array simulation Array simulation
#include<cstdio> #include<cstring> #include<iostream> #include<algorithm> using
namespace std; char s[11]; int n,m; bool p; struct node { int count; node *
next[26]; }*root; node * build() { node * k=new(node); k->count=0; memset(k
->next,0,sizeof(k->next)); return k; } void insert() { node * r=root; char *
word=s; while(*word) { int id=*word-'a'; if(r->next[id]==NULL) r->next[id]=
build(); r=r->next[id]; r->count++; word++; } } int search() { node * r=root;
char * word=s; while(*word) { int id=*word-'a'; r=r->next[id]; if(r==NULL)
return 0; word++; } return r->count; } int main() { root=build(); scanf("%d",&
n);for(int i=1;i<=n;i++) { cin>>s; insert(); } scanf("%d",&m); for(int i=1
;i<=m;i++) { cin>>s; printf("%d\n",search()); } } Pointer writing
Five , Template questions

hud 1251 Statistical conundrum  http://acm.hdu.edu.cn/showproblem.php?pid=1251
<http://acm.hdu.edu.cn/showproblem.php?pid=1251>

codevs 4189 Dictionaries  http://codevs.cn/problem/4189/ <http://codevs.cn/problem/4189/>