<>

1. Basics

Introducing IO Before model , Right first io Of a piece of data while waiting " experience " Explain . As shown in the figure :



When a program or an existing process / thread ( In the future, it will be regarded as the process without distinction ) When a piece of data is needed , It can only be accessed in its own memory in user space , modify , This memory is called app
buffer. Assume the required data is on disk , Then the process must first initiate the relevant system call , Inform kernel to load files on disk . But normally , Data can only be loaded into the kernel's buffer , Let's call it kernel
buffer. Data loading to kernel buffer after , Also copy data to app buffer. Here we are , The process can access the data , Modified .

Now there are a few questions to be addressed .

(1). Why can't data be loaded directly into app buffer What about ?

In fact, it can , Some programs or hardware to improve efficiency and performance , Can realize the function of kernel bypass , Avoiding kernel participation , Directly on the storage device and app
buffer Data transfer between , for example RDMA Technology needs to implement such kernel bypass function .

however , In the most common and most cases , For safety and stability , Data must be copied into kernel space first kernel buffer, Copy to app
buffer, To prevent processes from crashing into kernel space .

(2). The data copying process mentioned above , Is the copy the same way ?

dissimilarity . Today's storage devices ( Include network card ) Basically all support DMA operation . What is? DMA(direct memory
access, Direct memory access )? In a nutshell , The data interaction between memory and device can be transmitted directly , No longer need a computer CPU participate in , But through the chip on the hardware ( It can be simply regarded as a small cpu) Control .


hypothesis , Storage device does not support DMA, So the transfer of data between memory and storage device , Must pass the CPU Calculate which address to get data from , What addresses are copied to each other , How much data to copy in ( How many data blocks , Where are the data blocks ) wait , Only one data transfer ,CPU A lot of things to do . and DMA That's what freed the computer CPU, Let it handle other tasks .

besides kernel buffer and app buffer Replication between , This is the data transmission of two memory spaces , Only by CPU To control .

therefore , Loading hard disk data to kernel buffer The process is DMA Copy Mode , And from kernel buffer reach app buffer The process is CPU Copy mode of participation .

(3). If the data is to pass TCP What to do when connecting and transmitting ?

for example ,web Service response data to client , Need to pass TCP Connection transfer to client .

TCP/IP The stack maintains two buffers :send buffer and recv buffer, They are collectively called socket
buffer. Need to pass TCP Data transmitted by connection , Need to copy to send
buffer, Then copy it to the network card and transmit it through the network . If passed TCP Connection received data , Data first enters through the network card recv buffer, Then copied to user space app buffer.

same , When data is copied to send buffer Or from recv buffer Copy to app buffer Time , yes CPU Participating copies . from send
buffer Copy to or from network card to recv buffer Time , yes DMA Copy of operation mode .

As shown in the figure below , Yes TCP The process of connecting and transferring data .



(4). Network data must come from kernel buffer Copy to app buffer Copy to send buffer Do you ?

no . If the process does not need to modify the data , Send it directly to TCP The other end of the connection , You don't have to kernel buffer Copy to app buffer, Instead, copy directly to send
buffer. This is zero replication .

for example httpd When no information needs to be accessed and modified , Copy the original local data to app buffer Copy the original local copy to send
buffer And then it's transmitted , But actually copy to app buffer It can be omitted . Using zero replication technology , Can reduce one copy process , Improve efficiency .

of course , There are many ways to realize zero copy technology , See my other zero copy article : Zero copy (zero copy) technology
<http://www.cnblogs.com/f-ck-need-u/p/7615914.html>.

Here's how httpd Complete data operation process when processing file class requests .



General explanation : Client initiates a request for a file , adopt TCP connect , Request data entry TCP Of recv buffer, Pass again recv() Function to read data into app
buffer, here httpd The working process analyzes the data , Know that a file is requested , So a system call is initiated ( For example, to read this file , launch read()), The kernel loads the file , Data copied from disk to kernel
buffer Copy to app
buffer, here httpd It's time to start building response data , Data may be modified , For example, add a field to the response header , Finally, copy the modified or unmodified data ( for example send() function ) reach send
buffer in , Pass again TCP Connection transfer to client .

<>

2. I/O Model

So called IO Model , It describes the emergence of I/O The state of the process while waiting and how data is processed . The state surrounding the process , Data ready to kernel buffer Again app
buffer Two phases of . Where data is copied to kernel buffer The process is called data preparation , Data from kernel buffer Copy to app buffer The process is called
Data replication phase . Remember these two concepts , Later description I/O These two concepts are always used in the model .


This paper uses httpd Process TCP Connection mode processing local file as an example , Please ignore httpd Is this really true , That kind of function , Please ignore TCP Connecting details of processing data , This is just an example for easy explanation . in addition , This article uses local files as I/O The object of the model is not very suitable , The play is on the socket , If you want to see the treatment TCP/UDP In process socket I/O Model , After reading this , Combined with another article of mine "
I don't know socket and TCP Connection process <http://www.cnblogs.com/f-ck-need-u/p/7623252.html>
" To re understand I/O Model .

Again , Data transfer from hardware to memory is not required CPU Participating , And memory to memory data transfer is needed CPU Participating .

<>

2.1 Blocking I/O Model

As shown in the figure :



Suppose the client initiates index.html File request for ,httpd You need to index.html Data is loaded from disk to its own httpd app
buffer in , Then copy to send buffer Send out in .

But in httpd Want to load index.html Time , It first checks its own app
buffer Is there any in index.html Corresponding data , No system call was initiated to let the kernel load the data , for example read(), The kernel will check itself first kernel
buffer Is there any in index.html Corresponding data , without , Load from disk , Then prepare the data to kernel buffer, Copy to app
buffer in , At last httpd Process processing .

If using Blocking I/O Model :

(1). When set to blocking i/o Model ,httpd It's blocked from to .
(2). Only when data is copied to app buffer After completion , Or something went wrong ,httpd To be awakened to deal with it app buffer Data in .
(3).cpu There will be two context switches : User space to kernel space to user space .
(4). Because a copy of a phase is not required CPU Participating , So in the process of data preparation ,cpu Can handle tasks of other processes .
(5). Phase data replication needs CPU participate in , take httpd block , To some extent , Helps improve its copy speed .
(6). This is the easiest , The simplest IO pattern .

As shown below :



<>

2.1 Non-Blocking I/O Model

(1). When set to non-blocking Time ,httpd System call initiated for the first time ( as read()) after , Return an error value immediately EWOULDBLOCK(
as for read() Whether to return when reading a normal file EWOULDBLOCK Please ignore , after all I/O The model is mainly for socket files , Just be read() yes recv() okay
), Instead of letting httpd put to sleep .UNP This is exactly what is described in .
When we set a socket to be nonblocking, we are telling the kernel "when an I/O
operation that I request cannot be completed without putting the process to
sleep, do not put the process to sleep, but return an error instead.
(2). although read() Back now , but httpd And keep sending read() Check kernel : Whether the data has been successfully copied to kernel
buffer 了 ? This is called polling (polling). Every poll , As long as the kernel doesn't have the data ready ,read() Return error message EWOULDBLOCK.
(3). until kernel buffer Medium data preparation completed , Do not return when polling again EWOULDBLOCK, But will httpd block , To wait for data to be copied to app buffer.
(4).httpd Not blocked at stage , But it's going to keep sending read() polling . Being blocked , take cpu Give the kernel the data copy reach app buffer.

As shown below :



<>

2.3 I/O Multiplexing Model


Called multiplex IO Model or IO multiplexing , It means you can check multiple IO Waiting status . There are three IO Reuse model :select,poll and epoll. In fact, they are all functions , Used to monitor the data readiness of the specified file descriptor , Ready means that a system call is no longer blocked , For example, for read() In terms of , It's data ready. It's ready . Ready to read , Whether it is writable and abnormal , The readability condition includes whether the data is ready . When it's ready , Process will be notified , Process sends system call to data operation again , as read(). therefore , These three functions only deal with the problem of whether the data is ready and how to inform the process . You can combine these functions with blocking and nonblocking IO Mode use , For example, when it is set to non blocking ,select()/poll()/epoll Will not block on the corresponding descriptor , Process calling function / The thread will not be blocked .


select() and poll() almost , Their monitoring and notification methods are the same , It's just poll() Be smarter , So here only select() Simple introduction of monitoring single file request IO multiplexing , As for the more specific , Monitoring multiple files and epoll How , At the end of this paper .


(1). When you want to load a file , If httpd To initiate read() system call , In case of blocking or non blocking , that read() Whether to return depends on whether the data is ready or not , Can we take the initiative to monitor whether the data is ready kernel
buffer Chinese , Or whether it can be monitored send buffer Is there any new data in ? This is it. select()/poll()/epoll The role of .

(2). When using select() Time ,httpd Launch a select call , then httpd The process is select()" block ". Since it is assumed that only one request file is monitored here , therefore select() When the data is ready to arrive kernel
buffer Wake up directly in middle time httpd process . The reason for blocking is to put double quotes , Because select() With time interval option available to control blocking duration , If this option is set to 0, be select No blocking , It means to return immediately but keep polling to check whether it is ready , It can also be set to block permanently .

(3). When select() When the monitoring object of is ready , Notify ( Polling ) Or wake up ( Blocking condition )httpd process ,httpd Relaunch read() system call , The data will be kernel
buffer Copy to app buffer Middle Union read() success .
(4).httpd Initiate a second system call ( Namely read()) Later blocked ,CPU All to the kernel for copying data to app buffer.

(5). about httpd When only one connection is processed ,IO Reuse model is not as good as blocking
I/O Model , Because it makes two system calls ( Namely select() and read()), Even in the case of polling, it will be consumed continuously CPU. however IO The advantage of multiplexing is that it can monitor multiple file descriptors at the same time .

As shown in the figure :



More detailed description , See the end of this article .

<>

2.4 Signal-driven I/O Model


Signal driven IO Model . When the signal drive function is turned on , Start with a system call for signal processing , as sigaction(), This system call will return immediately . But when the data is ready , Will send SIGIO signal , When the process receives this signal, it knows that the data is ready , The system call of operation data is initiated , as read().

After a system call to signal processing is initiated , Process will not be blocked , But in read() Transfer data from kernel buffer Copy to app buffer Time , The process is blocked . As shown in the figure :



<>

2.5 Asynchronous I/O Model


Asynchronous IO Model . When set to asynchronous IO Model time ,httpd Initiate asynchronous system call first ( as aio_read(),aio_write() etc. ), And return immediately . This asynchronous system call tells the kernel , Not just data , And copy the data to app
buffer in .

httpd From return , Until data is copied to app buffer The end will not be blocked . When data is copied to app buffer end , A signal notification will be sent httpd process .

As shown in the figure :



It looks asynchronous , But notice , Copying kernel buffer Data to app
buffer Middle time is needed CPU Participating , It means unimpeded httpd Can compete with asynchronous calling function CPU. If the concurrency is large ,httpd The more connections you may have access to ,CPU The more serious the dispute , The slower the asynchronous function returns a success signal . If we can't deal with this problem well , asynchronous IO The model is not necessarily good .

<>

2.6 synchronization IO And asynchronous IO, Distinction between blocking and non blocking


block , Non blocking ,IO multiplexing , Signal drivers are all synchronous IO Model . Because the system call of operation data is initiated ( As in this article read()) The process is blocked . this 里要注意,虽然在加载数据到kernel
buffer的数据准备过程中可能阻塞,可能不阻塞,但kernel buffer才是read()函数的操作对象,同步的意思是让kernel buffer和app
buffer数据同步.显然,在保持kernel buffer和app buffer同步的过程中,进程必须被阻塞,否则read()就变成异步的read().

只有异步IO模型才是异步的,因为发起的异步类的系统调用(如aio_read())已经不管kernel
buffer何时准备好数据了,就像后台一样read一样,aio_read()可以一直等待kernel
buffer中的数据,在准备好了之后,aio_read()自然就可以将其复制到app buffer.

如图:



<>

3.select(),poll()和epoll


前面说了,这三个函数是文件描述符状态监控的函数,它们可以监控一系列文件的一系列事件,当出现满足条件的事件后,就认为是就绪或者错误.事件大致分为3类:可读事件,可写事件和异常事件.它们通常都放在循环结构中进行循环监控.


select()和poll()函数处理方式的本质类似,只不过poll()稍微先进一点,而epoll处理方式就比这两个函数先进多了.当然,就算是先进分子,在某些情况下性能也不一定就比老家伙们强.

<>

3.1 select() & poll()


首先,通过FD_SET宏函数创建待监控的描述符集合,并将此描述符集合作为select()函数的参数,可以在指定select()函数阻塞时间间隔,于是select()就创建了一个监控对象.

除了普通文件描述符,还可以监控套接字,因为套接字也是文件,所以select()也可以监控套接字文件描述符,例如recv
buffer中是否收到了数据,也即监控套接字的可读性,send
buffer中是否满了,也即监控套接字的可写性.select()默认最大可监控1024个文件描述符.而poll()则没有此限制.

select()的时间间隔参数分3种:
(1).设置为指定时间间隔内阻塞,除非之前有就绪事件发生.
(2).设置为永久阻塞,除非有就绪事件发生.
(3).设置为完全不阻塞,即立即返回.但因为select()通常在循环结构中,所以这是轮询监控的方式.


当创建了监控对象后,由内核监控这些描述符集合,于此同时调用select()的进程被阻塞(或轮询).当监控到满足就绪条件时(监控事件发生),select()将被唤醒(或暂停轮询),于是select()返回
满足就绪条件的描述符数量
,之所以是数量而不仅仅是一个,是因为多个文件描述符可能在同一时间满足就绪条件.由于只是返回数量,并没有返回哪一个或哪几个文件描述符,所以通常在使用select()之后,还会在循环结构中的if语句中使用宏函数FD_ISSET进行遍历,直到找出所有的满足就绪条件的描述符.最后将描述符集合通过指定函数拷贝回用户空间,以便被进程处理.

监听描述符集合的大致过程如下图所示,其中select()只是其中的一个环节:



大概描述下这个循环监控的过程:

(1).首先通过FD_ZERO宏函数初始化描述符集合.图中每个小方格表示一个文件描述符.
(2).通过FD_SET宏函数创建描述符集合,此时集合中的文件描述符都被打开,也就是稍后要被select()监控的对象.

(3).使用select()函数监控描述符集合.当某个文件描述符满足就绪条件时,select()函数返回集合中满足条件的数量.图中标黄色的小方块表示满足就绪条件的描述符.
(4).通过FD_ISSET宏函数遍历整个描述符集合,并将满足就绪条件的描述符发送给进程.同时,使用FD_CLR宏函数将满足就绪条件的描述符从集合中移除.
(5).进入下一个循环,继续使用FD_SET宏函数向描述符集合中添加新的待监控描述符.然后重复(3),(4)两个步骤.

如果使用简单的伪代码来描述:
FD_ZERO for() { FD_SET() select() if(){ FD_ISSET() FD_CLR() } writen() }
以上所说只是一种需要循环监控的示例,具体如何做却是不一定的.不过从中也能看出这一系列的流程.

<>

3.2 epoll

epoll比poll(),select()先进,考虑以下几点,自然能看出它的优势所在:


(1).epoll_create()创建的epoll实例可以随时通过epoll_ctl()来新增和删除感兴趣的文件描述符,不用再和select()每个循环后都要使用FD_SET更新描述符集合的数据结构.

(2).在epoll_create()创建epoll实例时,还创建了一个epoll就绪链表list.而epoll_ctl()每次向epoll实例添加描述符时,还会注册该描述符的回调函数.当epoll实例中的描述符满足就绪条件时将触发回调函数,被移入到就绪链表list中.

(3).当调用epoll_wait()进行监控时,它只需确定就绪链表中是否有数据即可,如果有,将复制到用户空间以被进程处理,如果没有,它将被阻塞.当然,如果监控的对象设置为非阻塞模式,它将不会被阻塞,而是不断地去检查.

也就是说,epoll的处理方式中,根本就无需遍历描述符集合.