<>

1. Basics

IntroducingIO Before model, First pairio Of a piece of data while waiting" experience" Explain. Pictured:



When a program or an existing process/ thread( In the future, it will be regarded as the process without distinction) When a piece of data is needed, It can only be accessed in its own memory in user space, modify, This memory is calledapp
buffer. Assume the required data is on disk, Then the process must first initiate the relevant system call, Inform kernel to load files on disk. But normally, Data can only be loaded into the kernel's buffer, Let's call itkernel
buffer. Data loading tokernel buffer after, Also copy data toapp buffer. Here we are. The process can access the data, Revised.

Now there are a few questions to be addressed.

(1). Why can't data be loaded directly intoapp buffer What about?

In fact, it can, Some programs or hardware to improve efficiency and performance, Can realize the function of kernel bypass, Avoiding kernel participation, Directly on the storage device andapp
buffer Data transfer between, for exampleRDMA Technology needs to implement such kernel bypass function.

however, In the most common and most cases, For safety and stability, Data must be copied into kernel space firstkernel buffer, Copy it toapp
buffer, To prevent processes from crashing into kernel space.

(2). The data copying process mentioned above, Is the copy the same way?

Dissimilarity. Today's storage devices( Including NIC) Basically all supportDMA operation. What is?DMA(direct memory
access, Direct memory access)? To put it simply, The data interaction between memory and device can be transmitted directly, No longer need a computerCPU participate in, But through the chip on the hardware( It can be simply regarded as a smallcpu) Control.


hypothesis, Storage device does not supportDMA, So the transfer of data between memory and storage device, Must pass theCPU Calculate which address to get data from, What addresses are copied to each other, How much data to copy in( How many data blocks, Where are the data blocks) Wait, Only one data transfer,CPU A lot of things to do. andDMA That's what freed the computerCPU, Let it handle other tasks.

besideskernel buffer andapp buffer Replication between, This is the data transmission of two memory spaces, Only byCPU To control.

therefore, Loading hard disk data tokernel buffer The process isDMA Copy Mode, And fromkernel buffer reachapp buffer The process isCPU Copy mode of participation.

(3). If the data is to passTCP What to do when connecting and transmitting?

for example,web Service response data to client, Need to passTCP Connection transfer to client.

TCP/IP The stack maintains two buffers:send buffer andrecv buffer, They are collectively calledsocket
buffer. Need to passTCP Data transmitted by connection, Need to copy tosend
buffer, Then copy it to the network card and transmit it through the network. If passedTCP Connection received data, Data first enters through the network cardrecv buffer, Then copied to user spaceapp buffer.

same, When data is copied tosend buffer Or fromrecv buffer Copy toapp buffer Time, yesCPU Participating copies. fromsend
buffer Copy to or from network card torecv buffer Time, yesDMA Copy of operation mode.

As shown in the figure below, It is throughTCP The process of connecting and transferring data.



(4). Network data must come fromkernel buffer Copy toapp buffer Copy it tosend buffer Do you?

No. If the process does not need to modify the data, Send it directly toTCP The other end of the connection, You don't have tokernel buffer Copy toapp buffer, Instead, copy directly tosend
buffer. This is zero replication.

for examplehttpd When no information needs to be accessed and modified, Copy the original local data toapp buffer Copy the original local copy tosend
buffer And then it's transmitted, But actually copy toapp buffer It can be omitted. Using zero replication technology, Can reduce one copy process, Enhance efficiency.

Of course, There are many ways to realize zero copy technology, See my other zero copy article: Zero copy(zero copy) technology
<http://www.cnblogs.com/f-ck-need-u/p/7615914.html>.

The following is thehttpd Complete data operation process when processing file class requests.



General explanation: Client initiates a request for a file, adoptTCP Connect, Request data entryTCP Ofrecv buffer, Re passrecv() Function to read data intoapp
buffer, herehttpd The working process analyzes the data, Know that a file is requested, So a system call is initiated( For example, to read this file, Launchread()), The kernel loads the file, Data copied from disk tokernel
buffer Copy it toapp
buffer, herehttpd It's time to start building response data, Data may be modified, For example, add a field to the response header, Finally, copy the modified or unmodified data( for examplesend() function) reachsend
buffer in, Re passTCP Connection transfer to client.

<>

2. I/O Model

So-calledIO Model, It describes the emergence ofI/O The state of the process while waiting and how data is processed. The state surrounding the process, Data ready tokernel buffer Until thenapp
buffer Two phases of. Where data is copied tokernel buffer The process is called data preparation, Data fromkernel buffer Copy toapp buffer The process is called
Data replication phase. Remember these two concepts, Later descriptionI/O These two concepts are always used in the model.


This article takeshttpd ProcessTCP Connection mode processing local file as an example, Please ignore.httpd Is this really true, That kind of function, Please ignore it.TCP Details of connection processing data, This is just an example for easy explanation. in addition, This article uses local files asI/O The object of the model is not very suitable, The play is on the socket, If you want to see the treatmentTCP/UDP In process socketI/O Model, After reading this, Combined with another article of mine"
I don't knowsocket andTCP Connection process <http://www.cnblogs.com/f-ck-need-u/p/7623252.html>
" To re understandI/O Model.

Explain again, Data transfer from hardware to memory is not requiredCPU Participating, And memory to memory data transfer is neededCPU Participating.

<>

2.1 Blocking I/O Model

Pictured:



Suppose the client initiatesindex.html File request for,httpd Need toindex.html Data is loaded from the disk to its ownhttpd app
buffer in, Then copy tosend buffer Send out in.

But inhttpd Want to loadindex.html Time, It first checks its ownapp
buffer Are there anyindex.html Corresponding data, No system call was initiated to let the kernel load the data, for exampleread(), The kernel will check itself firstkernel
buffer Are there anyindex.html Corresponding data, Without, Load from disk, Then prepare the data tokernel buffer, Copy it toapp
buffer in, Finally beinghttpd Process processing.

If usedBlocking I/O Model:

(1). When set toblocking i/o Model,httpd It's blocked from to.
(2). Only when data is copied toapp buffer After completion, Or something went wrong,httpd To be awakened to deal with itapp buffer Data in.
(3).cpu There will be two context switches: User space to kernel space to user space.
(4). Because a copy of a phase is not requiredCPU Participating, So in the process of data preparation,cpu Can handle tasks of other processes.
(5). Phase data replication needsCPU participate in, takehttpd block, To some extent, Helps improve its copy speed.
(6). This is the easiest, The simplestIO Pattern.

Following chart:



<>

2.1 Non-Blocking I/O Model

(1). When set tonon-blocking Time,httpd System call initiated for the first time( asread()) after, Return an error value immediatelyEWOULDBLOCK(
As forread() Whether to return when reading a normal fileEWOULDBLOCK Please ignore. after allI/O The model is mainly for socket files, Just whenread() yesrecv() Okay
), Rather than lethttpd put to sleep.UNP This is exactly what is described in.
When we set a socket to be nonblocking, we are telling the kernel "when an I/O
operation that I request cannot be completed without putting the process to
sleep, do not put the process to sleep, but return an error instead.
(2). althoughread() Back now, buthttpd And keep sendingread() Check kernel: Whether the data has been successfully copied tokernel
buffer 了? This is called polling(polling). Every poll, As long as the kernel doesn't have the data ready,read() Return error messageEWOULDBLOCK.
(3). Untilkernel buffer Medium data preparation completed, Do not return when polling againEWOULDBLOCK, But willhttpd block, To wait for data to be copied toapp buffer.
(4).httpd Not blocked at stage, But it's going to keep sendingread() polling. Being blocked, takecpu Give the kernel the datacopy reachapp buffer.

Following chart:



<>

2.3 I/O Multiplexing Model


Called multiplexIO Model orIO multiplexing, It means you can check multipleIO Waiting status. There are three kinds.IO Reuse model:select,poll andepoll. In fact, they are all functions, Used to monitor the data readiness of the specified file descriptor, Ready means that a system call is no longer blocked, For example, forread() Speaking, It's data ready. It's ready. Ready to read, Whether it is writable and abnormal, The readability condition includes whether the data is ready. When it's ready, Process will be notified, Process sends system call to data operation again, asread(). therefore, These three functions only deal with the problem of whether the data is ready and how to inform the process. These functions can be combined with blocking and non blockingIO Mode use, For example, when it is set to non blocking,select()/poll()/epoll Will not block on the corresponding descriptor, Process calling function/ The thread will not be blocked.


select() andpoll() Almost, Their monitoring and notification methods are the same, Justpoll() Be smarter, So here onlyselect() Simple introduction of monitoring single file requestIO multiplexing, As for the more specific, Monitoring multiple files andepoll Way, At the end of this paper.


(1). When you want to load a file, Ifhttpd To initiateread() system call, In case of blocking or non blocking, thatread() Whether to return depends on whether the data is ready or not, Can we take the initiative to monitor whether the data is readykernel
buffer What about China? Or whether it can be monitoredsend buffer Is there any new data in? This is it.select()/poll()/epoll Role.

(2). When usedselect() Time,httpd Launch aselect call, Then?httpd Process beingselect()" block". Because it is assumed that only one request file is monitored here, thereforeselect() When the data is ready to arrivekernel
buffer Wake up directly in middle timehttpd process. The reason for blocking is to put double quotes, Becauseselect() With time interval option available to control blocking duration, If this option is set to0, beselect Non blocking, It means to return immediately but keep polling to check whether it is ready, It can also be set to block permanently.

(3). Whenselect() When the monitoring object of is ready, Will inform( Polling situation) Or wake up( Congestion condition)httpd process,httpd Re launchread() system call, The data will bekernel
buffer Copy toapp buffer Sino Unionread() Success.
(4).httpd Initiate a second system call( Namelyread()) After being blocked,CPU All to the kernel for copying data toapp buffer.

(5). abouthttpd When only one connection is processed,IO Reuse model is not as good asblocking
I/O Model, Because it makes two system calls( Namelyselect() andread()), Even in the case of polling, it will be consumed continuouslyCPU. howeverIO The advantage of multiplexing is that it can monitor multiple file descriptors at the same time.

Pictured:



More detailed description, See the end of this article.

<>

2.4 Signal-driven I/O Model


Signal drivenIO Model. When the signal drive function is turned on, Start with a system call for signal processing, assigaction(), This system call will return immediately. But when the data is ready, Will sendSIGIO signal, When the process receives this signal, it knows that the data is ready, The system call of operation data is initiated, asread().

After a system call to signal processing is initiated, Process will not be blocked, But inread() Data fromkernel buffer Copy toapp buffer Time, The process is blocked. Pictured:



<>

2.5 Asynchronous I/O Model


AsynchronousIO Model. When set to asynchronousIO Model time,httpd Initiate asynchronous system call first( asaio_read(),aio_write() etc.), And return immediately. This asynchronous system call tells the kernel, Not just data, And copy the data toapp
buffer in.

httpd From return, Until data is copied toapp buffer The end will not be blocked. When data is copied toapp buffer End, A signal notification will be senthttpd process.

Pictured:



It looks asynchronous, But notice, Copykernel buffer Data toapp
buffer Middle time is neededCPU Participating, It means unimpededhttpd Can compete with asynchronous calling functionCPU. If the concurrency is large,httpd The more connections you may have access to,CPU The more serious the dispute, The slower the asynchronous function returns a success signal. If we can't deal with this problem well, asynchronousIO The model is not necessarily good.

<>

2.6 synchronizationIO And asynchronousIO, Distinction between blocking and non blocking


block, Non blocking,IO multiplexing, Signal drivers are all synchronousIO Model. Because the system call of operation data is initiated( As in this articleread()) The process is blocked. this里要注意,虽然在加载数据到kernel
buffer的数据准备过程中可能阻塞,可能不阻塞,但kernel buffer才是read()函数的操作对象,同步的意思是让kernel buffer和app
buffer数据同步.显然,在保持kernel buffer和app buffer同步的过程中,进程必须被阻塞,否则read()就变成异步的read().

只有异步IO模型才是异步的,因为发起的异步类的系统调用(如aio_read())已经不管kernel
buffer何时准备好数据了,就像后台一样read一样,aio_read()可以一直等待kernel
buffer中的数据,在准备好了之后,aio_read()自然就可以将其复制到app buffer.

如图:



<>

3.select(),poll()和epoll


前面说了,这三个函数是文件描述符状态监控的函数,它们可以监控一系列文件的一系列事件,当出现满足条件的事件后,就认为是就绪或者错误.事件大致分为3类:可读事件,可写事件和异常事件.它们通常都放在循环结构中进行循环监控.


select()和poll()函数处理方式的本质类似,只不过poll()稍微先进一点,而epoll处理方式就比这两个函数先进多了.当然,就算是先进分子,在某些情况下性能也不一定就比老家伙们强.

<>

3.1 select() & poll()


首先,通过FD_SET宏函数创建待监控的描述符集合,并将此描述符集合作为select()函数的参数,可以在指定select()函数阻塞时间间隔,于是select()就创建了一个监控对象.

除了普通文件描述符,还可以监控套接字,因为套接字也是文件,所以select()也可以监控套接字文件描述符,例如recv
buffer中是否收到了数据,也即监控套接字的可读性,send
buffer中是否满了,也即监控套接字的可写性.select()默认最大可监控1024个文件描述符.而poll()则没有此限制.

select()的时间间隔参数分3种:
(1).设置为指定时间间隔内阻塞,除非之前有就绪事件发生.
(2).设置为永久阻塞,除非有就绪事件发生.
(3).设置为完全不阻塞,即立即返回.但因为select()通常在循环结构中,所以这是轮询监控的方式.


当创建了监控对象后,由内核监控这些描述符集合,于此同时调用select()的进程被阻塞(或轮询).当监控到满足就绪条件时(监控事件发生),select()将被唤醒(或暂停轮询),于是select()返回
满足就绪条件的描述符数量
,之所以是数量而不仅仅是一个,是因为多个文件描述符可能在同一时间满足就绪条件.由于只是返回数量,并没有返回哪一个或哪几个文件描述符,所以通常在使用select()之后,还会在循环结构中的if语句中使用宏函数FD_ISSET进行遍历,直到找出所有的满足就绪条件的描述符.最后将描述符集合通过指定函数拷贝回用户空间,以便被进程处理.

监听描述符集合的大致过程如下图所示,其中select()只是其中的一个环节:



大概描述下这个循环监控的过程:

(1).首先通过FD_ZERO宏函数初始化描述符集合.图中每个小方格表示一个文件描述符.
(2).通过FD_SET宏函数创建描述符集合,此时集合中的文件描述符都被打开,也就是稍后要被select()监控的对象.

(3).使用select()函数监控描述符集合.当某个文件描述符满足就绪条件时,select()函数返回集合中满足条件的数量.图中标黄色的小方块表示满足就绪条件的描述符.
(4).通过FD_ISSET宏函数遍历整个描述符集合,并将满足就绪条件的描述符发送给进程.同时,使用FD_CLR宏函数将满足就绪条件的描述符从集合中移除.
(5).进入下一个循环,继续使用FD_SET宏函数向描述符集合中添加新的待监控描述符.然后重复(3),(4)两个步骤.

如果使用简单的伪代码来描述:
FD_ZERO for() { FD_SET() select() if(){ FD_ISSET() FD_CLR() } writen() }
以上所说只是一种需要循环监控的示例,具体如何做却是不一定的.不过从中也能看出这一系列的流程.

<>

3.2 epoll

epoll比poll(),select()先进,考虑以下几点,自然能看出它的优势所在:


(1).epoll_create()创建的epoll实例可以随时通过epoll_ctl()来新增和删除感兴趣的文件描述符,不用再和select()每个循环后都要使用FD_SET更新描述符集合的数据结构.

(2).在epoll_create()创建epoll实例时,还创建了一个epoll就绪链表list.而epoll_ctl()每次向epoll实例添加描述符时,还会注册该描述符的回调函数.当epoll实例中的描述符满足就绪条件时将触发回调函数,被移入到就绪链表list中.

(3).当调用epoll_wait()进行监控时,它只需确定就绪链表中是否有数据即可,如果有,将复制到用户空间以被进程处理,如果没有,它将被阻塞.当然,如果监控的对象设置为非阻塞模式,它将不会被阻塞,而是不断地去检查.

也就是说,epoll的处理方式中,根本就无需遍历描述符集合.