1.1 Background notes: The whole process of network data transmission

In every networkio process, Data must go through several caches, Send it out again. Following chart:

Browser on the right, Left side ishttpd Server as an example.

Whenhttpd Service received from browserindex.html On file request, Responsible for processing requestshttpd Child process/ Thread always initiates system call first, Let the kernel beindex.html Load from storage device. But it's loaded in a kernel space bufferkernel
buffer, Not directly to the process/ Memory area of thread. Because of the data transmission between memory device and storage device, No,CPU Participation, So this timeDMA operation.
* When the data is ready, Kernel wakeuphttpd Child process/ thread, Let it useread() Function to copy data to its own buffer, That is, in the pictureapp buffer. Here we areapp
buffer Data in, Is already a process/ thread, It can also be read, Modify and so on. Because this time it is usedCPU Replicated, So it will consumeCPU Resources. Because of this phase, we switch from kernel space to user space, So context switching.
When data modification is completed( Maybe I didn't do anything) after, As we think, It needs to respond to the browser, Which means to passTCP Connection transmission out. butTCP The stack has its own buffer, To send data through it, Data must be written to itsbuffer in, For the sendersend
buffer, For the recipientrecv buffer. Therefore, adoptwrite() Function to transfer data from theapp buffer Copy tosend
buffer. This time, too.CPU Replication in progress, So it will consumeCPU. Context switching is also possible.
* Non local data will eventually be transmitted through the network card, So use it againsend() Thesend
buffer The data in is handed over to the network card and transmitted through the network card. Because this time it's data transfer between memory and device, No,CPU Participation, So this time, tooDMA operation.
* When the response data is received by the network card of the host where the browser is located( Of course, Data is continuously transmitted), Transfer it toTCP Ofrecv buffer. This timeDMA operation.
* Data is continuously filled inrecv buffer in, But browsers don't have to read it, Instead, you need to notify the browser process to userecv() Function to transfer data fromread
buffer Take away. This timeCPU operation( Forgot to mark in the picture).
Need attention, abouthttpd End to speak, If the network speed is slow, andhttpd Child process/ The data that the thread needs to respond to is large enough( thansend buffer Still big), Likely to lead tosocket
buffer Fill up, Thenwrite() Function will returnEWOULDBLOCK orEAGAIN, Child process/ The thread will enter the waiting state.

On the browser side, If the browser process is slow to transfer data from thesocket buffer(recv buffer) Take away, Likely to lead tosocket buffer Be filled up.

Say againhttpd End network data" experience". Following chart:

Each process/ When a thread needs a piece of data, Always copy tokernel buffer, Copy it toapp buffer, Copy it tosocket
buffer, Finally, copy it to the network card. In other words, Always passing by4 Segment copy experience.

But think about it. Under normal circumstances, Data from storage device tokernel buffer It is necessary. fromsocket buffer reachNIC It's also necessary, But fromkernel
buffer reachapp
buffer Is it necessary? Process must access, Do you want to modify the data? Not always, Even forweb For service, If not to be modifiedhttp response message, Data can be completely free of user space. That is to say, no morekernel
buffer copy toapp buffer, This is the concept of zero replication.

The concept of zero replication is to avoid copying data in kernel space and user space. The main purpose is to reduce unnecessary copies, Avoid givingCPU Do a lot of data copy tasks.

notes: It's just normal, For example, some hardware can completeTCP/IP The work of protocol stack, Data may not pass throughsocket buffer, Directly inapp
buffer Data transfer between and hardware,RDMA Technology is realized on this basis.


1.2 zero-copy:mmap()

mmap() Function to map a file directly into the memory of a user program, Returns a pointer to the target area when the mapping succeeds. This memory space can be used as shared memory space between processes, The kernel can also directly operate this space.

After mapping files, No data will be copied to memory temporarily, Only when this memory is accessed, No data found, Page missing access is generated, UseDMA Operations copy data into this space. The data in this space can be copied directly tosocket
buffer in. So it's zero replication. Pictured:

The code is as follows:
#include <sys/mman.h> void *mmap(void *addr, size_t length, int prot, int
flags,int fd, off_t offset);

1.3 zero-copy:sendfile()

man Document description of this function:
sendfile() copies data between one file descriptor and another. Because this
copyingis done within the kernel, sendfile() is more efficient than the
combination of read(2) and write(2), which would require transferring data to
andfrom user space.

sendfile() Function to copy data with file descriptors: Describe the document directlyin_fd Data copied to file descriptorsout_fd, amongin_fd Data provider,out_fd Is the data receiver. The operation of file descriptors is performed in the kernel, No user space, So data doesn't need to be copied toapp
buffer, Zero replication enabled. Following chart

sendfile() The code of is as follows:
#include<sys/sendfile.h> ssize_t sendfile(int out_fd, int in_fd, off_t
*offset, size_t count);
howeversendfile Ofin_fd Must point to supportmmap Documents, It's a real document, And can not besocket, Pipeline and other documents. stayLinux
2.6.33 before, Also restrictout_fd Must be pointingsocket Descriptors for files, So people always think that it is specially used for network data copying. But fromLinux
2.6.33 start,out_fd Can be any file, And if it's a normal file, besendfile() Will reasonably modify the documentoffset.

withnginx Openedtcp_nopush Ofsendfile take as an example, When it openstcp_nopush After function,nginx Build the response header in user space first, And put insocket
send buffer in, And then tosender buffer The ID of a file to be loaded
( for example, Declare that I will read it latera.txt The data in the file is sent to you), These two parts are sent to the client first, Then load the disk file(sendfile Mode loading), Every time it's fullsend
buffer Just send it once, Until all data is sent.


1.4 zero-copy:splice()

man Document description of this function:
splice() moves data between two file descriptors without copying between
kernel address space and user address space.
It transfers up to len bytes of data from the file descriptor fd_in to the
file descriptor fd_out,where one of
thedescriptors must refer to a pipe.
splice() Function to move data between two file descriptors, And one of the descriptors must be a pipeline descriptor. Because there is no need tokernel buffer andapp
buffer Copy data between, So zero replication is implemented. Pictured:

notes: Because there must be a pipeline descriptor, So in the picture above, If tosocket File descriptor, So nostorage-->kernel buffer OfDMA Operational.

The code is as follows:
#define _GNU_SOURCE /* See feature_test_macros(7) */ #include <fcntl.h> ssize_t
splice(int fd_in, loff_t *off_in, int fd_out, loff_t *off_out, size_t len,
unsigned int flags);

1.5 zero-copy: tee()

man Document description of this function:
tee() duplicates up to len bytes of data from the pipe referred to by the file
descriptor fd_in to the pipe
referred to by the file descriptor fd_out. It does not consume the data that is
duplicatedfrom fd_in;
therefore, that data can be copied by a subsequent splice(2).

tee() Function to copy data between two pipeline descriptors. Due toin_fd Copy to another pipeout_fd Time, Don't think the data came fromin_fd Of, So after copying the data,in_fd Still availablesplice() Function to move data. Because there is no user space, So zero replication is implemented. Pictured:

Linux Lowertee Program is to usetee Function combinationsplice Function implemented, Pass the data firsttee() Copy function to pipeline, Reusesplice() Function to move data to another file descriptor.

The code is as follows:
#define _GNU_SOURCE /* See feature_test_macros(7) */ #include <fcntl.h> ssize_t
tee(int fd_in, int fd_out, size_t len, unsigned int flags);

1.6 Write time replication technology(copy-on-write,COW)

Parent processfork When generating child processes, Will copy all its memory pages. This leads to at least two problems: Consume a lot of memory; Copy operation time consuming. Especiallyfork After useexec When loading a new program, Because memory space will be initialized, So replication is almost redundant.

Usecopy-on-write technology, Make infork Do not copy memory pages when subprocesses, It's a shared memory page( In other words, The child process also points to the physical space of the parent process), Only when the subprocess needs to modify a certain piece of data, To copy this piece of data to your ownapp
buffer And make changes, Then this piece of data belongs to the private data of the subprocess, Free access, modify, copy. This enables zero replication to some extent, Even if some data blocks are copied, It's also being replicated in a process that's gradually needed.

Too many copies on write, A brief overview is about the above.