This paper mainly explains thatTCP During connection, Operation on socket in each stage, I hope I can understand what socket is for people without network programming foundation, The role played helps. If an error is found, Please point out



1. Full socket format{protocol,src_addr,src_port,dest_addr,dest_port}.

This is often referred to as the quintuple of a socket. amongprotocol Yes, yes.TCP stillUDP Connect, The rest specify the source address separately, Source port, Destination address, Target port. But how did it come about?

2.TCP The protocol stack maintains twosocket Buffer:send buffer andrecv buffer.

To passTCP The data sent by the connection is copied tosend buffer, Probably from user spaceapp buffer Copied in, Or from the kernelkernel
buffer Copied in, The process of copying in is throughsend() Function completed, Because it can also be usedwrite() Function write data, So it's also called writing data, Correspondingsend
buffer There's another namewrite buffer. Howeversend() Function ratiowrite() Functions are more efficient.

The final data flows out through the network card, thereforesend
buffer Data in need to be copied to the network card. Because one end is memory, One end is network card device, Can be used directlyDMA Copy by, No needCPU Participation. In other words,send
buffer Data in passesDMA Copy to network card and transfer toTCP The other end of the connection: receiving end.

When passedTCP When connecting to receive data, The data must flow in through the network card first, And then againDMA Copy torecv buffer in, Re passrecv() Function to transfer data fromrecv
buffer Copy into theapp buffer in.

The general process is as follows:

3. Two sockets: Listening and connected sockets.

Listening socket is when the service process reads the configuration file, Resolve the address to listen to from the configuration file, port, Then passsocket() Function created, And then throughbind() Function to bind the listening socket to the corresponding address and port. subsequently, process/ The thread can passlisten() Function to listen on this port( Strictly speaking, monitoring this monitoring socket).

Connected socket is listening toTCP After connecting the request and shaking hands three times, adoptaccept() Socket returned by function, Follow up process/ The thread can use the connected socket and the client toTCP Signal communication.

In order to distinguishsocket() Function sumaccept() Two socket descriptors returned by function, Some people uselistenfd andconnfd Indicates listening socket and connected socket respectively, Vivid, This is occasionally used below.

Here are the functions, Analyze these functions, It's also connecting, Process of disconnection.


Specific process analysis of connection

Following chart:


socket() function

socket() The socket file descriptor function generates a socket file descriptor for communicationsockfd(socket() creates an endpoint for
communication and returns a descriptor). This socket descriptor can be used as a laterbind() Binding object for function.


bind() function

Service program analyzes configuration file through, Resolve the address and port you want to listen to, Plus you can get throughsocket() Socket generated by functionsockfd, Can be usedbind() Function to bind the socket to the address and port combination to listen to"addr:port" upper. A socket with a port bound can be used aslisten() Listener for function.

Socket with address and port bound has source address and port( Source for the server itself), Plus the protocol type specified in the configuration file, There are five tuples3 Tuple. Namely:
however, It is common to see that some service programs can configure to listen to multiple addresses, Port implementation multiple instances. This is actually through many timessocket()+bind() System call generates and binds multiple sockets.


listen() Function sumconnect() function

Seeing the name of a thing one thinks of its function,listen() Function is listening has passedbind() Boundaddr+port Of. After listening, Socket fromCLOSE State transition toLISTEN state, So the socket can be provided externallyTCP Connected window.

andconnect() Function to initiate a connection request to a listening socket, That is, to initiateTCP Three handshake process of. It can be seen from here, Connection requester( Such as client) Will be usedconnect() function, Of course, Initiatingconnect() before, The connection initiator also needs to generate asockfd, And it is likely to use socket with random port bound. Sinceconnect() Function to initiate a connection to a socket, Nature in useconnect() Function with connected destination, I.e. destination address and destination port, This is the address and port bound on the listening socket of the server. meanwhile, It also has its own address and port, For the server, This is the source address and port of the connection request. Therefore,TCP The sockets at both ends of the connection have become the complete format of the quintuples.


depth analysislisten()

More detailslisten() function. If you listen to multiple addresses+ port, You need to listen to multiple sockets, So now I'm in charge of the monitoring process/ Thread will adoptselect(),poll() To poll these sockets( Of course, It can also be usedepoll() Pattern), When only one socket is monitored, These modes are also used to poll, Justselect() orpoll() There is only one socket descriptor of interest.
Regardless of useselect() stillpoll() Pattern( As forepoll We don't need to talk about the different monitoring methods),
In process/ thread( monitor) In the process of monitoring, It's orpoll() upper. Until there's data(SYN information) Write to what it listens forsockfd in( Namelyrecv
buffer), Kernel wake up( Be careful notapp Process wake up, becauseTCP Three handshakes and four waves are done by the kernel in kernel space, No user space involved) And willSYN Copy data tokernel
buffer We need to deal with it( For example, judgment.SYN Is it reasonable?), And prepareSYN+ACK data, This data needs to be collected fromkernel buffer Copy insend
buffer in, Copy in the network card and send it out. The connection to the unfinished queue(syn
queue) Create a new project for this connection in, And set toSYN_RECV state. Then use it againselect()/poll() Way to monitor socketslistenfd, Until data is written to this againlistenfd in, Kernel wakes up again, If the data written this time isACK information, It means that a client sends it to the server kernelSYN Response, So copy the data tokernel
buffer After some treatment, Move the corresponding items in the connection incomplete queue to the connection completed queue(accept queue/established
queue), And set toESTABLISHED state, If it's not received this timeACK, It must beSYN, New connection request, So it's the same process as above, Put in the connection incomplete queue. For connections that have been placed in the completed queue, Will wait for kernel to passaccept() Function to consume
( Initiated by a user space processaccept() system call, Consumption operation completed by kernel), Just go throughaccept() Over connection, The connection will be removed from the completed queue, It also meansTCP It has been established, The user space processes at both ends can transfer real data through this connection, Until useclose() orshutdown() When the connection is closed4 Second wave, The kernel is no longer needed in the middle. That's how the monitor handles the whole thingTCP Loop process of connection
In other words,listen() The function also maintains two queues: Connection incomplete queue(syn queue) And connection completed queues(accept queue)
. When a listener receives a message from a clientSYN And replied.SYN+ACK after, An entry about this client will be created at the end of the unfinished connection queue, And set its status toSYN_RECV. Obviously, This entry must contain information about the address and port of the client( May behash Past, I'm not sure). When the server receives the message sent by the client againACK After information, By analyzing the data, the listener thread knows which item in the unfinished connection queue this message is returned to, Move this item to the completed connection queue, And set its status toESTABLISHED, Finally, wait for the kernel to useaccept() Function to consume and receive this connection. From then on, The kernel is temporarily out of the stage, Until4 Second wave.

When the unfinished connection queue is full, Listener blocked no longer receives new connection requests, And passselect()/poll() Wait for two queues to trigger writable events. When the completed connection queue is full, The listener will not receive new connection requests, meanwhile, The action that is preparing to move into the completed connection queue is blocked. stayLinux
2.2 before,listen() Function has abacklog Parameters, Used to set the maximum total length of these two queues( There's actually only one queue, But there are two states, See below." Little knowledge"), fromLinux
2.2 start, This parameter only indicates the completed queue(accept
queue) Maximum length of, and/proc/sys/net/ipv4/tcp_max_syn_backlog Used to set the unfinished queue(syn queue/syn
backlog) Maximum length of./proc/sys/net/core/somaxconn Hard limit the maximum length of completed queues, Default is128, Ifbacklog Parameter greater thansomaxconn, bebacklog Will be truncated to this hard limit.
When a connection in the queue is completedaccept() after, ExpressTCP Connection established, This connection will use its ownsocket buffer Data transmission with client
. thissocket buffer And monitoring socketsocket buffer It's all for storageTCP collect, Data sent, But their meaning is no longer the same: Listening on socketsocket
buffer admit of only interpretationTCP During connection requestsyn andack data; Just establishedTCP Connectedsocket
buffer The main stored content is transmitted at both ends" formal" data, For example, response data built by the server, Client initiatedHttp Request data.
Little knowledge: two typesTCP socket Actually, There are two different types ofTCP Socket implementation. The two types of queues described above areLinux
2.2 One of the following. There is another kind.(BSD Derivation) Only one queue is used for socket type of, In this single queue3 All connections during handshake, But each connection in the queue has two states:syn-recv andestablished.

Recv-Q andSend-Q Explanation

netstat ImperativeSend-Q andRecv-Q The list showssocket buffer Related content, Below isman netstat Explanation.
Recv-Q Established: The count of bytes not copied by the user program
connected tothis socket. Listening: Since Kernel 2.6.18 this column contains
the current syn backlog. Send-QEstablished: The count of bytes not acknowledged
by the remote host. Listening: Since Kernel 2.6.18 this column contains the
maximum sizeof the syn backlog.
For listening socket,Recv-Q Represents the currentsyn backlog, Piled upsyn Number of messages, That is, the current number of connections in the unfinished queue,Send-Q It means thatsyn
backlog Maximum value, That is to say, the maximum number of connections in the unfinished connection queue;
For establishedtcp Connect,Recv-Q The list showsrecv buffer The size of data not copied by user process in
,Send-Q The list shows that the remote host has not returnedACK Message data size.

Why the distinction has been establishedTCP Connected socket and listening socket, Because the sockets in these two states are differentsocket
buffer, Listening socket pays more attention to the length of queue, Just buildTCP Connected sockets pay more attention to, Data size sent.
[[email protected] ~]# netstat -tnl Active Internet connections (only servers) Proto
Recv-Q Send-Q Local Address Foreign Address State tcp 0 0*
LISTEN tcp 0 0* LISTEN tcp6 0 0 :::80 :::* LISTEN tcp6 0 0
:::22 :::* LISTEN tcp6 0 0 ::1:25 :::* LISTEN [[email protected] ~]# ss -tnl State Recv
-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 *:22 *:* LISTEN 0
100 *:* LISTEN 0 128 :::80 :::* LISTEN 0 128 :::22 :::* LISTEN 0
100 ::1:25 :::*

Be careful,Listen Socket in state,netstat OfSend-Q andss ImperativeSend-Q Columns have different values, becausenetstat The maximum length of the unfinished queue is not written at all. therefore, Determine whether there is any free position in the queue to receive the newtcp On connection request, Should be used as much as possibless Command, notnetstat.


syn flood Influence

in addition, If the listener sendsSYN+ACK after, The client can't receive the returnedACK news, The monitor will beselect()/poll() Set timeout wake up, And resend it to the clientSYN+ACK news, Prevent this message from being lost in the vast network. however, There's a problem with this reissue, If the client callsconnect() Time forgery source address, So the listener repliedSYN+ACK The message must not reach the host of the other party, In other words, The monitor will be lateACK news, So it's resendSYN+ACK. But whether it's a monitor, becauseselect()/poll() The set timeout is woken up again and again, Or copy data in again and againsend
buffer, All this timeCPU Participating, Andsend
buffer MediumSYN+ACK And copy in the network card( This timeDMA Copy, UnwantedCPU). If, This client is an attacker, Thousands of them have been sent continuously, Ten thousandSYN, The monitor almost collapsed, The network card will be blocked seriously. This is what we callsyn
flood attack.

flood There are many ways, for example, narrowlisten() Maximum length of two queues maintained, Reduce retransmissionsyn+ack Number of times, Increase retransmission interval, Reduce receiptack Wait timeout for, Usesyncookie etc. But direct modificationtcp None of the options is good for performance and efficiency. Therefore, it is extremely important to filter packets before the connection reaches the listener thread要的手段.

















send()函数是将数据从app buffer复制到send buffer中(当然,也可能直接从内核的kernel
buffer中复制),recv()函数则是将recv buffer中的数据复制到app






1.关闭写.此时将无法向send buffer中再写数据,send buffer中已有的数据会一直发送直到完毕.
2.关闭读.此时将无法从recv buffer中再读数据,recv buffer中已有的数据只能被丢弃.
3.关闭读和写.此时无法读,无法写,send buffer中已有的数据会发送直到完毕,但recv buffer中已有的数据将被丢弃.






对于监听进程/线程来说,每次重用的套接字被称为监听桶(listener bucket),即每个监听套接字都是一个监听桶.