1. Introduction

In this tutorial, we’ll discuss why the Real-time Transport Protocol (RTP) uses UDP instead of TCP and why TCP is not suitable for real-time communication. We’ll start with a brief introduction to the RTP protocol and explain the fundamental requirements of real-time communication. Then we’ll look at the TCP protocol and discuss why it’s not suitable for real-time communication. Finally, we’ll summarize the main points.

2. RTP in a Nutshell

The Real-time Transport Protocol (RTP) is a network protocol for real-time multimedia (audio and video) data transfer over IP networks. It is defined in RFC-3550 and is used in conjunction with the RTP Control Protocol (RTCP) defined in the same document. These two protocols work together to provide a complete real-time multimedia data transfer protocol suite. While RTP is responsible for delivering real-time data, RTCP is used to monitor the quality of service (QoS) and synchronize the media streams.

Although RTP is called a transport protocol, it’s an application-level protocol that runs on top of UDP, and theoretically, it can run on top of any other transport protocol. It is designed to be a general-purpose protocol for real-time multimedia data transfer and is used in many applications, especially in WebRTC together with the Real-time Transport Control Protocol (RTCP).

RTP delegates the responsibility of splitting the real-time data into segments, timing recovery, loss detection, and content identification to the application layer. This approach is called Application Layer Framing (ALF). This is important because the application knows the semantics of the data. For example, we can imagine video and audio streams being sent over RTP. The application knows the borders of the video frames and their timing relationship with the audio frames. Therefore, it can decide how to split the data into segments to be sent together over the network.

3. Real-time Communication Requirements

For real-time multimedia data, it’s important to know the timing relationship between the data segments and keep them in order. Therefore, RTP uses a timestamp and a sequence number to identify the data segments.

A video frame may be split into multiple segments, so each segment will have the same timestamp and it is very important that these segments are stored together in the playback buffer. The sequence number is used to detect lost packets and to reorder the packets in the correct order. RTP provides no means to request retransmission and acknowledgment messages.

We can agree, that for real-time multimedia data, a steady frame or sample rate is the most critical requirement. Therefore, multimedia applications make use of a playback buffer to smooth out the jitter and also to conceal the network latency.

For real-time data transmission, it is clear that the sender should send the data as fast as possible and the receiver should deliver the data to the application as soon as possible. Now let’s look at the TCP protocol and see how it can be counterproductive in these two requirements.

4. Is TCP Suitable for Real-time Communication?

The TCP protocol was designed to provide reliable data transfer over an unreliable network. It focuses primarily on the reliability of the data transfer. Its main strategy is to retransmit the data in case of packet loss until the receiver finally acknowledges the data. A time-out mechanism is used to detect lost packets and trigger the packet’s retransmission according to the protocol’s retransmission rules.

If this happens repeatedly, the time-outs sum up and the retransmission time increases exponentially. This is called the exponential back-off algorithm. So just one unacknowledged packet will block the whole data transfer for a long time. This causes multiple subsequent data segments to be stored in the input buffer, the receiver can’t deliver to the application. Concerning the given playback rate, the waiting data segments will soon become obsolete and thus discarded. This is the price for the reliability of the TCP protocol.

To have an idea of how long it can take, let’s look at the default values of the TCP protocol in the Linux kernel. It has a kernel parameter called tcp_retries2, which is set to 15 by default. This means that the TCP protocol will try to retransmit the packet 15 times before it gives up. The Linux kernel starts with a retransmission time of 200 ms and doubles it after each retransmission, up to a maximum of 120 seconds. With 15 retransmissions, this can take up to 924.6 seconds (15 minutes !) before the TCP protocol notifies the application about the broken connection.

TCP also provides flow control and congestion control mechanisms to avoid network congestion. TCP is very careful in avoiding network congestion. While the closing of the congestion window (when congestion is detected) is done exponentially, the opening of the congestion window is done linearly. This means that the TCP protocol is very slow in increasing the data transfer rate.

Another spoiler for real-time communication is the use of Nagle’s algorithm on the sender side. The algorithm’s purpose is to reduce the number of small packets sent over the network causing unnecessary overhead. It detects small-sized messages and accumulates them in the TCP buffer before sending them over the network in larger segments. While this may be a good idea for file transfer, it is not suitable for real-time communication.

5. One-to-Many Communication

Multimedia applications often use one-to-many communication. It is very common, that a video conference hosts multiple participants, where each participant sees all other participants, and all hear the one who is currently speaking.

For TCP, this means that for each participant, a separate TCP connection is required, and the traffic is multiplied. This causes a lot of overhead because the same data is sent in multiple channels over the same network link. Additionally, each TCP connection has its own congestion window and retransmission buffer, which causes a lot of memory consumption on the sender side.

On the other hand, UDP does not have such overhead simply because it’s a connection-less protocol, so there’s no state information to maintain on the sender side. Moreover, with UDP, the data can be sent to multiple receivers in a single transmission by using multicast. This saves a lot of bandwidth and network resources.

6. Summary

To understand why RTP uses UDP instead of TCP, we need to understand the fundamental requirements for real-time communication and the characteristics of both TCP and UDP. So let’s sum it all up in the table below:

Characteristics

Real-time communication

The timing relationship among received data is the highest priority.

Information arriving too late is useless.

UDP

Doesn’t guarantee packets arriving in order, but RTP uses a timestamp and a sequence number to identify the data segments and put them in the correct order.

Packet loss is not an issue as long as it keeps within certain limits, especially if a mechanism like Forward Error Correction (FEC) is used.

Can implement one-to-many communication in a single transmission by using multicast.

TCP

Prefers reliability over timeliness.

Introduces long communication timeouts.

Postpones sending of small data chunks.

Can implement one-to-many communication only by stream multiplication.

7. Conclusion

In this article, we discussed why the Real-time Transport Protocol uses UDP instead of TCP, and why TCP is not suitable for real-time communication. Today, many applications are communicating over the Internet and it’s important to understand their requirements and the characteristics of the underlying protocols to make the right choice for the data transfer protocol.