QUIC protocol from the monitoring perspective
The goal of this document is to describe a network protocol QUIC from the perspective of network monitoring. Compared to the existing protocols, QUIC has several differences that we encountered when trying to monitor it. In addition to the description of the protocol, the article contains information from measuring the protocol in the environment of the large university network of Masaryk University (MUNI).
Our goal within the CONCORDIA project is, among other things, to solve the analysis and monitoring of encrypted traffic. This area also includes the monitoring of the QUIC protocol, which is intended for encrypted web communication. We would like to create a tool that can recognize the QUIC protocol in network traffic. At the same time, the tool should obtain basic information about what kind of communication it is.
QUIC is an acronym for Quick UDP Internet Connections and was designed by Google. Currently, it is maintained by IETF as several RFC documents currently standardize the QUIC protocol. The primary purpose of the QUIC protocol is to improve the user experience (mainly reduce web page load times) compared to the connections using TCP+TLS+HTTP/2 protocols.
Even though QUIC was implemented with an orientation to web applications, it is a general-purpose transport protocol that can be used for any network application. As its name says, it is based on the UDP protocol and uses port 443. Because the UDP is a connectionless protocol without guaranteed delivery, it is up to the QUIC protocol to implement all necessary mechanisms to ensure correct data delivery.
Unfortunately, considering all packets that are being transferred over UDP port 443 as QUIC packets is generally not a good practice. Not just because the QUIC can be used on another port, but also other protocols can use the same port as QUIC. And unfortunately, this is the case of OpenVPN protocol which is also using the 443 UDP port.
The QUIC protocol is not a new protocol at all and has already been with us for over eight years. It was first created by Google in 2012. In 2015, Google submitted the protocol’s specification to IETF for standardization. The QUIC Working Group was established in 2016, and the first IETF draft was published in November of the same year. Flowmon Networks was involved in the QUIC working group as a consultancy partner representing network monitoring vendors. The last draft, the 34th, was published in January 2021. Within this period, many changes have been made to the protocol, and many development teams were just waiting for the final specification, which was finally published in RFC9000 in May 2021. During the standardization process, the QUIC and HTTP Working Groups agreed on marking the HTTP over QUIC as the new version of HTTP protocol – HTTP/3. In June 2021, almost 20% of all web pages support HTTP/3 (source: https://w3techs.com/technologies/details/ce-http3), and HTTP/3 is supported on 73% of all web browsers (source: https://caniuse.com/?search=HTTP%2F3).
Right now, the QUIC is described in four RFC documents published on 27.05.2021:
- RFC 8999: Version-Independent Properties of QUIC
- RFC 9000: QUIC: A UDP-Based Multiplexed and Secure Transport
- RFC 9001: Using TLS to Secure QUIC
- RFC 9002: QUIC Loss Detection and Congestion Control
Unfortunately, based on our measurements described later in this text, the last and standardized version of QUIC is not the only one currently used. Therefore, the monitoring devices should support other versions as well. Even if all the current services and applications migrate to the standardized QUIC version, there is still a big chance that a new malware will use some old implementation of the QUIC version. The reason is that malware may try to use old versions to prevent analysis by some protection mechanisms and avoid detection.
Long and short headers
One of the most interesting and important things is that the QUIC uses two types of headers – long and short. The long header is used for initial connection establishment. It was being designed to be expressive and easily extensible. However, only a few header fields from the long header are necessary in the following packets after the connection is successfully established. That’s the reason why a short header is being used because it aims to be maximally efficient and contains only necessary fields.
Information from long headers can even be stored on servers, so the next time a client connects on the same server, the client can use the previous information and start using the short headers from the first packet. From a network monitoring perspective, this mechanism can be problematic because the information from long headers may be unavailable for two main reasons:
- Long headers were transferred a long time ago, and the monitoring system doesn’t have such a capacity to store headers for such a long time.
- Users can move their devices across multiple networks. Because the information from long headers is not specific to the network, a user can transfer long headers only in one network and use information from them in other networks as well.
The following Figure 1 contains the data structures describing both types of header:
Unfortunately, there is no information in the short header that we can use to unambiguously distinguish QUIC packets from other protocols. However, with a long header, it is possible to use the 4 bytes long version field that indicates the version of the protocol. Before the standardized version was introduced, each byte represented one character from the ASCII table (e.g., “Q046”). With the standardized version, this field is interpreted as one unsigned integer (e.g., 0x00000001). If this field contains any valid version (e.g., “Q046”), we consider all packets from the same network flow belonging to the QUIC protocol.
Zero round trip time
With sharing information between network connections, it is possible to transfer smaller headers that will reduce the number of transferred bytes. However, this is not the only benefit. Another benefit is that a client can use a 0-RTT (zero round trip time) connection resumption because this connection can use information stored from previous connections. With a 0-RTT connection, the client sends the data immediately with the first packet. On the other hand, traditional HTTP/2 connections that use TCP and TLS protocol1 require two round trip times before the data can be transferred.
Before enabling the 0-RTT feature in QUIC protocol, it is important to understand that 0-RTT brings one unfavorable security threat into the network applications – replay attacks. Data sent over 0-RTT are repayable and allow attackers to duplicate any packets in the future. There are two possible solutions for this problem: 1) completely disabling this feature or 2) implementing the protection mechanism inside the application layer, so the duplicated packets will do no harm.
The classical way of identifying a network connection is by using information from the Internet and transport layer (based on TCP/IP stack). Specifically, it is a 5-tuple: source and destination IP address, source and destination transport port, and transport protocol. However, in the current world, when network devices are being moved from one network to another (e.g., from Wi-Fi to cellular network), this approach is not ideal. By moving the device from one network to another, the network usually assigns a different IP address and therefore all previous network connections are terminated.
QUIC tries to solve this issue with a field not related to lower network layers and is used for connection identification. The field is called Connection ID, and the length is variable. So, when the user moves a device from one network to another, the application can still use the same Connection ID. Therefore, it is possible to continue with previous connections without any interruption. It is also important to remember that Connection ID can even change during communication. Because of that, mapping all information about the connection (like SNI) with the Connection ID may result in some inaccurate states.
Because an application can use information from previous connections, even from different networks, it may happen that the first packet that the application sends to the destination will use the short QUIC header. In that case, middleboxes cannot correctly analyze these packets because the information in long-headers has been transferred over a different network.
All application data that are being transferred over the QUIC protocol are always encrypted. This is great for privacy purposes, but it is crucial to have at least basic knowledge of transferred application data for monitoring purposes. To allow middleboxes to properly work with QUIC packets, the QUIC version and Connection ID are not encrypted in transmitted packets. Fortunately, it is still possible to get SNI (Server Name Indication) from the packets. It is just a little bit complicated. First, it is necessary to find out the version of the QUIC protocol, based on which an initial salt is being deduced. The Client Hello frame is encrypted from the connection based on the version and the initial salt. In all new versions, the format of the Client Hello packet is based on TLS 1.3 protocol. However, in older implementations, the QUIC Crypto had been used. After the Client Hello frame is decrypted, it is possible to extract SNI, which may help identify the network connection.
As described in the previous section, the client can change the Connection ID during communication with the same server. Therefore, pairing the decrypted SNI with Connection ID will not assign the correct SNI to packets with a new Connection ID.
Several other benefits of QUIC over TCP+TLS+HTTP/2 are:
- Improved congestion control – Each QUIC packet carries a new sequence number (even the retransmitted one). ACKs messages carry the transmission delay. Both these mechanisms allow communication entities to have a better picture of what has been transferred and to have an exact round trip-time calculation.
- Multiplexing without head of line blocking – TCP works with data as with a stream of bytes. When a single packet in the stream is lost, no stream on that connection can make forward progress until that one packet is retransmitted. On the other hand, packet loss in one stream in QUIC doesn’t affect other streams.
- Forward error correction – Optionally. When a packet loss is expected, it is possible to periodically send an FEC (Forward Error Correction) packet containing parity that can be used to recover one lost packet from the FEC group even without the retransmission.
Avoiding the HTTP
One of the potential problems of the QUIC protocol is that it is a (relatively) new protocol, and therefore it is not supported on old middleboxes and security appliances. This creates a security hole for many organizations. Current solutions that monitor web traffic are implemented to work only with HTTP and HTTPS traffic transferred over TCP and TCP+TLS protocol. After analyzing the traffic, the security appliances may filter access to some web pages, generate statistics, monitor performance, or execute deep packet inspection. However, none of this will be possible if the monitoring solution will not understand the QUIC protocol. This potential security hole can also be used by new malware families that may try to use QUIC protocol to avoid detection or deep analysis.
One of the ways to prevent having QUIC inside the network before upgrading all network equipment to support this new protocol is to block all traffic on UDP port 443. However, as mentioned at the beginning of this document, this filtering may also stop OpenVPN from working.
Monitoring QUIC in MUNI network
Unfortunately, we have noticed that many QUIC versions are used simultaneously, which brings several difficulties related to network monitoring. To be able to properly monitor the protocol, all used versions should be supported. Therefore, we have started measuring the occurrence of each version in real network traffic. The monitoring took place at Masaryk University in Czechia between 1.1.2021 and 31.8.2021.
As a large organization with more than 40,000 users, including more than 35,000 students and 6,000 employees, Masaryk University has a vast computer network with more than 25,000 communicating IP addresses every day. The university is connected to the CESNET2 academic network using two connection points. CESNET2 is one of the first large-scale fiber-optic networks in the world to implement cutting-edge technology to maximize transmission capabilities. It further builds on the pan-European multi-gigabit network GÉANT2, which forms the backbone of all European research and education networks and provides connectivity to similar networks overseas. The monitoring point for measuring the QUIC protocol usage and its parameters is located in this single connection point of the university to the academic network CESNET2. The total number of flows is changing distinctly over time. This is because the number of students depends on the part of the school semester. Various anti covid restrictions also affected the number of students and employees in the Masaryk University network.
The Figure 2 and Figure 3 shows the number of QUIC flows and their distribution according to the used version. The figures do not show individual versions but only aggregated values according to version type. In total, we have detected 4 correct version types:
- Versions beginning with the letter Q, where the following number represents the QUIC draft number of the protocol that the version implements. These versions are using QUIC Crypto to initialize the encrypted connection.
- Before releasing the RFC 9000 describing the form of the protocol, version T051 was created, which is similar to version Q050 but uses the TLS 1.3 protocol to initialize an encrypted connection instead of the QUIC Crypto protocol.
- Versions with the prefix 0xfaceb00 indicate a proprietary implementation of the protocol from Facebook.
- The latest version is version 0x00000001, which is described in RFC 9000.
The main conclusion of our measurement is that to properly monitor the QUIC traffic, it is necessary to support not just the latest version but older versions as well. The oldest detected version was Q039. However, our conclusion may change because, after the release of version number 1, we expect a continuous migration from the previous draft versions. This trend has already begun, as we can see by reducing the number of older versions in July and August.
The following Figure 4 and Figure 5 show the individual representation of versions starting with the character Q. For version Q039, the number is so small that the value is not well visible in the figures. For a better interpretation, Table 1 at the end of the article can help.
During our monitoring, we have noticed some unexpected versions of QUIC. Specifically, version 0xfaceb001 and 0xfaceb002. Figure 6 shows the number of flows for these two versions. Based on the information from the Internet, we have found out that it is a proprietary implementation of the QUIC protocol created by Facebook. They call their QUIC implementation “mvfst” (source: https://github.com/facebookincubator/mvfst). The protocol is used by Facebook and Instagram web pages, and according to an Internet blogpost from 21.10.2020 (source: https://engineering.fb.com/2020/10/21/networking-traffic/how-facebook-is-bringing-quic-to-billions/), 75% of Facebook traffic uses this version of QUIC, and the goal is to exclusively use the QUIC protocol for the Facebook traffic. Unfortunately, we don’t know whether Facebook will start to migrate its services from the Facebook version of QUIC into the standardized version.
All the figures from this chapter have been generated based on the numbers shown in Table 1. Just for clarification, there are zeros in the RFC 9000 version of the protocol because the version has been published in May 2021.
This article describes the new QUIC protocol focusing on the possibilities of its monitoring in the network. The text contains the most important advantages of the protocol over connections using TCP + TLS + HTTP / 2 protocols and also describes their possible effects on the monitoring itself. Because there are many protocol versions, we started measuring the QUIC protocol traffic on the MUNI network. Because it is not very realistic to implement every version of the protocol on monitoring probes (originally, new versions were released every few months), we decided to use this monitoring to determine which versions are actually used. During the monitoring period, the final version number 1 was released, which is described in RFC9000. We anticipate that existing applications will gradually migrate to this version.
- with support of TLS Session Resumption
(By Flowmon and Masaryk University)