Passive TCP Identification for Wired and WirelessNetworks: A Long-Short Term Memory Approach

04/09/2019 ∙ by Xiaoyu Chen, et al. ∙ Shanghai University 0

Transmission control protocol (TCP) congestion control is one of the key techniques to improve network performance. TCP congestion control algorithm identification (TCP identification) can be used to significantly improve network efficiency. Existing TCP identification methods can only be applied to limited number of TCP congestion control algorithms and focus on wired networks. In this paper, we proposed a machine learning based passive TCP identification method for wired and wireless networks. After comparing among three typical machine learning models, we concluded that the 4-layers Long Short Term Memory (LSTM) model achieves the best identification accuracy. Our approach achieves better than 98 works for newly proposed TCP congestion control algorithms.



There are no comments yet.


page 3

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Transmission control protocol / Internet protocol (TCP/IP) lay the foundation of today’s information society. With years of research efforts, many TCP algorithms111For convenience, we will call TCP congestion control algorithm as TCP algorithm in this paper. have been devoted to prevent the network congestion as well as the transmission packet losses. As the typical packet transmission over Internet is based on wired architecture, traditional TCP algorithms, such as NewReno[1], Cubic[2] and Vegas[3], focus on the network congestion event. With the rapid development of wireless transmission, current TCP algorithms, e.g. Sprout[4] / Verus[5], jointly consider the transmission packet loss in the wireless environment, and achieve a better trade-off between the transmission delay and throughput.

Ideally, the intelligent network shall automatically identify the TCP algorithms and adapt the network routing policy and transmission resources to maximize the network utility. However, even the initial TCP algorithm identification has been regarded as a challenging task, and plenty of research efforts have been spent until recently[6, 7, 8, 9, 10, 11]. According to the way of data collection, TCP identification can be mainly divided into two categories, namely active detection and passive measurement, where the former relies on observing the behaviors of injected redundant packets and passive measurement relies on the observations at intermediate nodes, without affecting the current network traffic. The passive measurement has minimum effects on network which is more practical than active detection, so is commonly used in recent work[12, 13, 6].

Among the existing passive TCP identification methods, a cluster based scheme has been proposed in[12], which can identify any 2 out of 14 TCP algorithms and achieves 85% identification accuracy. However, subject to the cluster-based method, the work is difficult to extend for the identification of newly proposed algorithms. Meanwhile the accuracy is affected by the calculation of artificial features. According to [6], the congestion avoidance algorithm identification (CAAI) can identify 15 TCP algorithms available in major operating system and reaches overall 96.98% identification accuracy. The inputs of the model are multiplicative decrease parameter and window growth function (the offset window sizes) that they believe will remain the same for most TCP algorithms. However the assumption is invalid for newly proposed algorithms, like BBR[14], and can not be applied directly. Another kind of passive TCP identification algorithms utilize the back-off factor by inferring the size of congestion window (CWND) during wired transmissions, which can identify 3 TCP algorithms with 95% accuracy as reported in [13]. Since the back-off factor is time-varying, this type of method is only valid for loss-based algorithms. As far as we aware, the previous works are based on the wired environment and new types of features need to be exploited for the identification of newly proposed algorithms in wireless networks.

In this paper, we propose a machine learning based passive TCP identification method to address the above issues, and the main contributions are summarized as follows.

  • Unified Passive TCP Identification Framework.

    We proposed a unified passive TCP identification framework, which consists of a features extraction block and an identification block. Different from the traditional identification algorithms, the proposed framework can easily expanded to support the identification of new types of TCP algorithms.

  • Joint consideration of wired and wireless performance.

    In order to jointly support wired and wireless environments, we propose a LSTM based passive TCP identification method, which can extract the time-domain correlations to reflect the congestion-based features (mainly for wired networks) and the transmission loss features (mainly for wireless networks). By fully utilizing the LSTM and dense connected neural network architecture, our proposed method can identify 5 TCP algorithms with 99.8% accuracy in wired networks and 6 TCP algorithms with 98.2% accuracy in wireless networks.

The rest of the paper is arranged as follows. In Section II, the problem formulation and limitations are analyzed. Our system model is introduced in Section III, including features selection and identification model. In Section IV, we introduce the experiment settings and evaluation results. Finally, the conclusion and future work are presented in Section V.

Ii Problem Formulation

TCP identification methods are typically based on the analysis of maximum a posteriori estimation (MAP), which can be expressed as (

1) according to the Bayes formula.


is the variant of TCP algorithm and

is the input vector.

is the probability of the TCP algorithm

when the input is the vector .

is the probability distribution function of

when TCP algorithm is . We can further write the above equation (1) in the following form.


is the proportion of TCP algorithm in the training set. For a Bayesian generative model, the training set and testing set are assumed to be sampled from the same probability distribution[15], so is a constant here. is the normalization parameter, which represents the distribution probability of the input vector in the entire input set . As equation (2) is the probability distribution function about TCP , the probability of can be considered as a constant. Thus can be written as the right hand side of equation (2). The problem is transformed to find the maximum probability of all TCP algorithms for each input vector . The TCP algorithm with the maximum probability is considered as the sender’s TCP variant in current input conditions. The optimization function is expressed as shown in (3).


is the conditional probability density function about input

under TCP algorithm . is the discrete probability mass function of the TCP algorithm. The important part of TCP identification is to construct the probability function.

The previous methods infer the back-off factor for the identification of TCP algorithms[13]. For example, the of Cubic is 0.7[2], for Reno is 0.5. Referring to the traditional Addictive Increase Multiple Decrease (AIMD) mechanism, the core function is (4).


is the value of CWND at time . and are constant. is used to increase the CWND and is used to decrease the CWND quickly. indicates a normal ACK when it equals to , otherwise a loss or timeout has occurred in last transmission. Two methods are used to infer the : (i) using packet loss event; (ii) exploiting time out event. By comparing the change in the cwnd value before and after the loss or time-out event occurs, the can be inferred. However, even for traditional delay-based algorithms, the above method can not work well. The biggest differences are back-off factors which values change over time. Refer to Vegas [3], the common expressions of and in delay-based algorithms shown as (5).


is the total bytes sent at last -1 time. is the minimum round trip time (RTT) observed in period. stands for the expected transmission rate. is the actual transmission rate. When actual transmission rate less than expected rate, and CWND increases, otherwise decreases. , , and are constant. is decided by a function about recent observed RTT. More factors, like , and which change over time, are introduced to control the CWND. According to [6], the window growth function is introduced to construct the probability function and to realize the identification of the delay-based algorithms.

The newly proposed TCP algorithms that based on different theories and models are more difficult to identify than before. The reason is that they have more control factors that change over time, and the factors are not independent of each other completely[14, 5, 4], which makes the construction of the probability function difficult. The identification model need to deal with multiple control factors so as to identify multiple algorithms. The differences and relations of these factors have to be considered, which is not easy to address in traditional methods. Machine learning has the ability to construct the complex function in this aspect and is used in our method. We will give a further introduction in Sec III.B.

Iii System Model

In this section, the system model of our proposed method will be introduced, as shown in Fig. 1. In order to extract the features that used to train the network, the captured data passively collected from the user side will be pre-processed firstly. Next step, the pre-processed data is sent to data process block so that it can be aligned on the time-domain. After that the data is sent into the well tuned identification model to identify the TCP algorithm of the data. The parts of features selection and learning-based identification model will be introduced in detail.

Fig. 1: The system model of our proposed method including passive data collection used to extracted the features and well tuned identification model based on neural network that can accurately identify TCP variants.

Iii-a Features Selection

Fig. 2: Three features of BBR and Cubic in wired networks. The bottleneck buffer is set to 2BDP, the bandwidth is 10 Mbps and the RTT is 50.
Fig. 3: The figure shows the selected features of BBR and Cubic algorithms in wireless networks. The RB numbers is 6, and the RTT sets to 50 .

For the convenience of actual deployment, the data is collected in a passive way that has the minimal effect on current traffic. Considering different network scenarios, the locations used to collect data differ. The data is collected from the receiver in wired networks, while we choose UE and base station in wireless networks.

Iii-A1 Features in wired networks

Although higher accuracy can be achieved if the nodes for data collection are closer to the sender [13]. However, server usually runs several service VMs and each of them may use different TCP algorithms which means there are at least two TCP flows in the links from the server to the closest router. One TCP flow may suffer interference from another. We can not change the configuration of router to optimize for a single TCP flow as it is unfair to other flows. In contrast, the reconfiguration of user side is more easy and practical, so the data are collected passively from receiver in wired networks.

The features extracted from raw TCP packets are passively collected from the receiver side using tcpdump tool, including inflight, throughput and oneway RTT, as shown in Fig. 2. In order to collect oneway RTT, the TCP TimeStamps option is open at the server and user side. And the throughput is calculated as (6):


is the size of received packet and is the difference of arrival time between two adjacent received packets. The features of different TCP algorithms have its unique behaviors and relationship which will be exploited to train our identification model.

Iii-A2 Features in wireless networks

Different from wired networks, the state information of base station (BS) is collected besides the raw TCP packets from user equipment (UE) in wireless networks. Because the performance of TCP degrades seriously and the state information from BS can reflect the situation experienced by TCP in wireless channel to some extent. The accuracy of identification will be improved if combined with the state information from BS.

The features extracted from UE are throughput and oneway RTT222The inflight has similar behaviors to the changes of RLC buffer size, so it is not considered in wireless networks.. The state information from BS including the buffer size of radio link control (RLC) layer and the delay of packet data convergence protocol (PDCP) layer. The size of RLC buffer can reflect the overall network congestion. The delay of PDCP layer represents the queuing delay of air interface and the delay caused by re-transmissions in wireless channel, such as automatic repeat request (ARQ) in RLC acknowledged mode (AM). Compared with the wired networks, wireless channel has high bit errors that result in more fluctuations of the features’ behaviors, as shown in Fig. 3, which is a challenge to TCP identification.

Iii-B Learning-Based Identification Model

Fig. 4: The network model of LSTM
Input Layer
Dense 1024
Conv2D 5x5x128
s2 PReLU
LSTM 600
Dense 512
Conv2D 5x5x64
s2 PReLU
LSTM 600
Dense 256
Conv2D 3x3x32
s2 PReLU
Dense 256
Dense 128
Dense 128
Dense 128
Output Layer
Dense 6
None softmax
Dense 6
None softmax
Dense 6
None softmax
0.968 0.942 0.982
TABLE I: An Overview of Network Configurations and results

Machine learning has the ability to extract features and has been widely used in image classification and natural language processing. We combine machine learning with TCP identification as shown in Fig. 1. The data collected from the user side is fed into the neural network to train the machine learning model for the identification of TCP algorithms.

Three typical machine learning models are exploited and compared in wireless networks, including dense neural network (DNN), convolutional neural network (CNN) and LSTM. We use the same dataset to train


To accelerate the training process of the data set, we install Tensorflow on our server with Intel(R) Xeon(R) CPU E5-3680 and NVIDIA Tesla P100 GPU.

and test, and all network models have been well tuned to achieve the best accuracy. The network configurations and results are listed in Table I. The 4-layers LSTM network has the highest identification accuracy, and its model shown in Fig. 4.

The reason for poor performance of DNN is that the output of fully connected DNN is only decided by current state which means the timing information is not considered. Although CNN has strong ability to extract features, the features of different time are not considered either. The LSTM model that combines the information from the current state with the previous state to get the output is suited for identifying because it takes account of the time series. In our well tuned model the number of previous state which we consider is 131. The model also has high scalability, when a new TCP algorithm join the model, the model can be fine-tuned using most of existed parameters except for the output softmax layer, which means the model will converge quickly. The training property of the LSTM used in this paper will be introduced, including the input and output, training dataset, data process, loss function, and other implementation details.

Input and output. The input data is a set of features with time series which is different between wired and wireless networks. In wired networks, the input consists of PacketSize, TimeDif, OnewayDelay and Inflight. In wireless networks, PacketSize, TimeDif, OnewayDelay, PDCPDelay, and RLCBufferSize are included. The output is the TCP algorithm variant of the input data.

Training dataset. The dataset is generated using mininet and ns-3 which will be introduced in Sec IV.A. We use 16800 sets of data to train wired identification model and 4320 sets of data to train wireless model.

Data process.

The data directly obtained from the networks can not be aligned on the time-domain, which will affect the training and testing of LSTM model. We re-sample the data using linear interpolation and the sampling interval is set to 5

. The exponential weighted moving average method is then used to reduce the impact of interpolation operation and the discontinuity of input data.

Loss function. The loss function we use is cross entropy calculated as equation (7) shows. is the input data of the network and is the label.


Other implementation details.

The total epochs used to train the LSTM model is 500. The batch size is 32. We use Adam optimizer with

initial learning rate and reduce the learning rate by ten times when in and total epochs, respectively.

Iv Experiments and Evaluations

In this section, the experiment settings of wired and wireless networks topology and evaluation results are given in detail.

Fig. 5: The topology of wired and wireless networks.

Iv-a Networks Topology and Parameters Setup

Iv-A1 Wired Network Topology

The wired network topology is constructed using mininet[16] as shown in Fig. 5 (a). The server connects to a router and Netem is used to set links delay from the server to the receiver. The rate limiting and buffer size at the router is configured by Token-Bucket Filter[17]. Tcpdump is used to passively collect data from the receiver.

The TCP algorithms used to generate the dataset for training including Cubic, NewReno, Hybla, Vegas and BBR, which are commonly used in current networks444The Linux kernel used in simulation is 4.13, which contains the TCP congestion control algorithms we need.. The maximum bandwidth is 510 , and the range of link RTT is from 40 to 100 . The size of bandwidth delay product (BDP) is 2001000 packets555The maximum transmission unit sets to 1500 Bytes.. We repeat each simulation 10 times for 60 and generated 16,800 sets of data for training, and 900 sets of data are generated for testing.

Iv-A2 Wireless Network Topology

The ns-3[18] is used to construct the wireless network topology as shown in Fig. 5 (b). Compared with wired network, the Westwood is added because of its good performance in wireless channel. The implementations of Cubic[19] and BBR[20] have been added to the original ns-3. The RLC AM mode is used in actual deployment, so is also considered in simulation. The maximum size of RLC buffer is unlimited in original ns-3, we modified the implementation of AM to support the drop-tail mechanism[21]. The tracing system of ns-3 is turned on to collect the TCP packets from receiver side. As for the delay of PDCP layer and RLC buffer size, which can be extracted from log files.

In terms of bandwidth, the number of resource block sets to 6, 15 and 25, respectively. The RTT is 20100 and the delay of the link between server and the closest router is adjust accordingly. The delay of air interface between BS and UE is considered as 3 . The maximum RLC buffer size sets to 100700 packets. The error model is MiErrorModel and the fading model is EVA60kmph. The Bulksend is used to generate data at the server side. Finally, 4320 sets of data are generated for training and 1440 sets of data for testing.

Iv-B Performance Evaluations

BBR Cubic NewReno Hybla Vegas
BBR 179 1 0 0 0
Cubic 0 179 1 0 0
NewReno 0 0 180 0 0
Hybla 0 0 0 180 0
Vegas 0 0 0 0 180

The confusion matrix in wired networks

Precision Recall F1-Score Support
BBR 1.000 0.994 0.998 180
Cubic 0.994 0.994 0.994 180
NewReno 0.995 1.000 0.997 180
Hybla 1.000 1.000 1.000 180
Vegas 1.000 1.000 1.000 180
Average/Total 0.998 0.998 0.998 900
Accuracy 0.998
TABLE III: The identification results in wired networks
BBR Cubic NewReno Hybla Vegas Westwood
BBR 240 0 0 0 0 0
Cubic 0 227 13 0 0 0
NewReno 0 0 238 0 2 0
Hybla 0 2 9 229 0 0
Vegas 0 0 0 0 240 0
Westwood 0 0 0 0 0 240
TABLE IV: The confusion matrix in wireless networks
Precision Recall F1-Score Support
BBR 1.000 1.000 1.000 240
Cubic 0.991 0.946 0.968 240
NewReno 0.915 0.992 0.952 240
Hybla 1.000 0.954 0.976 240
Vegas 0.992 1.000 0.996 240
Westwood 1.000 1.000 1.000 240
Average/Total 0.983 0.982 0.982 1440
Accuracy 0.982
TABLE V: The identification results in wireless networks

The identification results of our proposed method are analyzed in this section. In terms of accuracy, we mainly use the confusion matrix and the criterias of precision, recall and F1-score. The identification accuracy in wired networks is listed in Table II and Table III, from which we can see that almost all TCP algorithms are accurately identified and the overall accuracy is 99.8%.

The model shows a high accuracy on the identification of BBR, Vegas and Westwood in wireless networks from Table IV and Table V. The identification accuracy of the other three algorithms is no less than 94.6%, which also demonstrates that our proposed method performs well in wireless networks.

Fig. 6: The impact of selected features on identification accuracy. (a) shows the impact of input features on the accuracy of each TCP algorithm in wired networks and (b) shows their impact in wireless LTE networks.

The impact of selected features on the accuracy of each TCP algorithm identification is analyzed, as shown in Fig. 6, which provides a reference for selecting appropriate input for future optimization. Fig. 6 (a) shows that all features have high accuracy on BBR. However the performance is different on the other algorithms, especially for throughput. The reason is the behavior of throughput is steady if no loss occurs when use loss-based algorithms. Fig. 6 (b) shows that the selected features have large fluctuations on accuracy although the overall accuracy is above 94.6%, which is caused by the error-prone wireless channel. The accuracy of throughput is higher in wireless networks than in wired networks. In contrast, the RTT has a poor performance than in wired networks.

V Conclusion and Future Work

In this paper, a machine learning based passive TCP identification method is proposed. The proposed method can realize high identification accuracy for loss based, delay based and even the newly proposed algorithms with high scalability. Our method can be used in both wired and wireless network that covers most of scenes in current network. More than 98.2% identification accuracy is achieved in both network scenarios.

This work will be extended in the following aspects. 1) combine the identification method with actual system to test its performance, including optimizing configurations after identifying the variant of TCP algorithms; 2) update the identification model under multiple flows competitions.


This work was supported by the National Natural Science Foundation of China (NSFC) Grants under No. 61701293 and No. 61871262, the National Science and Technology Major Project Grants under No. 2018ZX03001009, the Huawei Innovation Research Program (HIRP), and research funds from Shanghai Institute for Advanced Communication and Data Science (SICS).


  • [1] S. Floyd, T. Henderson, and A. Gurtov, “The NewReno modification to TCP’s fast recovery algorithm,” Tech. Rep., 2004.
  • [2] S. Ha, I. Rhee, and L. Xu, “CUBIC: a new TCP-friendly high-speed TCP variant,” ACM SIGOPS operating systems review, vol. 42, no. 5, pp. 64–74, 2008.
  • [3] L. S. Brakmo and L. L. Peterson, “TCP vegas: End to end congestion avoidance on a global internet,” IEEE Journal on selected Areas in communications, vol. 13, no. 8, pp. 1465–1480, 1995.
  • [4] K. Winstein, A. Sivaraman, and H. Balakrishnan, “Stochastic forecasts achieve high throughput and low delay over cellular networks,” in Proc. NSDI’13, 2013, pp. 459–471.
  • [5] Y. Zaki, T. Pötsch, J. Chen, L. Subramanian, and C. Görg, “Adaptive congestion control for unpredictable cellular networks,” in ACM SIGCOMM Computer Communication Review, vol. 45, no. 4.   ACM, 2015, pp. 509–522.
  • [6] P. Yang, J. Shao, W. Luo, L. Xu, J. Deogun, and Y. Lu, “TCP congestion avoidance algorithm identification,” IEEE/ACM Transactions On Networking, vol. 22, no. 4, pp. 1311–1324, 2014.
  • [7] J. Pahdye and S. Floyd, “On inferring TCP behavior,” ACM SIGCOMM Computer Communication Review, vol. 31, no. 4, pp. 287–298, 2001.
  • [8] A. Medina, M. Allman, and S. Floyd, “Measuring the evolution of transport protocols in the internet,” ACM SIGCOMM Computer Communication Review, vol. 35, no. 2, pp. 37–52, 2005.
  • [9] S. Jaiswal, G. Iannaccone, C. Diot, J. Kurose, and D. Towsley, “Inferring TCP connection characteristics through passive measurements,” in Proc. INFOCOM’04, vol. 3, 2004, pp. 1582–1592.
  • [10] S. Rewaskar, J. Kaur, and F. D. Smith, “A performance study of loss detection/recovery in real-world TCP implementations,” in Proc. ICNP’07, 2007, pp. 256–265.
  • [11] F. Qian, A. Gerber, Z. M. Mao, S. Sen, O. Spatscheck, and W. Willinger, “TCP revisited: a fresh look at TCP in the wild,” in Proc. SIGCOMM’09, 2009, pp. 76–89.
  • [12]

    J. Oshio, S. Ata, and I. Oka, “Identification of different TCP versions based on cluster analysis,” in

    Proc. ICCCN’09, 2009, pp. 1–6.
  • [13] D. H. Hagos, P. E. Engelstad, A. Yazidi, and Ø. Kure, “General TCP State Inference Model from Passive Measurements Using Machine Learning Techniques,” IEEE Access, 2018.
  • [14] N. Cardwell, Y. Cheng, C. S. Gunn, S. H. Yeganeh, and V. Jacobson, “BBR: Congestion-based congestion control,” Queue, vol. 14, no. 5, p. 50, 2016.
  • [15] C. M. Bishop, Pattern recognition and machine learning.   springer, 2006.
  • [16] Mininet. [Online]. Available:
  • [17] D. Scholz, B. Jaeger, L. Schwaighofer, D. Raumer, F. Geyer, and G. Carle, “Towards a Deeper Understanding of TCP BBR Congestion Control.”
  • [18] Network Simulator 3. [Online]. Available:
  • [19] B. Levasseur, M. Claypool, and R. Kinicki, “A TCP CUBIC implementation in ns-3,” in Proc. WNS3’14, 2014, p. 3.
  • [20] M. Claypool, J. W. Chung, and F. Li, “BBR’: an implementation of bottleneck bandwidth and round-trip time congestion control for ns-3.” in Proc. WNS3’18, 2018, pp. 1–8.
  • [21] R. Robert, E. Atxutegi, A. Arvidsson, F. Liberal, A. Brunstrom, and K.-J. Grinnemo, “Behaviour of common TCP variants over LTE,” in Proc. GLOBECOM’16, 2016, pp. 1–7.