I Introduction
Domain name system (DNS) is one of the most important technologies of the Internet. We can convert a domain name into an IP address using DNS. Without this service, the Internet would not be deployed as widely as it is now. DNS messages are normally built on top of UDP packets. Unlike in TCP, it is easy to forge the source address of UDP packets. As a result, DNS requests with a fake source address can easily be sent to a DNS server. In theory, any DNS server can answer any domain name resolution request; there are no protocol requirements that limit or filter request messages from client nodes. When DNS was invented, malicious activity utilizing DNS servers as packet reflectors was not extensive; however, as the Internet grew, attackers started to use this open operating policy to send traffic to victim nodes by forging DNS message source addresses. To prevent this activity, recent DNS servers have been configured to answer requests originating only from specific client nodes, typically filtered by source IP address. Unfortunately, there are more than a few improperly configured DNS servers in the wild; these are called open resolvers111DNS Scanning Project: https://dnsscan.shadowserver.org. The DNS protocol is still one of the major methods for attacking [2][1][4]. In this paper, we propose a method of classifying a DNS server, according to whether or not it is used as a reflector, by monitoring the incoming DNS messages. We collect a series of DNS packets sent from a DNS server and build a feature matrix of the server, assuming that a reflector may have a different packet sequence pattern than that found with a normal DNS server. The preliminary result shows that our method can classify reflectors with an F1 score greater than 0.9 when the test and training data are generated within the same day. The trained model can also classify the data not used for the training and testing phase of the same day with more than 0.7 F1 score.
Ii DNS Server Feature Matrix
The basic idea behind this proposal originates from [5]. [5]
was invented to detect malicious nodes by investigating a series of TCP SYN packets sent from these nodes. TCP SYN packets are collected based on the source IP addresses of the TCP streams and a feature matrix as an image is generated. In the aforementioned study, it was assumed that the images have different shapes that are dependent on the activities of a malicious host, for example, scanning or DoS. The images generated from SYN packets were used as training data of a deep learning network using a CNN algorithm.
We follow a similar process in our proposal. The difference is that we use DNS response packets received from servers as an input for building the feature matrix.
To apply our method, we first create training data. To split the DNS messages into good messages and suspicious messages, we used the mechanism proposed in [3]. We monitor DNS messages at the boundary of an organization’s network and check all request and response messages. If there is a DNS server being used as a reflector, and it is sending unintended response messages, we will not see any matching request messages sent from within the organization.
The values we used to generate a feature matrix are shown in TABLE I.
Type | Description |
---|---|
Timestamp | Timestamp of a packet |
Port # | Source port # of a packet |
Size | Size of a DNS message |
OPCODE field | Indicating the DNS message type |
AA field | Indicating Authoritative Answer or not |
TC field | Indicating if a packet is truncated |
RD field | Indicating if recursive query is desired |
RA field | Indicating if recursive query is available |
Z field | Reserved field and should be 0 |
RCODE field | Indicating result code |
QDCOUNT | # of query items |
ARCOUNT | # of answer records |
NSCOUNT | # of name servers information |
AACOUNT | # of additional records |
The captured messages are grouped by source IP address (in this case the DNS server IP address), sorted by timestamp, and divided into groups of 100 packets. Fig. 1 shows an example of a DNS server feature matrix.
The order of rows is the same as the order presented in Table I. The values are normalized per row. Each column indicates one DNS response message. Because feature matrix is created every 100 packets, the size of columns is 100.
Iii Learning with SVM
The feature matrix image shown in Fig. 1 is based on messages sent from a suspicious DNS server. This particular server kept sending unsolicited DNS response messages; we can guess the behavior by observing the image. A smoothly changing timestamp row means that messages are being sent periodically. Most packets have the same shape except for source port number. Rows that are almost white or black signify, in most cases, the same values.
Fig. 2 shows a feature matrix of a good DNS server.
Different from the case shown in Fig. 1, the fields indicating the number of resource records (such as ARCOUNT) in each response packet have several different values. This is plausible because the contents of DNS request messages sent to a specific DNS server vary according to client; responses may also vary, depending on the request messages.
The datasets used with SVM are a single day data of a certain research network captured between 24th August 2019 and 25th August 2019. The sizes of the datasets are listed in TABLE II.
Date | # of Good / Bad DNS pkts | # of Good / Bad matrices |
---|---|---|
24th Aug. | 33,824,531 / 2,863,321 | 323,269 / 28,291 |
25th Aug. | 30,238,481 / 1,148,935 | 291,730 / 6,105 |
The selected hyper-parameters were penalty = 10, gamma = 0.01, and kernel = rbf, using grid search. The model was trained and tested with 20,000 randomly selected good matrices, and 80% of bad matrices for each day. For example, when using the dataset of 24th, we randomly sampled 20,000 matrices from good matrices and bad matrices. The ratio of training data and test data was 0.8 and 0.2.
TABLE III and IV present the classification results of sampled data for each day. As we can see from the tables, as long as we focus on the sampled data, the classification accuracy is high enough.
Precision | Recall | F1-score | Support | |
Good | 1.00 | 1.00 | 1.00 | 3,987 |
Bad | 1.00 | 1.00 | 1.00 | 4,540 |
Accuracy | 1.00 | 8,527 | ||
Macro Avg. | 1.00 | 1.00 | 1.00 | 8,527 |
Weighted Avg. | 1.00 | 1.00 | 1.00 | 8,527 |
Precision | Recall | F1-score | Support | |
Good | 1.00 | 1.00 | 1.00 | 3,993 |
Bad | 0.98 | 1.00 | 0.99 | 984 |
Accuracy | 1.00 | 4,977 | ||
Macro Avg. | 0.99 | 1.00 | 0.99 | 4,977 |
Weighted Avg. | 1.00 | 1.00 | 1.00 | 4,977 |
Next, we evaluated the rest of the data in the datasets not used in the training phase for each day. The results are shown in TABLE V and VI.
Precision | Recall | F1-score | Support | |
Good | 1.00 | 1.00 | 1.00 | 303,269 |
Bad | 0.85 | 1.00 | 0.92 | 5,659 |
Accuracy | 1.00 | 308,928 | ||
Macro Avg. | 0.92 | 1.00 | 0.96 | 308,928 |
Weighted Avg. | 1.00 | 1.00 | 1.00 | 308,928 |
Precision | Recall | F1-score | Support | |
Good | 1.00 | 1.00 | 1.00 | 271,730 |
Bad | 0.54 | 1.00 | 0.70 | 1,221 |
Accuracy | 1.00 | 272,951 | ||
Macro Avg. | 0.77 | 1.00 | 0.85 | 272,951 |
Weighted Avg. | 1.00 | 1.00 | 1.00 | 272,951 |
The precision values are decreased on both days that means more false positive results are seen. The F1-score on 24th is still acceptable, however, the score on 25th
is largely degraded. Since the recall values of both good matrices and bad matrices are kept high, we can still detect bad matrices with high enough probability.
Fig 3 shows a feature matrix of a DNS server labeled as a bad matrix which we didn’t see any request messages for the response messages sent from the DNS server. The shape looks quite similar to that of a good feature matrix shown in Fig 2 in the sense that the contents of the response messages have a wide variety of patterns. The server shown in Fig 3 was one of the DNS servers of the host organization of the datasets where we captured the packets. Considering the quality of the security operators of the organization, it is unlikely that the server was used as a reflector. Our guess is that the request messages went to the server using the different path where we were monitoring the traffic.
Cleansing of source data when using machine learning techniques is one of the important phases to achieve reliable results, and at the same time, it is one of the hardest tasks, especially the size of the data is big and the contents are dynamic and changing. Since the Internet is open system and the traffic trends are undoubtedly changing every day, assigning correct labels to training dataset is not an easy task. In this preliminary experiments, we did not perform intensive data cleansing because of lack of time. For example the matrix pattern shown in Fig 3 may be a benign pattern. We continue to investigate the contents of the dataset in more detail to achieve better labels.
Iv Conclusion
We attempted to classify DNS servers according to whether or not they were being used as reflectors by capturing a small number of DNS response messages sent from them. We used a method similar to the one proposed in [5]
to build a DNS server feature matrix. The preliminary results of classification using SVM show sufficient precision as long as training and test data from the same day is used. At this moment, the trained model does not show as high classification result when applied to the rest of the data which are not used for training and testing. One possible reason is the improper labeling of the data. As we described, we labeled each matrix based on the technique described in
[3]. The method can find all the unsolicited DNS response messages assuming we can monitor the entire DNS message exchanges. In our preliminary experiments, we were seeing unsolicited DNS response messages sent from the servers located inside the host organization, which may be benign servers. Assigning correct labels to data is important when using the data as a training dataset for machine learning algorithms. We plan to investigate the contents in more detail to create better training datasets.The classification method we used in this paper was SVM. SVM is a simple and easy-to-use tool for data analysis, however, we recently have more advanced algorithms. Therefore, in the future, we plan on making the results more stable by investigating data and matrix generation approaches (e.g. what values to use to build a matrix) and also by investigating classification algorithms (including deep learning technologies) to achieve superior performance.
Acknowledgement
This work was supported by JST CREST Grant Number JPMJCR1783, Japan.
References
- [1] (2016) An overview of DDoS attacks based on DNS. In 2016 International Conference on Information and Communication Technology Convergence, ICTC 2016, pp. 276–280. Cited by: §I.
- [2] (2014-08) An Internet-wide view of Internet-wide scanning. In 23rd USENIX Security Symposium (USENIX Security 14), San Diego, CA, pp. 65–78. External Links: ISBN 978-1-931971-15-7, Link Cited by: §I.
- [3] (2007) Detecting DNS amplification attacks. In International Workshop on Critical Information Infrastructures Security, pp. 185–196. Cited by: §II, §IV.
- [4] (2015) Going Wild : Large-Scale Classification of Open DNS Resolvers Categories and Subject Descriptors. In Proceedings of the 2015 Internet Measurement Conference (IMC’15), pp. 355–368. External Links: ISBN 9781450338486 Cited by: §I.
-
[5]
(2018)
Malicious host detection by imaging SYN packets and a neural network
. In Proceedings of IEEE International Symposium on Networks, Computers and Commnications (ISNCC2018), Cited by: §II, §IV.
Comments
There are no comments yet.