Evading classifiers in discrete domains with provable optimality guarantees

10/25/2018 ∙ by Bogdan Kulynych, et al. ∙ 0

Security-critical applications such as malware, fraud, or spam detection, require machine learning models that operate on examples from constrained discrete domains. In these settings, gradient-based attacks that rely on adding perturbations often fail to produce adversarial examples that meet the domain constraints, and thus are not effective. We introduce a graphical framework that (1) formalizes existing attacks in discrete domains, (2) efficiently produces valid adversarial examples with guarantees of minimal cost, and (3) can accommodate complex cost functions beyond the commonly used p-norm. We demonstrate the effectiveness of this method by crafting adversarial examples that evade a Twitter bot detection classifier using a provably minimal number of changes.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Model and goal

We introduce a framework to efficiently find adversarial examples suitable for constrained discrete domains such as those underlying security applications of machine learning. As an illustrative example we consider a toy Twitter bot detection classifier that takes as input the days since the account was created, and the total number of replies to the tweets made by this account, and outputs a binary decision: bot or not. In this setting the adversary’s goal is to, starting from an arbitrary account, create a bot that evades the detector by only modifying these two features. The adversary wishes to keep these modifications to a minimum: increasing the lifetime of the account costs time, and increasing the number of replies to the account’s tweets requires engaging real users or deploying other bots.

Various works have tried to tackle this type of problems using greedy search methods (Papernot et al., 2016; Grosse et al., 2016; Gao et al., 2018; Kolosnjaji et al., 2018; Jia and Gong, 2018)

, and evolutionary algorithms 

(Xu et al., 2016; Dang et al., 2017). These approaches do not provide formal guarantees of their effectiveness, and thus cannot be used to obtain principled adversarial robustness guarantees for the models. Our approach builds adversarial examples in such a way that it provides such guarantees.

Formally, let the target classifier have the discriminant function of form where is a feature mapping. Let denote the binary decision of the classifier for input produced by thresholding the discrimant function at

. This encompasses several families of models in machine learning, e.g., logistic regression, SVM, and neural network-based classifiers.

The adversary uses the “mimicry” strategy (Demontis et al., 2017), i.e., she starts with a known initial example and applies structure-preserving transformations until the transformed example, , causes a misclassification. Following security practices, we assume a worst-case adversary that has full knowledge of the target model parameters, including and the feature mapping .

The adversary’s goal is to find an adversarial example that incurs minimal manipulation cost. This problem can be formulated as an optimization problem:


where is a given initial example, and is the adversarial cost. models the “price” that an adversary pays to transform example into . The minimal for a given is known as (pointwise) robustness (Bastani et al., 2016; Fawzi et al., 2016), or minimal adversarial cost (MAC) (Lowd and Meek, 2005).

We formalize the transformations an adversary can perform as a transformation graph. This is a directed weighted graph , with . Each edge represents the modification of an example into an example . For each edge the function defines the manipulation cost associated to that modification. For a given path in the graph, we define the path cost, representing the cost of performing that chain of transformations, as the sum of edge costs along it:

Within the graphical framework, the problem in Equation 1 is reduced to minimizing the manipulation cost as defined by the graph , narrowing the search space to only those that are reachable from :


where is defined as the minimal path cost over all paths from to :

Example 1.

Let us consider a transformation graph for the toy Twitter bot classification problem. For each feature vector

, there exist up to four children in the graph: an example with the value of the the number of days since account creation feature incremented by one, or decremented by one; and analogously two children for number of replies to the tweets. Let all edges have cost 1. In such a graph the cost of a transformation chain is the number of edges traversed, e.g., incrementing the number of days since account creation by three is equivalent to a path consisting of three nodes (path cost of 3). The adversary’s goal is to find the path with the lowest cost (minimal number of transformations) that flips the classifier’s decision. The resulting account is the solution to Equation 2.

To illustrate its generality, we show how several existing attacks in discrete domains can be cast as instances of best-first search (Hart et al., 1968) within this graphical framework in Table 1.

2 Minimal-cost attacks using heuristic graph search

One way to find an optimal, or admissible, solution to Equation 2 that incurs minimal cost is using uniform-cost search (Hart et al., 1968). However, this approach may be inefficient or even infeasible. Let us constrain the transformation graph in Example 1, where the branching factor is 4, to performing at most 30 decrements or increments to any of the features. The number of nodes in this graph is bounded by . Given that the uniform-cost search algorithm needs to expand nodes in the worst case, if a single expansion takes a nanosecond, full graph traversal would take 36 years.

For certain settings, however, it is possible to use heuristics to identify the best direction in which to traverse the graph, significantly speeding up the search. Heuristic search algorithms, like A

 (Hart et al., 1968), or iteratively-deepening A (Korf, 1985), can find the solution to the problem in Equation 2. To ensure that these algorithms find the admissible it is sufficient that the heuristic is admissible (Dechter and Pearl, 1985):

Definition 2.1 (Admissible heuristic).

Let be a weighted directed graph with . A heuristic is admissible if for any and any goal node it never overestimates the :

We now detail one realistic setting for which there exists an admissible heuristic. Let the input domain be a discrete subset of the vector space , and let the cost of an edge in the transformation graph be the distance between examples and :

. This is similar to typical cost functions in adversarial machine learning 

(Sharif et al., 2018). We show in Section 3, however, that the structure of the transformation graph can encode more complex cost functions even if the edge cost is .

Let be a superset of , e.g., can be a continuous closure of a discrete . Let denote the MAC of the classifier at input , with cost set to , and the search space being . Using the fact that the search space is a subset of , can be simplified from Equation 1 to the following:


For some models, this can be either computed using formal methods (Carlini et al., 2017; Katz et al., 2017; Bastani et al., 2016), or bounded analytically, yielding a lower bound  (Tsuzuku et al., 2018; Peck et al., 2017; Hein and Andriushchenko, 2017). The known methods usually perform the computation over a box-constrained for some contiguous intervals . For linear models

can be computed exactly and efficiently as distance from a point to the decision hyperplane of the classifier 

(Fawzi et al., 2018).

Any lower bound on the over any such that , can be used to construct an admissible heuristic for a given starting example :

This heuristic shows a lower bound on a path cost from an example to any adversarial example. Figure 2 in Appendix C illustrates this when the classifier is a linear model.

Statement 2.1 (Admissibility of ).

Let the transformation graph have , and the initial example be . Then is an admissible heuristic for the graph search problem from Equation 2. (Proof in Appendix A)

-admissible relaxations. A number of works are dedicated to bounded relaxations of the admissibility properties of A search, trading off the admissibility guarantees for computational efficiency (Pohl, 1970, 1973; Pearl and Kim, 1982; Likhachev et al., 2003). In this paper, we employ static weighting (Pohl, 1970) for its simplicity. In this approach the heuristic is multiplied by . This results in adversarial examples that have at most higher cost than MAC.

3 Evaluation

We now evaluate our framework against a realistic instantiation of Twitter bot detection. We use a dataset of extracted features for Twitter bot classification by Gilani et al. (2017). Each example in the dataset represents aggregated information about a Twitter account in April of 2016. Accounts are human-labeled as bots or real humans. We report results for accounts with under 1,000 followers. We obtained similar results for more popular accounts.

Each account has the following associated features: the number of tweets, retweets, favourites, lists, and replies, the average number of URLs, the size of attached content, average likes and retweets per tweet, and the list of apps that were used to post tweets (summarized in: Table 2, Appendix C).

We use a linear model as the target classifier, since it allows us to use the exact value of the heuristic (Fawzi et al., 2018). See Appendix B for details on computing this heuristic. We use regularized logistic regression, trained using a 5-fold cross-validation on 90% of the data, and tested on 10% of the data.

Discretization. We bucketize all the numerical features (e.g., size of attached content

) into a number of buckets that correspond to quantiles, as computed on the training data. We run the attacks using a number of buckets from 5 to 100 which, as explained below, effectively defines the size of the transformation graph. After quantization, we one-hot encode these features. The list of apps that were used to post the tweets are represented as follows. For each app (6 in total), two bits are set: one if it was used, and zero if it was not.

Transformation graph and adversarial cost. For each bucketized feature in a feature vector we define two transformations: change it to the next larger, and resp. smaller, bucket. For the buckets in the extremes only one transformation is possible. Thus, the more buckets are used for discretizing numerical features, the larger is the graph size. In a transformed example, the further the changed bucket is from the original value, the more transformations are needed, and the larger is the path cost. For the list of apps feature, we define one transformation per app, flipping the bits that represent whether the app was used or not.

We set the edge weights to distance between feature vectors. Because of the way we encoded the features, each transformation has a cost of 2 (one bit is set to zero, and another bit to one). Hence, a path cost in this graph is proportional to the number of feature changes, either of bucketized features, or the list of apps feature. We note that could be used, yielding a similar path cost model. We empirically found that the heuristic corresponding to edge cost performs best.

Performance. For each example in the full dataset we run uniform-cost search (UCS), to obtain a baseline for the cost, and A with heuristic. For A, we also run -bounded relaxations of A with different values of . In Figure 1 (left) we show the details for the best-performing model, the one using 20 buckets for discretization. We find that for those examples that do not require many changes, both A and UCS traverse parts of the graph of similar sizes. When the number of nodes that need to be traversed gets larger, the speed-up of A becomes more prominent. Moreover, increasing weight significantly speeds up the search, e.g., by two orders of magnitude for . Table 3, Appendix C presents some of the obtained adversarial examples and the corresponding feature transformations.

Figure 1: Left, node expansions for different search algorithms (-axis is logarithmic); right, relative increase of the path cost over minimal adversarial cost (each point is an adversarial example; the distributions are concentrated at

, all points above that are outliers)

Provable guarantees. We evaluate the increase in cost of adversarial examples found with -bounded relaxations of A over those of optimal MAC adversarial examples found with UCS or A. We show in Fig. 1 (right) the increase in costs relative to the MAC. The results show that -relaxations of A through static weighting give an extremely pessimistic cost sub-optimality guarantee. Indeed, means adversarial examples can have up to higher cost than MAC; in practice, most do not suffer any increase, and the maximum increase is only . On the other hand, weighting significantly speeds up the search, at the same time producing adversarial examples that are close to optimal. For , that results in a two order of magnitude speed-up, all adversarial examples have optimal cost in our experiments.

4 Conclusions

In this paper we have proposed a graphical framework to formalize evasion attacks in discrete domains that casts existing attacks as instances of best-first search over a graph. In settings with a discrete input space , and the adversary’s transformation cost being the distance between examples, this framework obtains adversarial examples that incur minimal cost for the adversary with white-box knowledge. This method produces a focused pointwise robustness guarantee for a given model of adversarial capabilities. We evaluated the attack against a Twitter bot classifier, showing that it can produce adversarial examples within an -bound of the minimal adversarial cost, using significantly lower computational runtime than exhaustive search. We found that static -weighting gives a pessimistic optimality bound. More work is needed to produce tighter optimality bounds, e.g., applying other relaxations of A.

Attack Domain Adversary knowledge Expansions Cost Heuristic Scoring function / Search algorithm Admissibility

Papernot et al. (2016)
Text White-box Word substitutions Number of substitutions Forward gradient-based Heuristic / Greedy best-first

Kulynych (2017)
Text White-box Word substitutions, character substitutions, insertions, deletions Semantic dissimilarity Forward gradient-based Non-linear combination of cost and heuristic / Best-first

Liang et al. (2018)
Text White-box Insertion of common phrases, word removal, character substitution, homoglyph insertions Forward gradient-based Heuristic / Greedy best-first

Ebrahimi et al. (2018)
Text White-box Character/word substitution, insertion, deletion Forward gradient-based Heuristic / Beam

Gao et al. (2018)
Text Black-box Character substitution, insertion, deletion Confidence-based Heuristic / Greedy best-first

Grosse et al. (2016)
Malware White-box Bit flips Forward gradient-based Heuristic / Greedy best-first

Jia and Gong (2018)
App recommendations White-box Item addition, modification Forward gradient-based Heuristic / Greedy best-first

Overdorf et al. (2018)
Credit scoring Black-box Category modifications Number of modifications Cost / Uniform-cost

* White-box * Graph path cost with edge costs Lower bound on minimal adversarial cost over continuous domain Linear combination of cost and heuristic / A

Table 1: Instantiations of existing attacks within the graphical framework


  • Papernot et al. (2016) Papernot, N., McDaniel, P. D., Swami, A., and Harang, R. E. (2016).

    Crafting adversarial input sequences for recurrent neural networks.

    In J. Brand, M. C. Valenti, A. Akinpelu, B. T. Doshi, and B. L. Gorsic, editors, 2016 IEEE Military Communications Conference, MILCOM 2016, Baltimore, MD, USA, November 1-3, 2016, pages 49–54. IEEE.
  • Grosse et al. (2016) Grosse, K., Papernot, N., Manoharan, P., Backes, M., and McDaniel, P. D. (2016). Adversarial perturbations against deep neural networks for malware classification. CoRR, abs/1606.04435.
  • Gao et al. (2018) Gao, J., Lanchantin, J., Soffa, M. L., and Qi, Y. (2018).

    Black-box generation of adversarial text sequences to evade deep learning classifiers.

    In 2018 IEEE Security and Privacy Workshops, SP Workshops 2018, San Francisco, CA, USA, May 24, 2018, pages 50–56. IEEE.
  • Kolosnjaji et al. (2018) Kolosnjaji, B., Demontis, A., Biggio, B., Maiorca, D., Giacinto, G., Eckert, C., and Roli, F. (2018). Adversarial malware binaries: Evading deep learning for malware detection in executables. CoRR, abs/1803.04173.
  • Jia and Gong (2018) Jia, J. and Gong, N. Z. (2018). Attriguard: A practical defense against attribute inference attacks via adversarial machine learning. In W. Enck and A. P. Felt, editors, 27th USENIX Security Symposium, USENIX Security 2018, Baltimore, MD, USA, August 15-17, 2018., pages 513–529. USENIX Association.
  • Xu et al. (2016) Xu, W., Qi, Y., and Evans, D. (2016). Automatically evading classifiers: A case study on PDF malware classifiers. In 23rd Annual Network and Distributed System Security Symposium, NDSS 2016, San Diego, California, USA, February 21-24, 2016. The Internet Society.
  • Dang et al. (2017) Dang, H., Huang, Y., and Chang, E. (2017). Evading classifiers by morphing in the dark. In B. M. Thuraisingham, D. Evans, T. Malkin, and D. Xu, editors, Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, Dallas, TX, USA, October 30 - November 03, 2017, pages 119–133. ACM.
  • Demontis et al. (2017) Demontis, A., Melis, M., Biggio, B., Maiorca, D., Arp, D., Rieck, K., Corona, I., Giacinto, G., and Roli, F. (2017). Yes, machine learning can be more secure! A case study on android malware detection. CoRR, abs/1704.08996.
  • Bastani et al. (2016) Bastani, O., Ioannou, Y., Lampropoulos, L., Vytiniotis, D., Nori, A. V., and Criminisi, A. (2016). Measuring neural net robustness with constraints. In (Lee et al., 2016), pages 2613–2621.
  • Fawzi et al. (2016) Fawzi, A., Moosavi-Dezfooli, S., and Frossard, P. (2016). Robustness of classifiers: from adversarial to random noise. In (Lee et al., 2016), pages 1624–1632.
  • Lowd and Meek (2005) Lowd, D. and Meek, C. (2005). Adversarial learning. In SIGKDD, pages 641–647.
  • Hart et al. (1968) Hart, P. E., Nilsson, N. J., and Raphael, B. (1968). A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Systems Science and Cybernetics, 4(2), 100–107.
  • Korf (1985) Korf, R. E. (1985). Iterative-Deepening-A*: An optimal admissible tree search. In A. K. Joshi, editor,

    Proceedings of the 9th International Joint Conference on Artificial Intelligence. Los Angeles, CA, USA, August 1985

    , pages 1034–1036. Morgan Kaufmann.
  • Dechter and Pearl (1985) Dechter, R. and Pearl, J. (1985). Generalized best-first search strategies and the optimality of A*. J. ACM, 32(3), 505–536.
  • Sharif et al. (2018) Sharif, M., Bauer, L., and Reiter, M. K. (2018). On the suitability of -norms for creating and preventing adversarial examples. CoRR, abs/1802.09653.
  • Carlini et al. (2017) Carlini, N., Katz, G., Barrett, C., and Dill, D. L. (2017). Provably minimally-distorted adversarial examples. arxiv preprint. arXiv, 1709.
  • Katz et al. (2017) Katz, G., Barrett, C. W., Dill, D. L., Julian, K., and Kochenderfer, M. J. (2017). Reluplex: An efficient SMT solver for verifying deep neural networks. In R. Majumdar and V. Kuncak, editors, Computer Aided Verification - 29th International Conference, CAV 2017, Heidelberg, Germany, July 24-28, 2017, Proceedings, Part I, volume 10426 of Lecture Notes in Computer Science, pages 97–117. Springer.
  • Tsuzuku et al. (2018) Tsuzuku, Y., Sato, I., and Sugiyama, M. (2018). Lipschitz-margin training: Scalable certification of perturbation invariance for deep neural networks. CoRR, abs/1802.04034.
  • Peck et al. (2017) Peck, J., Roels, J., Goossens, B., and Saeys, Y. (2017). Lower bounds on the robustness to adversarial perturbations. In (Guyon et al., 2017), pages 804–813.
  • Hein and Andriushchenko (2017) Hein, M. and Andriushchenko, M. (2017). Formal guarantees on the robustness of a classifier against adversarial manipulation. In (Guyon et al., 2017), pages 2263–2273.
  • Fawzi et al. (2018) Fawzi, A., Fawzi, O., and Frossard, P. (2018). Analysis of classifiers’ robustness to adversarial perturbations. Machine Learning, 107(3), 481–508.
  • Pohl (1970) Pohl, I. (1970). Heuristic search viewed as path finding in a graph. Artif. Intell., 1(3), 193–204.
  • Pohl (1973) Pohl, I. (1973). The avoidance of (relative) catastrophe, heuristic competence, genuine dynamic weighting and computational issues in heuristic problem solving. In N. J. Nilsson, editor, Proceedings of the 3rd International Joint Conference on Artificial Intelligence. Standford, CA, USA, August 20-23, 1973, pages 12–17. William Kaufmann.
  • Pearl and Kim (1982) Pearl, J. and Kim, J. H. (1982). Studies in semi-admissible heuristics. IEEE Trans. Pattern Anal. Mach. Intell., 4(4), 392–399.
  • Likhachev et al. (2003) Likhachev, M., Gordon, G. J., and Thrun, S. (2003). ARA*: Anytime A* with provable bounds on sub-optimality. In S. Thrun, L. K. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing Systems 16 [Neural Information Processing Systems, NIPS 2003, December 8-13, 2003, Vancouver and Whistler, British Columbia, Canada], pages 767–774. MIT Press.
  • Gilani et al. (2017) Gilani, Z., Kochmar, E., and Crowcroft, J. (2017). Classification of twitter accounts into automated agents and human users. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, ASONAM ’17, pages 489–496, New York, NY, USA. ACM.
  • Kulynych (2017) Kulynych, B. (2017). textfool: Plausible looking adversarial examples for text classification.
  • Liang et al. (2018) Liang, B., Li, H., Su, M., Bian, P., Li, X., and Shi, W. (2018). Deep text classification can be fooled. In J. Lang, editor, Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden., pages 4208–4215. ijcai.org.
  • Ebrahimi et al. (2018) Ebrahimi, J., Rao, A., Lowd, D., and Dou, D. (2018). Hotflip: White-box adversarial examples for text classification. In I. Gurevych and Y. Miyao, editors, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 2: Short Papers, pages 31–36. Association for Computational Linguistics.
  • Overdorf et al. (2018) Overdorf, R., Kulynych, B., Balsa, E., Troncoso, C., and Gürses, S. (2018). POTs: Protective optimization technologies. CoRR, abs/1806.02711.
  • Lee et al. (2016) Lee, D. D., Sugiyama, M., von Luxburg, U., Guyon, I., and Garnett, R., editors (2016). Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain.
  • Guyon et al. (2017) Guyon, I., von Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R., Vishwanathan, S. V. N., and Garnett, R., editors (2017). Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA.

Appendix A Proof of Statement 2.1

Observe that if , the heuristic is equal to zero, and hence is trivially admissible. Indeed, it cannot overestimate due to the fact that and for any .

It is therefore sufficient to show that if , the lower bound on adversarial robustness at over never overestimates :


The following sequence holds:

(for some s.t. )

The first equality is by definition of (see Equation 4).

Since is the norm of the smallest adversarial perturbation over , the distance from to is smaller than the distance from to any other that also flips the decision of the classifier:

Hence, the first inequality.

By Equation 3, is a path cost for some path:

By triangle property of the metric, the second inequality holds:

Hence, , which implies Equation 5, and concludes the proof.

Appendix B Details of the heuristic for linear models

Let the model be a linear model over the input space , that is, is an identity mapping. Then, is a distance from a point to the linear decision hyperplane defined by the discriminant function :

where is the Hölder conjugate of : .

Appendix C Supplementary figures

Figure 2: An admissible heuristic for graph search. Circles represent feasible examples , with being a subset of 2-dimensional Euclidean space. The color and a plus or minus shows the true class of an example. Arrows represent transformations in , with the length of the arrow being the cost of the transformation. The MAC at point over is the distance from the point to the decision boundary of the target linear model . The value of the admissible heuristic is this distance to the decision boundary.
Feature Column name Type Note
Number of tweets user_tweeted Integer
Number of retweets user_retweeted Integer
Number of replies user_replied Integer
Age of account, days age_of_account_in_days Integer
Total number of URLs in tweets urls_count Integer
Number of favourites (normalized) user_favourited Float
Number of lists (normalized) lists_per_user Float
Average number of likes per tweet likes_per_tweet Float
Average number of retweets per tweet retweets_per_tweet Float
Size of CDN content, kB cdn_content_in_kb Float
Apps used to post the tweets source_identity Set of categories 6 categories
Total number of apps used sources_count Integer
Followers-to-friends ratio follower_friend_ratio Float Dropped
Favourites-to-tweets ratio favourite_tweet_ratio Float Dropped
Average cumulative tweet frequency tweet_frequency Float Dropped
  • We have dropped this feature from the dataset, since the effect of transformations on it cannot be computed without having original tweets. We were not able to obtain the original dataset of the tweets for compliance reasons.

Table 2: Features from the Twitter bot classification dataset by Gilani et al. (2017)
source_identity user_tweeted user_retweeted user_favourited user_replied lists_per_user age_of_account_in_days urls_count
[other, browser, mobile] (6.0, 7.0] [0, 1.0] (52.0, 207.8] [0, 1.0] (0.011, 0.0143] (2127.598, 2264.455] (5.0, 6.0]
[browser, mobile] (6.0, 7.0] [0, 1.0] (207.8, 476.143] [0, 1.0] (0.011, 0.0143] (2127.598, 2264.455] (5.0, 6.0]
[other] [0, 1.0] (19.0, 32.0] [0, 4.0] (26.0, 42.0] (0.024, 0.0329] (766.122, 849.108] (16.0, 24.2]
[mobile, osn] [0, 1.0] (12.0, 19.0] [0, 4.0] (26.0, 42.0] (0.0188, 0.024] (766.122, 849.108] (16.0, 24.2]
[other, browser, mobile] (20.0, 27.0] (19.0, 32.0] (35990.095, 211890.704] (18.0, 26.0] (0.024, 0.0329] (2450.073, 3332.802] (16.0, 24.2]
[browser, mobile] (20.0, 27.0] (12.0, 19.0] (35990.095, 211890.704] (18.0, 26.0] (0.024, 0.0329] (2450.073, 3332.802] (16.0, 24.2]
[marketing] (55.6, 766.0] [0, 1.0] (772.24, 1164.881] [0, 1.0] (0.05, 0.0962] (1384.307, 1492.455] (56.0, 806.0]
[mobile, osn] (27.0, 55.6] [0, 1.0] (772.24, 1164.881] [0, 1.0] (0.05, 0.0962] (1384.307, 1492.455] (56.0, 806.0]
[automation] (13.0, 16.0] [0, 1.0] (4.0, 52.0] [0, 1.0] (0.05, 0.0962] (1177.447, 1269.787] (12.0, 16.0]
[mobile, osn] (13.0, 16.0] [0, 1.0] (4.0, 52.0] (1.0, 2.0] (0.05, 0.0962] (1269.787, 1384.307] (12.0, 16.0]
[automation] (16.0, 20.0] [0, 1.0] [0, 4.0] [0, 1.0] (0.00373, 0.00485] (503.551, 645.233] (16.0, 24.2]
[mobile] (16.0, 20.0] [0, 1.0] [0, 4.0] (1.0, 2.0] (0.00485, 0.00594] (503.551, 645.233] (16.0, 24.2]
[automation] (55.6, 766.0] [0, 1.0] [0, 4.0] [0, 1.0] (0.024, 0.0329] (1939.803, 2127.598] (56.0, 806.0]
[mobile] (27.0, 55.6] [0, 1.0] [0, 4.0] (1.0, 2.0] (0.0143, 0.0188] (1939.803, 2127.598] (56.0, 806.0]
[marketing] (55.6, 766.0] (2.0, 3.0] [0, 4.0] [0, 1.0] (0.0962, 62.702] (849.108, 956.554] (56.0, 806.0]
[mobile] (27.0, 55.6] (2.0, 3.0] [0, 4.0] (1.0, 2.0] (0.0962, 62.702] (849.108, 956.554] (24.2, 56.0]
[other] (55.6, 766.0] [0, 1.0] (4.0, 52.0] [0, 1.0] (0.05, 0.0962] (1177.447, 1269.787] (56.0, 806.0]
[mobile, osn] (27.0, 55.6] [0, 1.0] (4.0, 52.0] (1.0, 2.0] (0.05, 0.0962] (1269.787, 1384.307] (24.2, 56.0]
[other] (7.0, 9.0] [0, 1.0] [0, 4.0] [0, 1.0] (0.05, 0.0962] (17.321, 209.213] (7.0, 9.0]
[mobile, osn] (7.0, 9.0] [0, 1.0] [0, 4.0] (1.0, 2.0] (0.05, 0.0962] (17.321, 209.213] (7.0, 9.0]
  • Features that were not changed in any adversarial example are not presented.

Table 3: Examples of feature transformations that cause the classifier (<1000 followers, bucketization parameter 20) to misclassify a Twitter bot account as a human account. The first line in every pair corresponds to intial feature values, the second line shows the feature values in a MAC adversarial example. Transformed feature values are emphasized in bold