API Misuse Correction: A Statistical Approach

08/18/2019
by   Tam The Nguyen, et al.
Auburn University
0

Modern software development relies heavily on Application Programming Interface (API) libraries. However, there are often certain constraints on using API elements in such libraries. Failing to follow such constraints (API misuse) could lead to serious programming errors. Many approaches have been proposed to detect API misuses, but they still have low accuracy and cannot repair the detected misuses. In this paper, we propose SAM, a novel approach to detect and repair API misuses automatically. SAM uses statistical models to describe five factors involving in any API method call: related method calls, exceptions, pre-conditions, post-conditions, and values of arguments. These statistical models are trained from a large repository of high-quality production code. Then, given a piece of code, SAM verifies each of its method calls with the trained statistical models. If a factor has a sufficiently low probability, the corresponding call is considered as an API misuse. SAM performs an optimal search for editing operations to apply on the code until it has no API issue.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

02/18/2021

APIScanner – Towards Automated Detection of Deprecated APIs in Python Libraries

Python libraries are widely used for machine learning and scientific com...
12/01/2020

Designing Voice-Controllable APIs

The main purpose of a voice command system is to process a sentence in n...
12/12/2018

CAMLroot: revisiting the OCaml FFI

The OCaml language comes with a facility for interfacing with C code -- ...
02/01/2021

Automatically Identifying Parameter Constraints in Complex Web APIs: A Case Study at Adyen

Web APIs may have constraints on parameters, such that not all parameter...
01/05/2022

ARCLIN: Automated API Mention Resolution for Unformatted Texts

Online technical forums (e.g., StackOverflow) are popular platforms for ...
02/15/2021

Recommending API Function Calls and Code Snippets to Support Software Development

Software development activity has reached a high degree of complexity, g...
10/05/2018

Sifaka: Text Mining Above a Search API

Text mining and analytics software has become popular, but little attent...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

In modern software development, programmers often rely heavily on Application Programming Interface (API) frameworks and libraries to shorten time-to-market and upgrade cycles of software systems. For example, a prior study reports some Android mobile apps having up to 42% of their external dependencies to Android API and 68% to Java API [20]. However, programmers lacking programming experiences and relevant documentation (including code examples) could use API incorrectly (API misuse). For example, one can forget to check that hasNext() returns true before calling next() with an Iterator from Java API.

API misuse is a popular root cause of software errors, crashes, and vulnerabilities [6, 11, 19, 5, 12, 7, 2, 14]. Thus, researchers have proposed and developed several API misuse detectors. Unfortunately, recent studies show that those detectors often have low accuracy [3]. In addition, to the best of our knowledge, no current approaches can repair the detected API misuses automatically.

To understand the nature of API misuses, we studied a dataset of 144 API misuses provided by Amann et al. [2]

publicly. We found that the root causes of those API misuses can be classified into five groups including incorrect temporal order of method calls, incorrect handling of exceptions, missing pre-conditions and post-conditions, and incorrect values of arguments. Table 

I shows the total number of each kind of misuses in the dataset.

Incorrect temporal order of method calls 75
Incorrect handling of exceptions 27
Missing pre-condition 22
Missing post-condition 15
Incorrect argument value 5
TABLE I: Root causes of 144 API misuses

Ii Approach

In this paper, we propose SAM (“Statistical Approach for API Misuses”), a statistical approach for detecting and repairing API misuses automatically. Suggested by Table 1, SAM uses several statistical models for five factors involving the usage of any API method call: the temporal order between method calls, exception handling, pre-condition, post-condition, and argument values. SAM trains those statistical models using a massive amount of high-quality production code available in open-source repositories and app stores.

After training, SAM, can detect and repair API misuses in a given code snippet. It first validate every method call in such code using its trained statistical models. If one usage factor of a method call has a sufficiently low probability (e.g. less than a threshold), it considers the call as an API misuses. SAM then consider the API misuse correction problem into an optimal search problem in which it searches for repair actions that optimally eliminate those low probability usage factors.

Fig. 1: Reading a file with File and FileOutputStream

To demonstrate the model, Figure 1 shows a code example of API usage that involves 2 Java API objects java.io.File and java.io.FileOutputStream. We define five types of usage factors that represent the usages of API method calls and label them as follows.
Temporal Order: The temporal order specifies the order constraints between API method calls. In particular, the temporal order factor of an API method call m is defined as the API method call appears before m in the API usage. For example, the method call fis.read(bytes) in Figure 1 requires the initialization of the FileInputStream variable fis. is the probability of the temporal order factor of given the code context , or . The method appears right before in the code context.
Precondition: The precondition factors of an API method call m are defined as all the preconditions that need to be checked on the calling object or parameters before calling m. For instance, before initialize the FileInputStream variable fis, the parameter file needs a null check. is the precondition probability of the parameter in the method call in . Each parameter has its own precondition probability, thus, is the product of the those probabilities.
Postcondition: The postcondition factors of an API method call m are defined as all the postconditions that need to be checked on the calling object, parameters, or the returning value after calling m. For instance, after reading a character from the file using the fis.read(bytes) method, the return value is compared with a constant. represents of probability of postcondition factors of the method call .
Argument Value: The argument value factor specifies the value of an argument when it is passed to an API method call. The argument value is limited by the type of argument and the specification of the method call. For example, in Figure 1, the charset used in the string constructor is UTF8. If the argument value is not the name of a supported charset, the method call will throw a UnsupportedEncodingException. represents of probability of argument value factors of the method call .
Exception: The exception factors of an API method call m are defined as the exceptions that need to be handled when calling m. For instance, when calling fis.read(bytes), it is required to handle IOException exception. represents of probability of exception factors of the method call .

SAM represents the usage of an API method call given the context as the combination of its usage factors. Given SAM, we could detect API misuses in code. An API method call is considered a misuse if there is at least one usage factor of a method has a sufficiently low probability (e.g. less than a threshold). Thus, the API misuse detector simply checks all probabilities of usage factors to detect API misuses.

The API misuse correction problem is modeled as an optimal search problem. The algorithm to solve the problem is described in Figure 2. The input of the algorithm includes the code that contains the API misuse(s), and the current editing length . In line 2, the function Detect-API-Misuses() finds all the usage factors that leads to the API misuse(s). If , the correction is finished and is returned. Next, the algorithm compares the current editing length with the maximum editing length in line 6. It then generates repair actions based on with the function Generate-Repair-Actions(). Each repair action is applied on in line 9. The function Correct-API-Misuses() is called recursively to further edit the code by one edit length. The algorithm stops when there is no API misuse(s) in the code or the maximum editing length is reached.

1:function Correct-API-Misuses(Code , EditLength )
2:        Detect-API-Misuses(C)
3:       if  then
4:             return
5:       end if
6:       if   then
7:              Generate-Repair-Actions(X)
8:             for Action in  do
9:                     Modify()
10:                    Correct-API-Misuses()
11:             end for
12:       end if
13:end function
Fig. 2: The API misuse correction algorithm

Training SAM requires calculating the probability distribution of each usage factor of every API method call

. For example, the probability distribution of the temporal order factor of the method call requires computing all probabilities , where the method appears right before . We have,

where is number of occurrences of the bigram . is the number of occurrences of . Thus, the training of SAM involves counting those occurrences. This is an advantage of SAM as it could easily be trained from massive amount of code to improve the accuracy.

Iii Related Work

The researchers have proposed several API-misuse detectors including [23, 16, 10, 24, 8, 18, 17, 1]. Jadet [23] and GroumMiner [16] detect API misuses by detecting violation on object usage graphs. DMMC [10] is a misuse detector for Java specialized in missing method calls. Tikanga [24] is a misuse detector for Java that builds on JADET [23]. Amann et al. [3] provides a systematic evaluation of static API misuse detectors. They also provided MuBench [2], a benchmark for evaluating API misuse detectors. They proposed MuDetect [4], an improved API misuse detector of GroumMiner [16].

Car-Miner [22] is a wrong error handling detector for C++ and Java. ExAssist [15] is a tool for detecting exception handling bugs and automatically fix those bugs. Alattin [21] is a detector for Java that detects missing null checks, missing value or state conditions not involving literals, and missing calls required in checks. DroidAssist [13] is a detector for Dalvik Bytecode. MAD-API [9] is an approach that fixes Android APIs that are out of date.

Iv Conclusions

In this paper, we propose SAM, a statistical approach for detecting and repairing API misuses automatically. Given a piece of code, SAM verifies each of its method calls with the trained statistical models. If a factor has a sufficiently low probability, the corresponding call is considered as an API misuse. SAM performs an optimal search for editing operations to apply on the code until it has no API issue.

References

  • [1] M. Acharya and T. Xie (2009) Mining api error-handling specifications from source code. In Proceedings of the 12th International Conference on Fundamental Approaches to Software Engineering: Held As Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009, FASE ’09, Berlin, Heidelberg, pp. 370–384. External Links: ISBN 978-3-642-00592-3, Link, Document Cited by: §III.
  • [2] S. Amann, S. Nadi, H. A. Nguyen, T. N. Nguyen, and M. Mezini (2016-05) MUBench: a benchmark for api-misuse detectors. In 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR), Vol. , pp. 464–467. External Links: Document, ISSN Cited by: §I, §I, §III.
  • [3] S. Amann, H. A. Nguyen, S. Nadi, T. N. Nguyen, and M. Mezini (2018) A systematic evaluation of static api-misuse detectors. IEEE Transactions on Software Engineering (), pp. 1–1. External Links: Document, ISSN 0098-5589 Cited by: §I, §III.
  • [4] S. Amann, H. A. Nguyen, S. Nadi, T. N. Nguyen, and M. Mezini (2019) Investigating next steps in static api-misuse detection. In Proceedings of the 16th International Conference on Mining Software Repositories, MSR ’19, Piscataway, NJ, USA, pp. 265–275. External Links: Link, Document Cited by: §III.
  • [5] M. Egele, D. Brumley, Y. Fratantonio, and C. Kruegel (2013) An empirical study of cryptographic misuse in android applications. In Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, CCS ’13, New York, NY, USA, pp. 73–84. External Links: ISBN 978-1-4503-2477-9, Link, Document Cited by: §I.
  • [6] S. Fahl, M. Harbach, T. Muders, L. Baumgärtner, B. Freisleben, and M. Smith (2012) Why eve and mallory love android: an analysis of android ssl (in)security. In Proceedings of the 2012 ACM Conference on Computer and Communications Security, CCS ’12, New York, NY, USA, pp. 50–61. External Links: ISBN 978-1-4503-1651-4, Link, Document Cited by: §I.
  • [7] M. Georgiev, S. Iyengar, S. Jana, R. Anubhai, D. Boneh, and V. Shmatikov (2012) The most dangerous code in the world: validating ssl certificates in non-browser software. In Proceedings of the 2012 ACM Conference on Computer and Communications Security, CCS ’12, New York, NY, USA, pp. 38–49. External Links: ISBN 978-1-4503-1651-4, Link, Document Cited by: §I.
  • [8] Z. Li and Y. Zhou (2005) PR-miner: automatically extracting implicit programming rules and detecting violations in large software code. In Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ESEC/FSE-13, New York, NY, USA, pp. 306–315. External Links: ISBN 1-59593-014-0, Link, Document Cited by: §III.
  • [9] T. Luo, J. Wu, M. Yang, S. Zhao, Y. Wu, and Y. Wang (2018) MAD-api: detection, correction and explanation of api misuses in distributed android applications. In Artificial Intelligence and Mobile Services – AIMS 2018, M. Aiello, Y. Yang, Y. Zou, and L. Zhang (Eds.), Cham, pp. 123–140. External Links: ISBN 978-3-319-94361-9 Cited by: §III.
  • [10] M. Monperrus, M. Bruch, and M. Mezini (2010) Detecting missing method calls in object-oriented software. In ECOOP 2010 – Object-Oriented Programming, T. D’Hondt (Ed.), Berlin, Heidelberg, pp. 2–25. External Links: ISBN 978-3-642-14107-2 Cited by: §III.
  • [11] M. Monperrus and M. Mezini (2013-03) Detecting missing method calls as violations of the majority rule. ACM Trans. Softw. Eng. Methodol. 22 (1), pp. 7:1–7:25. External Links: ISSN 1049-331X, Link, Document Cited by: §I.
  • [12] S. Nadi, S. Krüger, M. Mezini, and E. Bodden (2016-05) ”Jumping through hoops”: why do java developers struggle with cryptography apis?. In 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE), Vol. , pp. 935–946. External Links: Document, ISSN 1558-1225 Cited by: §I.
  • [13] T. T. Nguyen, H. V. Pham, P. M. Vu, and T. T. Nguyen (2015-11)

    Recommending api usages for mobile apps with hidden markov model

    .
    In 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), Vol. , pp. 795–800. External Links: Document, ISSN Cited by: §III.
  • [14] T. T. Nguyen, P. M. Vu, and T. T. Nguyen (2019) An empirical study of exception handling bugs and fixes. In Proceedings of the 2019 ACM Southeast Conference, ACM SE ’19, New York, NY, USA, pp. 257–260. External Links: ISBN 978-1-4503-6251-1, Link, Document Cited by: §I.
  • [15] T. T. Nguyen, P. M. Vu, and T. T. Nguyen (2019) Recommending exception handling code. In ICSME, Cited by: §III.
  • [16] T. T. Nguyen, H. A. Nguyen, N. H. Pham, J. M. Al-Kofahi, and T. N. Nguyen (2009) Graph-based mining of multiple object usage patterns. In Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, ESEC/FSE ’09, New York, NY, USA, pp. 383–392. External Links: ISBN 978-1-60558-001-2, Link, Document Cited by: §III.
  • [17] M. K. Ramanathan, A. Grama, and S. Jagannathan (2007-05) Path-sensitive inference of function precedence protocols. In 29th International Conference on Software Engineering (ICSE’07), Vol. , pp. 240–250. External Links: Document, ISSN 0270-5257 Cited by: §III.
  • [18] M. K. Ramanathan, A. Grama, and S. Jagannathan (2007) Static specification inference using predicate mining. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’07, New York, NY, USA, pp. 123–134. External Links: ISBN 978-1-59593-633-2, Link, Document Cited by: §III.
  • [19] J. Sushine, J. D. Herbsleb, and J. Aldrich (2015) Searching the state space: a qualitative study of api protocol usability. In Proceedings of the 2015 IEEE 23rd International Conference on Program Comprehension, ICPC ’15, Piscataway, NJ, USA, pp. 82–93. External Links: Link Cited by: §I.
  • [20] M. D. Syer, M. Nagappan, A. E. Hassan, and B. Adams (2013) Revisiting prior empirical findings for mobile apps: an empirical case study on the 15 most popular open-source android apps. In Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research, CASCON ’13, Riverton, NJ, USA, pp. 283–297. External Links: Link Cited by: §I.
  • [21] S. Thummalapenta and T. Xie (2009-11) Alattin: mining alternative patterns for detecting neglected conditions. In 2009 IEEE/ACM International Conference on Automated Software Engineering, Vol. , pp. 283–294. External Links: Document, ISSN 1938-4300 Cited by: §III.
  • [22] S. Thummalapenta and T. Xie (2009) Mining exception-handling rules as sequence association rules. In Proceedings of the 31st International Conference on Software Engineering, ICSE ’09, Washington, DC, USA, pp. 496–506. External Links: ISBN 978-1-4244-3453-4, Link, Document Cited by: §III.
  • [23] A. Wasylkowski, A. Zeller, and C. Lindig (2007) Detecting object usage anomalies. In Proceedings of the the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, ESEC-FSE ’07, New York, NY, USA, pp. 35–44. External Links: ISBN 978-1-59593-811-4, Link, Document Cited by: §III.
  • [24] A. Wasylkowski and A. Zeller (2009) Mining temporal specifications from object usage. In Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering, ASE ’09, Washington, DC, USA, pp. 295–306. External Links: ISBN 978-0-7695-3891-4, Link, Document Cited by: §III.