The choice of log severity level can be challenging and cause problems in producing reliable logging data. However, there is a lack of specifications and practical guidelines to support this challenge. In this study, we present a multivocal systematic mapping of log severity levels. We analyzed 19 severity levels from 27 studies and 40 logging libraries from literature peer-reviewed, logging libraries, and practitioners' views. Our results show redundancy and semantic similarity between the levels and a tendency to converge the levels for a total of six levels. Our contributions help leverage the reliability of log entries: (i) mapping the literature about log severity levels, (ii) mapping the severity levels in logging libraries, (iii) a set of synthesized six definitions and four general purposes for severity levels. We recommend that developers use a standard nomenclature, and for logging library creators, we suggest providing accurate and unambiguous definitions of log severity levels.READ FULL TEXT VIEW PDF
Logs are often the primary source of information for system developers and operators to understand and diagnose the behavior of a software system IST/EL2020/systematic. According Lin et al. ICSE/LIN2016/log-clustering “engineers need to examine the recorded logs to gain insight into the failure, identify the problems, and perform troubleshooting”. According to El-Masri et al. IST/EL2020/systematic, each log entry is usually composed of time-stamp, severity level, software component, and log message. Severity levels indicate the degree of severity of the log message SPE/KIM2020/automatic. For example, a less severe level is used to indicate that the system behaves as expected, while a more severe level is used to indicate that a problem has occurred ICSE/CHEN2017/characterizing-antipatterns.
The choice of severity level impacts the amount of log data that a software system produces ICSE/LIN2016/log-clusteringICSE/CHEN2017/characterizing-antipatternsESE/CHOWDHURY2018/exploratoryESE/ZENG2019/studying. For example, if a system is set to Warn level, only statements marked with Warn levels and higher levels (e.g., Error, Fatal) will be output ICSE/CHEN2017/characterizing-antipatterns.
In this sense, when a developer choose severity levels inappropriately, the system can produce more log entries than it should, or the opposite, less log entries ESE/HASSANI2018/studying. In both scenarios, the wrong choice of severity level can cause problems in the software system performance ICSE/CHEN2017/characterizing-antipatterns ESE/LI2017/LogLevelChoose ICSE/YUAN2012/characterizingLoggingPractices, in the maintenance ESE/LI2017/LogLevelChoose ASE/HE2018/characterizingNaturalLanguageDescriptions, as well affect log-based monitoring and diagnostics ESE/HASSANI2018/studying ESE/LI2017/LogLevelChoose ASWEC/RONG2018/logging.
Developers spend significant time adjusting log severity levels ESE/KABINNA2018/examining. After an initial choice, developers may modify the severity level re-evaluating how critical an event is ICSE/YUAN2012/characterizingLoggingPractices SPSP/ZHAO2017/log20
. They can re-evaluate if a statement, initially classified asInfo, would actually be of Error level, or if it would not be an intermediate level between the two levels, that is, a Warn SPSP/ZHAO2017/log20. Among the factors that make choosing the severity level a challenge are: (i) lack of knowledge of how logs will be used COMACM/OLINER2012/advances; (ii) lack of understanding how critical an event is; ESE/ZENG2019/studying; (iii) the ambiguity of certain events that seem to be related to multiple levels of severity ICSE/LIN2016/log-clusteringSPSP/ZHAO2017/log20.
In addition, there is a lack of specifications and practical guidelines for performing logging tasks in projects and industry ICSME/ANU2019/verbosityloglevels ASE/HE2018/characterizingNaturalLanguageDescriptions ASWEC/RONG2018/logging. The consequence is that “personal experience and preferences play an important role in logging practices” in software development projects ASWEC/RONG2018/logging.
Considering the lack of guidelines and specifications for logging practices, we found studies that focus on where to log ICSE/FU2014/developersSPSP/ZHAO2017/log20TSE/LI2020/qualitative and what to log ESE/LI2017/LogLevelChoose when searching the literature for studies on log severity levels. However, we came across a gap in studies that specifically analyze the log severity levels. Thus, we address the following research question: What are log severity levels?
To answer this question, we studied the state of art and practice of log severity levels, surveying their nomenclatures, definitions and descriptions, using three different sources: (1) peer-reviewed literature, (2) logging libraries, and (3) practitioners’ point of view.
Our results provide a landscape of log severity levels and show a convergence between academia and industry regarding their definitions. We observed that when putting the set of nomenclatures and definitions raised in perspective, we can see a convergence toward three purposes: Debugging, Informational, Warning, and Failure. Furthermore, we proposed definitions for that purposes and for the six severity levels that characterize the state of logging practice. Our study meets the needs of guidelines and specifications reported in the literature, supporting developers and system operators in generating reliable log data entries. Our study also support the conception process of logging library creators.
The main contributions of this study are:
a mapping of the literature on log severity levels;
a mapping of severity levels in the logging libraries;
a set of synthesized definitions for six log severity levels, and the suggestion of four general purposes for severity levels.
This paper is organized as follows. The following Section presents the multivocal mapping of log severity levels covering the peer-reviewed literature, logging libraries and practitioners’ views. Section 3 presents the log severity levels synthesis. Section 4 presents a discussion of the main findings, our recommendations, and the threats to validity. Section 5 closes the study presenting our conclusions and future work.
We conducted our research on log severity levels using three sources: peer-reviewed literature (PRL), to capture what is state-of-the-art; log libraries (LL), to capture a vision of library creators; and to capture how practitioners understand and use log severity levels, the Stack Overflow111 https://stackoverflow.com/, a Q/A website (QA). This study aims to identify and summarize the log severity levels, mapping the knowledge about the current utilization. All data are available in our reproducibility package at https://github.com/Log-Severity-Level.
We performed a two-stage systematic search in order to identify the current literature on log severity levels, as shown in Fig. 1. On Stage 1, we adopted automated search as the search strategy. According to Keele et al. KEELE2007/guidelines, automated search is the most common utilized search strategy to identify relevant studies for a Systematic Mapping. In Stage 1, our search query was: ("log level" OR "log severity" OR "logging level" OR "logging severity" OR ("severity level" AND (logging OR log)).
We executed our search query on Scopus222https://www.scopus.com, using three metadata fields: title, abstract and keywords and we found 291 studies. Then we applied the inclusion (IC) and exclusion (EC) criteria, specifically:
IC1: The study must be a conference paper or article;
IC2: The study must be of the Computer Science area;
IC3: The study must be a primary study;
IC4: The study should address logging practices;
IC5: The study should describe the use of log severity levels or define them;
EC1: The study is not written in English;
EC2: The study is a duplicate;
EC3: The study does not present a link between logging practices and the use of log severity levels.
After applying the IC1, IC2, EC1, and EC2 we obtained 40 studies. We read the title and abstract of each of them and, after filtering by IC3, IC4, IC5, and EC3 we obtained the seed dataset with 9 studies.
On Stage 2, we used our initial set of nine studies as our seed set to perform three rounds of snowballing, backward and forward, detailed in Fig. 1.
The final dataset included 27 studies (Table 1).
|[P01]||COMACM/OLINER2012/advances||Advances and challenges in log analysis||2012|
Characterizing logging practices in open-source software
|[P03]||WCCCT/GOMATHY2014/developing||Developing an error logging framework for ruby on rails application using AOP||2014|
|[P04]||ICSE/FU2014/developers||Where do developers log? An empirical study on logging practices in industry||2014|
|[P05]||ESE/SHANG2015/studying||Studying the relationship between logging characteristics and the code quality of platform software||2015|
|[P06]||ICSE/LIN2016/log-clustering||Log clustering based problem identification for online service systems||2016|
|[P07]||ESE/LI2017/LogLevelChoose||Which log level should developers choose for a new logging statement?||2017|
|[P08]||ICSE/CHEN2017/characterizing-antipatterns||Characterizing and Detecting Anti-Patterns in the Logging Code||2017|
|[P09]||ESE/CHEN2017/characterizing-logging-practices||Characterizing logging practices in Java-based open source software projects – a replication study in Apache Software Foundation||2017|
|[P10]||SPSP/ZHAO2017/log20||Log20: Fully Automated Optimal Placement of Log Printing Statements under Specified Overhead Threshold||2017|
|[P11]||ESE/LI2017/towards||Towards just-in-time suggestions for log changes||2017|
|[P12]||ASWEC/RONG2018/logging||How is logging practice implemented in open source software projects? A preliminary exploration||2018|
|[P13]||ASE/HE2018/characterizingNaturalLanguageDescriptions||Characterizing the natural language descriptions in software logging statements||2018|
|[P14]||ESE/HASSANI2018/studying||Studying and detecting log-related issues||2018|
|[P15]||ESE/CHOWDHURY2018/exploratory||An exploratory study on assessing the energy impact of logging on android applications||2018|
|[P16]||ESE/KABINNA2018/examining||Examining the stability of logging statements||2018|
|[P17]||CLOUD/YUAN2019/approach||An approach to cloud execution failure diagnosis based on exception logs in Openstack||2019|
|[P18]||ICSME/ANU2019/verbosityloglevels||An Approach to Recommendation of Verbosity Log Levels Based on Logging Intention||2019|
|[P19]||ESE/ZENG2019/studying||Studying the characteristics of logging practices in mobile apps: a case study on F-Droid||2019|
|[P20]||ICSE/LI2019/dlfinder||DLFinder: Characterizing and Detecting Duplicate Logging Code Smells||2019|
|[P21]||ESE/CHEN2019/extracting||Extracting and studying the Logging-Code-Issue-Introducing changes in Java-based large-scale open source software systems||2019|
|[P22]||TSE/LI2020/qualitative||A Qualitative Study of the Benefits and Costs of Logging from Developers Perspectives||2020|
|[P23]||SPE/KIM2020/automatic||Automatic recommendation to appropriate log levels||2020|
|[P24]||ICTSS/BHARKAD2020/optimizing||Optimizing Root Cause Analysis Time Using Smart Logging Framework for Unix and GNU/Linux Based Operating System||2020|
|[P25]||ICDC/OBRKEBSKI2019/log||Log Based Analysis of Software Application Operation||2020|
|[P26]||ACMSAC/GHOLAMIAN2020/logging||Logging statements’ prediction based on source code clones||2020|
|[P27]||ACMCASE/LI2020/where-shall||Where Shall We Log? Studying and Suggesting Logging Locations in Code Blocks||2020|
Distribution of included studies. The highest number of publications that include log severity levels were published in the last three years: five studies in 2017, five studies in 2018, five studies in 2019, and six studies in 2020. These 21 studies represent 78% of our set of included studies and makes us observe that interest in the subject has been increasing in recent years.
Most of the included studies deal with severity levels in general, presenting severity levels to illustrate and clarify logging processes as a whole. Others studies address severity levels as one aspect of their research on logging practices [P02][P09][P12][P19] in the study of log statements [P13][P16][P26]. Some studies deal with specific problems related to severity levels, where to log in [P04][P10], which severity level to choose [P07], and automatic recommendation of severity levels [P23][P27].
[colframe=gray!25, coltitle=black, arc=0mm, title=Finding #1] Research on log severity levels has grown in recent years.
Presence of log severity levels. All 27 studies mention log severity levels (e.g., Debug, Info, Warn,…); 23 studies (85%) mention at least three severity levels. In contrast, only eight studies (30%) have definitions or descriptions for severity levels As shown in Fig. 2, it is possible to distinguish two groups of severity levels: the most mentioned and the least mentioned. The first one formed by the Error (26), Debug (25), Info (23), Warn (20), Fatal (17), and Trace (14) levels, makes up 93% of the mentions, and the latter formed by the Notice, Critical, Alert, Verbose, Panic, and Failure levels, making up the remaining 7%. The most representative group in the number of mentions is also the one that comes with the most definitions.
[colframe=gray!25, coltitle=black, arc=0mm, title=Finding #2] Error, Debug, Info, Warn, Fatal, and Trace are the levels that stand out in the log severity level research.
Categorization. Some studies propose categorizations related to log statements. He et al. ASE/HE2018/characterizingNaturalLanguageDescriptions group the logging descriptions333“(…) the textual part of a log statement, excluding variables” ASE/HE2018/characterizingNaturalLanguageDescriptions into three main categories: (i) “description for program operation”, (ii) “description for error condition”, and (iii) “description for high-level code semantics”. The descriptions of the first category appear related to the Info severity level, describing three types of operations: complete operation, current operation, and next operation. The second category describes the occurrence of an error/exception; severity levels related to this category are Info and Error. In the third category, the logging descriptions essentially describe the code, e.g., variables, functions, and branches, such as if-else blocks; all examples of this category use Debug level.
Yuan et al. ICSE/YUAN2012/characterizingLoggingPractices comment on two classes of levels: “error-level (e.g., error, fatal) (…) and non-error (also non-fatal levels), such as info and debug”. In the same way, they comment on other two classes considering average logging level444The “average logging level” is a metric calculated from transforming each log severity level into quantitative measures. ESE/SHANG2015/studying.:
“Intuitively, high-level logs are for system operators and lower-level logs are for development purposes. (…) The higher-level logs are used typically by administrators and lower-level logs are used by developers and testers”. ESE/SHANG2015/studying
[colframe=gray!25, coltitle=black, arc=0mm, title=Finding #3] Studies seek to categorize the severity levels or elements of log sentences.
Definitions. Table 2 presents the definitions and descriptions found for the log severity levels. Four studies ([P02], [P12], [P14], [P24]) have definitions for a defined set of levels (4 or more levels). The other four present descriptions for only one or two levels to contextualize the idea of log severity level [P20][P23][P25][P27].
[colframe=gray!25, coltitle=black, arc=0mm, title=Finding #4] Only 15% of selected studies have a set offour or more definitions for log severity levels.
Next, we comment on the levels with more than one definition in Table 2.
As described by its name, four of the definitions associate the Debug level with debugging tasks [P02][P12][P14][P24]. Its target phase of the software process is development, consequently consumed mainly by developers [P23][P24]. Debug level appears related to expressions like “verbose,” “fine-grained information,” “details of events,” “useful for developers.” Kim et al. SPE/KIM2020/automatic describe it as “broadly used to designate the state of the variable.”
The Trace level is described as “more refined than the Debug level”.
Info level messages are described as “important but normal events” [P02] [P14][P24], used to highlight and describe the application’s progress [P12][P25] “at coarse-grained level,” whose circumstances do not require action to take [P24].
Unlike the Info level, the Warn level definitions describe it as a severity level that requires action to be taken [P24] because it designates potentially harmful situations [P12] capable of causing system problems [P27].
The definitions for the Error level do not say much beyond their goal of logging errors or failed operations [P02][P12][P14][P24]. However to [P12], Error level “designates error events that might still allow the application to continue running.”
The expressions used in the Fatal level definitions are aborting processes or applications [P02][P12][P14], very severe errors [P12], and critical problems [P23].
|Trace||[P12]||“Trace / Finest: This level designates finer-grained informational events than the ’Debug’."|
|[P14]||“and trace (tracing steps of the execution, most fine-grained information)"|
|Debug||[P02]||“debug (i.e., verbose logging only for debugging)"|
|[P12]||“Debug / Fine / Finer: This level designates fine-grained informational events that are most useful to debug an application."|
|[P14]||“debug (verbose logging only for debugging)"|
|[P23]||“Moreover, the log level debug is broadly used to designate the state of the variable during the development phase with the corresponding message."|
|[P24]||“Level 7 Debug: Messages at debug level contains more details of events, debug level log messages are more useful for developers and for|
|debugging an application."|
|Info||[P02]||“info (i.e., record important but normal events)"|
|[P12]||“Info / Config: This level designates informational messages that highlight the progress of the application coarse-grained level."|
|[P14]||“info (record important but normal events)"|
|[P24]||“Level 6 Information: Normal operation messages are at this level, no action is required to take."|
|[P25]||“Info level entries describe application operation, e.g. details of creating services."|
|Notice||[P24]||“Level 5 Notice: Unusual event is mentioned, but not, an error is shown."|
|Warn||[P12]||“Warn / Warning: This level designates potentially harmful situations."|
|[P24]||“Level 4 Warning: Warning messages indicate that an error may occur if action is not taken."|
|[P27]||“The logging statement is at the warn level, which is the level for recording information that may potentially cause system oddities"|
|Error||[P02]||“error (i.e., record error events)|
|[P12]||“Error / Severe: This level designates error events that might still allow the application to continue running."|
|[P14]||“error (record error events)"|
|[P20]||“The logging statement is at the error level, which is the level for recording failed operations."|
|[P24]||“Level 3 Error: Occurred error information is shown in this kind of log messages."|
|Critical||[P24]||“Level 2 Critical: Critical level messages, is written in the log file when a critical situation occurs in the normal execution of the system."|
|Alert||[P24]||“Level 1 Alert: Alert level messages indicate that respective one should be corrected immediately."|
|Fatal||[P02]||“fatal (i.e., abort a process after logging)"|
|[P12]||“Fatal: This level designates very severe error events that will presumably lead the application to abort."|
|[P14]||“fatal (abort a process after logging)"|
|[P23]||“For example, the log level fatal is used to indicate that a critical problem has occurred around the position of the log statement, where the developer|
|tries to leave an appropriate log message as a clue to treat it later. "|
|The levels are distributed from top to bottom, from the least severe to the most severe.|
We use the PYLP index555https://pypl.github.io/PYPL.html, a ranking of programming languages, as a starting point for library selection, selecting languages with a “share value” greater than 1.0%. We got 16 languages, so we took these languages and queried Google Search: logging library, concatenating the name of each language and used the first result page for each query. We found 160 hits (blogs, forums, code repositories), and from them, we mapped 60 libraries. We inspect code repositories (when available), documentation and library guidelines to apply our inclusion and exclusion criteria:
IC1: The library/language has a set of log severity levels;
EC1: The library does not create log statements with log severity levels;
EC2: The library is on Github and has less than 1000 stars.
After applying the above criteria, we obtained 37 libraries. We manually added Java Util Logging, PHP logging, and Syslog-ng to the set. Our final dataset included 40 libraries666Among the libraries, three appear in two versions (Log4J [L13], versions 1 e 2; Loguru versions C++[L27] and Python [L34]; PHP Logging, versions Linux and Windows [L36]) (Table 3) and 63 sources of source code, documentation, and library guidelines.
Table 3 presents the logging libraries selected for the study and the 19 different severity levels found in them. The levels are distributed from left to right, from the least severe to the most severe. The table does not show the pseudo-levels (e.g., All, Off, Notset, Log4Net_Debug) and groups the variant nomenclatures for the same level (Info/Informational, and Warn/Warning) [L17][L40].
Distribution of levels by library. Of the selected libraries, 91% have between five and eight severity levels, 39.5% have six levels, 23.3% have five levels, 14% have eight levels, and 14% have seven levels. The libraries with the lowest number of levels, Google Glog [L01] and Golang Glog [L02], have the four same levels: Info, Warn, Error, and Fatal. The libraries with more levels are Log4C [L39] and Log4Net [L40], with nine and 15 severity levels, respectively.
[colframe=gray!25, coltitle=black, arc=0mm, title=Finding #5] The lowest number of log severity levels in libraries is four, and the highest number is 15.
Representativenes of levels. When aggregating the data from Table 3, we see that six levels are present in more than 50% of the libraries, four of which are present in more than 90%, Info (100%), Warn (98%), Error (98% ), Debug (93%), Trace (55%), and Fatal (52%).
[colframe=gray!25, coltitle=black, arc=0mm, title=Finding #6] Six levels are present in over 50% of libraries, among them three in over 90%: Info (100%), Warn (98%), Error (98%), Debug (93%), Trace (55%), and Fatal (52%).
Three libraries have one severity level that is unique to them: Fault in OSLogging [L03], Config in Java Util Logging [L29], and Success in Loguru [L34]. Six levels are only present in up to 10% of libraries: Verbose (10%), Emergency (10%), Finer (10%), Finest (7%), Fine (5%), and Severe (5%).
[colframe=gray!25, coltitle=black, arc=0mm, title=Finding #7] Config, Success, Fault, Severe, Fine, Finest, Finer, Emergency and Verbose have low occurrence in libraries compared to other severity levels, equal to or less than 10%.
The most severe level. The most severe level in 48% of the libraries is the Fatal level, followed by the Error (19%), Critical (14%), Emergency (10%), and Alert (5%) levels. The eight libraries, where the Error level is the most severe, have five logging severity levels.
Number of severity levels over time. Fig. 3 presents the medians of the number of levels of libraries by their release years. There is no linear variation over time in the number of severity levels that the log libraries have provided. There is even a slight variation in quantity. The only point outside the curve is the year 2004, which features the 15 levels of the Log4Net library. Despite this high number, its documentation informs that it “categorizes logging into levels: DEBUG, INFO, WARN, ERROR and FATAL” [L40].
[colframe=gray!25, coltitle=black, arc=0mm, title=Finding #8] There is no trend towards a decrease or increase in the number of log severity levels in future libraries.
Numeric Values. In our dataset, 38 libraries (95%) use numeric values associated with levels to sort them according to their specific degrees of severity (Table 3), as indicated by the following quote:
“Levels have a numeric value that defines the relative ordering between levels.” [L40]
Table 3 shows that it is possible to organize the severity levels of almost all libraries equivalently when considering their numerical values, except for libraries [L17] and [L32]. [L17] presents the only variation in the numerical ordering between Alert and Critical. [L32] has a Debug level numbering different from all other libraries. Furthermore, 60% of libraries (25) have their levels sorted in ascending order and 31% (11) in descending order.
[colframe=gray!25, coltitle=black, arc=0mm, title=Finding #9] There is consistency in the numerical ordering of severity levels across logging libraries.
“Two Levels with the same value are deemed to be equivalent.” [L40]
It is possible to observe libraries with the same numerical value for different severity levels, they are: (i) the Critical and Fatal levels in the Python library [L17]; (ii) Info and Notice, (iii) Error, Critical and Alert, in the PHP library [L36], when running on the Windows operating system; (iv) Finest and Verbose, (v) Finer and Trace, (vi) Debug and Fine in Log4Net (L40). The fact that different severity levels have the same numerical value indicates redundancy of the log levels, which is well exemplified by the following quote:
“Why doesn’t the org.slf4j.Logger interface have methods for the FATAL level? The Marker interface (…) renders the FATAL level largely redundant. If a given error requires attention beyond that allocated for ordinary errors, simply mark the logging statement with a specially designated marker which can be named ‘FATAL’ or any other name to your liking.” [L06]
[colframe=gray!25, coltitle=black, arc=0mm, title=Finding #10] Three libraries have redundancy in the numeric values of their log severity levels.
The “Marker interface”, mentioned in [L06], is an option provided by libraries [L06] and [L13] to add more context to a log statement and avoid redundancy, which allows using only the necessary log levels.
|Fine*,||[L09]||“All of FINE, FINER, and FINEST are intended for relatively detailed tracing. The exact meaning of the three levels will vary between subsystems, but in general,|
|Verb.||FINEST should be used for the most voluminous detailed output, FINER for somewhat less detailed output, and FINE for the lowest volume (and most|
|[L12]||“These levels designate fine-grained informational events that are most useful to debug an application.”|
|Verb.||[L14]||Verbose is the noisiest level, rarely (if ever) enabled for a production app.|
|Trace||[L04]||“Designates very low priority, often extremely verbose, information”|
|[L05]||“The TRACE level designates informational events of very low importance.”|
|[L13]||“A fine-grained debug message, typically capturing the flow through the application.”|
|[L14]||“(…) more detailed information. Expect these to be written to logs only.”|
|[L15]||“Logging from external libraries used by your app or very detailed application logging.”|
|[L16]||“For trace debugging; begin method X, end method X.”|
|[L40]||“The Trace level designates fine-grained informational events that are most useful to debug an application.”|
|[L13]||“Logs that contain the most detailed messages. (…) may contain sensitive application data. (…) should never be enabled in a production environment”|
|Debug||[L04]||“Designates lower priority information.”|
|[L05]||“The DEBUG level designates informational events of lower importance.”|
|[L13]||“A general debugging event.”|
|[L14]||“Detailed information on the flow through the system. Expect these to be written to logs only.”|
|[L15]||“Anything else, i. e. too verbose to be included in ‘info’ level.”|
|[L16]||“For debugging; executed query, user authenticated, session expired.”|
|[L35]||“The message is only for debugging purposes.”|
|[L40]||“The Debug level designates fine-grained informational events that are most useful to debug an application.”|
|[L13]||“Logs that are used for interactive investigation during development. (…) should primarily contain information useful for debugging and have no long-term value”|
|[L14]||“Debug is used for internal system events that are not necessarily observable from the outside, but useful when determining how something happened”|
|Info||[L04]||“Designates useful information.”|
|[L05]||“The INFO level designates informational messages highlighting overall progress of the application.”|
|[L13]||“An event for informational purposes.”|
|[L14]||“Interesting runtime events (startup/shutdown). Expect these to be immediately visible on a console, so be conservative, and keep to a minimum.”|
|[L15]||“Detail on regular operation.”|
|[L16]||“Normal behavior like mail sent, user updated profile etc.”|
|[L29]||“INFO is a message level for informational messages. Typically INFO messages will be written to the console or its equivalent. So the INFO level should only|
|be used for reasonably significant messages that will make sense to end users and system administrators."|
|[L35]||“The message is purely informational.”|
|[L40]||“The Info level designates informational messages that highlight the progress of the application at coarse-grained level.”|
|[L13]||“Logs that track the general flow of the application. These logs should have long-term value.”|
|[L14]||“(…) things happening in the system that correspond to its responsibilities and functions. (…) observable actions the system can perform.”|
|Notice||[L03]||“Captures information that is essential for troubleshooting problems. For example, capture information that might result in a failure.”|
|[L35]||“The message describes a normal but important event.”|
|[L36]||“normal, but significant, condition.”|
|[L40]||“The Notice level designates informational messages that highlight the progress of the application at the highest level.”|
|Warn||[L04]||“Designates hazardous situations.”|
|[L05]||“The WARN level designates potentially harmful situations.”|
|[L13]||“An event that might possible lead to an error.”|
|[L14]||“Use of deprecated APIs, poor use of API, ‘almost’ errors, other runtime situations that are undesirable or unexpected, but not necessarily ‘wrong’. Expect these|
|to be immediately visible on a status console.”|
“A note on something that should probably be looked at by an operator eventually.”
|[L16]||“Something unexpected; application will continue.”|
|[L29]||“WARNING is a message level indicating a potential problem. In general WARNING messages should describe events that will be of interest to end users|
|or system managers, or which indicate potential problems.”|
|[L35]||“Warning conditions / The message is warning.”|
|[L40]||“The Warn level designates potentially harmful situations.”|
|[L13]||“Logs that highlight an abnormal or unexpected event in the application flow, but do not otherwise cause the application execution to stop.”|
|Error||[L04]||“Designates very serious errors.”|
|[L05]||“The ERROR level designates error events which may or not be fatal to the application.”|
|[L13]||“An error in the application, possibly recoverable.”|
|[L14]||“Other runtime errors or unexpected conditions. Expect these to be immediately visible on a status console.”|
|[L15]||“Fatal for a particular request, but the service/app continues servicing other requests. An operator should look at this soon(ish).”|
|[L16]||“Something failed; application may or may not continue.”|
|[L35]||“The message describes an error.”|
|[L40]||“The Error level designates error events that might still allow the application to continue running.”|
|[L13]||“(…) highlight when the current flow of execution is stopped due to a failure. These should indicate a failure in the current activity, not an application-wide failure”|
|Severe||[L40]||“The Severe level designates very severe error events.”|
|Critical||[L17]||“Houston, we have a %s", "major disaster".”|
|[L35]||“The message states a critical condition.”|
|[L40]||“The Critical level designates very severe error events. Critical condition, critical.”|
|[L13]||“Logs that describe an unrecoverable application or system crash, or a catastrophic failure that requires immediate attention”|
|Alert||||“Action must be taken immediately ”|
|[L36]||“action must be taken immediately.”|
|[L40]||“The Alert level designates very severe error events. Take immediate action, alerts.”|
|Fatal||[L013]||“A severe error that will prevent the application from continuing.”|
|[L14]||“Severe errors that cause premature termination. Expect these to be immediately visible on a status console.”|
|[L15]||“The service/app is going to stop or become unusable now. An operator should definitely look into this soon.”|
|[L16]||“Something bad happened; application is going down.”|
|[L40]||“The Fatal level designates very severe error events that will presumably lead the application to abort.”|
|Emerg.||[L35]||“The message says the system is unusable.”|
|[L36]||“system is unusable.”|
|.||[L40]||“The Emergency level designates very severe error events. System unusable, emergencies.”|
Late Trace. The Trace level is present in 55% of selected libraries, however, analyzing the release notes of these libraries, in at least three of them, the Trace level was not present in the first versions. It was added to Log4J [L13] in 2005, SLF4J [L06] in 2007, and JS-Logger [L09] in 2018. According to the SLF4J FAQ page, Trace level was used in several projects
“to disable logging output from certain classes without needing to configure logging for those classes. (…) Thus, in many of cases the TRACE level carried the same similar semantics meaning as DEBUG.” [L06]
[colframe=gray!25, coltitle=black, arc=0mm, title=Finding #11] There may be semantic similarity in using the Trace and Debug levels.
Definitions. Table 4 presents the definitions and descriptions found for the log severity levels on libraries.
According to the libraries, the Debug level describes detailed [L14][L15][L40] and low priority/importance [L04][L05] information, which helps debug activities [L13][L16][L36][L40]. [L15] describes Debug as “too verbose to be included in ‘info’ level,” suggesting a similarity in the level’s purpose. Only one of the libraries uses the word “problem” to describe this level [L17].
The Trace level is described with the same characteristics as Debug but deepens the low priority: “very low priority” [L04], “very low importance” [L05].
The Info level designates normal behaviors [L16], regular operations [L15], and their messages “highlight (overall) the progress of the application” [L05] [L40] “at coarse-grained level” [L40]. [L14] advises to keep them to a minimum, as messages at this level will generate data immediately. [L29] highlights that they have value for end users and system administrators. [L17] also uses the word “problem” to refer to this level.
The Warn level is described as a “hazardous situation” [L04], highlighting a potential problem [L05][L17][L29][L40] that could be to an error [L13]. It is also described as a “almost errors” [L14], considering that the application is still running although unexpected [L14][L16]. Operators/end-users/system managers should be likely to be interested in messages at this level [L15][L29].
The Notice level resembles Info level in three definitions: it describes normal events [L35] [L36] and highlights the application’s progress [L40]. However, according to [L03], it can describe potential failures, likening the Notice level to the Warn level.
The expressions used to describe the Error level are “major problem” [L17], “very serious error” [L04], “unexpected conditions” [L14]. In addition, there are also descriptions that the registered event may or may not interrupt the application’s operation [L05] [L16] [L40]. Regardless, even if it does not stop the application as a whole, it can impede the good progress of a particular request [L06]. In the event of logs of this level, an operator must find out as soon as possible [L15].
In the library definitions, five levels appear related to very severe error events: Severe [L40], Critical [L40], Alert [L40], Fatal [L13] [L14] [L40], and Emergency. Critical level definitions lead to believe that a “disaster” has occurred [L08]. In addition to Critical, this level requires immediate action [L35] [L36] [L40]. For the Fatal level, we find more descriptives for the event: it “prevent the application from continuing” [L13], it “cause premature termination” [L14], the application “is going down/to stop” [L15] [L16], “lead to abort” [L40] or “become unusable” [L15].
[colframe=gray!25, coltitle=black, arc=0mm, title=Finding #12] Severe, Critical, Alert, Fatal, and Emergency have similar descriptions regarding severe error events in the libraries.
Descriptions for these levels appear in three libraries. In [L9], they are described similarly, without precise terms to distinguish their differences, such as the “most voluminous detailed”, “somewhat less detailed”, and “lowest volume” output. For [L12], these levels are useful to debug an application.
For [L32], Basic is an alias for the Debug level. In [L29], Config level describes messages “intended to provide a variety of static configuration information, to assist in debugging problems.” [L34] is the only library to offer the Success severity level, but it does not provide a definition. Numerically, it lies between the Info and Warn levels. For [L03], Fault level “captures information about faults and bugs in your code”; the library, as far as we know, does not provide numerical values for the levels.
[colframe=gray!25, coltitle=black, arc=0mm, title=Finding #13] From the libraries, we found 19 severity levels. Despite the diversity of nomenclature, the concepts of levels are consistent across the various libraries, suggesting a convergence towards concepts of greater granularity.
In Garousi et al.’s guidelines IST/GAROUSI2019/guidelines, the importance of contextual information in the study suggests the inclusion of grey literature. Therefore, we adopted automated search on Stack OverFlow, “a major forum where practitioners post questions and discuss technical issues” ICEASE/GAROUSI2016/need-for-multivocal, as the search strategy for capture the practitioners view. Our search query was: log levels, using the filter is:question. We found 742 hits (as of this writing: Jun. 2021). Following, we applied the inclusion (IC) and exclusion (EC) criteria:
IC1: The question/answer explains when to use at least five of the logging levels.
IC2: The question/answer must have at least two votes.
EC1: The question/answer is not original (copied from another source such as log libraries or RFCs).
EC2: The question/answer consists of exemplifying messages characteristic of log levels;
EC3: Questions not approved and closed by StackOverflow.
After applying the criteria, we obtained 4 questions with 9 relevant answers (Table 5).
|[QSO1]||When to use the different log||https://bit.ly/2SQhCE8||[ASO1][ASO2]|
|[QSO2]||Logging levels - Logback||https://bit.ly/3hNei5d||[ASO4][ASO5]|
|rule-of-thumb to assign log levels||[ASO6][ASO7]|
|[QSO3]||Difference between logger.info||https://bit.ly/3hgKKhg||[ASO8]|
|[QSO4]||How to use log levels in Java||https://bit.ly/3wa1UBn||[ASO9]|
In the selected answers from StackOverflow, six levels of log severity are described, among which the most discussed are Debug (9), Error (9), Warn (8), and Info (8); the other two levels are Trace (5) and Fatal (3).
[colframe=gray!25, coltitle=black, arc=0mm, title=Finding #14] The severity levels discussed in the selected responses corroborate the most cited and defined levels in the peer-reviewed literature and logging libraries, respectively.
Definitions. Table 6 presents the definitions and descriptions found for the log severity levels on libraries.
|Trace||[ASO1]||“Only when I would be "tracing" the code and trying to find one part of a function specifically.”|
|[ASO2]||“Trace is by far the most commonly used severity and should provide context to understand the steps leading up to errors and warnings. (…)”|
|[ASO3]||“The TRACE messages are intended for developers when they don’t need to log state variables.”|
|[ASO4]||“We don’t use this often, (…) extremely detailed and potentially high volume logs that you don’t typically want enabled even during normal development. (…)”|
|[ASO5]||“Trace is something i have never actually used”|
|Debug||[ASO1]||“Information that is diagnostically helpful to people more than just developers (IT, sysadmins, etc.).”|
|[ASO2]||“We consider Debug Trace. (…) we discourage use of Debug messages (…) this makes log files almost useless (…)”|
|[ASO3]||“The DEBUG messages are intended for developers when they need to log state variables.”|
|[ASO4]||“(…) any message that is helpful in tracking the flow through the system and isolating issues, especially during the development and QA phases. (…)”|
|[ASO5]||“Debug means that something normal and insignificant happened; (…)”|
|[ASO6]||“Shouldn’t be used at all (and certainly not in production) (…)”|
|[ASO7]||“variable contents relevant to be watched permanently”|
|[ASO8]||“If you want to print the value of a variable at any given point, you might call Logger.debug”|
|[ASO9]||“As the name says, debug messages that we only rarely turn on. (…)”|
|Info||[ASO1]||“Generally useful information to log (service start/stop, configuration assumptions, etc). (…) I want to always have available but usually don’t care about under|
|normal circumstances. This is my out-of-the-box config level.”|
|[ASO2]||“This is important information that should be logged under normal conditions such as successful initialization, services starting and stopping or successful|
|completion of significant transactions. (…)”|
|[ASO3]||“The INFO messages are intended for system operators and describe expected states”|
|[ASO4]||“Things we want to see at high volume in case we need to forensically analyze an issue. System lifecycle events (system start, stop) go here. (…) Typical|
|business exceptions can go here (…)”|
|[ASO5]||“Info means that something normal but significant happened; the system started, the system stopped, (…)”|
|[ASO6]||“Anything else that we want to get to an operator.(…) log message per significant operation (…).”|
|[ASO7]||“used in functions/methods first line, to show a procedure that has been called or a step gone ok, (…)”|
|[ASO9]||“Anything that we want to know when looking at the log files, e.g. when a scheduled job started/ended (…)”|
|Warn||[ASO1]||“Anything that can potentially cause application oddities, but for which I am automatically recovering. (…)”|
|[ASO2]||“This MIGHT be problem, or might not. (…) Viewing a log filtered to show only warnings and errors may give quick insight into early hints at the root cause|
|of a subsequent error. Warnings should be used sparingly so that they don’t become meaningless. (…)”|
|[ASO3]||“The WARN messages are intended for system operators when the process can continue in an unwanted state”|
|[ASO4]||“An unexpected technical or business event happened, customers may be affected, but probably no immediate human intervention is required. (…) Basically any|
|issue that needs to be tracked but may not require immediate intervention.”|
|[ASO5]||“Warn means that something unexpected happened, but that execution can continue, perhaps in a degraded mode;(…) Something is not right, but it hasn’t gone|
|properly wrong yet - warnings are often a sign that there will be an error very soon.”|
|[ASO6]||“This component has had a failure believed to be caused by a dependent component (…). Get the maintainers of THAT component out of bed.”|
|[ASO7]||“not-breaking issues, but stuff to pay attention for. Like a requested page not found”|
|[ASO9]||“Any message that might warn us of potential problems, (…)”|
|Error||[ASO1]||“Any error which is fatal to the operation, but not the service or application (…) These errors will force user (administrator, or direct user) intervention. (…)”|
|[ASO2]||“Definitely a problem that should be investigated. SysAdmin should be notified automatically, but doesn’t need to be dragged out of bed. (…)”|
|[ASO3]||“The ERROR messages are intended for system operators when, despite the process cannot continue in an unwanted state, the application can continue.”|
|[ASO4]||“The system is in distress, customers are probably being affected (or will soon be) and the fix probably requires human intervention. The "2AM rule" applies|
|here-if you’re on call, do you want to be woken up at 2AM if this condition happens? If yes, then log it as ‘error’”|
|[ASO5]||“Error means that the execution of some task could not be completed; (…) Something has definitively gone wrong.”|
|[ASO6]||“This component has had a failure and the cause is believed to be internal (…). Get me (maintainer of this component) out of bed.”|
|[ASO7]||“critical logical errors on application, like a database connection timeout. Things that call for a bug-fix in near future”|
|[ASO8]||“When responding to an Exception, you might call Logger.error”|
|[ASO9]||“Any error/exception that is or might be critical. Our Logger automatically sends an email for each such message on our servers”|
|Fatal||[ASO1]||“Any error that is forcing a shutdown of the service or application to prevent data loss (or further data loss).”|
|[ASO2]||“Overall application or system failure that should be investigated immediately.(…) wake up the SysAdmin. (…) this severity should be used very infrequently(…)”|
|[ASO3]||“The FATAL messages are intended for system operators when the application cannot continue in an unwanted state.”|
In the selected answers from StackOverflow, the similarity found in the first two sources in our mapping is not as blunt. It would be best to prefer Debug over Trace for part of the answers [ASO1][ASO4][ASO5], and for another part, the opposite [ASO2][ASO6][ASO9]. For [ASO3], both levels are intended for developers, Debug being the one that records variable values. For [ASO1], Trace level is used to find a specific piece of code, while Debug level is classified as “helpful to people more than just developers.”
Among the levels discussed on the selected answers, the Info level has the most significant convergence in the definitions presented. All answers describe it as a record of operations that start and/or end, describing “normal but significant situations” of the system [ASO1][ASO2][ASO5][ASO6], or that is, “expected situations” [ASO3]. [ASO4] points out that this level also describes typical business exceptions, and according to [ASO6], operators are the audience of Info messages.
As in the libraries, the selected answers describe the Warn severity level messages as potential problems/unexpected events [ASO1][ASO2][ASO4][ASO5][ASO9] that can cause complications for the system [ASO1][ASO5], and therefore they need to be observed. Despite these events, the system remains running [ASO1][ASO3][ASO5], without the need for immediate human intervention [ASO4]. For [ASO3], operators are the public interested in this level of severity.
There is also divergence at this level. For three of the responses [ASO1][ASO2][ASO3][ASO6], the Error level indicates a failure that did not stop the system execution but should be investigated by the system operators [ASO2][ASO3]. However, for another two responses, the degree of severity is more critical, and the “interested person” should be “get out of bed” [ASO4][ASO5]. This severity degree is the same that is attributed to the Fatal level by [ASO1][ASO2][ASO3]: errors occur that force the application to “shut down” and require immediate action.
Analyzing our three sources, we observe redundancy in the numerical values of the levels, the semantic similarity of their definitions, and the low occurrence of some levels in the libraries. To reduce the similar or redundant levels, we abstracted the 19 severity levels to six levels, which constitute the state of practice of logging. Following, we explain the steps of our synthesis, as shown in Fig. 4.
Finest, Verbose, Finer, Trace, Debug, Basic, Fine, Config. Fine and Verbose levels are present in six libraries. [L29] and [L32] provide the Fine level; [L10], [L12], and [L19] provide the Verbose level, and [L40] provide both levels. Besides the semantic similarity between their definitions/descriptions in six libraries, in [L40] they (Fine, Verbose) have the same numerical value. These facts suggest they can be merged on the same level. In Fig. 4, we chose to merge for the severity with the highest numerical value. We perfomed the same process for Finer and Trace, as well as Debug and Fine levels. The Basic level, present only in [L32], is equated with the Debug severity level in its documentation and merges to Debug. The resulting Verbose and Trace still have definitions with substantial semantic similarity, so we abstracted the level from Verbose to Trace. The low-occurrence Config level, which provides information for debugging [L29], was merged with the Debug level.
Info, Success. We observed that the Success level has a numerical value between Info and Warn levels in [L34]. Besides, the Success level has low occurrence and does not have a definition. We merged into Info as it goes against Warn’s purpose.
Notice, Warn. In libraries, the Notice level has low occurrence. Also, it resembles the Info level at the same time that it resembles the Warn level. Considering the combination of its definition found in the literature and the level nomenclature, we merged the Notice level to the Warn level.
Error, Fault. The Fault level is another level of low occurrence. For [L03], Fault level describes bugs in running software systems, and furthermore, it ranks before the Critical severity level. Consequently, we merged Fault to Error.
Fatal, Alert, Critical, Emergency, Severe. Fatal, Critical, Alert, and Emergency levels are the most severe levels out of 81% of selected libraries, and their definitions have a prominent semantic similarity. In our abstraction, the Fatal level is the level that describes the failures situation, and it has the most significant among the levels analyzed (Fatal, Critical, Alert, Severe and Emergency). The Fatal and Critical levels have the same value in [L17]. Finally, the Severe level has low occurrence, but its definition has semantic similarity with the Fatal level. These facts leaded us to merge Fatal, Critical, Alert, Severe and Emergency into the Fatal level.
We performed the synthesis process, resulting on six state-of-practice log severity levels: (Trace, Debug, Info, Warn, Error, Fatal). Thus, considering the results obtained from the three sources, we synthesized combined definitions for the six abstracted levels as follows.
[colframe=gray!25, coltitle=black, arc=0mm, title=Info severity level] Info severity level describes normal events, which inform the expected progress and state of a software system.
[colframe=gray!25, coltitle=black, arc=0mm, title=Warn severity level] Warn severity level describes potentially dangerous situations caused by unexpected events and states. For this reason, they must be observed, even if they do not interrupt the execution of a software system.
[colframe=gray!25, coltitle=black, arc=0mm, title=Error severity level] Error severity level describes the occurrence of unexpected behavior of a software system. For this reason, they must be investigated, even if they do not interrupt the execution of a software system.
[colframe=gray!25, coltitle=black, arc=0mm, title=Fatal severity level] Fatal severity level describes critical events that bring a software system to failure.
[colframe=gray!25, coltitle=black, arc=0mm, title=Debug severity level] Debug severity level describes variable states and details about interesting events and decision points in the execution flow of a software system, which helps developers to investigate internal system events.
[colframe=gray!25, coltitle=black, arc=0mm, title=Trace severity level] Trace severity level broadly tracks variable states and events in a software system.
We observed a convergence of log severity levels across the three sources after the processes of abstraction and synthesis, and we notice four main purposes (or meta-levels) for log severity levels:
it describes levels used to record the expected behavior of a software system.
it describes levels used to warn unexpected behavior of a software system.
it describes levels used to record failures of a software system.
it describes levels used to log variable states and events internal to the behavior of a software system.
68% of log severity levels can be considered “prosaic" specializations of the most representative 32% severity levels in peer-reviewed literature, logging libraries, and practitioners’ point of view. The excessive specialization of levels can make it challenging to choose an appropriate one and impact the amount of data generated and its reliability. We recommend keeping a standard nomenclature for the levels: Trace, Debug, Info, Warn, Error, and Fatal. Moreover, developers should limit the number of severity levels. Thus, we recommend using only those six severity levels. We also recommend creating a policy to use severity levels, with practical examples to guide the choice of severity levels effectively.
We observed a lack of precision in the definitions of log severity levels. For example, there are severity levels in library definitions without distinction of specific purposes, distinguished only by adjectives and superlatives. This lack of precision can cause a misunderstanding of severity levels and hinder logging practices. Thus, we suggest that logging library creators provide precise and unambiguous definitions for considering the log level purposes.
The vast majority of logging libraries use a numeric value associated with severity levels. In the absence of precise definitions, these values clarify the creators’ proposal regarding the degree of severity of each level. We suggest that logging library creators continue to provide these values. We also suggest that these values are ascending, considering the order between the six severity levels of the state of practice, from the least severe to the most severe: Trace, Debug, Info, Warn, Error, and Fatal.
Using a limited set of levels can lead to difficulties when there is a need to identify specific log data. We recommend that developers explore the library features like the “Marker interface”, which can add semantics to the log levels used in practice. We also recommend studying the logging library settings to avoid solving with severity levels what could be solved with configuration practices.
The logging community recognizes that choosing the level of log severity can be challenging and can impact systems in development and production. Therefore, in recent years, log severity levels have been more generally investigated. Despite this growing interest, the researchers have not explored what severity levels are and should be in our logging practices. From a Software Engineering point of view, it is necessary to make this discussion about the purposes of log severity levels and how many severity levels are needed to generate reliable log data.
In this study, we do not validate definitions and purposes against actual logging entries. Empirical studies need to be done to observe the adherence of these definitions and propositions with logging practices in real software systems, in addition to controlled experiments to assess the effectiveness of these definitions.
The fact that our work does not cover an exhaustive set of libraries is a factor that can reduce the validity of the results. However, we aimed to obtain a representative set of them. Further, we apply well-defined and validated inclusion and exclusion criteria, selecting only libraries to increase the validity of our results.
The choice of log severity level can be challenging and cause problems in producing reliable logging data. In this study, we present a state-of-the-art and state-of-the-practice mapping of log severity levels. We extracted data from three sources: peer-reviewed literature, logging libraries, and practitioners’ views, through a Q&A website. Our study systematically mapped the selected sources, empirically analyzed the definitions, descriptions, and documentation of log severity levels.
To summarize, we analyzed 19 severity levels from 27 studies and 40 logging libraries. Our results showed that there is redundancy and semantic similarity between the levels. However, they also showed a tendency to converge the severity levels for a total of six levels. Besides, there is consistency in ordering between the different levels in different libraries and that the levels are permeated with specific purposes. Our main contributions are: (i) mapping of the peer-reviewed literature of studies dealing with logging severity levels; (ii) mapping of the severity levels in the logging libraries; (iii) a set of synthesized definitions for log severity levels and the suggestion of four general purposes for severity levels. The results of our study (mapping, definitions, and purposes) provide evidence to create guidelines for choosing log severity levels that increase the data reliability. Logging library creators can also use our study to improve their conception processes. Finally, we present recommendations about log severity levels.
In future work, we plan to expand this systematic multivocal mapping, adding more sources of the grey literature, such as others Q&A websites and technical blogs that discuss the log severity level. Furthermore, we aim to leverage the results of this mapping, answering “what is an appropriate (reliable?) severity level", identifying a conceptual framework that supports developers and system operators’ logging practices in addition to metrics and approaches. Finally, we plan to organize a catalogue of log entry patterns, presenting metadata for each severity level as intent and practical examples.