Based on international financial reporting standards, an operating segment is a profit component of a business entity that has discrete financial information available and whose results are evaluated regularly by the entity’s management for purposes of performance assessment111https://www.accountingtools.com/articles/2017/5/13/operating-segment. A company’s operating segments can be its products, services, business divisions, geographic locations, or assets (such as mines, reserves, wells, oilfields, etc.).222Not all assets qualify as operating segments. They can only be considered an operating segment if they function as an active source of revenue.
Credit analysts consistently monitor the performance of a business within each of its operating segments in order to determine major areas of risk or growth. For instance if a certain product is the main driver of profit for a given company, then a fall in net sales for that product might pose a financial risk to the company. Monitoring the performance of operating segments requires reading through lengthy financial reports and extracting each segment and its corresponding performance metric manually from tables.
In this paper, we introduce SPot (or S&P Operating segmenT extractor), a tool that ingests financial reports from public U.S. companies in real-time, processes each table, and identifies each row header or column header that is likely to express an operating segment. The corresponding rows/columns are then extracted, aggregated, and displayed to the end-user on an interactive UI that allows them to study, trace, and adjust the performance indicators associated with each operating segment.
At the most basic level, SPot’s main task reduces to a binary classification problem at the table header level. Concretely, given a header, the system is supposed to identify whether it is likely to be associated with an operating segment, or a non-operating metric such as a financial metric, name of a board member, an office location, a debt schedule, etc. A few challenges complicate this task:
Operating segments are company-specific, so a taxonomy-driven approach would not scale to unseen companies.
Named-entity recognition cannot be used because certain types of operating segments (such as business divisions or types of service) are not always named entities. On the other hand, actual named entities (such as the names of executive leaders) are often not operating segments.
Positional cues and co-occurrence metrics fall short, because, even though operating segments tend to be expressed in the same tables, they are often co-located with non-operating items. For instance an Income Statement table might begin with operating segments and move on to standard financials such as Total Revenue and R&D Expense.
Table 4 lists a few examples illustrating the above challenges. To address these problems, we use a multi-stage process that filters tables down to those likely to include operating segments. Then each row is classified using a sequence model that utilizes selective masking in a way that minimizes overfitting.
2. System Design
Figure 2 demonstrates the data flow and the components in SPot, which the following subsections describe in more detail333Figures 1, 2 and 3 provide fabricated examples and do not reflect any company’s actual financial reports..
2.1. Ingestion and classification of documents
SPot ingests earnings reports published by public U.S. companies to the SEC website444https://www.sec.gov/cgi-bin/browse-edgar?action=getcurrent. As 8-K files are posted, they are ingested into the system through a proprietary sourcing service that uses SEC’s RSS service. Unlike 10-Q and 10-K filings, 8-K filings are not limited to earnings reports, but can cover any material events that companies release to the public. A taxonomy-based classifier identifies 8-K filings that include earnings reports (Nourbakhsh et al., 2020).
2.2. Normalization of numeric tables
The reports are then processed through a normalization pipeline that performs the following steps: (1) Periods are identified and mapped to the company’s fiscal calendar. For instance “Three Months Ended March 30, 2020” may be normalized to ‘‘Q1 2020’’. (2) Financial numbers are identified and normalized to the scale expressed inside or in the vicinity of the table. For instance “$USD 14MM” may be normalized to ‘‘14,000,000.00 (USD)’’. (3) All other numbers are normalized to their raw form. For instance “30 percent” is normalized to ‘‘30%’’. A more detailed description of the ingestion and normalization pipeline is available in (Nourbakhsh et al., 2020).
2.3. Identification of tabular structure
It is important for the system to understand the structure of each table, including the distinction between the body of the table and row/column headers. This is done using a rules-based method that finds the largest rectangle in a table that includes numeric information. That rectangle is treated as the body of the table, and the cells that fall outside of the rectangle are treated as row headers or column headers (see Figure 1 for an example).
Indentation and spatial information are often used to indicate hierarchy in tables (see Figure 1 for an example). We use the headless Selenium webdriver555https://www.selenium.dev/selenium/docs/api/py/index.html with PhantomJS integration666https://phantomjs.org/ to render each table in a background process. This allows us to locate the x- and y-coordinates of every cell in the table, which identifies the alignments of each cell against its row and column headers. This information is used to infer the hierarchy of headers. As an example, the second row header in Figure 1 is normalized to ‘‘Net sales --> Products’’.
3. Operating Segment Identification
We first narrow the pool of tables down to those likely to include operating segments in them. This is done in two steps:
Tables not including any financial data, currency or periods in them (such as those with names of board members or office locations) are removed from the pool.
Tables with boilerplate language (i.e. those that are unlikely to include any company-specific language in them) are removed from the pool. This is done by following Algorithm 1. The inspiration is that by treating each company as a document, TF-IDF weights can be calculated for each term in each table. These weights would indicate how specific the term is to the company. Tables with a higher aggregate TF-IDF weight are likelier to have operating segments in them.
We collected 225 earnings reports published between May 1, 2016 and May 1, 2019. The reports belonged to 149 publicly traded U.S. companies within 6 sectors, three belonging to consumer-focused industries (Technology, Media, Retail), and three in the commodities space (Oil/Gas, Metals/Mining, Chemicals). The sectors were determined according to S&P’s standard industry classification777https://www.spglobal.com/ratings/en/sector/corporates/corporate-sector. Among these filings, we extracted 3,124 tables in total. Next, we collected 51,937 individual row/column headers from these tables. Four human annotators manually labeled each header as including or not including an operating segment. To avoid data leakage between training and test sets, 30 companies were set aside for testing. No filings from these companies were included in the training set. Table 1 summarizes the stats of the splits.
3.2. Header Classification
To address the challenges mentioned in Section 1, we approached the problem as that of identifying headers that do not include operating segments. This would allow us to focus on metrics that are not company-specific, i.e. can occur in any financial report. To do this, we trained a recurrent model with bidirectional GRU units (Cho et al., 2014) with the parameters identified in Table 3. During training, we first built a vocabulary using tokens from the non-operating headers. Next, we iterated through the operating headers and compared each token to the vocabulary. Those present in the vocabulary were left unchanged. Those not found in the vocabulary were masked with the “¡UNK¿” token. This allowed the model to distinguish between common financial terms (such as “revenue”) and those that were likelier to appear in operating segments (such as “iPhone”).
The model was trained according to parameters listed in Table 3. Two configurations were tested using pre-trained GloVe (Pennington et al., 2014) 300d embeddings and pre-trained ELMo (Peters et al., 2018) embeddings . The resulting models were benchmarked against a suite of baselines listed in Table 2. To increase the precision of the models, the operating segment class was treated as the negative class, and evaluation was aimed at high recall for the positive class. This would result in high precision for the negative class. As Table 2
shows, the recurrent model with pre-trained GloVe embeddings outperformed all baselines in both precision and recall.
Table 5 shows a detailed view of the model’s performance per sector. F1 performance is relatively consistent overall, with consumer industries doing slightly better than commodities. This might be associated with the fact that operating segments are a lot less company-specific in the commodity market (e.g. “natural gas”) than in the consumer market (e.g. “iPhone”).
4. User Interface
Figure 3 illustrates how the system would filter operating segments and display them to the end user. Users are presented with a split-screen, where the left-hand panel displays data associated with the operating segments and the right-hand panel connects the data to the earnings reports where it was generated. Users have the ability to review, adjust, and export the data for their analytical purposes.
5. Conclusion and Future Work
In this paper, we presented SPot, a tool for extracting operating segments from earnings reports in real-time. The tool allows us to trace and record company performance at a granular level. We hope to further enhance SPot’s capabilities by normalizing the operating segments into an ontological structure. The insights extracted by SPot can be used for predicting the future performance of a company, identifying potential competitors in the market, and analyzing sector-level trends.
- Learning phrase representations using rnn encoder-decoder for statistical machine translation. In Proceedings of EMNLP, pp. 1724–1734. Cited by: §3.2.
- SPread: automated financial metric extraction and spreading tool from earnings reports. In Proceedings of the 13th International Conference on Web Search and Data Mining, pp. 853–856. Cited by: §2.1, §2.2.
GloVe: global vectors for word representation. In Proceedings of EMNLP, pp. 1532–1543. Cited by: §3.3.
- Deep contextualized word representations. In Proceedings of NAACL-HLT, pp. 2227–2237. Cited by: §3.3.