Log In Sign Up

Now You're Thinking With Structures: A Concept for Structure-based Interactions with Spreadsheets

by   Patrick Koch, et al.

Spreadsheets are the go-to tool for computerized calculation and modelling, but are hard to comprehend and adapt after reaching a certain complexity. In general, cognition of complex systems is facilitated by having a higher order mental model of the system in question to work with. We therefore present a concept for structure-aware understanding of and interaction with spreadsheets that extends previous work on structure inference in the domain. Following this concept, structural information is used to enrich visualizations, reactively enhance traditional user actions, and provide tools to proactively alter the overall spreadsheet makeup instead of individual cells The intended systems should, in first approximation, not replace common spreadsheet tools, but provide an additional layer of functionality alongside the established interface. In ongoing work, we therefore implemented a tool for structure inference and visualization along the common spreadsheet layout. Based on this framework, we plan to introduce the envisioned proactive and reactive interaction mechanics, and finally provide structure-aware unctionality as an add-in for common spreadsheet processors. We believe that providing the tools for thinking about and interacting with spreadsheets in this manner will benefit users both in terms of productivity and overall spreadsheet quality.


k-simplex2vec: a simplicial extension of node2vec

We present a novel method of associating Euclidean features to simplicia...

ProLiVis: Protein-Protein Interaction Literature Visualization System

We provide a visualization model that targets the visualization of Prote...

Understanding Visualization: A Formal Approach using Category Theory and Semiotics

This article combines the vocabulary of semiotics and category theory to...

DimDraw -- A novel tool for drawing concept lattices

Concept lattice drawings are an important tool to visualize complex rela...

Does Interacting Help Users Better Understand the Structure of Probabilistic Models?

Despite growing interest in probabilistic modeling approaches and availa...

A Study of Mental Maps in Immersive Network Visualization

The visualization of a network influences the quality of the mental map ...

GIFT: Generalizable Interaction-aware Functional Tool Affordances without Labels

Tool use requires reasoning about the fit between an object's affordance...

I Introduction

Tools for end-user programming, first of all spreadsheet processors like Microsoft Excel, enjoy great popularity with business and private users alike. An analysis of a 2012 survey involving 5010 US citizens [1] concluded that 45 % of respondents used spreadsheets at work at least monthly, and workers who used spreadsheets earned more than their peers, all else being equal. Indeed, the combination of tangible units of calculation in cells and rapid feedback due to instantaneous formula evaluation in spreadsheets provides a powerful interface for on-demand development of custom calculation models. However, the same calculation-result-first paradigm that poses such an easy point of access to spreadsheets turns out to also be a major drawback in terms of understandability for advanced spreadsheet models [2]. In cases where the functionality of a sheet is not immediately apparent by the displayed numbers and layout, further understanding often requires inspection of formulas and reference chains. However, due to overly complex formulas and poor spreadsheet layouts, inferring the functionality of a spreadsheet by formula inspection can be challenging even for professional programmers.

To alleviate this issue, a number of different approaches for improving spreadsheet QA have been developed in recent years [3]. Continuing this intent, we propose to reinforce a more abstract mindset of spreadsheets, that focuses on overarching design and cohesion instead of functionalities of individual cells. To support this perspective, we outline an interaction scheme for spreadsheets that expands on the functionality of conventional spreadsheet processors by means of information about higher-order structures that are already present in the sheets. The required structural information is provided by an automated structure inference process that was presented in previous work [4]. In particular, we propose to: (1) enhance spreadsheet visualization by adding structure information either in-place or via an additional, complementary UI element; (2) include reactive procedures that trigger after common user actions and automatically restore structural soundness; and (3) provide a set of proactive methods for directly altering the overall calculation structure of a spreadsheet. The basic ideas in this list are not new. We do however believe that providing users with a concise combination of information and tools to interact with the overall structure of their spreadsheets will lead to more awareness for spreadsheet quality in general and will consequently result in more efficient and less error prone spreadsheets. In ongoing work, we tested preliminary variants of different visualization schemes that follow the outlined guidelines in a research prototype111download via, and found that visualization of coherent cell groups can for example greatly assist in comprehension of spreadsheet models.

The remainder of the paper is structured as follows: We first describe the fundamental structure primitives along a motivating example in Section II. The subsequent sections each consider one of the three suggested areas for improvement along the terms of Design Intent, Technical Considerations, an Example, and Impact & Discussion: Section III suggests spreadsheet structure visualization approaches; Section IV outlines a reactive procedure that monitors user actions in spreadsheets; and Section V discusses proactive spreadsheet structure interactions. In Section VI, we briefly summarize previous works, and Section VII concludes the work.

Ii Motivating Example

Figure 1 shows a car loan calculation spreadsheet. Given the value view of the sheet only, as illustrated by Subfigure 0(b), the calculations of the example seem straightforward at first glance. However, when investigating the example in greater detail, as provided by the formula view in Subfigure 0(b), it becomes apparent that the end result is depending on a number of interwoven and, most importantly, inconsistent calculations.

The highlighting of Formula groups facilitates distinguishing between the different calculations that take place. These groups summarise a number of neighboring cells that fulfil the same purpose in the sheet by applying the same type of calculation. For example, the formula group B3:B9 provides the Start balance of the current year by referring to the End balance of the previous year. Each cell within a formula group might refer to other cells via a number of references.

Reference groups, in turn, are defined as the set of cells that are referred to by the cells of a formula group via one specific reference of the group’s formula. Subfigure 0(c) shows the reference groups that are inferred for the colour-linked formula groups in Subfigure 0(b) using hatched borders. For example, the formula group D2:D8 of the End balance column refers to the reference group C2:C8 via the second reference () in the group’s formula (=+-5000). Note that another reference group, B2:B8, is also inferred by the first reference of the group (), but is not displayed in the example, as it coincides with the borders of other groups for the same cells. This illustrates one major difference between formula groups and reference groups: each cell can only be part of one formula group, but can be part of multiple coinciding reference groups. Moreover, reference groups can but do not necessarily have to coincide with formula groups that contain the same cells.

(a) Value view.
(b) Formula view with highlighted formula groups.
(c) Formula view with highlighted reference groups.
Fig. 1: Example spreadsheet of a car loan.

Iii Structure Visualization in Spreadsheets

The following section addresses proposed enhancements of spreadsheet visualization schemes by addition of structure information. Design Intent. The main goal of structure visualization approaches should be to point out relevant information about the positions, locations, and extents of the different structures that are present in a given spreadsheet. The UI should also be able to adequately express the connections in-between those structures, and should also make apparent which further structure-based interactions (see Section V) are available, and what effects these interactions would have.

Technical Considerations. In common spreadsheet UIs, feedback about membership of individual cells in any structure is likely best represented by either using matching cell background colours or cell borders to highlight cells. Another approach would be to include some form of additional graphical interface component which provides contextual information either for the current sheet or the currently selected cell. In either case, the amount of available data to display would make some form of information filtering necessary. We therefore suggest to provide different, user-selectable perspectives, that highlight a certain structural aspect (e.g. the formula groups) of either the current worksheet, or a specific selected cell.

Example. Figures 0(b) and 0(c) provide examples for simple but effective visualizations. Background colour and borders are used as stylistic devices to point out membership of different cells in either formula groups or reference groups.

Spreadsheet structures, e.g. formula groups, could also be represented in a supplemental graph layout, that connects the different groups based on inter-group references. The UI could then navigate to and highlight a group or connection in the common UI, that was selected within the graph visualization.

Impact & Discussion. Visual representations are the primary means of relaying information to the users of software systems. Therefore, adequate visual emphasis of the different inherent structural properties of a given spreadsheet is the first and foremost directive for raising awareness and support for structural aspects of spreadsheet design.

The main concern for the implementation of such a visualization scheme would likely be not to overwhelm users with the new and unexpected information. Ideally, this information would seamlessly blend in the already familiar spreadsheet UI. As the available information is likely to exceed the available real estate for tactful integration in existing graphical interfaces, some form of guided user interaction to select the currently relevant information would be beneficial. In addition, spreadsheet processors could provide some form of user-initiated process that checks for structural issues, and highlights individual issues along with a matching visualization.

Iv Structure-preserving Procedures

The following section discusses reactive procedures that automatically restore structural soundness. Design Intent. The intent of this core theme is to provide some form of process that, after each user action that altered a given spreadsheet, checks whether the overall structure of the sheet is still sound. Otherwise, the process suggests some form of additional action to performed which restores structural soundness. The system should provide appropriate feedback based on the analysis result, along with the possibility to fix structural issues automatically when possible, and to give suggestions about necessary manual interventions when not.

Technical Considerations. The first obvious consideration regards the functional requirements to a system that supports the envisioned checks and alterations. Such a system would have to either preprocess and store an abstract representation of the structural makeup of a sheet for each cell, or be able to deduce the necessary structural relations for any given alteration in real time. It would also require some defined notion of structural soundness that is expressible for the inferred structures, and it would need the capability to check for violations of soundness, as well as to automatically generate and apply candidates for repair operations. In many cases, e.g. the expansion of a formula to include a new reference to an hitherto empty cell, the system would also be required to be able to prompt the user for some form of input.

The second consideration regards the question as to how such a system could be integrated into common spreadsheet tools. We assume that any new interaction affects a spreadsheet that was structurally well-formed before the interaction (this could be established in an initial analysis and repair step). In that a case, the addition or deletion of cells that are part of existing formula groups or reference groups can automatically be cascaded by addition or removal of similar cells within other, functionally related cell groups. In cases where user input is required, the intended alteration could either be displayed by highlighting the affected cell(s), and either prompting textual input, or providing predefined suggestions in form of a drop-down list. In general, the feature should remain optional, allowing users to deactivate the functionality, if no automatic checking and alteration is requested.

Example. Every action that directly influences the overall structure of a spreadsheet would require a subsequent soundness check. This pertains in particular actions that add, delete, or alter the content of an individual cell. Actions that affect multiple cells at once can be assumed to be consecutive alterations of individual cells. In our running example in Figure 1, the user could alter the formula in cell C6 to apply another interest rate. A suggested reaction would then be to apply the same interest rate for the remainder of the group C5:C9. Deletion of the cell C6 would be more challenging: as the cells are connected by an ongoing chain of references, all subsequent calculations would normally be missing a reference. However, using structural information, we can deduce that the respective cells in the other formula groups in Row 6 can also be matched to the same calculation chain, and therefore can be removed as well. This allows us to reconnect the remaining cell references accordingly. However, in many cases such alterations can be fixed by a number of repair candidates, which makes user involvement necessary.

Impact & Discussion. When using spreadsheet systems that check and establish a form of structural soundness, we could guarantee the absence of specific fault types (e.g. typos) in the created spreadsheet models. However, most users are likely overwhelmed when being confronted unexpectedly with the such a feature. Some form of guided introduction for the process and its notion of soundness would be required.

Moreover, following such an approach also severely limits the design freedom of the spreadsheet paradigm. Especially for the development of small prototype sheets or specialized calculations, the constant interference of the corrective script would pose a hinderance, and should therefore be deactivatable. In practice, some form of guided transition mechanism would be desirable, which takes a spreadsheet that was developed free-form as input, and converts it to a structurally sound version that provides the same functionality.

V Structure-altering Operations

The following section discusses methods for altering the overall structure of a spreadsheet. Design Intent. This last proposed enhancement for spreadsheet interfaces aims at providing tools to deliberately alter the overall makeup of their spreadsheets, that go beyond alteration of individual cells. Such operations allow users to add, alter, relocate, and remove entire structural components of a given sheet, while keeping its layout and calculation integrity intact. Such tools are, however, closely dependent on an adequate visual representation that points out such options for interaction and also allows for assessment of possible consequences (see Section III).

Technical Considerations. Technically, structure-altering operations can, in simple cases, be lead back to incremental application of atomic changes and handling the cascading automatic adaptations accordingly (see Section IV). In general, however, these operations represent deliberate changes in the overall structure of a sheet, and we thus can make use of all structural information to plan the actions beforehand.

Example. An example for such an operation could be splitting up a formula group into two sets of linked calculations: we fist determine which external reference groups are referred to by the group in question, and which other formula groups refer to it. Based on this information, we can then introduce a new space for the second formula group next to their existing position, initializing it with a copy of the base group. Next, we decide at which point to split the base formula, and alter the formulas of the two groups accordingly. Lastly, we re-arrange the formula-references of all participating formula groups based on the new layout and calculation order.

Impact & Discussion. The proposed tools can be interpreted as refactoring operations for spreadsheets that operate on a global, structural scope. Using these operations, specific shortcomings and issues in the overall composition of a spreadsheet can be handled with relative ease. Such issues are indeed relatively common, and can be detected in the form of spreadsheet smells [5, 6, 7]. In general, providing these tools invites users to preserve an overall good spreadsheet quality, as individual maintenance actions can be performed without fear of unexpected consequences due to overlooked dependencies. This, in turn, raises awareness for good spreadsheet quality, which is likely to reflect also in better constructed spreadsheets in the future, that are less susceptible to errors.

Vi Related Work

In terms of previous works that analyzed structural properties of spreadsheets, Hermans et al. extracted class diagrams and dataflow diagrams from spreadsheets [8, 9], Abraham and Erwig inferred areas of calculation and related headers [10], Mittermeir and Clermont inferred logical areas and semantic classes from spreadsheets [11], Kankuzi and Ayalev worked on a graph-based visualizations of spreadsheets based on Markov Clustering (MCL) [12], and Schmitz and Jannach provide structure-considering tool support for spreadsheet debugging [13]. In our own research, we inferred groups, blocks, and related headers for spreadsheets [4].

Abraham end Erwig introduced formal descriptions for spreadsheet templates [14]. Cunha et al. adapted the idea, provided approaches to define and infer formal spreadsheet models, and also proposed safe spreadsheet operations and refactorings based on inferred model information [15, 16, 17, 18, 19].

Previous research on impact aware editing and refactoring in spreadsheets was conducted by Hermans et al. [20, 21], Badame and Dig [22], Cunha et al. [15], and O’Beirne [23].

Vii Conclusions & Outlook

We presented a three-pronged conceptual approach for extending interactions with spreadsheet systems by means of structural information. Major challenges that we identified are the filtering of appropriate information as to not overwhelm users, and the conceptualization of adequate user interaction methods where user input is required. Nevertheless, we believe that providing structure-enhanced information and tools can benefit the way users handle and think about spreadsheets.

In intermediate future work, we plan to further investigate structure visualization approaches and structure-aware refactoring operations for spreadsheets. Furthermore, thinking about spreadsheet in structures opens up a number of interesting further potential applications. For example, machine-learning approaches can exploit inferred structural representations to learn which features are conductive for good spreadsheet design, and consequently offer automated design feedback or even spreadsheet synthesis. Another interesting application could be the inference of topic and contextual information, by linking the extracted structures with semantic databases. Looking beyond the spreadsheet domain, raising awareness for structural connections in general helps users to better conceptualize and work with computational systems.


The work described in this paper has been been funded by the Austrian Science Fund (FWF) project DEbugging Of Spreadsheet programs (DEOS) under contract number I2144.


  • [1] C. Scaffidi, “Workers who use spreadsheets and who program earn more than similar workers who do neither,” in VL/HCC.   IEEE Computer Society, 2017, pp. 233–237.
  • [2] D. G. Hendry and T. R. G. Green, “Creating, comprehending and explaining spreadsheets: a cognitive interpretation of what discretionary users think of the spreadsheet model,” Int. J. Hum.-Comput. Stud., vol. 40, no. 6, pp. 1033–1065, 1994.
  • [3] D. Jannach, T. Schmitz, B. Hofer, and F. Wotawa, “Avoiding, finding and fixing spreadsheet errors - A survey of automated approaches for spreadsheet QA,” Journal of Systems and Software, vol. 94, pp. 129–150, 2014.
  • [4] P. W. Koch, B. Hofer, and F. Wotawa, “Static spreadsheet analysis,” in ISSRE Workshops.   IEEE Computer Society, 2016, pp. 167–174.
  • [5] F. Hermans, M. Pinzger, and A. van Deursen, “Detecting code smells in spreadsheet formulas,” in ICSM.   IEEE Computer Society, 2012, pp. 409–418.
  • [6] ——, “Detecting and visualizing inter-worksheet smells in spreadsheets,” in ICSE.   IEEE Computer Society, 2012, pp. 441–451.
  • [7] J. Cunha, J. P. Fernandes, H. Ribeiro, and J. Saraiva, “Towards a catalog of spreadsheet smells,” in ICCSA (4), ser. Lecture Notes in Computer Science, vol. 7336.   Springer, 2012, pp. 202–216.
  • [8] F. Hermans, M. Pinzger, and A. van Deursen, “Automatically extracting class diagrams from spreadsheets,” in ECOOP, ser. Lecture Notes in Computer Science, vol. 6183.   Springer, 2010, pp. 52–75.
  • [9] ——, “Breviz: Visualizing spreadsheets using dataflow diagrams,” CoRR, vol. abs/1111.6895, 2011.
  • [10] R. Abraham and M. Erwig, “Ucheck: A spreadsheet type checker for end users,” J. Vis. Lang. Comput., vol. 18, no. 1, pp. 71–95, 2007.
  • [11] R. T. Mittermeir and M. Clermont, “Finding high-level structures in spreadsheet programs,” in WCRE.   IEEE Computer Society, 2002, pp. 221–232.
  • [12] B. Kankuzi and Y. Ayalew, “An end-user oriented graph-based visualization for spreadsheets,” in Proceedings of the 4th international workshop on End-user software engineering.   ACM, 2008, pp. 86–90.
  • [13] T. Schmitz and D. Jannach, “An al-based interactive tool for spreadsheet debugging,” in VL/HCC.   IEEE Computer Society, 2017, pp. 333–334.
  • [14] R. Abraham and M. Erwig, “Inferring templates from spreadsheets,” in ICSE.   ACM, 2006, pp. 182–191.
  • [15] J. Cunha, J. Saraiva, and J. Visser, “Discovery-based edit assistance for spreadsheets,” in VL/HCC.   IEEE Computer Society, 2009, pp. 233–237.
  • [16] J. Cunha, M. Erwig, and J. Saraiva, “Automatically inferring classsheet models from spreadsheets,” in VL/HCC.   IEEE Computer Society, 2010, pp. 93–100.
  • [17] J. Cunha, J. P. Fernandes, J. Mendes, and J. Saraiva, “Embedding, evolution, and validation of model-driven spreadsheets,” IEEE Trans. Software Eng., vol. 41, no. 3, pp. 241–263, 2015.
  • [18] J. Cunha, M. Erwig, J. Mendes, and J. Saraiva, “Model inference for spreadsheets,” Autom. Softw. Eng., vol. 23, no. 3, pp. 361–392, 2016.
  • [19] J. Cunha, J. P. Fernandes, P. Martins, J. Mendes, R. Pereira, and J. Saraiva, “Evaluating refactorings for spreadsheet models,” Journal of Systems and Software, vol. 118, pp. 234–250, 2016.
  • [20] F. Hermans and D. Dig, “Bumblebee: a refactoring environment for spreadsheet formulas,” in SIGSOFT FSE.   ACM, 2014, pp. 747–750.
  • [21] F. Hermans, M. Pinzger, and A. van Deursen, “Detecting and refactoring code smells in spreadsheet formulas,” Empirical Software Engineering, vol. 20, no. 2, pp. 549–575, 2015. [Online]. Available:
  • [22] S. Badame and D. Dig, “Refactoring meets spreadsheet formulas,” in ICSM.   IEEE Computer Society, 2012, pp. 399–409.
  • [23] P. O’Beirne, “Spreadsheet refactoring,” CoRR, vol. abs/1009.1412, 2010. [Online]. Available: