A Static Analyzer for Detecting Tensor Shape Errors in Deep Neural Network Training Code

by   Ho Young Jhoo, et al.

We present an automatic static analyzer PyTea that detects tensor-shape errors in PyTorch code. The tensor-shape error is critical in the deep neural net code; much of the training cost and intermediate results are to be lost once a tensor shape mismatch occurs in the midst of the training phase. Given the input PyTorch source, PyTea statically traces every possible execution path, collects tensor shape constraints required by the tensor operation sequence of the path, and decides if the constraints are unsatisfiable (hence a shape error can occur). PyTea's scalability and precision hinges on the characteristics of real-world PyTorch applications: the number of execution paths after PyTea's conservative pruning rarely explodes and loops are simple enough to be circumscribed by our symbolic abstraction. We tested PyTea against the projects in the official PyTorch repository and some tensor-error code questioned in the StackOverflow. PyTea successfully detects tensor shape errors in these codes, each within a few seconds.



There are no comments yet.


page 1

page 2

page 3

page 4


Gradual Tensor Shape Checking

Tensor shape mismatch is a common source of bugs in deep learning progra...

ShapeFlow: Dynamic Shape Interpreter for TensorFlow

We present ShapeFlow, a dynamic abstract interpreter for TensorFlow whic...

Non-minimum tensor rank Gabidulin codes

The tensor rank of some Gabidulin codes of small dimension is investigat...

Tuna: A Static Analysis Approach to Optimizing Deep Neural Networks

We introduce Tuna, a static analysis approach to optimizing deep neural ...

Isogeometric shape optimization for scaffold structures

The development of materials with specific structural properties is of h...

Symbolic Tensor Calculus -- Functional and Dynamic Approach

In this paper, we briefly discuss the dynamic and functional approach to...

Fine-Grained Lineage for Safer Notebook Interactions

Computational notebooks have emerged as the platform of choice for data ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

1.1. Our Goal

Tensor shape mismatch

is a critical bug in deep neural network machine learning applications. Training a neural network is an expensive process that intends to terminate only when it finishes processing a huge amount of data through a sequence of tensor operations. In the middle of this time-consuming training process, if the shape of an input datum failed to fit with a tensor operation, the whole process abruptly stops wasting the entire training cost spent thus far, losing the trained, if any, intermediate result.

Our goal is to automatically predict at compile-time such run-time tensor-shape mismatch errors in PyTorch neural network training code.

1.2. Structure of PyTorch Programs

Figure 1. Typical structure of neural network training code in PyTorch.

It shows four parts of PyTorch code. Left to right: Define network structure, intialize model with initialization parameter, preprocess dataset, and run main loop.

2class Net(nn.Module):
3  def __init__(self, out_classes):
4    super(Net, self).__init__()
5    self.layers = nn.Sequential(
6      nn.Linear(28 * 28, 120),
7      nn.ReLU(),
8      nn.Linear(120, out_classes)
9    )
11  def forward(self, x):
12    x = x.reshape(x.shape[0], -1)
13    x = self.layers(x)
14    return x
17model = Net(out_classes=10)
20data = dataset.MNIST(’./data’, train=True,
21    transform=[ToTensor()])
22loader = DataLoader(data, batch_size=16)
25for epoch in range(10):
26  for batch, label in loader:
27    # model(batch) == model.forward(batch)
28    output = model(batch)
29    loss = F.nll_loss(output, label)
30    loss.backward()\end{lstlisting}
31  \caption{Basic PyTorch training code.}
32  \label{fig:lb-basic}
33  \Description[Basic PyTorch neural network]{Simplfied Python/PyTorch neural network code that shows the general structure of PyTorch training.}
36Contemporary machine learning frameworks such as PyTorch~\cite{pytorch}, TensorFlow~\cite{tensorflow}, and Keras~\cite{keras} use Python APIs to build neural networks. Training a neural network with such frameworks is mostly patterned after a standard procedure which is illustrated in Figure~\ref{fig:teaser}. Typical PyTorch neural network training code can be divided into four stages. Figure~\ref{fig:lb-basic} shows a code example, a simplified image classification code taken from the official PyTorch MNIST classification example~\cite{pytorch_example}. We first define the series of neural network layers and make them into a single neural network module. To correctly assemble the layers, the returned tensor of the former layer must satisfy the input requirements of the next layer. We will see those requirements from the next section. The network is instantiated with some initialization parameters called hyperparameter, e.g., the number of hidden layers. Next, the input dataset is preprocessed and adjusted to the requirements of the network. Every dataset is cut into smaller same-sized chunks (called minibatches) from this stage. Finally, the main loop starts, and the minibatches are sequentially fed to the network. One epoch means a single loop that an entire dataset is passed to the network, and the number of epochs (datasets) usually differs depending on the purpose and structure of the neural network. Including the number of epochs, the numbers of iterations in the training code are determined to be constants in most cases, except the main training loop which depends on the size of a dataset.
38\subsection{Tensor Shape Errors}
42  \begin{subfigure}[t]{\linewidth}
43    \begin{lstlisting}[style=mypystyle]
44class Net(nn.Module):
45  def __init__(self):
46    super(Net, self).__init__()
47    self.layers = nn.Sequential(
48      ## ’B’ represents batch size
49      ## [B x 784] * [784 x 120] -> [B x 120]
50      nn.Linear(28 * 28, 120),
51      ## [B x 120] -> [B x 120]
52      nn.ReLU(),
53      ## [B x 120] * [80 x 10] -> ERROR!
54      nn.Linear(80, 10))\end{lstlisting}
55    \vspace*{-.6em}
56    \caption{Error on the network structure.}
57    \label{fig:lb-struct}
58    \Description[PyTorch error sample 1]{Network initialization code that contains dimension mismatch in matrix multiplication.}
59  \end{subfigure}
60  \begin{subfigure}[t]{\linewidth}
61    \begin{lstlisting}[style=mypystyle]
62class Net(nn.Module):
63  def __init__(self, batch_size):
64    self.batch_size = batch_size
65    # ...
66  def forward(self, x):
67    x = x.reshape(self.batch_size, -1)
68    # ...
70## some models may require exact batch size
71model = Net(batch_size=64)
74##    argument ’drop_last=True’ is essential
75loader = DataLoader(data, batch_size=64)
76# loader = DataLoader(data, batch_size=64,
77#                           drop_last=True)
79for epoch in range(10):
80  for batch, label in loader:
81    out = model(batch)
83    ##     last batch size: 32 (!= 64)
Figure 2. Error on the last minibatch.

[PyTorch error sample 2]Data feed loop that crashes on the last batch because of the wrong batch size.

1## POTENTIAL ERROR 2: channel size can be 3
2img = PIL.Image.open(’./image.png’).resize([28, 28])
3# img = img.convert(’L’)
6tensor = to_tensor(img).reshape(28 * 28)
7out = model(tensor)
(a) Insufficient data preprocessing.

[PyTorch error sample 3]Insufficient data preprocessing which does not consider both RGB and monochrome images.

Various type of tensor shape errors. [PyTorch error samples]Sample codes of various types of tensor shape errors.

Figure 3. Overall architecture of PyTea.

[PyTea architecture]Overall architecture of PyTea. It takes Python code and instantiation info (i.e., command-line arguments), translates into PyTea IR, extracts every paths and constraint sets, then feed them into SMT solver.

Figure 4. Constraint generation example.

[Constraint generation example]Four strategies of path generations. Top to bottom: Cnstraint generation, Conservative path pruning, Loop unrolling for constant-bound loops, and Unknown-length data loop

1class RandBlock(nn.Module):
2  def __init__(self):
3    super(RandBlock, self).__init__()
4    self.layer = nn.Linear(32, 32)
6  def forward(self, x):
7    rand_num = random.randint(0, 1)
9    if rand_num == 1:
10      result = self.layer(x)
11    else:
12      result = x
14    return result
16model = nn.Sequential(
17  [RandBlock() for _ in range(24)])
Figure 5. Path explosion example.

[Path explosion example]Neural network block which has runtime random variable in its feed-forward path.

Figure 2(a) presents the typical type of tensor shape errors, which are slight modifications of Figure LABEL:fig:lb-basic. From the first example, the second Linear layer (line 8), which multiplies the input with 8010-matrix, requires a specific shape of a tensor as an input. The first layer (line 6), however, returns a wrong-shaped tensor, and the overall pipeline will malfunction. This kind of error is called tensor shape mismatch error, simply, shape error. Shape error is rather hard to manually find, only to be detected by running the program with an actual input. Indeed, the most notorious error for machine learning engineers is the error that can only be occurred after an immense amount of machine-hours. Section 1.2 shows another example. Its declaration of training data loader (line 14) hides a shape error. DataLoader class slices the dataset sequentially by batch_size and passes it to the model. If the total length of the dataset is not divisible by batch_size, however, the size of the residual minibatch will be the non-zero remainder of the total length. See line 16: because the third parameter drop_last

is missing, the model assumes a consistent batch size (lines 10 and 6) hence the program will crash from the residual minibatch, losing the whole training hours. The recent massive networks like GPT-3 

(gpt3) require more than hundreds of machine-hours to train. This type of error must be noticed before its run. Figure 2(a) illustrates another shape error that can be arisen from a dataset, not a structure of the model. It does not take input from the pre-defined MNIST dataset but reads an image from a file. If the read image is RGB, which has 3HW dimensions, it will not fit into the reshape method that requires a tensor of 2828-elements. That means we have to convert it to a monochrome image before feeding it to the network. Even though it had been successfully tested with monochrome images, there can be a user who tests it with an RGB image, crashing the execution of the code. Though several works (ariadne; pythia; shapeflow; pytropos; semistatic) have reported tools to detect the shape mismatch errors of machine learning libraries, especially for TensorFlow (tensorflow), none of them have presented any static analysis tool that statically detects the shape errors for realistic Python ML applications. Real-world machine learning applications heavily utilize third-party libraries, external datasets, and configuration parameters, and handle their controls with subtle branch conditions and loops, but the existing tools still lack in supporting some of these elements and thus they fail to analyze even a simple ML application. To ensure that the shape error will not happen for any input data, we should statically infer a precise yet conservative range of each tensor shape and track its transformations through all possible execution paths.

2. Overview of PyTea Analyzer

To find out shape errors before runtime, we present a static analyzer PyTea (PyTorch Tensor Error Analyzer). PyTea statically scans PyTorch applications and detects possible shape errors. PyTea analyzes full training and evaluation paths of the real-world Python/PyTorch applications with additional data processing and mixed usage of other libraries (e.g., Torchvision (torchvision), NumPy (numpy), PIL (pillow)) Figure 3 illustrates the overall architecture of PyTea analyzer. It first translates the original Python codes into a kernel language, PyTea Internal Representation (PyTea IR). Then, it tracks every possible execution path of the translated IR and collects the constraints regarding tensor shapes that dictate the conditions for the code to run without a shape error. The collected constraint sets are given to Satisfiability Modulo Theories (SMT) solver Z3 (z3) to judge that those constraints are satisfiable for every possible input shape. Following the result of the solver, PyTea concludes which path contains a shape error or not. If the constraint-solving by Z3 takes too much time, PyTea stops and tells ”don’t know”.

2.1. Assumptions

Given the typical structure of PyTorch neural network training code (Section 1.2), we assume for the PyTea’s input the followings about the PyTorch deep neural network training code:

  • Other than the training or evaluation dataset, every input value required to execute the code is injected by command-line arguments.

  • There is no infinite loop and recursion. We assume that every loop bound except for the datasets will be fixed to a constant.

  • The unknown loop bound for the datasets is only for the size of each dataset in an epoch, and every iteration is either with a fixed-sized minibatch of the dataset or with a smaller, residual minibatch.

  • We assume that string-manipulation expressions have no effect on tensor shapes.

These assumptions are based on our observations that most PyTorch networks and codes can be statically determined to fixed structures once we give precise command-line arguments. Real-world PyTorch applications mostly construct their structures by command-line arguments or external configuration files like JSON files. Therefore, PyTea chooses to analyze programs only with exact command-line arguments. For a few networks that are not resolved to a single fixed structure, we consider all possible structures. The number of the possible structures is to be controlled by our path-pruning technique, and sometimes, for an inevitable case, by timeout.

2.2. Handling path explosions

The number of possible paths is exponential to the number of branches in sequence. For some complex neural networks, such path explosion is possible. For example, Neural Architecture Search (nas) or Networks with Stochastic Depth (stochastic) have branches inside the network themselves. Figure 5 shows a representative path explosion case that utilizes a runtime random variable. We can notice that the feed-forward function (forward(self, x)) has two execution paths in its body. The final structure of the network is made with 24 same blocks (line 17), which makes 16M paths. We handle this exponential cost blow-up by means of conservative path-pruning and simple-minded timeouts. If we can find that the result of the binding scope of that feed-forward function is pure (i.e., do not change any global value), and its bounded value is indeed equal for every path and not related with the branch conditions, we then safely ignore other paths except for one. If a path explosion arises even if using this method, we then use a timeout. See Section 3.2.3 for more details.

2.3. Handling Loops

For the loops in typical PyTorch neural network programs, as we discussed in Section 1.2 and accordingly assumed in Section 2.1, we do not need the full power of static analysis (staticanalysis). PyTea unrolls constant-bound loops (Assumption A2 in Section 2.1) and analyzes their straight-line code version. For the unknown-bound loops for datasets, PyTea analyzes the loop body for just two cases with the aforementioned assumption A3. One is for the loop with a fixed-sized regular minibatch of an epoch. The other is for the loop with the residual minibatch. For example, see code in Figure 4. For the third code box of Figure 4, we can unroll the loop expression to 3 same expressions. If we do not know the length of the dataset, such as the fourth code box of Figure 4, we use assumption A3 and consider only two cases for the two different sizes of minibatches.

3. Analysis Steps

3.1. PyTea IR

Figure 6. Abstract syntax of PyTea IR.222For the explanatory purpose, we did not include function calls and definitions. See supplementary material for detailed definitions of PyTea IR. Currently, we implemented 34 basic tensor expressions, and every other PyTorch API has been constructed with the basic expressions. The basic expressions are as following: Torch.__init__, Torch.__getitem__, isSameShape, scalar, identity, broadcast, matmul, mm, bmm, item, repeat, expand, expand_as, transpose, reduce, topk, view, conv2d, conv_transpose2d, pool2d, batchnorm2d, cross_entropy, cat, stack, unsqueeze, squeeze, diag, flatten, narrow, pixel_shuffle, layer_norm, pad, adaptive, interpolate.

[Abstract syntax of PyTea IR]Formal definitions of PyTea IR expressions

Figure 7. A tensor that has shape (2, 3, 4). The rank of this tensor is 3, and each dimension has size 2, 3, and 4.

[Tensor shape example]Stacked cubes which aligned with X, Y, and Z-axis. Total 24 cubes.

Figure 8. Abstract syntax of constraints.

[Abstract syntax of constraints]Formal definitions of PyTea IR constraints.

As the first step of the analysis, the input Python code is translated into the kernel language, PyTea IR. See Figure 6. PyTorch APIs are translated into tensor expressions that only define shape transformations, which PyTea IR focuses on. The second step of the analysis is to scan the PyTea IR code and generate constraints.

3.2. Constraint generation

Constraints are the conditions required by a PyTorch application so that it can be executed without any tensor shape error. For example, two operands of a matrix multiplication operation must share the same dimension. For each tensor operation (mm, reshape, readImage, etc. of Figure 6), the shape of the input tensor must obey the requirement of the corresponding operation. Figure 8 shows the abstract syntax of the constraints. Value expression represents the value of PyTea IR expressions, which can be used inside shape constraints. When PyTea analyzes a PyTea IR, it traces tensor shapes and primitive values of Python and constructs symbolic value expressions. Shape expression represents the shape of tensors, which is basically a tuple of integers . Figure 7 shows an example of a tensor with a shape . Each integer is a dimension size. We call the number of dimensions as a rank of a shape. We can slice () a shape expression or concatenate () two shape expressions. For example, suppose a PyTea IR variable t has shape . Expression t[0], which means the first sub-tensor of t along the first axis, can be represented inside constraints as , or simply . In case of expression t’s shape is unknown(), the shape of a sub-tensor will be represented as .

3.2.1. Constraint generation rules for PyTea IR

To capture Python semantics and PyTorch shape transformations, PyTea follows the static semantics () of PyTea IR. Judgment () means that the PyTea IR expression is statically approximated by a symbolic value expression under environment in case the constraint set () is satisfied. The environment () is a finite table that maps variables to symbolic value expressions. The introduction of constraints happens for branch expressions or PyTorch APIs (See Section 3.2.2). The other expressions will collect constraints from their subexpressions. For example, for an add expression (), see:

The result value is symbolically () where and are symbolic results of and respectively. The result constraint set will be a union of the result constraint sets of and . Every symbolic variable originates from external input, e.g., random function or a dataset. Every expression in the constraints is constructed by these variables and constant values.

3.2.2. Constraint types

In order to help the constraint resolution engine Z3 come up with a sensible counter-example that violates the derived constraints, we classify the constraints into two exclusive classes:

soft and hard constraints. For Z3 to generate counter-examples, soft constraints can be violated, while hard constraints should not. Thus hard constraints are, for example, those from branch conditions or about the value range of the input. See Figure 4 again. Python built-in random.randint function generates an unknown random variable within a given range [0, 1]. We mark that bound constraint as a hard constraint. On the other hand, torch.mm API demands that two input tensors have to be rank-2 () tensor and the second dimension (-coordinate) of the first tensor have to be equal to the first dimension (-coordinate) of the second tensor. This condition can be violated under the shape of the inputs, hence we mark it as a soft constraint.

Hard constraint generation

Hard constraints are those for inputs and branch conditions. Input conditions restrict the initial ranges of each input. Branch conditions split each path into two. Consider the following rule.

The readImage API is an image fetching API that creates a new 3-rank tensor which represents color channels, height, and width. The range of color channels is from 1 to 4, i.e., monochrome to RGBA, hence the constraint in the above rule. The symbolic value is a tensor of shape (). As another case, consider the following rule.

The randInt API generates a new random variable which is bound to given two numbers. This expression is used from the Python API random.randint. For branching case, see below:

The if expression creates two paths depending on the branch condition . If the branch condition can be evaluated to a constant boolean, we can safely drop one branch.

Soft constraint generation

Soft constraints are the conditions with which PyTorch APIs must comply for them to run without a shape error. For instance, two operands of a matrix multiplication have to share the same middle dimension, and the reshape operation requires that the number of elements of the input tensor must be matched with the number of elements of the target shape. Each PyTorch API holds unique requirements of input conditions, and PyTea collects these requirements as soft constraints. Following three rules, for example, PyTea collects such constraints from three representative APIs (mm, reshape and

The mm API calculates a matrix multiplication of two 2-rank matrices. The second dimension of the first matrix must be equal to the first dimension of the second matrix following the basic rules of linear algebra. The reshape API redefines the shape of a tensor. Reshaping a tensor does not change or drop the value of a tensor, so the target shape must have the exactly same number of values as the original shape ():

The transpose API swaps two dimensions of the tensor along the -axis and -axis. Unlike the normal 2-rank matrix transposition, transpose slices a tensor with -plane and transposes each matrix on each cross-section:

From this rule, we only consider the shape of the result, not the movement of the value inside the tensor.

3.2.3. Handling path explosion

Splitting execution paths whenever the analyzer encounters a branch can make the analysis cost grow exponentially. We can ignore some of them using the online constraint check, but we cannot for branches that use run-time input values. However, we can still avoid path split if both paths behave identically in terms of tensor shape. The conservative conditions are as follows:

  1. Constraints collected from each path are not dependent on the branch condition, and

  2. Each path has no global side-effect, and

  3. Two paths’ result symbolic values are the same.

PyTea checks the above conditions locally, within the boundary of the let expression containing each branch. When PyTea cannot statically decide on any of the three conditions, it safely assumes the conditions do not hold. Most branches in PyTorch neural network blocks satisfy the above conditions. Typically, network blocks should result in a tensor with a fixed shape that matches with a requirement of the next block or the training target tensor. Those blocks’ feed-forward path will be translated into nested let blocks with branches that return the same-shaped tensor.

3.3. Constraint check

3.3.1. Online constraint check

To reduce the number of constraints and paths, our analyzer eagerly simplifies the symbolic expressions and constraints with primitive arithmetics and comparisons. By our eager, online constraint check, the ranges of each symbol can sometimes be known and be used to judge the subsequent constraints. If a branch condition can be simplified into constant true or false, we can trace only a single branch without splitting the path. If a constraint can be simplified to constant false, we can immediately report that the path is unsafe.

Algorithm for offline constraint check with SMT solver Input: - logical conjunctions of hard, and soft constraint sets
Output: valid, invalid, dontknow, or unreachable
Function : if  then return unreachable end if
else if  then return valid end if
if  then return valid end if
else if  then return invalid end if
else  return dontknow
Algorithm 1 Offline Constraint Check with SMT Solver

3.3.2. Offline Constraint check

Figure 9. Test result of PyTea command-line tool.

Captured image of the result of PyTea command-line tool. PyTea analyzed errornous dcgan example and reported that the code has two invalid paths.

PyTea feeds the collected constraints of each path to Z3. Algorithm 1 describes how we classify the Z3’s result. The final result of PyTea analyzer can be divided into four cases:

  • Valid: Soft constraints are always satisfied under the hard constraints. It guarantees that shape error will not occur from this path.

  • Invalid: A possible shape error is detected. There is a counterexample that makes soft constraints false under the hard constraints. We also report the generation position of the first broken constraint.

  • Don’t know: Z3 failed to decide whether constraints are satisfiable or not.

  • Unreachable: There is a conflict between hard constraints in this path. In other words, it is impossible to reach this path under the given conditions. This can happen if a path had passed two contradicted branches.

If every path results in either unreachable or valid path, we can conclude that the input program has no tensor shape error.

4. Evaluation

Our experiments show PyTea’s practical performance for real-world applications. To see the practicality of PyTea, we have collected several complete PyTorch applications and shape-related PyTorch bugs. First, we analyzed the official PyTorch example projects from GitHub repository pytorch/examples(pytorch_example)

. This repository consists 11 complete PyTorch applications about major machine learning tasks from Generative Adversarial Network (GAN) 


to Natural Language Processing. We also collected some PyTorch shape mismatch errors from StackOverflow and ran PyTea to statically detect them with PyTea. Finally, we conducted case analyses of several fully-functional, hand-made PyTorch applications such as Stochastic ResNet 


Experiment Settings

PyTea analyzer is written in mainly TypeScript (typescript), and communicates with Python scripts to run Z3. We also used Pyright (pyright) to parse and track Python syntax. The experiments were conducted on R7 5800X CPU, node.js 16.0.0 with TypeScript 4.2.4, and Python 3.8.8 with Z3Py We fixed the epoch size to 1 from the command-line arguments, but used default values for the other settings. We measured the total elapsed time from the cold boot to the termination of PyTea. The full options and codes are written in the supplementary materials. 333Link: https://sf.snu.ac.kr/pytea/

PyTea command-line tool

Figure 9 shows an example snapshot of the analysis result of the PyTea command-line tool. It has analyzed one of the PyTorch example projects and prints the result of each phase of PyTea. It first prints out the online constraint check results and categorizes each path into three cases, potential success, potential unreachable, and immediate fail. The last one indicates that the online checker has found a constraint that can be false from that path. The potential unreachable path is the path which the online checker has found a false constraint, but there are certain unresolved branch conditions. That path will be checked at the next phase, and PyTea will examine whether the path has conflicted constraints only within the hard constraint set, which means that the path is unreachable from the beginning. From the second step, PyTea delivers the collected constraint set of each path to Z3 solver and runs the offline constraint checks. The offline check will report the first conflicted constraint and its position of creation, i.e., the exact tensor expression or PyTorch API that causes an error. If the solver does not found any conflicted constraint, PyTea concludes that all the paths are valid, hence no tensor shape error is possible.

4.1. Results

4.1.1. PyTea for PyTorch Examples

Network LOC (main + lib) PyTea Hattori et al. (semistatic) Total time (s)
dcgan 3714 (214 + 3500) 1.75
fast_neural_style 4394 (338 + 4056) 2.40
imagenet 3820 (320 + 3500) 2.40
mnist 3607 (116 + 3491) 1.59
mnist_hogwild 3620 (129 + 3491) 1.94
reinforcement_learning 180 (180 + -) -
super_resolution 3886 (193 + 3693) 1.57
snli 223 (223 + -) -
time_sequence_prediction 3333 (88 + 3245) 1.88
vae 3593 (102 + 3491) 1.70
word_language_model 3278 (361 + 2912) 1.81
Table 1. Analysis result of pytorch/examples code repository. The lines of library APIs encapsulated with the analyzer were counted separately. : Analysis succeeded and found injected errors, : Analysis succeeded but requires a modification of the main code (e.g., provide explicit input tensor), : Failed to analyze.
Question PyTea Hattori et al. (semistatic)
Case 1 (66995380)
Case 2 (60121107)
Case 3 (55124407)
Case 4 (62157890)
Case 5 (59108988)
Case 6 (57534072)
Table 2. Analysis result of the StackOverflow questions. The numbers in parenthesis denote the URL id of each question.

For the experiment, we pass each project twice to the analyzer. For the first pass, PyTea analyzed the main code unmodified, and we check that PyTea does not inform false positives. Then, we injected artificial shape errors, which we subtract one from the first dimension of the target tensor, right before the neural network’s loss calculation. This simple method is decided on purpose. From this experiment, we focused on the speed of PyTea which shows the practicallity in order to be integrated to the code editor such as VSCode. This configuration can check the analysis time of the main network, and also confirm that PyTea tracks the tensor operations from the main network thoroughly, and we check PyTea does not report false negative results. We have compared PyTea against another PyTorch analyzer of Hattori et al. (semistatic). Table 1 shows the overall results. Among the 11 projects, PyTea successfully analyzed 6 projects without any modification of the original source code. For three projects with a complex data preprocessing stage, PyTea needs a bypass (i.e., code modification) of that stage to infer the shapes of input tensors. PyTea has also succeeded in finding these injected errors. As these results show, PyTea is quick and effective enough to be integrated into code editors. Meanwhile, Hattori et al.’s analyzer failed for almost all benchmarks. Furthermore, since their semi-static approach requires an explicit shape of the input tensor, we needed to feed them an exact network model and input tensors to compare its performance with PyTea. Although we have aimed to analyze the codes without any modification, two projects are heavily dependent on third-party data managing libraries like OpenAI-Gym (gym)

. Because, at the moment, we are focusing on the analysis of PyTorch-centered applications, we decided not to support those libraries for now. Supporting more libraries is straightforward and is our future work.

4.1.2. PyTea for StackOverflow questions

1class LSTM(nn.Module):
2  def __init__(self, ...):
3    # 7 lines...
4  def forward(self, tokens):
5    # 5 lines ...
6    return out_scores
8model = LSTM(embedding_matrix=np.zeros((1181, 100)))
9loss_function = nn.NLLLoss()
10optimizer = optim.Adam(model.parameters())
13input = torch.ones(256, 4, dtype=torch.long)
14target = torch.ones(256, 4, dtype=torch.long)
15output = model(input)
18# output: [256 x 4 x 1181], target: [256 x 4]
19#     SHAPE MISMATCH: [256 x 1181] != [256 x 4]
20loss = loss_function(output, target)
22## FIXED
23# output: [1024 x 1181], target: [1024]
24loss = loss_function(output.reshape(256*4, 1181), target.reshape(256*4))
Figure 10. Example code of StackOverflow question. (Case 2)

To show that PyTea can identify yet another set of real-world shape mismatches, we collected some PyTorch shape errors from StackOverflow questions. Recent TensorFlow analyzers (pythia; shapeflow) used a TensorFlow error dataset collected by Zhang et al. (zhangbug), but we manually gathered PyTorch shape mismatch cases rather than using their dataset, because of the fundamental difference of the structures between TensorFlow and PyTorch. We also considered porting the TensorFlow error dataset into PyTorch codes, but we concluded that the ported codes are fairly old and artificial and do not reflect the standard method to build a PyTorch application. Table 2 gives the analysis results of the 6 questions that we have collected. PyTea could detect every shape mismatch case from those questions. Following the analysis result, we could find the exact error positions and fix the shape mismatch cases. For example, the main code (Figure 10) of Case 2 does not satisfy the shape conditions for the inputs of NLLLoss (line 9). The NLLLoss module requires that the shape of the first input tensor without the second dimension is equal to the shape of the second input tensor. PyTea found out that NLLLoss could generate a shape error from our experiment. We then fixed the code according to the StackOverflow answer, and PyTea checked that every path became valid.

4.2. Discovered Errors in PyTorch Applications

We applied PyTea to several realistic PyTorch applications which contain potential shape errors or path explosion. PyTea-found shape errors include the typical type of shape errors that we introduced at Section LABEL:sec:tensor-error. The complete projects and experiment scripts from this section will be in the supplementary material.

4.2.1. Detecting insufficient data preprocessing

We found a potential error at the data preprocessing stage from fast_neural_style application of pytorch/examples repository. As shown in Figure LABEL:lb-fast2, Image.open does not guarantee the loaded image has channel 3, i.e., RGB image. Therefore, any training or inference stage with a monochrome image will fail if we miss the channel converting method like line 4. This error was remained from the initial version and was fixed by the latest commit (a3f28a2) of the preprocessing script.

4.2.2. Handling path explosion

1def load_image(filename, size=None, scale=None):
2  # POTENTIAL ERROR: channel size can be 1.
3  img = Image.open(filename)
4  # img = Image.open(filename).convert(’RGB’)
5  # ...
6  return img\end{lstlisting}
7  \caption{Insufficient preprocssing of image file.}
8  \label{lb-fast2}
12  \centering
13  \begin{lstlisting}[style=mypystyle]
14def forward(self, x):
15  residual = x
17  if self.training:
18    # sample random float value
19    sample = self.m.sample().item()
22    if sample > 0:
23      out = self.conv1(x)
24      out = self.bn1(out)
25      out = self.relu1(out)
26      out = self.conv2(out)
27      out = self.bn2(out)
29      if self.downsample is not None:
30          residual = self.downsample(x)
31      out = out + residual
32    else:
33      if self.downsample is not None:
34          residual = self.downsample(x)
35      out = residual
36  # ...
38  out = self.relu2(out)
39  return out\end{lstlisting}
40  \caption{Path explosion in Stochastic ResNet block.}
41  \label{fig:sto-res}
46  \centering
47  \begin{tabular}{c}
48    \begin{lstlisting}[style=mypystyle]
49class NTXentLoss(torch.nn.Module):
50  def __init__(self, batch_size, temperature):
51    super(NTXentLoss, self).__init__()
52    self.batch_size = batch_size
53    # ...
55  def forward(self, zis, zjs):
56    batch = self.batch_size
57    repr = torch.cat([zjs, zis], dim=0)
58    sim = self.similarity_function(repr)
60    ## zis: [B x N], sim: [2B x 2B]
61    ## CONSTRAINT: -sim.shape[0] <= b <= sim.shape[0]
62    l_pos = torch.diag(sim, b)
64    # ...
65    diag = torch.eye(2 * b)
66    l1 = torch.diag(torch.ones(b), -b)
67    l2 = torch.diag(torch.ones(b), b)
68    mask = diag + l1 + l2
69    mask = (1 - mask).type(torch.bool)
70    # ’mask’ tensor has (4b^2 - 4b) True values.
72    negatives = sim[mask].view(2 * b, -1)
73    # shape of ’negatives’: (2b, 2b - 2)
74    # ...
76# ...
77train_loader = DataLoader(
78  train_dataset,
79  batch_size=256,
80  # drop_last=True, # ERROR
82losses = train(net, train_loader)\end{lstlisting}
83  \end{tabular}
84  \caption{Shape inference which requires the exact values of a tensor.}
85  \label{fig:lb-simclr}
88For a neural network model which contains a runtime path-explosion, PyTea analyzed it without a timeout. The \verb+stochastic-resnet+ example uses several deep learning techniques, mainly stochastic depth training~\cite{stochastic}. See Figure~\ref{fig:sto-res}. From this application, the building block of the network contains runtime branches (line 9) that can cause a path explosion. PyTea’s path handling algorithm can successfully prune those branches and finishes without timeout. (Caveat: the overall data handling is somewhat hard to follow; we did not automatically reduce the repeat count of the main training loop. We explicitly reduced the length of the dataset 


 with a configuration file ({\verb|pyteaconfig.json|}), and without modifying the code itself.)
90\subsubsection{Handling both regular and residual batch sizes in the training loop}
92PyTea considers a residual minibatch in the training loop which leads to a shape error, as we discussed in Section~\ref{sec:overview}. We simplified the \verb+SimCLR+~\cite{simclr, simclr_repo} application to a single PyTorch-only script. From line 4 of Figure~\ref{fig:lb-simclr}, the main network class {\verb|NTXentLoss|} takes an exact batch size to initialize itself. So if we omit {\verb|drop_last|} parameter that removes the last batch at line 32, the last residual minibatch will lead to a crash if the total data size cannot be divided into the batch size. PyTea finds that the inequality between two batch sizes from line 14 of Figure~\ref{fig:lb-simclr} generates a shape error.
94\subsection{Limitation of PyTea}
96The main focus of PyTea is the detection of shape errors, so it does not perform general value analysis such as tracking the value of the tensor or array index out-of-bound exception.
98If a shape of a tensor is dependent on the value of the other tensor, PyTea can miss a shape error. For instance, the \verb+view+ method at line 18 of Figure~\ref{fig:lb-simclr} requires that the element count of an input tensor is divisible by $2b$. Tensor masking by a boolean tensor (\verb+similarity_matrix[mask]+) returns a 1-D tensor whose length is equal to the number of \verb+True+ of the masking tensor. Although lines 10 to 14 guarantee that the masking tensor has $4b^2-4b$ \verb+True+, we do not know the \verb+view+ API will succeed since we do not track the exact value of a tensor.
100\section{Related Works}
102There is only one work~\cite{semistatic} of statically detecting shape mismatch of PyTorch applications. Hattori et al.~\cite{semistatic} presented a semi-static analysis of PyTorch applications that requires explicit tensor inputs. Because of the path-insensitive and semi-static approach, their tool is premature to fully statically analyze real-world applications. As shown in Table~\ref{tbl-github}, the performance of their tool is impractical.
104For TensorFlow applications, the latest static analyzer is Pythia \cite{pythia}, following the same group’s previous work Ariadne~\cite{ariadne}. Pythia is dependent on the Doop framework\cite{doop, ptaint} for Java pointer analysis and the Datalog language. Since Pythia’s target is not Python, their coverage of Python and TensorFlow is still insufficient to handle real-world applications. For example, Pythia cannot analyze integer modular operation and tensor indexing and slicing, as shown in Figure~\ref{fig:pythia}. ShapeFlow~\cite{shapeflow} is a tester, a dynamic analyzer with fake TensorFlow libraries that only track shape transformations. Their dynamic approach achieved better performance and coverage than Ariadne and Pythia, but it requires a reduced dummy dataset to run their tool. It cannot detect a possibility of shape mismatch caused by an untested input dataset.
106There are several works to solve the shape mismatch problems\cite{ariadne, pythia, shapeflow, pytropos, semistatic}, but they all have fundamental limitations to analyze PyTorch machine learning applications, such as the lack of support for handling external data, branches, and loops. Also, most of them work on TensorFlow applications.
108%PyTorch constructs its graph dynamically which external input value controls the branches and shape of the graph. Any static PyTorch code analyzer has to handle those dynamic semantics. TensorFlow~\cite{tensorflow} is notoriously hard to debug as it constructs the networks statically; their development team had decided to change its basis to dynamic construction like PyTorch framework in TF version 2.0. The migration to 2.0 means it will outdate the previous TensorFlow analyzers, and we expect that our work can be adapted to TF 2.0.
110Static analyses for Python programs have also been reported~\cite{fromherz, pytropos}. Notably, Cruz-Camacho’s thesis\cite{pytropos} contains the shape analysis of NumPy\cite{numpy} array operators. However, their coverage of Python syntax is restricted that custom function and class declaration are not supported. PyExZ3\cite{pyexz3} is a value analyzer for Python language that implemented dynamic symbolic executor with Z3 backend. To port it for a shape mismatch problem needs a sizeable overhaul.
112\section{Conclusion and Significance}
114We have developed an automatic static analyzer PyTea that detects tensor-shape mismatch errors in PyTorch’s deep neural network code. Our experiments have shown that PyTea’s performance is practical in reality.
116\paragraph{Significance} The tensor-shape mismatch error is a critical bug in deep neural net training code, yet hard to statically detect by programmers. We presented a solution to this problem: automatic static analyzer PyTea whose performance is realistic. The analysis design strikes a balance between the cost, accuracy, and coverage with a focus on the typical program structure of PyTorch deep neural net code base. PyTea is classified as a bug-finder, not a verifier: PyTea can have false positives or false negatives in principle, yet we observed no such cases in our experiments. PyTea is ready for public release.
120  \centering
121  \begin{lstlisting}[style=mypystyle]
122import tensorflow as tf
124one = 1
125four = 1
126if one == 1:
127  four = 4
129target = tf.ones((4, 5))
131with tf.Session() as sess:
132  t0 = tf.ones((3, 4))       # [3 x 4] * [4 x 5]
133  p0 = tf.matmul(t0, target) # Pass: Correct
135  t1 = tf.ones((3, 5))       # [3 x 5] * [4 x 5]
136  p1 = tf.matmul(t1, target) # Error: Correct
138  t2 = tf.ones((3, 5 % 2))   # [3 x 2] * [4 x 5]
139  p2 = tf.matmul(t2, target) # Pass: False Negative
141  t3 = tf.ones((3, 5))[0]    # [5] * [4 x 5]
142  p3 = tf.matmul(t3, target) # Pass: False Negative
144  t4 = tf.ones((3, 5))[0:1]  # [1 x 5] * [4 x 5]
145  p4 = tf.matmul(t4, target) # Pass: False Negative
147  t5 = tf.ones((3, four))    # [3 x 4] * [4 x 5]
148  p5 = tf.matmul(t5, target) # Error: False Positive
149  # ...\end{lstlisting}
150  \caption{Basic tensor operations that Pythia~\cite{pythia} fail to analyze correctly.}
151  \label{fig:pythia}
156This work was partially supported by Korea Institute for Information \& Communications Technology Promotion (No.2021-0-00059), NAVER CLOVA (No. 0536-20200005) and Supreme Prosecutors Office of the Republic of Korea (No. SPO2020A1103DIGITALB, No. SPO2021A1103DIGITALB). This work was also supported by BK21 FOUR Intelligence Computing(Dept. of Computer Science and Engineering, SNU) funded by National Research Foundation of Korea(NRF) (4199990214639).
159%% The next two lines define the bibliography style to be used, and
160%% the bibliography file.
165% %% If your work has an appendix, this is the place to put it.
166% \appendix
171%% End of file ‘sample-sigconf.tex’.