Ideally, robotic computation should be highly accurate, responsive, and fast, as well as compute-and-power-efficient. Modern robots, however, face the challenge of selecting from an array of heterogeneous compute resources, each with a unique trade-off between accuracy and compute cost. For example, should a factory robot trust the perception results from an on-board deep neural network (DNN) or ask a busy human supervisor for help? Likewise, should a small drone compute its motion plan locally, or wait for a higher-fidelity plan from a remote server? At their core, these scenarios are instances of a compute model selection problem, where a robot must gracefully balance task-relevant accuracy with compute time, power, or network and human-processing delay.
Figure 1 illustrates the model selection problem addressed in this paper. Given the sensor observations at each time step, a robot’s model selection policy must dynamically invoke either a fast, compute-and-power-efficient model () or a slower, more accurate model () based on a high-level task’s required accuracy. Variants of this problem have been studied for perception tasks in cloud robotics [chinchali2019network, rahman2017motion] and human-robot collaboration [cakmak2012designing, whitney2017reducing, kaipa2016enhancing]. However, existing works either offer specialized point-solutions (e.g. for perception [eshratifar2020runtime, dorka2020modality]
) that do not readily generalize to other domains, use hand-engineered heuristics, or employ uninterpretable, learning-based algorithms[chinchali2019network, dinh2018learning]. Our key contribution is to provide a unified, interpretable, and theoretically-grounded framework for compute model selection in robotics.
The fundamental principle behind selecting an appropriate compute model is to perform a cost-benefit analysis. Our key insight is that a robot’s model selection algorithm can leverage the statistical, and often analytical, correlation between the accuracy of the fast and the slow compute models. This correlation can enable us to perform reliable and interpretable cost-benefit analysis between compute cost and gain in accuracy for the different models. Crucially, such correlations between fast and slow models are now possible even for state-of-the-art DNNs, due to recent advances that compress large DNNs with provable approximation guarantees [baykal2019datadependent, liebenwein2020provable].
Literature Review: Our work is broadly related to computational offloading in cloud robotics as well as teacher feedback for human-robot interaction. The closest work to ours is [chinchali2019network]
, which develops a deep reinforcement learning (RL) policy to select between a fast, less accurate deep neural network (DNN) or slower, more accurate DNN running at a cloud computing server. Indeed, we explicitly build upon[chinchali2019network] in our formulation by considering fast and slow compute models with a hierarchy of compute costs. In stark contrast to [chinchali2019network], however, we avoid uninterpretable, RL-based model selection policies. Instead we leverage statistical correlations between fast and slow computational models, such as compression algorithms for DNNs [baykal2019datadependent, liebenwein2020provable]. Further, unlike [chinchali2019network]
, we introduce a theoretically-grounded cost-benefit analysis for model selection, which generalizes beyond DNNs to high-dimensional linear regression and even sampling-based reachability problems, as shown in our evaluation.
Our work is also inspired by methods to compress large DNNs for efficient inference on compute-and-power limited robots. For example, the EfficientNet [tan2019efficientnet] suite of vision models provides 7 model variants that trade-off accuracy with model size and latency. Importantly, recent methods utilize core-set theory to train fast, compressed DNNs that provably approximate a slower DNN by pruning convolutional filters based on sensitivity analyses [baykal2019datadependent, liebenwein2020provable].
Finally, our work is related to scenarios where a robot must selectively ask a human teacher for clarification during active learning tasks[cakmak2012designing, whitney2017reducing] or remote assistance for manipulation [kaipa2016enhancing]. In principle, our framework applies to such settings if a robot can accurately correlate its confidence with the marginal accuracy gain it receives from human feedback. In practice, however, such correlations can often be learned from historical interaction data but are hard to analytically quantify.
Contributions: Given prior literature, our contributions are three-fold. First, we design an interpretable model selection algorithm which leverages analytical correlations between fast and slow model performance to dynamically decide which model to invoke. Second, we show how our algorithm can naturally leverage recent advances that compress large DNNs with provable approximation guarantees that relate fast and slow models. Third, we show strong experimental performance of our algorithm on diverse domains ranging from robotic perception to sampling-based reachability analysis for a simulated rover navigating Martian terrain data.
Organization: This paper is organized as follows. In Section II, we introduce a general formulation for model selection to gracefully trade-off task accuracy and compute costs. In Sec. III, we provide theoretical guarantees for model selection and instantiate them for applications in perception and reachability analysis in Sec. IV. Finally, we provide our experimental results in Sec. V and conclude in Sec. VI.
Ii Problem Statement
In this section, we formally define the problem of model selection depicted in Fig. 1 by introducing compute models, an accuracy metric, and a performance criterion.
Compute Model Input: The input to the compute model, at time , is denoted by . We denote the input data distribution by , that is, . In practice, could represent a depth-camera image or laser scan.
Compute Models: The compute models are denoted by , where . Given an input , the output is denoted by . The cost associated with is given by . The cost is context-dependent, such as battery consumption, compute inference latency, or even communication latency for cloud robotics tasks. For example, the compute models could be a DNN, with image input and corresponding segmentation . The distribution of outputs is denoted by , that is, . The ground-truth output associated with input is denoted by .
Loss Function: Let
be the loss function, that formally quantifies the quality of the outputreturned by a compute model compared to the ground-truth result of . A lower value of indicates a more accurate output. In practice, the loss function is context-dependent, such as the cross-entropy loss for image classification.
Model Selection Policy: Given an input , the model selection policy decides whether to use the slow compute model , or the fast model . We assume that the policy has access to the results of the fast model (without this information, the policy would be purely random). Therefore, the problem of model selection is to infer whether or not to additionally invoke the slow model to enhance task accuracy if the fast model results are insufficient for a robot’s high-level goal. The challenge is that the robot only has access to the input and the fast model output
and thus must estimate the accuracy benefit of the slow modelbefore invoking it.
Formally, we define the model selection policy as . Given input and fast model prediction , we define the action as . The action indicates selecting the fast model , and indicates selecting the slow model . We define the cost associated with each action as :
Reward: To simultaneously achieve high task accuracy while minimizing the cost of compute, we introduce a per-timestep reward. Given input , the output of the fast model , and model selection , the corresponding reward is:
where , are user-defined weights to balance the emphasis on accuracy and cost. These can be flexibly set by a roboticist given the unique requirements of a high-level task. For example, a fleet of low-power, compute-limited warehouse robots that rarely interact with humans might have a higher emphasis on cost to minimize how many times they query a shared central server or remote human supervisor. Conversely, robots that operate in safety-critical scenarios will have a much higher emphasis on accuracy given by .
Ii-a Formal Problem Definition
Given a stream of inputs, , our goal is to propose an optimal model selection policy , that provably maximizes the expected cumulative reward:
Intuitively, achieves the optimal balance between the cost and accuracy over the given period of time steps. We now formally define the model selection problem.
Problem 1 (Model Selection for Inference)
Given fast model , slow model , loss function , and model selection cost , find the optimal model selection policy , which maximizes the reward (Equation 1) over a finite horizon :
Ii-B Discussion on the Problem Definition
The model selection problem is broadly applicable in robotics since it is agnostic to the nature of the compute models, the loss function, or even costs. For example, the models could represent small quantized and large, compute-intensive DNNs or even small and large databases or random forests. Further, the costs could represent battery consumption or communication delay or model inference time.
The main challenge of this problem is the limited information available to the selection policy, namely the input , fast model output , and the cost function . The key challenge is to estimate the accuracy of the slow model, , before even invoking it, which motivates our key technical approach to statistically relate both models’ accuracy.
Iii An Algorithmic Approach To Model Selection
In this section, we provide an optimal solution to the model selection problem (Problem 1). First, we make the following practically-motivated assumption.
Assumption 1 (Action and State Independence)
Given a model input at any time , the model selection of policy does not affect the next robot measurement .
Our assumption is practical in many robotics scenarios, since is simply a choice of a compute model to process inputs, not a physical actuation decision. For example, a robot can run a fast perception DNN on images at every timestep and its choice to optionally consult a slower DNN does not affect the new image observation , which is instead largely affected by its ego-motion and surroundings. Our assumption will not hold for fast-moving robots whose control decisions are heavily dependent on the perception model they invoke, which we discuss in our future work.
The optimal model selection policy that solves Problem 1 is of the form:
By Assumption 1, the action at every time does not affect the next state . Thus, given any input , the actions are independent, so to maximize the cumulative reward it suffices to maximize the reward at every time-step independently. Recall from Equation 1 that the reward depends on two choices of , that is, . Therefore, Problem 1 can be rewritten as:
Substituting in the reward definition (Equation 1), we see that we should choose the slow model only when the associated reward is higher than continuing with the fast model. Thus, we choose only when:
Simplifying, we arrive at the desired result:
Theorem 1 suggests a simple model selection policy, which estimates the model accuracy gap , and only chooses the slow model if the gap is greater than a threshold that depends on the relative compute costs and weights of accuracy via . However, the key challenge is that calculating requires querying the slow model and knowledge of the ground-truth value . We now transition to two practical approaches to directly instantiate the guarantees from Theorem 1 in practice.
First, we note that in many practical deployment scenarios, the ground-truth oracle values are not present. In such practical settings, the more accurate slow model simply serves as the ground-truth, such as when a slow human supervisor makes ground-truth decisions. In the absence of human annotations, a large, compute-intensive DNN can serve as the slow model and ground-truth. Thus, we present the following lemma of Theorem 1.
The optimal model selection policy that solves Problem 1, when at all times is:
The proof is the same as Theorem 1, where we note that when the oracle and slow models are identical. For our evaluation, we use Lemma 1 as our model selection policy as it best reflects practical autonomous deployments. The key challenge to directly applying Theorem 1 and Lemma 1 is to accurately estimate the loss between fast and slow models solely using predictions from the fast model. However, we now show we can indeed compute the expected accuracy benefit for a broad class of fast and slow models that are related by provable approximation guarantees. Specifically, in Subsection III-A, we instantiate the guarantees of Lemma 1 to provide a closed-form, analytic model selection policy for linear regression problems. Crucially, we then extend our analysis to DNN inference in Subsection III-B.
Lemma 1 provides a general framework for model selection. For a novel setting, it can be instantiated by selecting the: (i) compute models, (ii) loss function, (iii) compute model cost, and (iv) characterizing the statistical relationship between compute models to derive the selection policy. Sections III-A and III-B instantiate Lemma 1 for specific cases of Linear Regression and DNN inference.
Iii-a Analytical Results for Linear Regression
We now apply the guarantees from Lemma 1 to an illustrative warm-up example of high-dimensional linear regression. Recall that our challenge is to estimate the expected value of from the information available to the selection policy, namely input and fast model prediction . To overcome this challenge, we apply results to approximate linear regression models using coresets [Boutsidis2012RichCF], which are importance-ranked subsets of a large training dataset. Importantly, a model trained on just the coreset will provably approximate the predictions of one trained on the full dataset. For example, a fast model could be trained on only a core-set of local data on-board a robot while a large one could be trained on multiple robots’ data in the cloud.
Compute Models: Let the fast and slow compute models be linear regression models , where , , and . We assume the slow model
is learned on a full set of training samples from a joint distribution on, while the fast model is only trained on a core-set of the original data.
Loss Function: Let the loss function be the standard norm loss: ; where . The following coreset guarantees follow from [Boutsidis2012RichCF]:
Property 1 (Relation between fast and slow models [Boutsidis2012RichCF])
For all , given input , denote the compute model outputs as and . Then, there exists an such that:
where and are the input and output distributions, meaning , and . [Boutsidis2012RichCF] provides the approximation factor based on the relative size of the core-set compared to the full training set.
Property 1 allows us to relate the fast and slow model predictions as:
Thus, the loss function can be upper bounded as:
We stress that Lemma 1 provides the optimal solution and the above solution is an approximation since we upper-bound the loss function between fast and slow models. However, our subsequent experiments show this is a very tight bound and implementing Eq. 6 yields very close performance to an unrealizable oracle solution that has perfect knowledge of the fast and slow model predictions.
Iii-B Analytical Results for Deep Neural Networks (DNNs)
We now provide a similar analysis to the linear regression scenario for the important case when a robotic perception DNN has been compressed using recently developed coreset guarantees [baykal2019datadependent, liebenwein2020provable]. Specifically, [baykal2019datadependent]
compresses fully connected DNNs with ReLU activations by targetedly removing weights with low relative importance via coresets.[liebenwein2020provable]
extends this work to convolutional neural networks (CNNs) by using coresets to remove convolutional filters that a prediction is least sensitive to, which enables a compressed DNN to provably approximate its original counterpart.
Compute Models: Let the models : , where be DNNs. Both models are trained on a set of samples drawn from a joint data distribution on .
Loss Function: As for linear regression, the loss function is an norm loss, such as for depth estimation from a perception CNN.
We now use the following guarantees for DNNs.
Property 2 (Relation between fast and slow models)
For all , given , , and , there exist an , such that the following holds [baykal2019datadependent, liebenwein2020provable]:
where is an upper bound on the error described below. and depend on the extent of DNN compression. Further, input and outputs .
Equation 8 and bound arise from the observation that in practical engineering scenarios, the outputs of a neural network and thus the loss will be bounded since they have physical meaning. For example, for a regression loss with perception, could be derived from the largest depth-reading a depth sensor can register. Likewise, for classification, is naturally bounded by 1 since the outputs are softmax scores from a cross-entropy loss.
We now use the core-set relationship to analyze Lemma 1 for DNN inference as follows:
Thus, using Equation 9, the loss can be upper bounded as:
Therefore, the expectation of the loss function is:
As for linear regression, we emphasize Lemma 1 is optimal and Eq. 12 is an approximation since we are bounding the expectation using the core-set guarantee. However, our experiments show that implementing Eq. 12 as a proxy for Lemma 1 works well in practice. More broadly, we emphasize that core-set guarantees are simply one way to instantiate the general policy provided in Lemma 1. For example, a roboticist could also use other practically-relevant models such as random forests or even approximate databases if they can reliably relate fast and slow model accuracy.
Iv Application Scenarios
We now describe example application scenarios of high-dimensional linear regression, DNN inference, and reachable set computation for a simulated Mars Rover to demonstrate the theoretical guarantees from Section III.
Iv-a Linear Regression
Using the analytical results from Subsection III-A, we demonstrate our model selection policy by simulating it on a toy example of linear regression. Let be any general-purpose linear regression model. The amount of time takes to generate an output is 2.5 seconds. Using coresets, we compress the linear regression model , to a faster linear regression model . The model takes 1 second to generate an output. The compression is such that the relation in Equation 3 holds with .
Iv-B Compute Efficient Robotic Perception
We now stress-test our algorithm on a scenario, inspired by [taxinet], where an aircraft must autonomously track a runway center-line using a wing-mounted camera for state estimation. This scenario, henceforth referred to as the TaxiNet scenario as per [taxinet], uses a DNN to map from camera images to an estimate of the aircraft’s lateral distance from the runway center-line and heading angle , which are linearly combined to create the aicraft’s steering control. We chose the TaxiNet scenario since the central idea is broadly applicable to resource-constrained robotics, such as low-power drones that use efficient vision models to estimate their real-time pose relative to a landing site.
We trained a ResNet-18 DNN [resnet18] to serve as the slow perception model using over 50K images from the standard X-Plane simulator [XPlane] using a publicly-available dataset [taxinetDataset]. The ResNet-18 achieved a low MSE loss of 0.038 on an independent test dataset of 18,372 images, where each image took 0.17 seconds for inference on a CPU. We compressed to yield a quantized ResNet-18 as , which was 47.21% faster but had a 64% higher loss, illustrating a clear need for model selection.
Iv-C Reachable Set Computation
In this subsection, we apply our model selection policy to safety assessment for robot navigation. Consider a robot, such as a Mars rover, navigating an unexplored environment. The robot has to assess whether its maneuvers are safe while considering environment uncertainties such as the coefficient of friction, wind disturbances, etc.. This is done by computing a reachable set, a set that contains all the states a rover can potentially reach. The robot can make the maneuver safely if the reachable set does not overlap with any obstacles. The reachable set computed by the fast compute model has confidence that is an order of magnitude lesser than the reachable set computed by the slow model.
We assume that the closed loop dynamics of the robot is given as a nonlinear system. For computing the reachable sets, we approximate the nonlinear dynamics locally as an uncertain linear system, where the coefficients in the dynamics belong to a bounded range.
Consider the discrete uncertain linear dynamical system where = where is the state, is the next state, and represents either the modeling uncertainty or a parameter. Given an initial state , the reachable set of the uncertain linear system includes the set of states reached by the system for any value of in the interval for a specified time horizon .
Prior work [inproceedings, 7318279, 10.1145/3358229] has shown that computing reachable sets for linear systems with uncertainties is a computationally expensive process. Recently, a statistical approximation of the reachable set has been presented in [ghosh2020]. The confidence of the statistical approximation can be tuned by the user according to her performance and accuracy requirements. Leveraging the flexibility of this statistical approach, we generate a fast compute model which has medium confidence and slow model that has high confidence over the computed reachable sets. Given the various constraints on robot resources, the model selection policy should invoke the appropriate compute model to guarantee safety while minimizing the cost.
Formally, consider a linear dynamical system with uncertainties, represented as , where is an uncertain dynamical matrix. The reachable set of the current state up to a time horizon , is denoted as . Though the dynamics is given as , it can encompass the open loop behavior (where are uncertain matrices), if a control sequence is provided. In such cases, the uncertain matrices are combined together to by concatenating the state and open-loop control.
A system is unsafe if the reachable set intersects with the unsafe set, such as obstacles. That is, given an unsafe set , a system is unsafe if and only if . Given and a set , we denote the uniform expansion (bloating) of set by as .
We now formally present the model selection problem for safety assessment of robot navigation.
Computation Models: The compute models , denoted as , compute approximations of the reachable set of an uncertain linear system defined by [ghosh2020]. The statistical guarantee associated with is as follows:
has a type I error of . Here, the confidence
and allowable type I errorare user-given parameters to the models. Intuitively, means the probability that the reachable set of any sample dynamics is contained within the reachable set is at least probability . Computing high-confidence approximations of the reachable set requires more statistical samples and therefore a higher computational time and cost. In particular, the required confidence set by a user for statistical guarantee is directly proportional to the required number of samples. Thus, we set the slow model to be a high-confidence reachable set and the fast model to be a lower-confidence approximation, so , , and therefore . We denote the outputs of the fast and slow models as and .
The crux of our selection policy is that we can relate the reachable sets returned by both models by a factor of :
Property 3 (Relationship between fast and slow models)
Given and , for all , there exists an such that:
In a calibration dataset, we can compute the fast and slow model reachable sets for all time steps. Then, we can set to be the minimum factor to bloat the robot’s set such that the bloated version over-approximates the slow model’s reachable set at all times. Thus, a robot can quickly run the fast model, bloat it by , and continue planning if the bloated set does not intersect an unsafe region, as formalized below.
Loss Function: Given the safety-critical nature of navigation, the loss is 0 when the reachable set doesn’t intersect an unsafe set and otherwise. Defining the reachable sets used to compute intersections with obstacles as and , the loss for any model is:
Intuitively, the above policy exploits the relationship between fast and slow models by first bloating the fast model’s reachable set by a factor of to create a guaranteed over-approximation of the slow model’s reachability computation. If the over-approximation does not intersect obstacles, we are guaranteed safety and simply proceed. If not, we need to invoke the slow model to assess its higher-fidelity reachable set and re-plan a trajectory if it indicates unsafety. While we implemented our policy with to prioritize safety, safety is also heavily emphasized in the loss function (Eq. 15) since the penalty is for collisions.
Iv-D Benchmark Algorithm
We evaluate the performance of our model selection policy against the following benchmark policies:
Fast: This policy always uses the fast model with prediction for all .
Slow: This policy always uses the slow model with prediction for all .
Random: The robot randomly chooses between the fast and slow model with equal probability.
Oracle: This strategy assumes that the slow model’s output is available to the model selection function at the time of inference. Thus, this strategy only selects the slow model when that decision has a better reward than using the fast model. The oracle is an upper-bound, unrealizable strategy since it assumes privileged knowledge of the slow model.
The principal objective of our evaluation is to show that our model selection policies from Lemma 1 and Equations 6, 12, and 16 achieve a significantly higher reward than benchmark model selection policies. Further, we show how our policy achieves better accuracy with a lower cost than competing benchmarks on simulations of linear regression, aircraft taxiing with state-of-the-art DNN perception models, and rover navigation with real Martian terrain data. All our code (in Python) and models are publicly available at [modelselection].
V-a Linear Regression Results
We now evaluate our selection policy for linear regression, as described in Equation 6 and Subsection IV-A. The key highlight is that our policy achieves 245.4% higher reward than benchmarks in 100 trials, each of duration timesteps with stochastic Gaussian inputs . Figures 2 (Left) and 3 (Left) show the cumulative rewards and trade-off between accuracy and cost, respectively, of all algorithms.
V-B Deep Neural Networks (DNN)
We now evaluate our model selection policy for the TaxiNet aircraft taxiing scenario from Subsection IV-B. Our key result on 18,372 test images is shown in Figure 2 (Center), where our policy (Our Selector) achieves 22.22% higher reward than competing benchmarks and is within 10.18% the performance of an upper-bound Oracle. Moreover, Figure 3 (Center) shows that our model selection policy achieves low loss with low cost unlike competing policies. This is because our policy leverages the statistical correlation between models to mostly rely on the fast model to reduce cost, but also opportunistically queries the slow model for higher accuracy. However, our policy is careful to only invoke the slow, accurate model when there is a substantial accuracy gain, leading it to be queried only 68.6% of the time.
V-C Reachable Set Computation
We now demonstrate the performance of our model selection policy (Equation 16) to determine the safety of a simulated Mars Rover navigating steep obstacles on terrain from NASA’s HiRise Dataset [hirise, nakanoya2020taskrelevant]. A low-power rover must always be safe, but also fast and compute-and-power-efficient while accounting for reachable sets while planning.
The rover is assumed to follow a linearized bicycle model with bounded perturbations in the dynamics matrix for yaw angle. Given an intended path, we use our model selection policy (Equation 16) to determine safety given uncertain dynamics while minimizing compute time. Specifically, given a start set, desired goal, and a set of way-points, we compute a reference trajectory using a cubic spline planner, which is followed using Model Predictive Control (MPC). Using the planned states and controls at every time, our model selection policy must determine the trajectory’s safety by invoking either a fast or slow reachable set computation model as described in Subsection IV-C.
Figure 4 (Left, first two images) shows how our policy (Equation 16) safely, but efficiently, follows two different paths near a red obstacle indicating an unsafe terrain gradient above 20 degrees. The key benefit of our approach is that the robot mostly uses the fast reachable set computation (blue) for high-efficiency and only intelligently consults the higher-fidelity slower model during tricky turns close to an obstacle. Indeed, Figure 4 (Left, third image) precisely shows how our policy (Equation 16) exploits the relationship between fast and slow models to selectively query the slow model only when required during key turns. The fast model’s reachable set result is in blue, the slow model’s result is in black, and the over-approximation from bloating the fast model’s result by is in cyan.
Clearly, even the over-approximation rarely intersects unsafe obstacles and it is only necessary to consult a fine-grained result from the slow model (black) when the over-approximation is too conservative and needs to be refined. In all scenarios, we rigorously verified the simulated rover is safe and never hits an obstacle despite dynamics uncertainties. Figures 2 (Right) and 3 (Right) quantitatively illustrates the superior efficiency and accuracy (safety) of our policy, since it achieves the highest reward, never hits an obstacle, and efficiently only queries the slow model on-demand near critical obstacles.
Limitations of Our Work: In the future, we plan to account for more sophisticated nonlinear dynamics using Hamilton-Jacobi-Bellman reachability analysis. Finally, future work should address multi-step decision-making, where model selection decisions affect subsequent measurements and control decisions.
To scale the deployment of low-power robotic swarms, it is increasingly important to optimize for compute energy, cost, and latency alongside standard metrics of task accuracy and resiliency. This paper presents a general algorithm for robots to flexibly trade-off task accuracy and compute cost in an interpretable manner with provable statistical guarantees. Our key insight is to leverage the statistical correlations between models to predict the marginal accuracy gain of a large model and balance it with additional compute costs. This general principle allows our framework to widely apply to cloud robotics, DNN perception, and reachability analysis.
In the future, we plan to address safety guarantees and investigate whether we can co-train large and small DNNs such that we can synthesize an interpretable run-time monitor that can transfer authority to a trusted controller if the DNNs are operating in uncertain regimes. Overall, we anticipate our model-selection results will become stronger with future advances in DNN verification and compression with approximation guarantees.