Estimating Robot Strengths with Application to Selection of Alliance Members in FIRST Robotics Competitions

Since the inception of the FIRST Robotics Competition and its special playoff system, robotics teams have longed to quantify the strengths of individual robots. During the alliance selection phase, arguably the most game-changing part of the competition, robotics teams rely on quantitative measures to gauge which teams would make a good fit for the playoffs. In this paper, we explore the current methods of measuring the strengths of robots in a three-versus-three competition. We introduce new methods of measuring robot strength via clustering analysis and use measured strength to predict outcomes of playoff matches. Finally, we attempt to answer the question of how many matches are needed in a qualification phase to get an accurate picture of the strengths of a group of robots.



There are no comments yet.


page 1

page 2

page 3

page 4


RoboCupSimData: A RoboCup soccer research dataset

RoboCup is an international scientific robot competition in which teams ...

RoboCup@Home: Summarizing achievements in over eleven years of competition

Scientific competitions are important in robotics because they foster kn...

Learning while Competing -- 3D Modeling & Design

The e-Yantra project at IIT Bombay conducts an online competition, e-Yan...

The Swarmathon: An Autonomous Swarm Robotics Competition

The Swarmathon is a swarm robotics programming challenge that engages co...

Bradley-Terry Modeling with Multiple Game Outcomes with Applications to College Hockey

The Bradley-Terry model has previously been used in both Bayesian and fr...

Crafting, Communality, and Computing: Building on Existing Strengths To Support a Vulnerable Population

In Nepal, sex-trafficking survivors and the organizations that support t...

Test Framework for a Virtual Competition Testbed

Virtual environments have been utilised in robotics research as a tool t...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

FRC stands for the FIRST Robotics Competition, a gathering of robots to complete certain tasks within timed matches. Started by Dean Kamen and Woodie Flowers [1] in 1992, each year an increasing number of teams of high school students design robots to accomplish tasks and win matches according to that years specific game mechanics. The competition has since ballooned to a pool of close to 4000 robots in 2018 playing in local and state tournaments for a chance to qualify for the world championships of around 800 robots. In turn, FIRST divided its world championships into two championship tournaments (Houston and Detroit in 2018) with about 400 robots each. As of the 2018 competitions, each championship site is composed of six divisions: Carver, Galileo, Hopper, Newton, Roebling, and Turing in Houston, and Archimedes, Carson, Curie, Daly, Darwin and Tesla in Detroit. Each division runs a mini-tournament to determine a division champion. The six division champions advance to the Einstein field where they play round robin matches in the semifinals. The top two semifinalists in the Einstein field based on win-loss record then advance to the final round and play three matches to determine the tournament champion for the site.

As in all tournaments throughout the season, each division mini-tournament is divided into two stages - qualification and playoffs. In the qualification stage, robots compete in randomly assigned matches of six robots split into three in a blue alliance against another three in a red alliance. In each match, the robots in an alliance play to win the match and earn ranking points. After qualification matches, the top eight robots with the highest ranking points form eight alliances of four robots each by choosing robots from the pool of robots in the division qualification stage. In the playoff stage, the eight alliances play three of their four robots in each match in a best of three matches knockout manner until a division champion is declared.

As the FRC continues to progress, so have participants’ questions on improving the quality of competition. Teams throughout the years have created new systems in order to collect certain statistics on robots and new methods of measuring the overall strengths of those robots based on the collected statistics. The general idea is to determine the individual strengths of robots based on a sufficient number of matches in the qualifying stage, and use the measured strengths to help determine the best robots to form an alliance in the playoff stage. With this in mind, we want to continue to improve today’s best methods for the convenience of future robot teams, leading to better quality competitions.

Additionally, many events sponsored by FIRST tend to run behind schedule. During the qualification stage, robots may play anywhere from six to twelve matches (depending on the event), the results of which may contribute to the overall perception of a robot by the others. In this paper, we will look at whether or not the number of scheduled matches is excessive or insufficient in terms of trying to estimate the strengths of robots adequately. We hope to make a recommendation on an optimal number of matches in future tournaments to improve planning and logistics.

To further illustrate the mechanics of a tournament, we introduce some notations. Let and denote the number of robots and the number of matches in a tournament, respectively. In each match , three robots forming the blue alliance, , go against three other robots forming the red alliance, . Given the above setup, in the qualification phase of a tournament matches are designed to ensure that each robot plays one match, the succeeding matches, where matches are designed to ensure that each robot plays two matches, and in total, matches are designed to ensure that each robot plays matches. For example, in the Carver division of the 2018 Houston Championship, we have and . Each robot played at least ten matches for a total of matches. There were exactly four robots who played eleven matches and the third match for each of these four robots were not considered in their ranking in the qualification stage.

The following are questions of interest in this study:

  1. Let represent the strengths of robots with corresponding IDs . How do we determine the “strength” or “rating” of each robot?

  2. How many matches are required for each robot to ensure the stability of strength measurement and match outcome prediction?

The rest of the paper is organized into five sections. In Section 2, we review existing measures of robot strength and related measurements of individual contributions to team games such as basketball and online multi-player games. Section 3 describes two proposed methodologies to measure robot strengths. Section 4 provides numerical examples based on the 2018 FRC Detroit Championship. Section 5 observes rank correlation of robot strengths with increasing number of matches played. We conclude with some discussions in Section 6.

2 Review of Literature

Throughout history, measurement of individual strength has always intrigued analysts of team sports, e.g., individual stats of NBA players in basketball. This is no different in FRC as in the last 25 years, both FRC teams and FIRST itself have developed methods of measuring the strengths of robots. In this section, we review such methods and express some of the drawbacks of each method. We also review existing methods of measuring individual strengths in team sports such as basketball and online multi-player games.

2.1 Existing Measures of Robot Strength

The widely-used measures of robot strengths currently in existence are Ranking Points, Average Score, Offensive Power Rating, Defensive Power Rating, Calculated Contribution to Winning Margin, and Winning Margin Power Rating. The paper [3] by Gardner discussed these methods in detail and comparisons of such methods as applied to past tournaments were presented. In what follows, we provide an overview of each method.

Ranking Points (RP). Every year, with the kickoff event starting the build and competition seasons come two in-game objectives, the completion of which gives teams RP. Additionally, winning a match will net a team two RP, while losing produces no RP. Pre-playoff rankings, which decide the teams allowed to form alliances in the playoffs and their order of selecting alliance members, are determined by the total RP earned in the qualification stage. Subsequently, there are some teams that design their robots specifically to complete the in-game objectives, sometimes sacrificing scoring ability. As the goal of playoff matches are actual points and not the RP objectives, many decide not to rely on the RP system to make choices regarding alliance selection.

Average Score (AS). This is one of the first methods that robotic teams used for measuring strength. When there are ties within the RP system, they are broken using average scores. The average score is calculated by adding up a specific robot’s match scores and dividing by the number of matches played during the qualifying stage and the number of robots in each alliance, i.e.,

where is the indicator function and and represent the scores of the blue alliance and the red alliance, respectively, in the -th match. An obvious drawback with using average score arises in the potential for the random match assignment system to pair robots in such a way that an obviously weak robot would be overrated. Similarly, a strong robot may be underrated, due to being paired with weak robots. One could argue that with enough matches, AS should be a good representation of the strength of a robot, but there is typically ten matches in the qualifying stage and it is an open question whether ten matches is enough for AS to reflect the strength of a robot.

Offensive Power Rating (OPR). In 2004, Karthik Kanagasabapthy of the FRC Team 1114 created a metric to determine an individual robot’s contribution to an alliance’s match score [2]. The result of Kanagasabapthy’s efforts was a new system that uses systems of equations to estimate individual robot performance. The details of the computation were first detailed in a post by Weingart [4] in 2006. The Blue Alliance (TBA), a website that publishes FRC statistics, displays the leading OPR’s per tournament for the public’s consumption. The OPR method relies on the assumption that robots’ contributions (in strengths or OPR) to an alliance’s total score are additive. For example, if robots , , and were in the red alliance for match , their predicted score for that match would be

In other words, alliance final scores across matches are being modeled as the sum of contribution strengths or OPRs of the alliance’s robots, ’s:


where and . While such method allows for the approximate decomposition of match scores into individual robot strengths in a logical manner, it completely ignores the negative effect of defense resulting from the actions of the robots in the opposing alliance.

Calculated Contribution to Winning Margin (CCWM). CCWM is a method developed by Ed Law [5] in 2008 to address the shortcomings of OPR by accounting for the effect of defense on the outcome of a match. Robot strengths in the CCWM method is computed like in the OPR method except for the fact that instead of modeling an alliance’s match score, say , it is replaced by the alliance’s margin , i.e., for match ,

Thus, the CCWM model is given by


where and . By solving such a model, Law accounted for the net effect of an opposing alliance’s score on the reference alliance’s score. The resulting represents the contribution of robot to the winning margin of every match it participates in. It is worth noting that around the time Law developed CCWM, there were proposals to consider a measure analogous to OPR, the Defensive Power Rating (DPR), to measure the defensive strengths of robots, but Law showed in [5] that DPR can be calculated from OPR and CCWM via

Winning Margin Power Rating (WMPR). The WMPR method is another commonly used method of measuring robot strengths. Like CCWM, it models the score margin from the point of view of an alliance, but unlike CCWM and OPR, it explicitly accounts for the effect of a robot’s strength on the opposing alliance. Robot strengths using WMPR are inferred from the following model:


where .

As discussed in [3], when any of the above models are solved using the least squares method, the result is overfitting due to the minimum number of matches played versus the number of parameters to be estimated. For example, in the 2018 FRC Championship divisions in Detroit and Houston, the qualifying stage in each division has about 112 to 114 matches involving 67 to 68 robots. This means that we have an observation to parameter ratio of under four for OPR and CCWM and under two for WMPR.

It is also worth noting that for some FRC games, elements of individual matches may not be constant. For example, the 2016 game, Stronghold, included obstacles in the middle of the field that change match-by-match based on audience selection. Since some robots would have an advantage depending on which obstacles are in their matches, one must include an environmental factor when considering the estimation of match scores, but none of the above methods include such a factor, so it might not be reasonable to assume that in any of the models described above. Therefore, it is likely that the derived estimators of robot strengths using AS, OPR, CCWM and WMPR could be biased.

2.2 Measures of Strength in Other Team Games

We now turn our attention to measures of strength used in other team games. WMPR above is actually similar to the model developed by Rosenbaum [7] to derive the Adjusted Plus/Minus (APM) statistic used to measure the contribution to net scores per possession by each player in a basketball game relative to all other players. A basic difference between the APM model and WMPR is the assumption of a home-court advantage factor typical of many team sports, but not applicable to robotics competition. Ilardi and Barzilai [6] reinterpreted and further improved the APM model by coding the role of each player as offensive or defensive in each possession rather than the original home or away player. Macdonald [8] implemented the APM model in the sense of Ilardi and Barzilai to measure contribution of hockey players when they play offensively, defensively or as a goalie in National Hockey League (NHL) games. Fearnhead and Taylor [9] extended the APM model to account for varying contribution of NBA players across seasons. Finally, Deshpande and Jansen [10]

takes an approach of estimating an NBA player’s contribution to the change in winning probability via APM and solves the regression problem with a Bayesian approach to address issues of multicollinearity.

While the APM approach seems to be gaining success in the NBA and NHL, the context is quite different from robotics competitions. As mentioned earlier, in a robotics tournament, the ratio of matches to teams is under two for WMPR, while the ratio of possessions to players in a season is about sixteen in the NBA. Even with a better number of observations to parameter ratio in traditional team sports, the APM model still suffers from two drawbacks in applications: (1) some players often play at the same time, and (2) there are not enough observations for a significant number of players.

Finally, we observe that the well-known Bradley-Terry [11] model of predicting outcomes of games of two parties based on historical outcomes and players involved is actually related to the APM and WMPR models with the dependent variable in the regression replaced with an indicator of win instead of net score. The Bradley-Terry model has been used to rank players in various sports and games including basketball and football [14], cricket [18], online multi-player games [15, 16], and bridge [17]. It was also shown that under appropriate update functions, the Elo rating [12] used in ranking chess players and players in other sports is also related to the Bradley-Terry model (see Aldous [13] for example).

3 Models for the Strengths of Latent Clusters of Robots

We extend OPR, CCWM and WMPR models for FRC tournaments to consider the clustering feature of robot strengths. The formulation of linear models with random effects is further used to characterize the strengths of latent clusters of robots. Under the proposed models, two estimation approaches are also developed in this section.

3.1 Regression Models for OPR, CCWM, and WMPR

The Offensive Power Rating (OPR) model predicts the score of an alliance in a match as the sum of the alliance’s robots’ strengths. Let


for and , and

Based on the above notations, the OPR regression model given by (2.2 - 2.2) can be expressed as

In contrast to the OPR model, the Calculated Contribution to Winning Margin (CCWM) regresses net score gained by an alliance in a match as the sum of the alliance’s robots’ strengths. Let

and using the same definition for as in the OPR model, the CCWM regression model given by (2.4 - 2.4) can be expressed as

For both the OPR and CCWM models, under the assumption of , can be estimated by the least squares estimator (LSE)

Since is full rank, it can be shown that

Like the CCWM model, the Winning Margin Power Rating (WMPR) model regresses the net score gained by an alliance, however, the net score is computed as the sum of the strengths of robots in the reference alliance minus the sum of the strengths of robots in the opposing alliance. Let


for and , and

Using the above notations, the WMPR regression model given by (2.5) can be expressed as


Since each row of sums to zero, a constraint

is imposed to avoid the singularity problem in estimation. Thus, (3.1) can be rewritten as


with being the -th column of , and , satisfying

Note that in the above model represent the relative strengths rather than intrinsic strengths of the robots.

Under the assumption of , can be estimated by the LSE

Since is full rank, it can be shown that

and, thus, is naturally estimated by

We note that each of OPR, CCWM and WMPR model can be expressed in the form and an LSE for can be computed using with appropriately defined and .

3.2 Extended Model Formulation

As noted earlier, the traditional models of robot strengths tend to over-fit due to the minimal average number of observations available per robot. As we analyzed data from recent competitions, we observed that there might exist clustering features for the strength of robots. It motivates us to propose regression models with robots in the same cluster based on the closeness of their strengths to each other.

For appropriately defined and on a model of robot strengths with an LSE for estimated using suppose that the robots in a tournament can be grouped into clusters, and let be the cluster containing robot , for . Let

with , and appropriately transformed for the WMPR model into

In the above setup, the true number of clusters, say , and the clusters of robot teams, say , are unknown.

A random effects model can be formulated as follows:


where if , ,
, and

The parameters and are unknown in the above model formulation. Also, when dealing with the WMPR model, since , a constraint is imposed in the proposed model, which implies that

The matrix representation of the model (3.4) is given by:


where and for the WMPR model. In fact, the above model can be expressed in the form

with and , , and if , . It follows that the LSEs of (as appropriate of ) has the property . In light of this, the proposed clustering approaches described next can be ensured to be valid.

3.3 Estimation Methods

We now describe two approaches to computing strengths of clusters of robots. The idea is that robots grouped in the same cluster based on the distances of , i.e., the strengths from the original OPR, CCWM or WMPR models, share the same regression coefficient.

Method 1

  1. Choose a traditional model of robot strength and appropriately define and .

  2. Fit a regression model . This can be accomplished by computing the LSE of .

  3. Apply a cluster analysis using to group robots into clusters, in which the corresponding matrix for the clusters of robots is denoted by with and .

  4. Fit a regression model , where

    For each , compute the LSE of .

  5. Compute the prediction accuracy rate

  6. Determine the optimal number of clusters, , by maximizing over .

Method 2

  1. Choose a traditional model of robot strength and appropriately define and .

  2. Fit a regression model . This can be accomplished by computing the LSE of .

  3. For ,

    1. Group robots into clusters by combining two robot clusters and with minimum distance together into one cluster. Update the matrix as defined in Method 1 Step 3 with .

    2. Fit a regression model , where

      Compute the LSE of .

  4. For each , compute the prediction accuracy rate

  5. Determine the optimal number of clusters, , by maximizing over .

The main difference between Method 1 and Method 2 are the approaches to clustering robots. While Method 1 clusters robots using traditional cluster analysis on the estimated strengths, Method 2 combines two clusters of robots with the closest strengths into the same cluster. We make the following notes on the approaches:

  • In Method 1, the ’s are defined using the estimates . In Method 2, the ’s are defined using the estimates , for .

  • Method 1 heavily relies on the various approaches in cluster analysis to define ’s, while there is no such issue in Method 2.

  • In practical implementation, Method 1 is computationally efficient in deriving the ’s.

  • While we recommend using the prediction accuracy rate to pick the best clustering, other criteria are available. For example, the prediction error sum of square, the Akaike Information Criterion, or the Bayesian Information Criterion can all be used for selecting the best clustering.

We will illustrate the use of the above clustering approaches in conjunction with the OPR, CCWM and WMPR models in the next section.

4 Numerical Examples

In this section, we report the results of applying OPR, CCWM, WMPR models in conjunction with Methods 1 and 2 on data sets from six divisions of the 2018 FRC Detroit Championships. The six divisions with associated number of robots and matches are as shown in Table 1.

Table 1: 2018 FRC Detroit Championships information on robots and matches.

We use as training data results from the qualifying stage in each division to fit models and compute prediction accuracy rates based on win/loss using the cross-validation criterion. We use as test data results from the playoff stage in each division and compute the prediction accuracy rates using the robot strengths from the models fitted with data from the qualifying stage. We fit nine models: the traditional OPR, CCWM, WMPR, clustering with Method 1 OPR-1, CCWM-1, WMPR-1, and clustering with Method 2 OPR-2, CCWM-2, WMPR-2. Table 2 shows the prediction accuracy rate on the qualifying matches data sets across various divisions.

Table 2: Prediction Accuracy Rate on qualifying matches in various divisions.

The prediction accuracy for the clustering methods are significantly better as compared to the corresponding traditional approaches. Across models, WMPR-1 provides the best prediction accuracy, while WMPR results are the weakest. The story, however, differs when we look at the data set from the playoff stage as shown in Table 3. In the playoff stage, the prediction accuracy rates are significantly lower with the lowest rates across all models in the Darwin division. The highest rates across most models are in the Carson division. While the lowest prediction accuracy rate on the qualifying stage data set is at least 75% for the clustering methods across divisions, the prediction accuracy rate on the playoffs data set is no higher than 76%.

Table 3: Prediction Accuracy Rate on test data in various divisions.

While the prediction accuracy rates are generally low for the playoff stage, the clustering approaches show promise versus their associated traditional models. This is likely due to the fact the 32 robots in the playoff matches may not actually have interacted with each other in the qualifying rounds, but we are using the strengths computed from the qualifying matches which accounts for each robot interacting with at most 50 other robots. The discrepancy in the prediction accuracy rates are particularly pronounced for the WMPR models where the robot strengths or ratings are computed relative to each other rather than a robot’s intrinsic strength. Across models, OPR-2 has the best rates followed by OPR, which is lower than OPR-2 in one division. Save for one division each, both CCWM-1 and CCWM-2 have higher rates than CCWM. WMPR-1 also matched WMPR and in all division except for the Tesla division where it significantly outperformed WMPR. Note that it makes sense to compare the clustering approaches of measurement against the traditional models as they are using the same input data structure.

Generally, the WMPR based models have the lowest prediction accuracy rates on the playoff matches. We surmise that, besides the issue on relative strengths of robots mentioned earlier, this is also due to two possibilities: (1) with fewer observations, the WMPR models are generally over-fitting; and (2) the 2018 FRC format did not allow for much negative effect between alliances, i.e., the strength of a robot does not have much negative contribution to the opposing alliance, which is evidenced by the fact that both OPR and CCWM based models outperform WMPR.

5 Robots Strengths and Ranking Stability

We propose using Kendall’s rank correlation coefficient to determine the stability of robot strengths as measured for successive number of matches per robot. As the rank correlation coefficient ranges from -1 (perfect discordance) to +1 (perfect concordance) for two sets of robot strengths, we hypothesize that as the number of matches increases in a tournament, the rank correlation should approach 1.

Let and be appropriately defined matrices for any of the OPR, CCWM or WMPR models representing the first matches for . We assume a minimum of six matches for each robot is required for the modeling to be meaningful. Suppose is the estimated strength based on the first matches, i.e., is the LSE for such that .

Given , Kendall’s rank correlation coefficient is computed as:

where Using various models of robot strengths, we expect to see diminishing returns on Kendall’s rank correlation as the number of matches increase. We found such evidence using data sets from six divisions of the 2018 FRC Detroit Championships.

We implemented three models: OPR, CCWM and WMPR on the qualifying stage of the Detroit data sets. With and , i.e., we apply the models on data from the first 10 to the first 6 matches and compute the rank correlations of robot strengths from data sets with successive decrease in the number of matches. The Tables 4, 5 and 6 show the rank correlation for each model in each division.

Table 4: Rank correlation of robot strengths using the OPR model on data from increasing number of matches in the qualifying stage.
Table 5: Rank correlation of robot strengths using the CCWM model on data from increasing number of matches in the qualifying stage.
Table 6: Rank correlation of robot strengths using the WMPR model on data from increasing number of matches in the qualifying stage.

With a few exceptions, the rank correlation of robot strengths estimated from incremental number of matches generally increases with the number of matches as we hypothesized. The rank correlations from OPR and CCWM are over 90%, while those of WMPR are close to 90% for strengths estimated from 10 matches versus those from the first 9 matches. We also observe that the rank correlation based on OPR and CCWM seem to be saturated at 9 versus 8 games as we see minimal changes to those of 10 versus 9 games. We only half the observations of OPR and CCWM, the rank correlation of WMPR does not seem to be saturated at 10 games versus 9, implying that more matches might be needed to stabilize the robot rankings by strength.

6 Conclusion

We proposed in this study a clustering approach to measuring the strengths of robots participating in First Robotics Competitions. While using prediction accuracy rates to pick optimal number of clusters, compared to traditional models of measurement, the associated clustering approach shows promise in reducing over-fitting and improving prediction accuracy with data from the 2018 FRC Detroit Championships. In the future, we would like to further refine the clustering approaches using other criteria such as AIC, BIC, rank correlation, or prediction error sum of squares. For now, the robot strengths measured from our clustering approach in conjunction with one of the traditional measurement methods can be used for alliance selection.

On the number of matches required to achieve stability in the estimates of robot strengths, we observed that the answer sensitive to the measure method and likely the amount of observed data. With the OPR and CCWM models the rank correlation of successive increase in matches seems to be saturated at 9 games, while we did not observe saturation at 10 games for WMPR. Further studies with another data set, perhaps those from the 2018 FRC Houston Championship, may be needed for more conclusive results.

We note that the current approaches to measuring robot strengths did not consider the individual characteristics of robots. For example, the speed or height of a robot may be a factor in a robot’s contribution to an alliance’s score, so the models of strengths may improve by incorporating individual model characteristics. However, FRC games changes every year, so models that incorporate robot characteristics will have to be developed each year according to the game dynamics.

Finally, we have a few directions to consider for future work. One constant factor each year that affects the final outcome of matches is the notion of penalty scores gained from opposing robots violating game rules. The penalty decision and points gained are subjectively decided by referees, so excluding the penalty points in the input data may improve the prediction accuracy. Another potential extension of this study is to address the prediction accuracy on the playoff matches using strengths computed from the qualifying matches. While we know that the relative robot strengths from the WMPR based models likely contributed to the issue, we also need to find out what is causing significantly lower accuracy rates from OPR and CCWM based models as well. The best of three matches format in the playoff stage and a fourth reserved robot not playing probably also contributed to the change in dynamics. We hope to study this in the future with historical FRC data across years.