Resolving Marker Pose Ambiguity by Robust Rotation Averaging with Clique Constraints

09/26/2019 ∙ by Shin-Fang Ch'ng, et al. ∙ 0

Planar markers are useful in robotics and computer vision for mapping and localisation. Given a detected marker in an image, a frequent task is to estimate the 6DOF pose of the marker relative to the camera, which is an instance of planar pose estimation (PPE). Although there are mature techniques, PPE suffers from a fundamental ambiguity problem, in that there can be more than one plausible pose solutions for a PPE instance. Especially when localisation of the marker corners is noisy, it is often difficult to disambiguate the pose solutions based on reprojection error alone. Previous methods choose between the possible solutions using a heuristic criteria, or simply ignore ambiguous markers. We propose to resolve the ambiguities by examining the consistencies of a set of markers across multiple views. Our specific contributions include a novel rotation averaging formulation that incorporates long-range dependencies between possible marker orientation solutions that arise from PPE ambiguities. We analyse the combinatorial complexity of the problem, and develop a novel lifted algorithm to effectively resolve marker pose ambiguities, without discarding any marker observations. Results on real and synthetic data show that our method is able to handle highly ambiguous inputs, and provides more accurate and/or complete marker-based mapping and localisation.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

In many robotic vision pipelines, fiducial markers are often employed to simplify feature extraction. In particular, planar markers 

[25, 7, 8, 9, 17, 11], which are designed to be easily detected and associated across images, find extensive use in laboratory and commercial settings (factories, warehouses, mines, etc.). In applications that perform planar marker-based SfM or SLAM [19, 14, 5, 13], there is a basic need to estimate the 6DOF pose of an observed marker relative to the camera coordinate frame. This is often solved as a special case of planar pose estimation (PPE), which functions by determining the relative pose between a plane of known dimensions and its projection onto the image [16, 18, 3].

While in theory 6DOF pose can be determined uniquely from four non-colinear but co-planar points, the situation is less clear in non-ideal conditions where perspective effects are not apparent, e.g., when the imaged marker is small or the marker is at a distance which is significantly larger than the focal length. In such conditions there is a two-fold rotational ambiguity that corresponds to an unknown reflection of the plane about the z-axis of the camera [16, 18, 3]. For one observed planar marker (specifically its four corners), state-of-the-art PPE methods [18, 3] may return two physically plausible pose solutions, with one of them being the correct one (i.e., the one closer to the ground truth pose).

Fig. 1 shows an example from the dataset of [5]. Note that the two solutions returned by PPE can be very different, thus it is unwise to arbitrarily choose one of the two poses, or take the midpoint of the two solutions as the pose estimate.

A common way to disambiguate the two returned poses and is to compute the reprojection error of each pose

(1)

where and are the reference 3D position and 2D observation of the 4 corners of the detected marker, is the camera intrinsic parameter and projects onto the image with camera pose . The PPE pose with the lower reprojection error is then selected.

However, comparing reprojection errors is not foolproof [26, 13], for if the corner localisation is noisy, and can be very close. In fact, the correct solution can have the higher reprojection error; see Fig. 1.

In practice, marker pose ambiguity occurs regularly [14]. Fig. 2(a) is the histogram of the reprojection error ratio

(2)

of the PPE-derived poses for all the markers detected in sequence Hotel2(H2) from [4]. About 25% of the PPE solutions are considered ambiguous (ratio value  [14]).

Fig. 1: (a) A detected marker with bounding box from a frame in the dataset of [5]. (b) The two poses (yellow) and (blue) returned by PPE [3] have reprojection errors and resp. Though has the lower error, it is an incorrect pose, cf. the ground truth pose (green).

While current theory and algorithms for PPE [18, 3] have characterised the ambiguity issue and are able to compute all physically plausible solutions stably, using the PPE outputs under ambiguity, particularly in marker-based SfM or SLAM pipelines, remains a fundamental challenge. In the following, we further survey efforts to deal with marker pose ambiguity, before outlining the proposed solution.

(a) Histogram of error ratio (2).
(b) Histogram of weight ratio (21).
Fig. 2: Histogram of reprojection error ratio (2) and weight ratio (21) from proposed method (Sec. IV-C) for all markers detected in Hotel2 [4].

I-a Related work

Tanaka et al. [22, 21] modified the conventional planar marker design to directly incorporate orientation information. They attach two one-dimensional moire patterns onto the marker to obtain appearance variation for pose disambiguation, as well as lenticular lenses that introduce 3D deviations to the marker surface. Though this largely alleviates the ambiguity problem, the marker fabrication is non-trivial.

For planar target camera tracking, a filtering method with a well-tuned camera motion model [26, 24] can be exploited to disambiguate the marker poses. However, this assumes temporal continuity in the images, which may not be valid in SfM with wide baseline images; moreover, there are no mature filtering methods for marker SLAM. Jin et al. [12] showed improved marker pose estimation accuracy by fusing depth information. However, this requires an RGBD camera.

Marker-based SfM/SLAM is an active research area [19, 15, 14, 5, 13]. Marker ambiguity is not dealt with explicitly in [15, 19, 5], though [5] combined feature-based SfM with marker-based SfM. Munoz-Salinas et al. applied the ratio test of [3] in their marker-based SfM [14] and SLAM pipeline [13]. Basically, if the ratio (2) is below a threshold (default is 0.6 [14]), the PPE solution with the lower reprojection error is used in subsequent SfM/SLAM processing; else, the marker detection is discarded. A weakness of this approach is the sensitivity to the threshold. If it is too low, many marker detections will be excluded, leading to data wastage or even SfM/SLAM failure. On the other hand, a high threshold risks using bad marker poses (recall that the pose with the lower reprojection error may not be the correct one) for SfM/SLAM. Sec. VI will demonstrate this shortcoming.

I-B Our contributions

Unlike previous works that have used a per-marker approach to resolve marker ambiguity, we exploit multi-view constraints for disambiguation. From the input marker detections, we first construct a multigraph of relative rotation measurements, which incorporates all PPE pose ambiguities. Then, we formulate a novel rotation averaging problem with clique constraints that respects consistency (details later) between subsets of relative pose measurements. We examine the combinatorial complexity of the new problem, and develop a lifted optimisation method to efficiently solve it. Then, a series of small maximal weighted clique problems are solved to make the final pose selections. Our method allows all valid PPE pose combinations to be examined, and leads to more accurate and/or complete marker-based SfM.

Ii Problem formulation

Consider input images that observed a set of markers of known sizes in a static scene. We assume calibrated cameras. A standard marker detection and id algorithm [1] is applied to each image. Denote by

(3)

as the set of markers detected in . Using a PPE technique [18, 3] on the corners of detected in , the marker-to-camera (M2C) relative pose of to is computed, which can potentially yield two solutions

(4)

Without loss of generality, we assume that each marker observation has exactly two relative pose solutions. Note that the pose ambiguity is due to orientation ambiguity, thus the translation component is the same, i.e.,

(5)

Given the set of all M2C relative pose measurements

(6)

our overall aim is SfM, i.e., find the absolute poses of the markers and cameras . To do so, pose ambiguity must be resolved, i.e., for each such that , choose either or for SfM computations.

Previous pipelines [14, 13] make the choice using per-marker heuristics, or discard the marker observation. This “preprocessing” yields the reduced measurement set

(7)

where each is either or , and . The reduced measurement set is then subjected to the rest of the SfM/SLAM pipeline. Our new method exploits multi-view consistency to disambiguate the PPE marker poses in a way that avoids premature decisions; details as follows.

Iii Multigraph with rotational ambiguity

Since the ambiguity lies in the orientations, it is natural to model the ambiguity using only the M2C relative rotations

(8)

To this end, we construct a multigraph , where the vertices is the set of markers , and the edges indicate covisibility between the markers. More specifically, if and are detected in , four edges

(9)

connect vertices and in ; assuming , the edges correspond to the marker-to-marker (M2M) relative rotations

(10)

Fig. 3 shows an example. Since multiple edges connect two vertices, is a multigraph. We summarise (9) and (10) as

(11)

where is a bit string composed of two binary indicators . The edges in are undirected; if , the edge has the associated M2M relative rotation

(12)

Thus, in our notation

(13)

The set of all edges (without repetitions) is thus

(14)

Similarly, the set of unique M2M relative rotations is

(15)

The existence of four M2M relative rotations per pair is a direct consequence of ambiguity in marker pose estimation, and the bit string selects a particular combination of M2C relative rotations to derive the M2M relative rotation.

Fig. 3: Multigraph and consistent cliques. (a) The scene has 4 markers captured in 3 images . All markers were detected in , while only a subset was detected in and .  (b) Multigraph with the edges labelled following (9). Since and were covisible in and , there are 8 edges connecting vertices and (similarly, and in and ). (c) Two consistent cliques (red and blue) for image .

Note that our multigraph construction method is a significant extension of that in [14], in that our multigraph incorporates all ambiguous marker poses, whereas [14] generates from the preprocessed data (7) with no ambiguities.

Iii-a Consistent cliques

We assume that the multigraph is connected, i.e., there is a path that connects every pair of vertices (markers) in .

Definition 1

(Consistent clique) Given multigraph as defined above, a consistent clique for image is a fully connected subgraph such that

  • ;

  • Every two vertices are connected by exactly one edge , where is one of .

  • For every two vertices that are connected to vertex , the associated edges and satisfy the condition .

Fig. 3 provides examples. Intuitively, a consistent clique for image corresponds to a set of M2M relative rotations that are composed using a constant selection of one of the two M2C relative poses for each marker detected in .

Since there are multiple valid combinations of constant M2C relative pose selections, there are multiple consistent cliques for an image. Assuming that markers are detected in each image, there are number of consistent cliques per image. For images, there are thus unique combinations of consistent cliques across the images.

Iv Disambiguation with rotation averaging

Based on the multigraph, our technique resolves the ambiguities by first solving a novel rotation averaging formulation, then - based on the averaging results - building and solving a maximum weighted clique problem. The key outcome of this step is marker pose disambiguation; Sec. V will incorporate this step into a marker-based SfM pipeline.

Iv-a Rotation averaging with clique constraints

While standard rotation averaging is defined over a graph of relative rotations [10, 2], extending the formulation to a multigraph of relative rotations is straightforward, and existing algorithms (we used [2]) can be applied with minor adjustments. Let be the absolute rotations of the markers. A rotation averaging problem over multigraph is

(16)

where is a robust norm. The motivation behind (16) is to attempt to identify the incorrect poses from PPE as the contributors to outlying measurements in the averaging task.

However, our tests (Sec. VI

) suggest that this approach is ineffective for disambiguation, most probably because (

16) does not enforce clique consistency (Def. 1). Thus, error terms that are regarded as inliers could correspond to choosing both PPE poses for the same marker detection.

To enforce clique consistency into rotation averaging, we introduce a set of binary indicator variables

(17)

where the setting implies selecting M2C relative rotation  the detection of in , while implies selecting . We then formulate the clique-constrained rotation averaging problem

(18)

Intuitively, selects the M2C relative rotations to compose the M2M relative rotations in a consistent way. Searching over thus allows different consistent cliques in all images to be examined. Finally, since are shared across images, multi-view consistency is exploited to choose the best combinations of the PPE relative rotations.

Iv-B Efficient algorithm using lifting approach

A naive method to solve (18) is to enumerate , and for each instantiation, collect the non-zero terms in (18) and solve the resulting rotation averaging problem. Then, return the with the lowest optimised error as the disambiguation decision. Since there are possible instantiations of (assuming markers seen per image), this is infeasible.

To enable an efficient algorithm for (18), we apply the lifting approach [20]. First, we relax the indicator variables and replace them in (18

) with a sigmoid function

(19)

which yields the “smoothed” version of (18)

(20)

Intuitively, the contribution of an error term in (20) is now weighted according to correctness of the corresponding M2C relative poses that define the error term.

Problem (20) can be solved using an iterative non-linear optimiser (e.g., fmincon in MATLAB). We initialise via a minimum spanning tree on , choosing the M2M relative rotations with the lower combined reprojection errors for chaining, and is set to reflect these choices. As we will show in Sec. VI, our method is not biased by such an initialisation, since it is capable of providing more accurate disambiguation than comparing reprojection errors alone.

Iv-C Selecting the marker poses

Let by the optimised relaxed indicator variables from solving (20). For the same sequence used in Fig. 2(a), we plot in Fig. 2(b) the histogram of the ratios

(21)

for all . Similar to (2), the ratio (21) indicates how “disambiguable” the PPE poses are for each marker detection (smaller ratios are better), but now based on the value of . Although is not discrete, the percentage of marker poses that are still ambiguous is now significantly reduced.

To conclusively select one PPE pose per detected marker, a simple solution would be to threshold each with ; however, we would like to avoid such a per-marker decision. To this end, for each image we construct the multigraph , where , and

(22)

Note that is a submultigraph of , and there exist consistent cliques in (see Sec. III-A). Further, each edge in has the weight

(23)

Given , define edge indicator variables

and the maximum weighted clique (MWC) problem

()

Basically, the aim of is to find a consistent clique in with the largest edge weights. Though MWC is intractable in general [23], each instance is small, since the number of detected markers in is small (usually ).

We use the efficient clique solver of [6] on each . The optimised provides a consistent selection of the PPE poses for all markers detected in . Specifically, for each detected in , find a that is nonzero, and set if , or otherwise.

Algorithm 1 summarises the proposed method for marker pose disambiguation.

V Marker-based SfM pipeline

To carry out marker-based SfM using our marker pose disambiguation method, we largely follow the pipeline of the state-of-the-art MarkerMapper [14]. Briefly, a robust pose graph optimisation is first invoked on the resolved M2C relative poses (7) from Algorithm 1 to yield absolute marker poses - in our case, the absolute rotation component is initialised using the output from solving (20). Then, each camera pose is initialised using single pose averaging from the M2C poses, before all marker and camera poses are refined simultaneously by bundle adjustment on the observed corners of all detected markers. We refer to [14] for details of the SfM pipeline.

1:M2C relative poses (6) with PPE ambiguity.
2:Construct a multigraph from the input (Sec. III).
3: Solve (20) based on (Sec. IV-B).
4:for  do
5:      Solve from (Sec. IV-C).
6:      Based on , select one of two M2C poses for all markers in (Sec. IV-C).
7:One M2C relative pose per detected marker.
Algorithm 1 Method for marker pose disambiguation

Vi Results

To assess the efficacy of the proposed marker pose disambiguation technique, we compared the following methods:

  • [leftmargin=1em]

  • Reprojection error (M1): For each marker detection, select the PPE solution with the lower reprojection error.

  • Strict ratio test (M2): The threshold of is applied on the reprojection error ratio (2) (see Sec. I-A for details).

  • Default ratio test (M3): The threshold of is applied on the reprojection error ratio (the default setting in [14]).

  • Robust rotation averaging and post hoc clique consistency enforcement (M4): Solve (16) by IRLS [2], then use the IRLS-optimised weights for the error terms as inputs to our M2C pose selection method in Sec. IV-C.

  • Proposed method (Ours): As described in Sec. IV.

When applying the above disambiguation methods to perform marker-based SfM, we simply used them to preprocess the input marker detections, then execute the rest of the pipeline of MarkerMapper [14] (see Sec. V). All the experiments were conducted on a 3.5GHz CPU and 8GB of RAM.

Vi-a Experiments on hybrid data

Vi-A1 Data generation

We used the ScanNet Dataset [4] that contained a number of sequences with ground truth 6DOF camera poses and depth. A test sequence was created from an original sequence by warping a number of ArUco markers [9, 17] based on known/ground truth M2C relative poses onto parts of the images that correspond to planar surfaces; see supplementary video 111https://www.youtube.com/watch?v=LtwavEeCkQ4&t= for a sample sequence. Using the ground truth camera absolute pose , the ground truth marker absolute pose is .

Seq Precision(%) # markers mapped # cameras localised
M1 M2 M3 M4 Ours M1 M2 M3 M4 Ours M1 M2 M3 M4 Ours
B 3 31 94.32 100 92.31 31.82 100 3 0 3 3 3 31 0 31 31 31
H1 5 41 80.68 100 82.61 22.16 100 5 0 5 5 5 41 0 40 41 41
O1 7 51 77.08 96.97 78.8 14.58 96.52 7 7 7 7 7 51 41 51 51 51
O2 6 91 92.64 100 98.95 37.94 99.41 6 4 6 6 6 91 46 91 91 91
H2 14 151 93.42 98.94 97.89 48.16 100 14 13 14 14 14 151 101 151 151 151
TABLE I: Precision in pose disambiguation on Hybrid Data.
Seq Average marker pose error (, cm) Average camera pose error (, cm)
M1 M2 M3 M4 Ours M1 M2 M3 M4 Ours
B 5.4 11.7 - - 6.3 15.0 19.0 37.5 2.3 2.2 7.0 15.9 - - 11.9 19.5 32.0 10.0 0.8 2.0
H1 11.7 13.0 - - 12.5 15.0 39.1 26.3 3.3 8.6 14.8 27.5 - - 17.6 41.6 37.9 28.8 5.0 3.2
O1 26.2 30.3 15.2 8.0 25.4 29.0 55.3 120.9 3.5 4.3 17.3 69.8 7.6 16.0 19.2 69.4 85.8 49.7 5.7 13.7
O2 8.7 6.6 4.4 4.2 4.1 2.6 28.0 63.2 4.2 2.4 6.2 10.5 0.8 2.4 17.4 4.0 41.6 40.1 1.3 3.4
H2 4.3 5.1 7.7 3.1 5.4 5.5 20.3 14.2 3.6 4.9 4.3 3.8 2.2 2.3 3.3 3.1 32.0 10.0 3.4 2.4
TABLE II: SFM Accuracy for different pose disambiguation methods on hybrid data. ‘-’ denotes failed reconstruction.

Vi-A2 Marker detection

Using the steps above, we generated five testing sequences from Bedroom(B), Hotel1(H1), Hotel2(H2), Office1(O1) and Office2(O2). We used [9] to detect, identify and localise the corners of each marker in each frame; see Table I for the number of frames and unique detected markers in each sequence. Though the markers were synthetically warped into the images, our analysis suggests that corner localisation suffered from errors of 1–7 pixels.

M1 M3 M4 Ours FM
TABLE III: Qualitative result: Reconstruction results for marker-based SfM methods M1,M3, M4, and Ours, as well as feature- and marker-based SfM method FM [5]. Row 1: ece floor4 wall, Row 2: ece floor5 stairs, Row 3: cee night cw. For the marker-based methods, red = reconstructed reference marker, blue: reconstructed markers, green: estimated camera positions.
Fig. 4: Comparison of camera position error (relative to FM) of M1, M3, M4 and Ours.
Dataset Mean err. (m) Median err. (m)
M1 M3 M4 Ours M1 M3 M4 Ours
ece floor4 wall 5.28 2.72 20.95 2.56 5.35 2.03 18.09 2.12
ece floor5 stairs 1.58 3.18 4.07 1.14 0.96 2.64 3.72 0.82
cee night cw 30.21 34.79 75.57 19.06 19.25 24.21 76.42 10.12
TABLE IV: Mean and median camera position error, relative to FM.

Vi-A3 Ground truth M2C pose selection

On the noisy corner localisations, PPE [3] is invoked, which yields two M2C relative poses for each detected marker. To decide the ground truth selection, we compute the angular difference between and as

(24)

The ground truth selection of the PPE poses is taken as the one with the lower angular difference .

Vi-A4 Results

For the hybrid data experiment, we evaluated all the approaches on two main aspects; see supplementary video  for demonstration of our pose disambiguation method.

Precision in pose disambiguation

For each testing sequence, precision in pose disambiguation is defined as

(25)

Table I shows that Ours generally has higher precision than the others. The fact that M4 (the control method) is much poorer than Ours proves that enforcing the proposed clique-consistency is crucial for disambiguating the PPE poses. Amongst the per-marker disambiguation methods (M1M3), M1 has the lowest precision, validating observations in previous works that comparing reprojection errors alone is not foolproof. Adding a ratio test to avoid decisions on cases that are too ambiguous helps to improve precision in M2 and M3. In particular, the precision of M2 is on par with Ours. However, as we show next, this gain by M2 comes at a cost.

Completeness and accuracy of SfM

To assess the effects of marker pose disambiguation on SfM, we evaluate

  • [leftmargin=1em]

  • the number of markers mapped and cameras localised; and

  • the error (in deg and cm) of the marker and camera poses

estimated by marker-based SfM from the disambiguated PPE poses in Table I,II respectively. Although M2 is precise, it yields a much sparser map than the others; moreover, as it has pruned away many useful detections, there are insufficient data to allow accurate SfM. Using our pose disambiguation technique leads to more complete and accurate maps.

Vi-B Real world dataset experiment

Testing was performed on sequences from [5]. We selected 3 indoor scenes with different difficulty levels: ece floor 4 wall, ece floor5 stairs and cee night cw. There are unique markers placed the scene in each sequence. To enable comparisons, we invoked [5] (denoted as FM) which conducts both feature- and marker-based SfM on the sequences. Since SfM with M2 failed in all 3 sequences due to insufficient data for optimisation, comparison is not made.

Qualitative results in Table III show that Ours is more accurate than M1 and M3 in marker-based SfM - of course, Ours is visibly not as complete as FM, but the latter uses features on top of markers, which entails heavier computations. Using the estimated camera positions by FM as reference, we obtain the position errors (in m) computed by the marker-based SfM methods - normalised and plotted as a cumulative density in Fig. 4. It is apparent that Ours is much more accurate in camera localisation, especially in the most challenging sequence cee night cw. Table IV lists the mean and median position error, relative to FM.

References

  • [1] G. Bradski (2000) The OpenCV Library. Dr. Dobb’s Journal of Software Tools. Cited by: §II.
  • [2] A. Chatterjee and V. Madhav Govindu (2013) Efficient and robust large-scale rotation averaging. In Proceedings of the IEEE International Conference on Computer Vision, pp. 521–528. Cited by: §IV-A, 4th item.
  • [3] T. Collins and A. Bartoli (2014) Infinitesimal plane-based pose estimation. International Journal of Computer Vision 109 (3), pp. 252–286. Cited by: Fig. 1, §I-A, §I, §I, §I, §II, §VI-A3.
  • [4] A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner (2017) Scannet: richly-annotated 3d reconstructions of indoor scenes. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    ,
    pp. 5828–5839. Cited by: Fig. 2, §I, §VI-A1.
  • [5] J. DeGol, T. Bretl, and D. Hoiem (2018) Improved structure from motion using fiducial marker matching. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 273–288. Cited by: Fig. 1, §I-A, §I, §I, §VI-B, TABLE III.
  • [6] D. Eppstein and D. Strash (2011) Listing all maximal cliques in large sparse real-world graphs. In International Symposium on Experimental Algorithms, pp. 364–375. Cited by: §IV-C.
  • [7] M. Fiala (2004) ARTag, an improved marker system based on artoolkit. National Research Council Canada, Publication Number: NRC 47419, pp. 2004. Cited by: §I.
  • [8] M. Fiala (2005) ARTag, a fiducial marker system using digital techniques. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), Vol. 2, pp. 590–596. Cited by: §I.
  • [9] S. Garrido-Jurado, R. Muñoz-Salinas, F. J. Madrid-Cuevas, and M. J. Marín-Jiménez (2014) Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recognition 47 (6), pp. 2280–2292. Cited by: §I, §VI-A1, §VI-A2.
  • [10] R. Hartley, J. Trumpf, Y. Dai, and H. Li (2013) Rotation averaging. International journal of computer vision 103 (3), pp. 267–305. Cited by: §IV-A.
  • [11] D. Hu, D. DeTone, and T. Malisiewicz (2019) Deep ChArUco: Dark ChArUco Marker Pose Estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8436–8444. Cited by: §I.
  • [12] P. Jin, P. Matikainen, and S. S. Srinivasa (2017) Sensor fusion for fiducial tags: highly robust pose estimation from single frame rgbd. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5770–5776. Cited by: §I-A.
  • [13] R. Munoz-Salinas, M. J. Marín-Jimenez, and R. Medina-Carnicer (2019) SPM-SLAM: simultaneous localization and mapping with squared planar markers. Pattern Recognition 86, pp. 156–171. Cited by: §I-A, §I, §I, §II.
  • [14] R. Munoz-Salinas, M. J. Marin-Jimenez, E. Yeguas-Bolivar, and R. Medina-Carnicer (2018) Mapping and localization from planar markers. Pattern Recognition 73, pp. 158–171. Cited by: §I-A, §I, §I, §II, §III, §V, 3rd item, §VI.
  • [15] M. Neunert, M. Bloesch, and J. Buchli (2016) An open source, fiducial based, visual-inertial motion capture system. In 2016 19th International Conference on Information Fusion (FUSION), pp. 1523–1530. Cited by: §I-A.
  • [16] D. Oberkampf, D. F. DeMenthon, and L. S. Davis (1993) Iterative pose estimation using coplanar points. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 626–627. Cited by: §I, §I.
  • [17] F. J. Romero-Ramirez, R. Muñoz-Salinas, and R. Medina-Carnicer (2018) Speeded up detection of squared fiducial markers. Image and vision Computing 76, pp. 38–47. Cited by: §I, §VI-A1.
  • [18] G. Schweighofer and A. Pinz (2006) Robust pose estimation from a planar target. IEEE transactions on pattern analysis and machine intelligence 28 (12), pp. 2024–2030. Cited by: §I, §I, §I, §II.
  • [19] K. Shaya, A. Mavrinac, J. L. A. Herrera, and X. Chen (2012) A self-localization system with global error reduction and online map-building capabilities. In International Conference on Intelligent Robotics and Applications, pp. 13–22. Cited by: §I-A, §I.
  • [20] N. Sünderhauf and P. Protzel (2012) Towards a robust back-end for pose graph slam. In 2012 IEEE International Conference on Robotics and Automation, pp. 1254–1261. Cited by: §IV-B.
  • [21] H. Tanaka, K. Ogata, and Y. Matsumoto (2017) Solving pose ambiguity of planar visual marker by wavelike two-tone patterns. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 568–573. Cited by: §I-A.
  • [22] H. Tanaka, Y. Sumi, and Y. Matsumoto (2014) A solution to pose ambiguity of visual markers using moire patterns. In 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3129–3134. Cited by: §I-A.
  • [23] E. Tomita and T. Seki (2003) An efficient branch-and-bound algorithm for finding a maximum clique. In International Conference on Discrete Mathematics and Theoretical Computer Science, pp. 278–289. Cited by: §IV-C.
  • [24] Y. Uematsu and H. Saito (2007) Improvement of accuracy for 2d marker-based tracking using particle filter. In 17th International Conference on Artificial Reality and Telexistence (ICAT 2007), pp. 183–189. Cited by: §I-A.
  • [25] J. Wang and E. Olson (2016) AprilTag 2: Efficient and robust fiducial detection. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4193–4198. Cited by: §I.
  • [26] P. Wu, J. Lai, J. Wu, and S. Chien (2012) Stable pose estimation with a motion model in real-time application. In 2012 IEEE International Conference on Multimedia and Expo, pp. 314–319. Cited by: §I-A, §I.