## I Introduction

Obtaining high-resolution 3D models of real-world scenes is a common task. These observations may be captured with a variety of robotic platforms (e.g., wheeled, articulated, aerial platforms, etc.) in a variety of different environments (e.g., outdoors, inside pipes, etc.)

The individual observations can then be combined into a single 3D representation (e.g., a triangulated 3D mesh). The quality of this model depends on how well the observations capture the scene, i.e., the number and distribution of the individual measurements. The problem of selecting and planning sensor views to obtain high-resolution models is known as nbv planning.

nbv planning approaches can be classified as either scene-model-based or scene-model-free. Model-based approaches

[Bircher2015, Kaba2016] use a priori knowledge of the scene structure to compute a set of views from which the scene (i.e., an object or environment) is observed. These approaches work for a given scene but do not generalise well to other scenes.Model-free approaches often use a volumetric [Connolly1985] or surface representation [Hollinger2012]

. Volumetric representations discretise the scene into voxels and can obtain high observation coverage with a small voxel size but do not produce high-resolution models of large scenes. Surface representations estimate surface geometry from observations and can obtain high quality models of large scenes but often require tuning of unintuitive parameters or multiple survey stages.

This paper presents the see, a scene-model-free approach to NBV planning that uses a density representation. This representation uses a given resolution and measurement density to define a frontier between fully and partially observed surfaces. Sensor views are proposed to observe this frontier and expand the fully observed surfaces. nbvs are selected and new measurements are obtained until the entire scene is observed at the chosen resolution and measurement density.

This density representation does not require an a priori discretisation of the scene as used by volumetric approaches and scales with the number of measurements obtained and not the size of the scene. This makes see appropriate for large-scale observations (e.g., inspecting a bridge with an aerial vehicle). see uses a more intuitive parameterisation than many surface representations and does not require multiple survey stages.

SEE is evaluated in simulation on four standard models [Krishnamurthy1996a, Turk1994, Curless1996, Newell1975] and a full-scale model of the Radcliffe Camera in Oxford [Boronczyk2016] (Fig. 1). The results show that it achieves higher surface coverage in less computational time than the evaluated state-of-the-art volumetric approaches [Vasquez-Gomez2015, Kriegel2015, Delmerico2017] while requiring the sensor to travel equivalent distances.

Section II presents an overview of nbv planning literature. Section III presents see. Section IV presents an experimental comparison of see with state-of-the-art volumetric approaches on four standard models and a full-scale model of the Radcliffe Camera. Sections V and VI present a discussion of the results and our plans for future work.

## Ii Related Work

Existing NBV planning work covers a variety of scene sizes, from small objects (e.g., the Stanford Bunny [Turk1994]) [Vasquez-Gomez2015, Kriegel2015, Delmerico2017, Dierenbach2016, Karaszewski2016, Khalfaoui2013, Pito1996, Yuan1995, Connolly1985] to buildings [Yoder2016, MENG2017, Bircher2016, Song2017, Hollinger2012, Bissmarck2015, Bircher2015, Kaba2016, Roberts2017].

Surveys of nbv planning literature [Tarabanis1995, Scott2003a, Karaszewski2016a]

categorise approaches based on their scene representation. The most widely used categorisation

[Scott2003a] classifies approaches as either scene-model-based or scene-model-free. Model-based approaches [Bircher2015, Kaba2016] require an a priori scene model and do not generalise well. Within the class of model-free approaches there are global, volumetric and surface representations.Global representations [Pito1996, Yuan1995] consider all observations as part of a single connected surface. Pito [Pito1996] generates a tessellated view space and selects nbvs to observe the boundaries of a partial mesh until the mesh boundaries are closed. It obtains high-resolution models but requires a fixed work-space and known sensor model. Yuan [Yuan1995] estimates the geometry of surface patches and selects views to observe the unknown space between them and obtain a single surface but only demonstrates it on simple surface geometries.

Volumetric representations [Vasquez-Gomez2015, Delmerico2017, Connolly1985, Yoder2016, Bircher2016, Song2017, MENG2017, Bissmarck2015] discretise a bounded scene volume into a voxel grid from which view selection metrics can be computed. Seminal work by Connolly [Connolly1985] uses a metric that counts the number of unseen voxels visible from potential views on a tessellated sphere encompassing the scene. View metrics in later work [Vasquez-Gomez2015, Delmerico2017] consider multiple factors but still sample views from a tessellated surface. Vasquez-Gomez et al. [Vasquez-Gomez2015] rank potential views based on reachability, distance, overlap with previous views and the number of visible unseen voxels. Delmerico et al. [Delmerico2017] use infg metrics to evaluate views based on voxel visibility, observability and proximity to existing observations.

The model resolution obtained from a volumetric representation depends on the resolution of the voxel grid and the number of potential views. Smaller voxels and more potential views allow for greater model detail but require higher computational costs to raytrace each view. These representations are difficult to scale to large scenes without lowering the model quality or increasing the computation time.

Volumetric representations [Yoder2016, Bircher2016, Song2017, MENG2017, Bissmarck2015] have been applied to large scenes despite these limitations. Most approaches mitigate the increase in computation time by reducing the number of potential views. Yoder et al. [Yoder2016] only sample views to observe the frontier between seen and unseen voxels and select nbvs with a view selection metric that balances view utility and travel distance. Meng et al. [MENG2017] similarly only sample views that observe frontier voxels and select nbvs with an infg metric. Bircher et al. [Bircher2016] use the RRT algorithm [LaValle1998] to plan paths through known voxels and sample views at the vertices of the RRT tree to observe unknown voxels. The nbv is selected from the sampled views with an infg metric. Song et al. [Song2017] present a similar approach to [Bircher2016] using the RRT* algorithm [Karaman2011] to plan a path to the nbv that maximises the observation of frontier voxels. Potential views are sampled within a given radius of the RRT* path and the subset that provides the greatest coverage is selected.

Reducing the number of potential views can mitigate the increased computational cost of large scenes but the resolution of the voxel grid is still limited by the raytracing complexity. Bissmarck et al. [Bissmarck2015] compare raytracing algorithms that consider voxel observability, frontier voxels, sparse ray casting and using a hierarchy of voxel grid resolutions to reduce this complexity. They demonstrate that these algorithms outperform simple raycasting in terms of computation time but a nbv planning approach using the algorithms for view selection is not presented.

Surface representations [Dierenbach2016, Khalfaoui2013, Roberts2017, Hollinger2012] estimate surface geometry from sensor observations (e.g., by triangulating measurements into a mesh) and compute views to extend the surface boundaries and improve the surface quality. Some approaches incrementally extend the surface representation with new observations [Dierenbach2016, Khalfaoui2013] while others use a multistage survey to iteratively refine a surface model of the scene [Roberts2017, Hollinger2012].

Dierenbach et al. [Dierenbach2016]

estimate surface geometry by training a neural network to generate a simplified mesh from sensor measurements. Point density is computed locally around the mesh vertices and views are proposed to extend the mesh and obtain a given point density. Khalfaoui et al.

[Khalfaoui2013] apply density-based clustering to sensor observations and propose views to observe the cluster boundaries until the maximum distance between cluster centers is below a given threshold. These approaches can obtain high-resolution models but require tuning of unintuitive parameters.Multistage approaches [Roberts2017, Hollinger2012] refine an existing surface mesh that is often obtained manually or with a preplanned path. Hollinger et al. [Hollinger2012] represent the mesh uncertainty as a Gaussian process and propose views to improve the surface estimation. Roberts et al. [Roberts2017] sample potential views within a given distance of the mesh surface, select the minimal subset that can provide complete coverage and plan the shortest path between them.

Some work [Kriegel2015, Karaszewski2016] presents approaches using both volumetric and surface representations. Kriegel et al. [Kriegel2015] combine a volumetric representation with an infg view selection metric and a surface representation that selects views to extend the boundaries of a surface mesh and obtain a given point density. Karaszewski et al. [Karaszewski2016] obtain an initial scene survey with a volumetric representation and then fill discontinuities in the observed surfaces based on the local point density. The local measurement density is also considered by see but without the complexity of using a different underlying representation.

see is a nbv planning approach that uses a density representation. Unlike volumetric representations, it scales well to large scenes and is shown to obtain accurate and complete models of scenes at any scale (i.e., both *bunnies* and *buildings*). Unlike surface representations, it does not require multistage surveys or have unintuitive parameters. SEE instead uses only measurement density and resolution.

## Iii Surface Edge Explorer (SEE)

SEE seeks to observe an entire scene with a minimum measurement density. This measurement density is defined by the resolution, and target density, , used to detect frontiers in the measurements. Frontiers are detected by classifying sensor measurements (i.e., points) based on the number of neighbouring points within the distance . Points with sufficient neighbours (i.e., the local density is greater than or equal to ) are classified as *core* and those without are classified as *outliers*

. Outlier points with both core and outlier neighbours are then classified as

*frontier*points (Fig. 2). These frontier points represent the boundary between fully and partially observed surfaces (Sec. III-A).

The scene observation is expanded by taking measurements at these frontiers. Potential views are proposed by estimating the local surface geometry around frontier points as a plane described by a set of orthogonal vectors (Fig.

3). These vectors describe the normal to the local surface, the density boundary and the direction of partial observation (i.e., the frontier) (Sec. III-B).Views are proposed orthogonal to this locally estimated surface plane to maximise sensor coverage (Fig. 4). The view distance can be specified by the user or defined as a function of the sensor parameters and desired resolution (Sec. III-C).

The nbv is selected from these *view proposals* to reduce the distance from both the current sensor position and the first observation of the scene. This guides observations to expand one frontier at a time and decreases the total distance travelled by the sensor (Sec. III-D).

The proposed views will not expand frontiers on discontinuous or highly non-planar surfaces. These views are iteratively adjusted in response to new observations until the frontier point is observed or a sufficient number of attempts have been made to classify it as an outlier. Points classified as outliers will not be reprocessed unless a new point is observed nearby (Sec. III-E).

SEE continues to select NBVs until there are no more frontier points and all measurements have been classified as core or outlier points. This can be achieved in unbounded real-world problems by discarding all measurements outside of a predefined scene boundary (Sec. III-F).

### Iii-a Frontier Detection

Frontiers between fully and partially observed surfaces are detected by performing density-based classification of sensor measurements (i.e., points). Points are classified as either core, frontier or outlier based on the number of neighbouring points, , with a radius, , of the point (Fig. 2). The number of observed points in the -ball is compared with the minimum number of points, , necessary to satisfy the desired point density, , where .

This density-based classification approach is based on DBSCAN [Ester1996]. DBSCAN classifies a set of sensor measurements, where , as core points, , frontier points, , or outlier points, . These labels are complete and unique such that

A point is classified as a core point if it has more than neighbours within a distance ,

where is the set of points within of ,

is the -norm and is set cardinality.

A point is classified as a frontier point if it is not a core point but has both core and outlier neighbours,

It is otherwise classified as an outlier point,

This paper modifies DBSCAN to classify measurements obtained from incremental observations (Alg. 1). When a new sensor observation is obtained, the set of new measurements, , is combined with the existing classification sets, , and (Line 1). Each new point, , is processed and added to either the core, frontier or outlier point sets (Line 3). Any new point that has not yet been classified is added to the (re)classification queue, , along with its neighbourhood points (Lines 4–5). If a point in the queue is not a core point then it is (re)classified based on the new measurements (Lines 6–7). Points with insufficient neighbours to be core are classified as frontier points if they have both core and outlier neighbours or otherwise as outlier points (Lines 9–14). Points with sufficient neighbours are classified as core points (Line 16). If the point was previously unclassified then its neighbourhood is added to the (re)classification queue and it is marked as classified (Lines 19–21).

### Iii-B Surface Geometry Estimation

Good observations require knowledge of the surface geometry. The surface around a frontier point, , is approximated as locally planar through eigendecomposition of a matrix representation of its neighbourhood,

where are the neighbouring points.

The eigendecomposition of the square matrix,

, produces a set of eigenvalues,

and their corresponding eigenvectors,

, satisfying the eigenequation,As

is a real orthogonal matrix, the set of eigenvectors form an orthonormal basis (i.e., three mutually orthogonal unit vectors) of

. Each eigenvector describes one component of the observed surface geometry (Fig. 3). The normal vector, , is orthogonal to the surface plane. The boundary vector, , points along the boundary between partially and fully observed surfaces. The frontier vector, , lies in the surface plane and points in the direction of partial observation.The surface geometry components are determined sequentially from the eigenvectors, eigenvalues, view orientation and the mean of the nearby points, ,

#### Iii-B1 Normal vector

The normal vector,

, is assigned as the eigenvector corresponding to the minimum eigenvalue (i.e., the direction of least surface variance),

The direction of the normal vector is chosen to be opposite the direction of the view orientation, , such that,

#### Iii-B2 Frontier vector

The frontier vector, , is the eigenvector perpendicular to the boundary of the partially observed surface. It is assigned as the remaining eigenvector which maximises the magnitude of the dot product with the mean point,

The direction of the frontier vector is chosen to point away from the mean of the frontier point neighbourhood, into the partially observed region of the point cloud such that,

#### Iii-B3 Boundary vector

The remaining eigenvector is locally tangential to the boundary between the density regions and is referred to as the boundary vector. The direction of the boundary vector is given by the cross product of the normal and frontier vectors,

### Iii-C View Generation

View proposals are generated to maximise sensor coverage of the estimated planar surface around each frontier point. A view proposal, , is defined by a view position, and orientation, .

The view position is a distance, , on the normal vector, , from the frontier point,

The view distance may be user specified or defined as function of the sensor parameters and desired resolution.

The view orientation, , is given by the inverse of the normal vector (i.e., pointing in the direction of the surface),

### Iii-D NBV Selection

The NBV is selected from the set of view proposals,

where maps frontier points to view proposals (i.e., Sec. III-C).

SEE observes the scene while reducing total travel distance by selecting NBVs based on their *incremental* and *origin* distances. The incremental distance of a NBV is given by the difference between the current view position, and the position of the proposed view. The origin distance of a NBV is given by the difference between the position of the proposed view and the first scene observation, .

The NBV, , is selected to minimise the global distance,

from the set of view proposals, , within of the current view,

If there are no nearby view proposals (i.e., ) then the NBV that minimises the local distance is selected,

### Iii-E Local View Adjustment

Real surfaces have discontinuities and occlusions that invalidate the locally planar assumptions and prevent expansion of the frontier. In these situations, SEE incrementally adapts the current view until either the frontier point is observed or sufficient attempts have been made to classify it as an outlier.

The locally planar assumption is often violated by surface discontinuities (e.g., edges or corners) or occlusions by other surfaces. When the frontier point is near a discontinuity, the view must be adjusted to observe both sides of it (i.e., to see around the corner). When the frontier point is occluded by another surface, the view must be adjusted to avoid the occlusion (i.e., to see around the other surface). These views are not orthogonal to the locally estimated surface. SEE attains such views by iteratively using new measurements to translate and rotate the current view to move the center of the observed points towards the frontier point.

The magnitude of the translation and rotation for each axis is determined by the displacement, , between the center of observed points, , and the frontier point along the axis,

where is a rotation into a local frame.

The view is first translated along the frontier vector by a distance, ,

and rotated around the boundary vector by ,

It is then translated along the boundary vector by a distance, ,

and rotated around the frontier vector by ,

The distance factor, , determines the magnitude of the translation and rotation for the view adjustment. SEE scales it exponentially with the number of view adjustments, , for a given frontier point, . This stops the size of the view adjustment from converging to zero as the center of observed points moves closer to the frontier point.

The position and orientation of the adjusted view, , is then given by,

The rotation matrices, and , are computed with Rodrigues’ rotation formula [RodriguesO.1840a] using the frontier and boundary axes and angles, and ,

where,

and

is the identity matrix.

The sensor is moved to the adjusted view and another observation is obtained. This process is repeated iteratively until the frontier is expanded (i.e., the other side of the surface discontinuity is observed) or the Euclidean distance between the frontier point and the center of observed points stops reducing. If this termination criterion is reached then the view is reinitialised on the viewing axis from which the frontier point was observed (i.e., where no occluding surface exists) but at a distance from the surface no greater than that of the observing view, .

This new view position is

The new view orientation is

When starting the view adjustment from the observation viewing axis, the distance factor is reinitialised, , and adjustment is again performed until termination. If this process also reaches the termination criterion then the frontier point is reclassified as an outlier point.

### Iii-F Completion

see completes the observation of a scene when the final frontier point has been observed and all points are classified as either core points or outliers. This termination criterion assumes that the observable scene is finite. In the real world this condition can be met by defining a scene boundary and discarding all measurements outside it.

## Iv Evaluation

. Noise-free measurements obtained by SEE are presented in the left-most column to illustrate the model. The graphs present the mean performance calculated from fifty independent trials on each model. Left to right they present the mean surface coverage vs the number of views, the mean computational time required to plan nbvs and the mean distance travelled by the sensor. The error bars denote one standard deviation around the mean. These results show that SEE achieves higher surface coverage in less computational time and with near equivalent travel distances when compared to the evaluated volumetric approaches.

see is compared to state-of-the-art nbv approaches with volumetric representations, Area Factor (AF) [Vasquez-Gomez2015], Average Entropy (AE) [Kriegel2015], Occlusion Aware (OA) [Delmerico2017], Unobserved Voxel (UV) [Delmerico2017], Rear Side Voxel (RSV) [Delmerico2017], Rear Side Entropy (RSE) [Delmerico2017] and Proximity Count (PC) [Delmerico2017] on four standard models, the Stanford Armadillo [Krishnamurthy1996a], the Stanford Bunny [Turk1994], the Stanford Dragon [Curless1996], the Newell Teapot [Newell1975] and on a full-scale model of the Radcliffe Camera [Boronczyk2016]. The implementations of the volumetric approaches are provided by [Delmerico2017].

### Iv-a Simulation Environment

Measurements are simulated from a depth sensor by raycasting into a triangulated mesh of a scene model and adding Gaussian noise ( m, m) to the ray intersections to simulate a noisy 3D range sensor. These measurements are given to the nbv algorithms as sensor observations. The process is repeated for each view requested by the algorithm.

The depth sensor is defined by a field-of-view in radians, , and a dimension in pixels, and

. The simulation environment contains no ground plane and the sensor can move unconstrained in three dimensions with six degrees of freedom. The sensor is prevented from moving inside scene surfaces by checking for intersections between the sensor path and the scene model. The sensor parameters used for the evaluation are

rad, px and px.### Iv-B Evaluation Parameters

Potential views for the volumetric approaches are sampled from a given view surface (i.e., a view sphere) surrounding the scene as in [Vasquez-Gomez2015, Delmerico2017]. Kriegel et al. [Kriegel2015] does not restrict views to a view surface but we use the implementation provided by [Delmerico2017] which does. The radius of the view sphere is defined as half the diagonal of the scene bounding box plus a chosen offset of m for the standard models and m for the Radcliffe Camera. The view distance for see is set to the radius of the view sphere.

SEE uses a measurement density of points per m for the standard models and points per m for the Radcliffe Camera. The resolution used is m for the standard models and m for the Radcliffe Camera. The volumetric approaches use the same resolutions for their voxel grids.

Every algorithm was run fifty times on each model for a given number of views. see was run until its completion criterion was satisfied. The view limit for the infg approaches on each model is set to the maximum number of views used by SEE to demonstrate their convergence. The number of views sampled on the view sphere is defined as the view limit as in [Delmerico2017].

### Iv-C Evaluation Metrics

The algorithms are evaluated by calculating their relative surface coverage, computational time and sensor travel distance. These values are averaged across fifty experiments on each model (Fig. 5).

#### Iv-C1 Surface Coverage

The surface coverage of an approach is measured as the ratio of observed model points, , to total model points, ,

A point is considered observed, , if there is a measurement within of the point. This registration distance is chosen as m for the standard models, as in [Delmerico2017], and m for the Radcliffe Camera model.

#### Iv-C2 Time

The time taken to compute next best views is measured and added to a cumulative total. The time required for sensor travel is not considered.

#### Iv-C3 Distance

The distance travelled by the sensor is measured by summing Euclidean distance between the positions of subsequent views.

## V Discussion

The experimental results demonstrate that see outperforms the evaluated state-of-the-art volumetric approaches (Fig. 5) by requiring less computational time to plan views that obtain greater surface coverage with near equivalent travel distances, regardless of scene complexity and scale. see is shown to consistently obtain high surface coverage for models with different surface complexities and scales while the volumetric approaches demonstrate varying performance.

Standard models with a large amount of self-occlusions (e.g., the ears of the Stanford Bunny and the handle of the Newell Teapot) demonstrate the advantages of the adaptable views used by see. The evaluated volumetric approaches perform worse on these problems as they do not adjust their views to account for occlusions. The view selection metric presented in [Kriegel2015] does adapt views to handle occlusions but this is not included in the implementation provided by [Delmerico2017].

The Radcliffe Camera model demonstrates the difficulty of scaling volumetric approaches to large scenes. The large resolution necessary for reasonable raytracing allows voxels to be observed by discontinuous measurements (Fig. 1).

The experiments show that the computational performance of see is logarithmically better than the volumetric approaches. The poor performance of the volumetric approaches is due to the computational complexity of raytracing a high-resolution voxel grid from every view on the view sphere when selecting a nbv. The limited scalability of the volumetric approaches with scene size is demonstrated by the difference in computational performance between the standard models and the Radcliffe Camera model.

While see travels a larger distance per-view in the experiments, it initially achieves equivalent surface coverage per unit distance. The volumetric approaches then appear to continue to travel without significantly improving coverage while see continues to increase coverage as it travels. As a result, by the time see terminates it has travelled distances equivalent to many of the other approaches but has achieved higher surface coverage.

## Vi Conclusion

see is a scene-model-free approach to NBV planning that uses a density representation. The representation defines a frontier between fully and partially observed surfaces based on a user-specified resolution and measurement density. View proposals are generated to observe this frontier and extend the scene coverage. nbvs are selected and new measurements are obtained until the scene is fully observed with the given measurement density and at the specified resolution.

The density representation used by see has a number of advantages over volumetric and surface representations. Unlike volumetric representations, the complexity of see only scales with the number of measurements and not scene scale, making it possible to obtain high-resolution models of large scenes. In contrast to many surface approaches the measurement density and resolution parameters can be specified intuitively and only a single survey stage is required.

Experimental results show that see outperforms state-of-the-art volumetric approaches in terms of surface coverage and computation time. It take less computation time to propose views that achieve greater surface coverage with an equivalent travel distance.

SEE was only compared to publicly available volumetric approaches as we were unable to attain implementations of relevant surface approaches. We plan to implement state-of-the-art surface (e.g., [Dierenbach2016]) and/or combined approaches (e.g., [Kriegel2015]) and present comparisons with these in future work. SEE may be made available to other researchers upon request to facilitate comparisons. We are also working to deploy and test see on real-world problems with an aerial platform.

Comments

There are no comments yet.