luvHarris: A Practical Corner Detector for Event-cameras

05/24/2021
by   Arren Glover, et al.
Istituto Italiano di Tecnologia
10

There have been a number of corner detection methods proposed for event cameras in the last years, since event-driven computer vision has become more accessible. Current state-of-the-art have either unsatisfactory accuracy or real-time performance when considered for practical use; random motion using a live camera in an unconstrained environment. In this paper, we present yet another method to perform corner detection, dubbed look-up event-Harris (luvHarris), that employs the Harris algorithm for high accuracy but manages an improved event throughput. Our method has two major contributions, 1. a novel "threshold ordinal event-surface" that removes certain tuning parameters and is well suited for Harris operations, and 2. an implementation of the Harris algorithm such that the computational load per-event is minimised and computational heavy convolutions are performed only 'as-fast-as-possible', i.e. only as computational resources are available. The result is a practical, real-time, and robust corner detector that runs more than 2.6× the speed of current state-of-the-art; a necessity when using high-resolution event-camera in real-time. We explain the considerations taken for the approach, compare the algorithm to current state-of-the-art in terms of computational performance and detection accuracy, and discuss the validity of the proposed approach for event cameras.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 4

page 7

page 8

page 10

05/02/2021

SE-Harris and eSUSAN: Asynchronous Event-Based Corner Detection Using Megapixel Resolution CeleX-V Camera

Event cameras are novel neuromorphic vision sensors with ultrahigh tempo...
03/27/2019

Speed Invariant Time Surface for Learning to Detect Corner Points with Event-Based Cameras

We propose a learning approach to corner detection for event-based camer...
11/06/2021

Neural Implicit Event Generator for Motion Tracking

We present a novel framework of motion tracking from event data using im...
06/26/2019

FA-Harris: A Fast and Asynchronous Corner Detector for Event Cameras

Recently, the emerging bio-inspired event cameras have demonstrated pote...
12/02/2018

Computing Spatial Image Convolutions for Event Cameras

Spatial convolution is arguably the most fundamental of 2D image process...
10/29/2020

Dynamic Resource-aware Corner Detection for Bio-inspired Vision Sensors

Event-based cameras are vision devices that transmit only brightness cha...
01/27/2021

e-ACJ: Accurate Junction Extraction For Event Cameras

Junctions reflect the important geometrical structure information of the...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Corner detection is used for motion estimation and feature point identification among other machine vision tasks 

[7]. In general corners can be used as informative features that are consistently identifiable over time. For motion estimation, the presence of the two orthogonal edges that define a corner disambiguates the unobservable motion in the direction parallel to a single edges orientation, i.e. it solves the aperture problem.

Several proposed corner detection methods for event-cameras investigate interesting ideas for event processing, but which we find are insufficient for actual use in a complete visual pipeline. However, event-cameras are still a promising technology for the task as they produce a low-latency, sparse visual signal and have the potential to enable high-frequency, reduced computation, visual algorithms in a wide range of applications. An event-camera achieves these advantages as it has independent, asynchronously firing pixels, rather than a global or rolling shutter. When a change in the light intensity is detected, each individual pixel outputs an event encoding the pixel position, and the direction of the gradient of the change. As such, further development of corner detection algorithms for event-cameras is worthwhile.

Event-driven corner detectors have been used in motion estimation pipelines, see [14, 16], and a number of corner detection solutions have been proposed [15, 9, 1, 5, 8]. eHarris [15] employs the accurate Harris algorithm but is too computationally heavy for high-resolution event-cameras. FAST [9] and ARC [1] aim to reduce the computational requirements, but sacrifice accuracy to do so. While their event-throughput improves, they may still fail to obtain real-time performance when using the latest generation, high resolution event-cameras.

In this paper, we present yet another method for performing corner detection with event-cameras, that we dub luvHarris for look-up event-Harris. Its focus is to produce both a real-time, robust and accurate corner detector based on the Harris algorithm [4]. To do so, luvHarris decouples the event throughput from the heavy computation of the Harris algorithm. Only a small (non-corner-related) computation is performed per event, and the result is event throughput of 8.6 M events/s ( improvement over the state-of-the-art). Other detectors, while light-weight, perform a full corner detection for each and every event. We instead take advantage of OpenCV [2] while still maintaining an asynchronous event-driven input and output, and discuss why this is a valid solution particularly for corner detection. The detection accuracy is also improved over the previous eHarris [15] by eliminating the fixed-time event integration window from the pipeline.

This paper has two main algorithm development contributions: 1. a novel event-surface that is compatible with the Harris algorithm, and 2. a look-up pipeline that performs computation as fast as possible, and decouples corner detection processing from the event-stream. Additionally, this paper explains reasons for failure of other detectors, from an accuracy and event-throughput perspective, and adds to the discussion around the balance between event-driven and batch-processing that enables practical on-line vision algorithms for event-driven cameras.

2 Background

The event-based adaptation of the Harris algorithm, eHarris [15]

, became the benchmark early on as it was simple, based on a known method, and open-source code was available. The algorithm created a binary

surface indicating the occurrence of an event in the recent past, over which the Harris score was calculated. The algorithm was event-by-event

in that for each incoming event, the surface was incrementally updated, and the Harris response was computed only locally around the position on the surface at which the event occurred. The Harris response is dependent on the Eigenvalues of the image derivative, in a square patch around the event position:

  • The partial per-pixel derivatives, and , are calculated using the Sobel operator with the surface patch.

  • The mean-square derivatives of the patch, , , and , are calculated by element-wise multiplication through a box filter.

  • The Harris response is related to the Eigenvalues of the matrix , and is calculated as , where and are the determinant and trace, respectively.

An event is classified as a

corner-event if the Harris response is above a threshold .

The accuracy of corner classification using the eHarris method was comparable to a ‘frame-based’ ground-truth, however the processing speed was sub-par [9, 1] for high-rate event-streams. eHarris was developed for the first generation DVS [6] ( resolution), but the reported real-time operation for a data stream of 160k events/s cannot handle higher resolution sensors (e.g. the qVGA ATIS [11] or DAVIS [3]) that were becoming the standard. Nowadays HD event-cameras are also available.

The subsequent FAST [9] and ARC [1] algorithms were proposed to provide a solution for higher-resolution event-cameras, and do so by moving away from traditional image processing techniques, towards techniques that are highly compatible with the asynchronous, fine temporal resolution of event-cameras. Indeed, these techniques reported throughput of up to 3.3M events/s, greatly increasing the conditions in which real-time processing is maintained. FAST reported comparable corner detection accuracy to eHarris [9], while ARC took an approach of reporting more overall detections (including more false positives) and relying on robust downstream off-line corner tracking algorithms to refine the results [1].

The FAST algorithm identifies corners by classifying the spatio-temporal pattern on an event-surface111A simple event-surface puts the value of each pixel equal to the last time that pixel fired, shown in Fig. 1. A continuous arc of recent timestamps which covers an approximate 90 degree angle results in a classified corner, while broken arcs, or arcs that cover an angle close to 180 degrees can be rejected. ARC is performed with the same idea, and the two algorithms differ slightly in the logic used to define a continuous arc.

(a)
(b)
(c)
Fig. 1: Example of false positive corner detections using the ARC algorithm, showing the event position on the surface in the left column and the arc values in the right column. An arc-based detection algorithm considers only the ring of purple events. (a) is an example of two close edges (from the desk) instigating a ‘second-wave’ of events, (b) is an example of a random event firing in-front of an edge, and (c) is an example of random noise passing the ARC detection logic. In all cases, comparing the blue to black pixel visualisation over the square patch, no corner is present.

2.1 ARC Pre-processing

To apply the ARC corner detection logic, a pre-processing stage is required to remove multiple events occurring quickly at the same pixel location (an artificial refractory period). Such signal patterns are common as contrast change due to an edge occurs over a non-discrete time period, allowing a pixel to fire at more than one point in time. Such camera output can appear as a ‘secondary-wave’ behind the initial edge. Secondary events often match the correct corner patch when considering only an arc of events, see Fig. 1. Several sources of motion can trigger secondary events such as:

  • The previously described secondary-wave.

  • Objects with edges or stripes in close proximity.

  • Moving an object back-and-forth across the same pixels.

The authors of [1] suggest a 50 ms artificial refractory period. In practice, it was found that under common circumstances 50 ms also removed large portions of useful signal, in which corners were missed. In this case, the maximum firing rate corresponds to a 20 Hz repeating signal, in which case a RGB camera may be preferable. Instead, a 5 ms refractory period (i.e. 200 Hz) seemed suitable for typical signals from a hand-held, or robot-mounted, camera. However, more false positives were observed. The requirement of such a pre-processing imposes restrictions on the use-cases of the ARC algorithm.

2.2 Missed corners by FAST

FAST, as defined in [9], avoids false positive detections of secondary-wave events as it excludes 270 degree corners in the classification stage. FAST does not require a pre-processing stage, but also cannot detect certain corner patterns (i.e. those produced by 270 degree corners) that are true corners.

In general, FAST and ARC use less information (less pixels) to determine corner patterns, compared to Harris, and are therefore more susceptible to noise in those pixels. Fig. 1 shows several examples of false positive corner detections of an arc-based detection, which, when considering the entire patch, are not patterns of data that correspond to corners.

2.3 Considerations for Applying the Harris algorithm to event-data

The dense computation of spatial derivatives required by Harris, has the downside of a higher processing requirement. Reducing the load of the Harris pipeline has been suggested by using FAST as a candidate detector, and applying Harris only to positive candidates [5]. We instead take a different approach: to enable a real-time Harris-based detector by exploiting the principles of the algorithm.

Original Harris corner detection is applied over an image, in which pixel values are typically bound between 0 and 255. The relative spatial placement of high and low values gives rise to the corner. The magnitude of the corner score, , is proportional to the difference between high and low values, . The consistent and bound value of within the image allows a fixed threshold to be set. In the event domain, a timestamp-based surface of active events (SAE) is used in FAST and ARC algorithms, however it does not produce a consistent . High values will correspond to the current clock-tick, while low values will be equal to the time in the past when the pixel last fired, which could be on the order of milliseconds, or minutes. The resulting unbounded, non-normalised, makes it impossible to fix for timestamp-based surfaces. For this reason, the binary image used by original eHarris [15] was not a simplistic, naive implementation, but rather a calculated decision to force the values of recent (high) and past (low) values to be bounded. On the other hand, the fixed integration window that was used only produces best results when scene motion matches the chosen temporal duration, e.g. movement of  3 pixels every 10 ms. A surface that does not require fixing such a parameter would be more robust to a wider variety of conditions.

A second characteristic of the Harris algorithm comes from the spatial pattern of the corners; a corner is defined by the relative values between neighbouring pixels. In synchronous implementations for frame-based cameras, computation can be saved by sharing the convolution results between neighbouring convolutions. As such, the convolution computation, , required for two neighbouring pixels is not simply , but , where is the proportion of overlap between the two regions. When in a typical dynamic scene, computing all spatial convolutions simultaneously across an image becomes faster than computing the convolution independently for each event. Performing Harris event-by-event throws away the intermediate calculations that can be re-used for neighbouring convolutions. An interesting alternative approach to larger area convolutions, is to calculate convolutions incrementally as in [13]; however the second pass over the Sobel-images needed to extract corners results in a batch operation, and the fixed high-pass filter acts similarly to a fixed temporal window, limiting this implementation’s usability.

Considering the above, luvHarris has been designed with the following principles:

  • A full patch of the surface gives more accurate corner classification than an arc, therefore the Harris method is chosen.

  • The surface over which Harris is applied needs to have consistent maximum and minimum values to enable a fixed classification threshold to be used and, for robustness, should avoid ambiguous parameters (i.e. a temporal window).

  • A corner is a spatial pattern, therefore spatial convolutions (or other pattern matching) is required. Redundant processing can be avoided by re-using convolutions results for corner classification of neighbouring pixels.

3 Look-up event-Harris

The proposed luvHarris is comprised of two parts:

  • An event-by-event update of a threshold-ordinal surface (TOS). The TOS is designed to accumulate visual data so it is compatible with the Harris corner detection algorithm, and does so without the need for an arbitrary temporal parameter.

  • An as-fast-as-possible computation of a Harris-score look-up-table, : when an instance of Harris computation is finished, a new one begins, independently from the number of events that need to be processed.

Our goal is to implement an on-line system in which events are streamed live from a camera, and the processing of corners must be real-time, for any possible event-rate. Events are streamed into the processing module, and events tagged as corners (and not-corners) are streamed out; hence the asynchronous nature of events is fully maintained in luvHarris. An overview of the main algorithm components are illustrated in Fig. 2, and each block is explained below.

Fig. 2: The flow of data through the system. To maintain real-time, must consist of all events produced by the camera, that have yet to be processed. consists of corner-tagged events. A single core CPU can run the algorithm alternating Phase 1 and Phase 2, while a multi-core CPU can process blue and green operations in separate threads simultaneously.

3.1 Event-by-event computation: Threshold-ordinal Surface

The TOS, visualised in Fig. 3, provides a coherent and bound spatial representation of the asynchronous events, maintaining the information about their temporal order. A comparison of several surface methods [8] concluded that an ordinal method (the surface value corresponds to the order of event arrival) is the most suitable for corner detection. In this case, a new event is set to the maximum value (255) and all others (in a local region ) are subtracted by 1. However, this ordinal method does not constrain well, as old events have an almost arbitrary value between 254 and 0. We propose the threshold-ordinal method: after a pixel is reduced below a zero-threshold, , its value is instead set to 0 in order to force a large, fixed range between high and low values. The TOS is defined as:

(1)

and for any input event , the full processing to calculate the corner classification is defined as:

0:  
  for
   for
    
    if
     
  
  
Algorithm 1 Event-by-event computation ()

where is the look-up table explained below.

The TOS is asynchronously updated event-by-event and forms a representation that attempts to capture the most up to date position of edges in the scene. To form an edge of 2 pixels thick, the zero-threshold is set as ; as an edge passes through a spatial region, the maximum number of non-zero pixels is twice the side-length of the region. As the edge moves, new events force the past position of the edge to fall below , leaving the most up-to-date position of the edge regardless of the edge velocity.

(a)
(b)
Fig. 3: Examples of the threshold-ordinal surface visualised as an image. Strong edges and corners are present in the visual signal, while blank regions are either zero (black), or filled with random noise, neither of which produce a strong corner response.

3.2 As-fast-as-possible computation: Harris Look-up

The TOS can be used at any given point in time as an ‘image’, with values between 0 and 255, on which to perform the Harris calculation using the cv::cornerHarris function from OpenCV [2]. The output of cv::cornerHarris is another 2D array, , in which the value at each element is the Harris score.

   cv::cornerHarris(TOS)
Algorithm 2 As-fast-as-possible computation ()

In traditional computer vision, a threshold is applied to , and all locations above the threshold are identified as corners. In luvHarris, is instead used as a look-up-table. For each event, is simply queried at the single event location. If then the event is tagged as a corner. Therefore, while the full look-up-table is generated, it is only queried sparsely and asynchronously.

One key aspect of luvHarris is that the data flow of events is completely decoupled from the generation of : the TOS is updated asynchronously event-by-event, and is computed over the most up-to-date TOS as frequently as computation allows. As cannot be feasibly generated for each single event, a single instance of is used for multiple events. The value , describes the temporal error between the time of an event, , and the time in which was generated, ; the likelihood of event mis-classification increases with . However, the relationship is not linear, and there exists a reasonable period in which results in little to zero change in the Harris score . If a corner completely passes through a spatial region (i.e. defined by ) between updates of , that corner will be ‘missed’, however for several pixels either side of true corner position the Harris algorithm is tolerant. Typically, to maintain accurate corner detection, must need only be updated one or two times as a corner travels a distance of .

3.3 Throughput Limitations

Real-time processing is achieved despite a variable number of input events, as the processing for each event (TOS) is decoupled from heavy algorithm processing (Harris). The total processing, , of an event-based algorithm is:

(2)

where is the number of computations required per-event, is the total number of events, and is the number of computations performed for, , non-event-based processes. If , where is the maximum number of operations capable by the CPU in the period in question, then the algorithm does not operate in real-time.

All previous state-of-the-art corner detection methods perform all corner detection calculations encapsulated by , and there are no non-event-based computations, i.e. . Therefore the total computational requirements are directly proportional to the number of events that are generated by the camera. In general, fully event-by-event algorithms suffer a reduced event throughput as the algorithm complexity increases (given a fixed ). While FAST and ARC are computationally light algorithms, their maximum event throughput (2.0M, and 3.3M events/s respectively) are less than next-generation event-cameras (typically over 4M events/s).

The general contribution of luvHarris to event-driven algorithms is to shift processing from to , to minimise the computational requirements dependent on the event-rate without completely removing (which would result in fully batch-based computation). For luvHarris, corresponds to updating the TOS and a look-up of a single pixel value in , for each event, while encapsulates the Harris computation, which is independent from the number of events that must be processed. luvHarris maintains asynchronous event input and output.

The computational implementation can be thought of as a two-step process, Algorithm 1 is processed sequentially with Algorithm 2, as shown in Fig. 2. To maintain real-time constraints, the input event-batch between updates of must comprise all events produced by the camera that have yet to be processed, . This concept is the key component for real-time luvHarris. As the event-rate increases, the number of events in increases and corner misclassification can occur as more events are associated with a single , however the overall algorithm latency is minimised222under the assumption the CPU is powerful enough to process Algorithm 1 for each and every event, otherwise neither luvHarris will achieve real-time performance..

Finally, luvHarris also lends itself well to parallel operation. As Algorithm 1 and Algorithm 2 are completely decoupled, they can easily be processed on separate CPU cores simultaneously, as illustrated in Fig. 2. Such an implementation allows Algorithm 1 to be run continuously without waiting for Algorithm 2 to first finish. The throughput is increased, but the algorithm isn’t dependent such a parallel operation.

4 Experiments and Results

The luvHarris algorithm is compared to eHarris, FAST and ARC algorithms in terms of real-time operation and corner detection accuracy. The shapes_6dof, poster_6dof, boxes_6dof, and dynamic_6dof datasets [10] are used, as in [9, 1]333In our case we used only the 6-DoF datasets as the literature has shown them to be the most challenging. The boxes_6dof and poster_6dof datasets were trimmed due to technical limitations loading the full dataset into memory. Additionally we perform on-line experiments with an ATIS ( resolution) camera running all algorithms live simultaneously, with an (unfiltered) event-rate of over 10M events/s.

Computation is performed on an Intel Core i7-9750H CPU @ 2.60GHz 12. was set to 3, with an approximate region size equal to FAST and ARC, which use two rings of radius 3 and 4 as in their respective publications. eHarris was implemented by forming an event-by-event sliding window binary image of 10 ms, and applying the openCV cv::cornerHarris method locally event-by-event. Despite eHarris and luvHarris both employing the Harris algorithm, luvHarris may even achieve better performance than eHarris as the TOS is not sensitive to speed.

The full results are also shown as a video444https://zenodo.org/record/4739290.

A filtered set of each dataset was also produced that removed events that occur consecutively at the same pixel location in a short time-window (i.e. artificial refractory period), or that are not correlated to their neighbours (i.e. salt-and-pepper noise) (as explained in [1]). Results indicate if the filtered or non-filtered data was used for the experiment.

4.1 Event Throughput

Table. I shows the maximum event-throughput measured for all algorithms by operating the algorithms at their limits. We verify that our implementations of the state-of-the-art algorithms are computationally on-par with available implementations as the values agree with those presented in [9, 1] accounting for some variation in the exact implementation and the hardware used. Our metric considers the event-throughput of the algorithms themselves, independently of any filtering steps, which could be used on all algorithms identically. The luvHarris method has an approximate speed improvement over the next best ARC algorithm. The major computational requirements of luvHarris comes from the update of the TOS. Therefore, increasing the parameter will decrease the event-throughput of the luvHarris algorithm.

Measured (M event/s) Reported (M events/s)
eHarris 0.16 0.14
FAST 1.98 1.67
ARC 3.27 7.52*
luvHarris 8.59 -
TABLE I: Maximum event throughput measured compared to that reported in the literature [1]. Inconsistencies arise from the exact method of measurement and hardware used, however there is general agreement. *ARC throughput was previously reported considering the event-filter (approx. event-rate) as part of the algorithm, however here we report throughput of only the algorithm itself. Throughput reported here is measured identically for all algorithms, independently from whether or not a noise-filter is used.

Instantaneous delay should arise when, at any point in time, the event-rate of the camera exceeds the maximum event-throughput listed in Table I. Delay was measured during operation by calculating the difference between the timestamp of the current processed packet of events, and the amount of time passed since beginning the experiment. Both the experiments that use datasets and those with a real camera were performed identically, in an on-line fashion: by directly connecting the camera or streaming the datasets with precise timing to the corner detection modules. The major difference to typical off-line processing is that in our experiments events cannot be processed if they have not yet been produced. The algorithms cannot compensate high-throughput periods by also processing low-throughput periods and taking the average.

Fig. 4 quantitatively analyses the real-time performance of all algorithms for each dataset. The eHarris implementation has the highest delay accumulation, and is non-real-time for all datasets. FAST achieves real-time for some datasets, but not for high-texture datasets during periods of fast motion, e.g. the second half of the poster_6dof, as indicated by a non-zero delay. Both ARC and luvHarris managed to maintain real-time for all instances in all datasets. The delay results achieved completely agree with the real-time assessment in [17].

The maximum event-rate of the poster_6dof dataset was below 3M events/s (see Fig. 3(c)) after applying the refractory period pre-processing. From Table I it is expected that both ARC and luvHarris performed real-time. The event-rate of the on-line experiment (see Fig. 3(e)) reached over 3M events/s, at which point the ARC algorithm also experienced instantaneous algorithm delay (of over 1 second). Despite the maximum throughput of luvHarris being much higher than the live experiment event-rate, also luvHarris experienced some instantaneous delay, which could arise from on-line conditions. The delay of luvHarris is several orders of magnitude smaller than the other algorithms, given the log scale of the delay axis.

(a) boxes_6dof
(b) dynamic_6dof
(c) poster_6dof
(d) shapes_6dof
(e) live with ATIS gen3
Fig. 4: Computed algorithm delay when streaming the noise-filtered datasets into the algorithm module. Any value above zero implies the algorithm is not running real-time. eHarris is not practically possible to run on-line with the Gen3 ATIS.

4.2 Corner Accuracy

Ground-truth

corners555used as the basis for analysis, we acknowledge the method is not infallible. The task of deciding which events are corners will always be somewhat ambiguous. were computed for all datasets by first creating a high temporal resolution intensity image video using the e2vid [12] algorithm. The ground-truth corner score was then assigned to each event using the output of the OpenCV cornerHarris method applied to the temporally closest frame, at the event’s pixel location. Finally, a threshold was applied to the scores to result in the set of true corner events, with an assumption that corner events comprise 20% of all corners. Intensity frames were generated every 1000-3000 events depending on the texture in the dataset. From a qualitative assessment, this error was insignificant compared to other possible sources of error, such as artefacts in the frame reconstruction. Fig. 4(a) shows an example e2vid frame for each dataset, as well as the slice of events associated with the single frame, indicating those classified as corners by the original Harris algorithm running on the reconstructed frames.

(a) The boxes_6dof, dynamic_6dof, poster_6dof, and shapes_6dof datasets [10]. The background image is the output of e2vid [12] network with blue indicating corner positions. Green pixels indicate events associated with the particular frame for assigning ground-truth corner scores. (Overlaps between blue and green are shown as cyan).
(b) Precision-recall for each of the above datasets without applying filtering.
(c) Precision-recall for each of the above datasets with applying the filtering as in [1].
Fig. 5: A comparison of detection accuracy and real-time performance for eHarris, luvHarris, FAST, and ARC on RPG 6DoF Datasets [10].

Precision-recall

plots show algorithm accuracy as each algorithm’s corner decision parameter is varied. For example, an algorithm with a high threshold will result in only a few corners (low recall), but should have less false positive detections (high precision). The precision-recall metric allows a better comparison between algorithms which use different parameters for the corner decision. FAST and ARC algorithms use the arc length for the decision parameter; as the inclusion angle is swept from strictly to anything in a arc the algorithm goes from high-precision to high-recall. Instead, eHarris and luvHarris use the corner score from the Harris equation. Algorithms that maintain a high precision as the recall increases are objectively better under a precision-recall metric, i.e. the top-right corner for Fig. 4(b) and Fig. 4(c).

Fig. 4(b) shows that in almost all cases the FAST algorithm results in the lowest corner precision for all recall, as it trades off accuracy for faster compute, compared to eHarris [9]. The ARC algorithm achieves an accuracy on-par with eHarris, possibly because both are able to detect and corners, compared to FAST’s only . luvHarris achieves a higher accuracy, in general, across all datasets. The difference between luvHarris and eHarris, in this case, comes from the use of the threshold ordinal surface.

Further analysis

the algorithm precision was compared at the 50% recall mark. That is, the decision threshold was tuned to detect 50% of the ground-truth corner events for all algorithms, and the number of correct corners were compared. The caption boxes in Fig. 4(b) and Fig. 4(c) indicate the percentage improvement over the eHarris algorithm. In many cases, luvHarris shows the biggest precision improvement, with up to a 87% improvement, compared to ARC’s 31% on the same dataset.

Qualitative corner quality

Fig. 6,  7 and 8 show corner traces for the shapes_6dof and boxes_6dof datasets, and for the live experiment, respectively. In simple scenes, e.g. Fig. 6 and Fig. 7(a), all algorithms are selective to corners, which validates a correct implementation of the algorithms. However, in cluttered scenes the selectivity of all algorithms to corners is questionable. Fig. 7(b) shows that FAST misses many corners, and Fig. 7(c) shows that, as expected, ARC produces many false positives, given it is designed to favour false positives, over missed detections [1], such that finally tracking can filter out the false positive detections. However, in Fig. 7 both ARC and FAST produce noisy responses such that it is hard to discern any consistent corner traces that could be tracked well. Instead, luvHarris provides consistent selective detections over time, in addition to some noisy detections. Finally, eHarris produces more noise than luvHarris in situations in which the camera motion does not match eHarris’s fixed temporal window.

(a) ARC
(b) eHarris
(c) luvHarris
(d) FAST
Fig. 6: Qualitative corner trails over a 100 ms window for the shapes_6dof dataset. (b, c) Harris based algorithms produce consistent, wider trails, while (a, d) arc-based algorithms are more affected by missed corner events, and falsely classifying edges as corners.
(a) ARC
(b) eHarris
(c) luvHarris
(d) FAST
Fig. 7: Qualitative corner trails over a 100 ms window for the boxes_6dof dataset. In cluttered conditions exactly what is a “corner” is more ambiguous, however it is clear that (c) luvHarris detects somewhat consistent trails, while (a) ARC and (d) FAST do not. (d) eHarris also does not consistently detect corners for datasets in which the temporal integration parameter is not compatible.
(a) [@12s, 3M events/s]
(b) [@21s, 5M events/s]
(c) [@28s, 8M events/s]
Fig. 8: A qualitative visualisation over a 100 ms compute time of luvHarris, ARC, and FAST run on-line simultaneously, indicates that all algorithms are selective to corners in simple scenes, but in complex scenes ARC and FAST produce more false positives and less consistent detections over time compared with luvHarris. eHarris was not run for the on-line experiment as it was too computationally heavy. The result is best seen in video format.

5 Discussion

Useability

from practical assessment of the previous state-of-the-art with live camera data, it was clear that the arc-based methods simply did not produce strong and consistent corner detections, which is shown in the qualitative results. Notably, ARC produced too many false positives for useful results, and it is arguable whether a further corner tracking layer would function well. Indeed, ARC was designed in conjunction with a corner tracker, but we argue that better underlying detections will always also produce better tracking. Instead, the low event-throughput of eHarris made it unsuitable for higher-resolution cameras, and the fixed temporal window also led to noisy results under certain conditions. Quantitatively, luvHarris improved over the state-of-the-art in both accuracy and event-throughput, but more importantly, the qualitative output shows consistent trails of corners that should be more easily tracked for motion estimation tasks.

Achieving real-time for the HVGA resolution is an improvement over previous state-of-the-art, however the maximum event-rate of luvHarris might still be less than satisfactory for even higher resolution cameras. Considering out results, it may actually be impossible to perform event-by-event computation for such cameras without specific neuromorphic hardware.

Comparison to literature

a recent comparison of event-based corner detectors [17] similarly concludes that ARC was the only real-time detector on the RPG datasets, but did not test higher-resolution event-cameras such as the ATIS generation 3. Unfortunately, they only present true positive rate as an accuracy metric, which does not give a full understanding of the performance as the precision-recall curves. Instead, [1] presented both true and false positive rates, and similarly indicate that more than 50% precision is not expected for event-based algorithms on RPG datasets. The exact value is, however, highly-dependent on thresholds used to classify ground-truth. Previous literature stating true and false positive rates are selecting a single point along the precision-recall curve to display as a result. We instead propose that the full curve gives a better overall picture of the true algorithm accuracy.

Dataset validity

from our results, the RPG boxes_6dof and poster_6dof are questionable datasets to use for quantitative comparison of corner detection. The flat precision-recall curves obtained indicate the algorithms are not performing much better than chance - however it can be seen that on shapes_6dof the algorithms are selective to corners. Such a result could indicate an incorrect ground-truth, but we suggest the datasets are too cluttered to be easily used to measure performance. For example, it does not matter exactly what an algorithm decides is a ‘corner’ as long as it consistently selects the same position over time. In such cluttered datasets, it is hard to determine precisely a ground-truth of corner and not-corner. Indeed, the results should also be slightly biased towards the luvHarris on these datasets as the ground-truth also uses the cv::cornerHarris method. The same ‘type’ of corners may be selected despite the very different images used for ground-truth and luvHarris. While we suggest the quantitative results are questionable, identifying consistent detections over-time in these datasets is still important, as we show in Fig. 7.

Known issues

the luvHarris algorithm will result in higher computational demand during periods of low event-rate as Algorithm 2 will still be computed ‘as-fast-as-possible’ over the entire retina, despite only very little change occurring. In such situations, sparse event-by-event algorithms will be more power efficient. A power-conscious implementation of luvHarris could throttle the rate of Algorithm 2 to the event-rate under periods of little motion with minor accuracy loss.

The TOS is still defined by a parameter, , similarly to the eHarris temporal window. However, is defined by the application rather than external conditions (e.g. object speed). In this case, we set to give a clear edge that promotes corner detection.

Event-by-event v.s. batch

an ongoing discussion occurs around the validity of batch computation for event-based cameras. The evidence in this study suggests that, for CPU processing of high-resolution cameras, only very limited processing can be performed for every single event. Indeed, the bottle-neck of luvHarris was actually the event-by-event TOS update per event, rather than the rate of cv::cornerHarris over the full retina. We therefore propose that the hybrid concept presented in this paper offer a good compromise between the two - complex algorithmic computations are performed in batch, while the event-stream is still read and output asynchronously. The events flow in and out just as in any fully event-by-event algorithm. On dedicated neuromorphic hardware fully event-by-event algorithms can still be realised.

6 Conclusion

We have presented a practical corner detector for event-based camera specifically addressing problems with limits on event-throughput and detector accuracy. Accuracy is improved by using the Harris algorithm; compared to the arc-based methods it uses more information to give a consistent result. The consistent corner trails even in cluttered conditions indicate the luvHarris algorithm will also produce consistent motion estimation when tracking corners over time.

Compared to previous event-driven Harris implementations, we use the proposed threshold-ordinal-surface which eliminates the need for a temporal parameter, and has a simple update methodology. The contribution of the TOS also extends beyond the corner detection algorithm, and the surface could also be used in other applications. The full luvHarris algorithm is simple and can be implemented in very few lines of code, and builds on open-source libraries.

Event-throughput for high-resolution cameras of multiple million events/s was achieved in real-time by decoupling the heavy Harris calculations from the event-stream. Instead, the only calculations that were done event-by-event was the update of the TOS and a simple look-up of the best effort Harris score. The concept of decoupling event streams from the complex algorithm component is valuable to apply to other event-driven vision algorithms, especially as the camera resolutions continue to increase.

References

  • [1] I. Alzugaray and M. Chli (2018) Asynchronous Corner Detection and Tracking for Event Cameras in Real-Time. IEEE Robotics and Automation Letters 3 (4), pp. 3177–3184. External Links: Document, ISBN 978-1-5386-8425-2, ISSN 2377-3766, Link Cited by: §1, §2.1, §2, §2, 4(c), §4.1, §4.2, TABLE I, §4, §4, §5.
  • [2] G. Bradski (2000) The OpenCV Library. Dr. Dobb’s Journal of Software Tools. Cited by: §1, §3.2.
  • [3] C. Brandli, R. Berner, M. Yang, S. Liu, and T. Delbruck (2014-10) A 240 x 180 130 db 3 us latency global shutter spatiotemporal vision sensor. IEEE Journal of Solid-State Circuits 49 (10), pp. 2333–2341. External Links: Document, ISSN 1558-173X Cited by: §2.
  • [4] C. G. Harris, M. Stephens, et al. (1988) A combined corner and edge detector.. In Alvey vision conference, Vol. 15, pp. 10–5244. Cited by: §1.
  • [5] R. Li, D. Shi, Y. Zhang, K. Li, and R. Li (2019) FA-harris: a fast and asynchronous corner detector for event cameras. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vol. , pp. 6223–6229. External Links: Document Cited by: §1, §2.3.
  • [6] P. Lichtsteiner, C. Posch, and T. Delbruck (2008) A 128x128 120 dB 15 us latency Asynchronous Temporal Contrast Vision Sensor. IEEE Journal of Solid-State Circuits 43 (2), pp. 566–576. External Links: Document, ISBN 0018-9200 VO - 43, ISSN 00189200, Link Cited by: §2.
  • [7] Z. Luo (2013) Survey of corner detection techniques in image processing. International Journal of Recent Technology and Engineering (IJRTE) 2 (2), pp. 184–185. Cited by: §1.
  • [8] J. Manderscheid, A. Sironi, N. Bourdis, D. Migliore, and V. Lepetit (2019) Speed invariant time surface for learning to detect corner points with event-based cameras. In

    2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    ,
    Vol. , pp. 10237–10246. Cited by: §1, §3.1.
  • [9] E. Mueggler, C. Bartolozzi, and D. Scaramuzza (2017) Fast Event-based Corner Detection. British Machine Vision Conference 1, pp. 1–11. Cited by: §1, §2.2, §2, §2, §4.1, §4.2, §4.
  • [10] E. Mueggler, H. Rebecq, G. Gallego, T. Delbruck, and D. Scaramuzza (2017)

    The event-camera dataset and simulator: event-based data for pose estimation, visual odometry, and slam

    .
    The International Journal of Robotics Research 36 (2), pp. 142–149. Cited by: 4(a), Fig. 5, §4.
  • [11] C. Posch, D. Matolin, and R. Wohlgenannt (2008-05) An asynchronous time-based image sensor. In 2008 IEEE International Symposium on Circuits and Systems, pp. 2130–2133. External Links: Document, ISSN 0271-4302 Cited by: §2.
  • [12] H. Rebecq, R. Ranftl, V. Koltun, and D. Scaramuzza (2019) Events-to-video: bringing modern computer vision to event cameras. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vol. , pp. 3852–3861. External Links: Document Cited by: 4(a), §4.2.
  • [13] C. Scheerlinck, N. Barnes, and R. Mahony (2019) Asynchronous spatial image convolutions for event cameras. IEEE Robotics and Automation Letters 4 (2), pp. 816–822. External Links: Document Cited by: §2.3.
  • [14] V. Vasco, A. Glover, E. Mueggler, D. Scaramuzza, L. Natale, and C. Bartolozzi (2017) Independent motion detection with event-driven cameras. 2017 18th International Conference on Advanced Robotics, ICAR 2017 (July), pp. 530–536. External Links: Document, ISBN 9781538631577 Cited by: §1.
  • [15] V. Vasco, A. Glover, and C. Bartolozzi (2016) Fast event-based Harris corner detection exploiting the advantages of event-driven cameras. 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4144–4149. External Links: Document, ISBN 978-1-5090-3762-9, Link Cited by: §1, §1, §2.3, §2.
  • [16] A. R. Vidal, H. Rebecq, T. Horstschaefer, and D. Scaramuzza (2018) Ultimate SLAM? Combining Events, Images, and IMU for Robust Visual SLAM in HDR and High Speed Scenarios. IEEE Robotics and Automation Letters. External Links: Document, 1709.06310, ISSN 2377-3766, Link Cited by: §1.
  • [17] Ö. Yılmaz, C. Simon-Chane, and A. Histace (2021) Evaluation of event-based corner detectors. Journal of Imaging 7 (2), pp. 25. Cited by: §4.1, §5.