Anisotropic-Scale Junction Detection and Matching for Indoor Images

03/16/2017 ∙ by Nan Xue, et al. ∙ 0

Junctions play an important role in the characterization of local geometric structures in images, the detection of which is a longstanding and challenging task. Existing junction detectors usually focus on identifying the junction locations and the orientations of the junction branches while ignoring their scales; however, these scales also contain rich geometric information. This paper presents a novel approach to junction detection and characterization that exploits the locally anisotropic geometries of a junction and estimates the scales of these geometries using an a contrario model. The output junctions have anisotropic scales --- i.e., each branch of a junction is associated with an independent scale parameter --- and are thus termed anisotropic-scale junctions (ASJs). We then apply the newly detected ASJs for the matching of indoor images, in which there may be dramatic changes in viewpoint and the detected local visual features, e.g., key-points, are usually insufficiently distinctive. We propose to use the anisotropic geometries of our junctions to improve the matching precision for indoor images. Matching results obtained on sets of indoor images demonstrate that our approach achieves state-of-the-art performance in indoor image matching.



There are no comments yet.


page 2

page 10

page 13

page 15

page 16

page 17

page 21

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Image correspondence is a key problem for many computer vision tasks, such as structure-from-motion 

[1, 2, 3, 4], object recognition [5, 6] and many others [7, 8]. The past decades have witnessed the big successes on that problem achieved by detecting and matching local visual features [9, 10, 11, 12, 13]. Although most of existing image matching algorithms relying on such local visual features perform well for images containing rich photometric information, e.g. outdoor images, they usually lose their efficiency on images that are less photometric and dominated by geometrical structures such as indoor images displayed in Fig. 1. In the indoor scenario, images are often dominated by low-texture parts and are with severe viewpoint changes, in which case it is reported to be more effective to make the correspondence of geometrical structures [14, 15] such as line segments [16, 17] and junctions [18].

The line segment matching problem has been studied in recent years since it can represent more structural information than key-points. Many algorithms match line segments by using either photometric descriptors with individual line segments [19] or the initial geometric relation [14, 15] to assist line segment matching. The approaches using pre-estimated epipolar geometry usually perform better than those of using photometric descriptors [19], but the epipolar geometry estimation still needs key-point correspondences in many situations. In the indoor scenes, due to the fact that descriptors for low-textured regions are not distinctive enough, it is very likely to produce unstable epipolar geometry for inferring the line segment matching. It is thus of great interest to develop elegant ways to make the correspondences of geometrical structures of images while get rid of the errors raised by the key-point correspondences, for finally achieving better matching of indoor images.

Alternatively, as a kind of basic structural visual features, junctions, have been studied as the primary importance for perception and scene understanding in recent years 

[20, 21, 22]. Being a combination of points and ray segments, junctions contain richer information than line segments, i.e. including a location and at least two ray segments (known as branches). Ideally, the information contained by a pair of junctions enables us to recover the correspondences between images up to affine transformations. However, due to the difficulties in the estimation of the endpoints of junction branches, most of junction detection algorithms [23, 24, 25, 26, 27, 28] concentrate on identifying the locations and orientation of branches while ignoring their length. This actually simplifies junctions as key-points and does not fully exploit their capabilities for image correspondences. To characterize the structure of junctions better, the detector ACJ [28] estimates scale invariant junction and it can be represented isotropically as a circle region with two or more dominant orientations. Every orientation represents a branch of junction and the radius of circle is equal to the shortest length among these branches. Although the orientation of branches is invariant with respect to viewpoint, it is not enough for estimating the affine transformation. Fortunately, if we can estimate the length of every branch, the affine transformation will be determined by a pair of junction correspondence.

Figure 1: A pair of indoor images. It can be seen that these images are dominated by geometrical structures. e.g. the edges of the door, and low-textured wall.

Motivated by this, we are going to study for exploiting the invariance of junctions through estimating scale (length) of branches. For indoor images, the inherent scale for junctions usually are the length of some (straightforward) boundary for salient objects in images, which contains rich structure information and beyond local features. More precisely, we proposed an a-contrario approach that models the endpoints of a ray segment starting at given location with initial orientations, which check the proposed point if it should be a part of the ray segment according to number-of-false-alarms (NFA). When the points that belong to the ray segment occurs continuously until the continuity broken, the inherent scale for the ray segment is determined. In reality, the initial orientations are noised, we also optimize them with the junction-ness based on the a-contrario theory. Once the anisotropic scale is estimated for each branch (ray segment), the local homography can be estimated from any pair of junctions extracting from two different images. Theoretically, the correct correspondence produce reasonable local affine homography while incorrect correspondences generate local homographies in their own way. Considering the certainty of junction locations, the regions around location can be mapped by correct or incorrect affine homography. Correct homographies will map one image to another with minimal patch distortion. Comparing the regions with induced affine homography for a pair of junctions can check if the pair are correspondence. When the corresponding junctions are identified in image pairs, the results will produce more structure information. Our contributions in this paper are

  • We extent the junction detector in [28] to anisotropic-scale geometrical structures, which can better depict the geometric aspect of indoor images.

  • We developed an efficient scheme for making the correspondence of anisotropic-scale junctions. More precisely, as a detected anisotropic-scale junction provides at least three points, each pair of junctions in images can induce an affine homography. We finally present a strategy by induced homographies to generate accurate and reliable correspondences for the location and anisotropic branches of junctions simultaneously.

  • We evaluate our method on challenging indoor image pairs, e.g. some of images are from the indoor image datasets used in [29, 30] and our results demonstrate that it can achieve state-of-the-art performance on matching indoor images.

The rest of this paper is organized as follows. First, the existing research related to our work is given in Sec. 2. In Sec. 3, the problem of detecting and matching junctions for indoor scene is discussed. Next to this section, an a-contrario approach for detecting anisotropic-scale junction is described. As for the junction matching, we design a dissimilarity in Sec. 5 to find the correspondences. The experimental results and analysis for our approach are given in Sec .6. Finally, we conclude our paper in Sec. 7.

2 Related works

In this section, we briefly review the existing approaches for junction detection and matching as well as geometrical structure matching for indoor images.

2.1 Junction detection

Detecting junction structure in images has been studied for years[31, 32, 33, 24, 34, 26, 27]. In the early stage, junction was studied as corner points [31, 32]. For the sake of recognition, the scale of junctions or other key-points also have been studied [10, 33, 28]. These approaches estimate the scale around junction locations by using scale space theories [35, 10, 9, 33] to handle the viewpoint changes across different images. Since these approaches determine the scale of interested points in very local area, their precision and discriminability will be lost quickly. Besides, these methods mainly focus on the localizations and scales of corner points while ignoring the differences between different type of junctions.

To overcome these shortcomings, the ACJ detector [28] was proposed to detect and characterize junctions with non-linear scale space. In this work, an a-contrario approach is proposed for determining the location and branches of junctions with interpretable isotropic scales, which characterizes the ray segments as junction branches and locations explicitly. The scales for detected junctions correspond to the optimal size at which one can observe the junction in the image.

Similar to junction detection, there is an elegant detector named edge based region (EBR) detector [12] for detecting affine invariant regions by estimating relative speed for two points that move away from a corner in both directions along the curve edges. This work can be regarded as a kind of junction detector in curve dominated images. The straight edges which are common in indoor scenario cannot be tackled in this way.

Although above mentioned approaches can extract junctions, their geometric representation is not exploited sufficiently. The scales estimated by these methods are local and insufficient for characterizing indoor scenes.

2.2 Junction matching

Junction matching has been attended since early years and shown promising matching accuracy[18, 36].

In [18], a model for estimating endpoints of junction branches is proposed which is very close to our work that estimating anisotropic scales for each branch. Differently, their approach [18] requires a roughly estimated fundamental matrix while our proposed method estimating anisotropic scales for each branch directly without fundamental matrix. For known fundamental matrix, the local homography between a pair of junctions can be estimated to produce more accurate results and refining epipolar geometry meanwhile [36]. These results are very related to recent approach for the hierarchical line segment matching approach LJL [15]. In this work, detected line segments are used to generate junctions with virtual intersections in the first stage. After that, junctions are regarded as key-points for matching initially. Finally the epipolar geometry induced from initial matching is used to estimate line segment correspondences with local junctions. The matching accuracy in fact relies on the descriptors of virtual intersections. Although their matching results are promising, the problem of estimating epipolar geometry need to other ways.

2.3 Indoor image matching with geometric structure

Most of indoor scenes can be described by using simple geometric elements such as points and line segments. As a combination of points and lines, junction is also a sort of useful geometrical structure for indoor scene. There has been many approaches such as Canny edge detector [37] and line segment detector (LSD) [16, 17] to extract line segments. LSD, which can produce more complete line segments than canny edge without any parameter tuning procedure, has been applied in many tasks such as line-segments matching [15] and 3D reconstruction [38]. Compared with key-points, line-segments can produce more complete result that contain the primary sketch for the scene.

Most of algorithms for line-segments matching rely on key-point correspondences. More precisely, key-points for an input image pair are firstly detected by using SIFT [10] or other detectors while estimating the epipolar geometry between the image pair by using RANSAC [39] and its variants. Based on the fundamental matrix induced by key-points matching, many approaches such as line-point-invariant (LPI) [14] and line-junction-line (LJL) [15] can match line segments correctly. LPI has ability to handle the relation between line-segments and matched key-points with viewpoint changes. LJL[15] method matches image pairs in multiple stages. In the first, detected line-segments are intersected with appropriate threshold to produce junctions and matching these intersections in the same way with key-points matching. Then, local homography are estimated for these junctions with the estimated fundamental matrices from key-points matching results. Although these approaches produce good performance in many cases, the matching results are in favor of matching lines instead of line-segments. Their results show that lines are matched while the endpoints of line segments are not matched very well. Except for the reason that the estimated epipolar geometry is sometime erroneous, there is a important reason for failure of line-segments matching that line segment detectors can not guarantee that the line-segments are consistent across imaging condition varying. In many situations, a line segment detected in image might be decomposed to two or more collinear line segments in another image . In this case, the results of line matching can be regarded as correct if the line in is corresponding with . However, in the aspect of line-segments matching, there exists no correct corresponding line-segments for in image . On the other hand, the existing line segment matchers rely on the results of key-point matching. Once the key-points matching failed or inaccurate, the induced result of line segment matching will be affected in some extent.

3 Problem Statement

3.1 Junction Model

Figure 2: Template of isotropic-scale junction (left) defined in ACJ [28] and anisotropic-scale junction (ASJ) (right) proposed in our work.

The early researches for junction detection usually focus on the orientations of branches and the locations while ignoring the length or scale of each branch. Even though the junction locations and orientation of branches are important to depict geometric structure for images, lacking scale of branches limits their performance for image matching. Motivated by this, we want to propose a new junction model for characterizing junction better. We define the our junction model by considering the endpoint of each branch. Since the length of every branch is possible to be different, we call our model as anisotropic-scale junction. As a special case, junction model with isometric branches is called isotropic-scale junction.

Definition 1 (Anisotropic-scale junction)

An anisotrpic-scale junction with branches starting at the same location is denote as


where and are the scale and orientation for -th branch, is the number of branches.

Fig. 2 provides an example for the difference between anisotropic-scale (left) and isotropic-scale (right) junctions is shown. The isotropic-scale junction is actually a special case for the anisotropic model when the length of all branches are identical.

3.2 Detecting Junction Locations and Isotropic Branches

Since junction is formed by several intersected line segments, the problem of localizing the intersection and identifying the normal angle of these line segments is easier to be focused. Once the isotropic junction model is defined, this problem becomes a template matching problem. Based on this idea, Xia et al. exploited the junction-ness for branches with given scale and orientations and then an a-contrario approach is derived to determine meaningful junctions for input images [28]. The junction-ness for given scale and orientation actually contains the neighbor information of normalized gradient. Different from points, the neighborhood for a given scale and is a sector. As shown in the left of Fig. 2, the dark area with represent the sector neighbor of the branch. The sector neighbor for given location , scale and orientation can be denoted mathematically as


where the is defined as with some predefined parameter , is the domain of input image, is the distance along the unit circle, defined as and

is the angle of the vector

in .

Since a junction is formed by edges and corner points, the normal angle for gradient should be consistent with the orientation of branches. Followed with this idea, if most of points have close normal angles with orientation , the corresponding scale and orientation should be meaningful to be a branch of the junction. For a given sector , the junction-ness can be measured by


and is the pairwise junction-ness with


where the is the norm of normalized gradient at point , for pixel is defined as , are the partial derivative of input image in and direction.

For the isotropic scale junction with two or more branches, the minimal junction-ness for one of the branches is used to describe the junction-ness for the entire junction with the equation (5)


where the number and represent the total number of branches and branch index for the junction .

3.3 Analysis for Estimating Anisotropic-scale Branches

Although the equation (5) measures junction-ness for a given junction, it does not contain any anisotropic scale for branches. Such definition of junction-ness only keeps the information that each branch’s scale is larger than and it cannot be used for handling more sophisticated transformations such as affine transform and projective transform. To overcome this problem, we define the anisotropic-scale junctions with independent scales in Def. 1. The difference between isotropic-scale junction and the anisotropic-scale one can be observed in Fig. 2. It is easy to see that the junction-ness for entire junction defined in Eq. (5) cannot be used to exploit independent scales . Fortunately, the isotropic-scale junctions detected by ACJ [28] is meaningful and the problem of estimating scale and orientation can be simplified to estimating only scale with given location and orientation . In other words, for the detected isotropic junctions, we need to exploit a robust method to estimate the length of corresponding ray segment with specific orientation .

One plausible way to model the unknown scale with respect to given location and orientation is that simply modify the junction-ness defined in Eq (3) to with specific . Then, the a-contrario approach in [28] seems to be feasible to check whether the scale is

-meaningful. The corresponding cumulative distribution function (CDF) used to get

-meaningful scale can be formulate to


where the

represents the distribution of random variable



is the number of pixels in corresponding sector neighbor and the operator

produces the convolutional probability density function (PDF) with

times, which actually represents the random variable of . The -meaningful scale for given orientation and location can be determined by the inequality


where is the number of test for junctions with branch.

However, the NFA defined in Eq. (8) has to face the fact that there exist several junctions in indoor images which have extremely large scale branches. This fact would lead to the above inequality disabled. To illustrate this problem, we studied the relationship between convolution times with the minimal junction-ness that can make the probability . As shown in Fig. 3, if the value of is greater than , the probability of will be equal to constantly, which may cause the inequality degenerated to . In fact, the pairwise junction-ness defined in Eq (4) can reach to and then the will be equal to . Therefore, the junction-ness in [28] is infeasible to model the unknown scale.

Figure 3: The relationship between the convolution times and corresponding minimal value with

4 An a-contrario model for anisotropic-scale junction detection

To solve the problems addressed in Sec. 3, we derive a differential junction-ness model for depicting scale with given location and orientation. Since the scale for each branch of junction is irrelevant, we just model the endpoint of each branch independently.

4.1 Differential Junction-ness Model

Suppose the isotropic junctions have been detected in a small scale , the inherent scales of branches will be greater than . If we increase the scale to larger , though the junction-ness is still larger, the error will not be increased significant. A reasonable way to recognize the un-significant variation is to study the variation of with respect to increased. Here, we first reformulate the junction-ness for a branch (3) in continuous form. The junction-ness for position , scale and orientation is


where the is the angle width for given scale, here, we select . The descrete partial derivative is given by


where is the -th sample angle in the range and is the -th sample point in the range .

4.2 Null Hypothesis and Distribution

After the differential junction-ness model built, we need to find a robust way to check if the value of for specific is significant enough. One way to achieve this goal is developing an a-contrario approach to control the threshold automatically. Since our work is an extension of ACJ [28]

, the null hypothesis here should be same, we say the variables

and follow the null hypothesis if

  1. , follows a Rayleigh distribution with parameter 1;

  2. ,

    follows a uniform distribution over


  3. All of the random variables are independent each other.

According to the dicussion in [28], every follows the distribution (7) independently. The random variable follows the distribution of the random variable


where the random variable follow the distribution in equation (7) , is the number of sampling points for and is the number of sampling points for . The function will be very small for reasonable (for example, induced ) since the parameter should have small values. Hence, the random variable could be approximated with for computational simplicity. In practice, is larger than 10 and therefore the PDF of

can be apprixmated accurately by using the Central Limit Theorem as


where and

are the expectation and variance of (

7). The PDF of is


which is the Gaussian distribution with mean

and variance . Meanwhile, the random variable follows . Therefore, the random variable follows the distribution approximately.

The probability for given and follows the distribution


describes the fact that scale cannot be increased with a sufficient small incremental at along orientation under the hypothesis . The smaller probability is, the more confident that scale is a reasonable scale. The small probability means that the point belongs to the branch with high possibility. Ideally, the existed branch should produce a series small probability in a interval . Then, the (maximum) scale of the branch should be defined as . We use the probability to check if the point belongs to the branch.

4.3 Number of Test and Number of False Alarms

In last subsection, we conclude that sufficient small probability of indicates that the point with certain direction and radius belongs to the branch more probably. The definition of sufficient probability need to be cleared. According to the Helmholtz principle, we bound the sufficient probability with the expectation of the number of occurrences of this event is less than under the a-contrario random assumption [40] with

where the denotes the number of occurrences of the point occurs along the given location and orientation. Since the location and orientation of the branch are known, expected number of false alarms should be smaller than where and are the number of rows and columns of the corresponding image. When the point rejects the hypothesis , the scale of the branch should be . The scale is called as the maximum (meaningful) scale of the branch if the scale is the maximum scale that satisfies inequality

Usually, the is defined as , which means the expected Number of False Alarm is not larger than 1.

4.4 Scale Ambiguity for Branches

Junctions are located at the intersections of line segments. Suppose there exist two junctions

where the two-tuples denotes the scale and orientation for the -th branch of the -th junction and is location of the -th junction. In the case that the junction is located at and , the scale ambiguity occurs since the line segment and the branch are co-linear. The scale of the first branch of can be regarded as either or . For example, there are two junctions and located at and respectively in the Fig. 4. The branch along the direction of for and are co-linear with the line segment marked as red. For the branch of , its scales are , and while the scales of the branch of are or . To eliminate the ambiguity, we define the scale for a branch as follow

Definition 2 (Scale of a branch)

Suppose there exist a branch starting at point in the direction , the possible salient scales are , we define the scale of this branch as

Figure 4: Scale ambiguity for branches. The junction and located at and have more than one scales respectively.

The branch with such scale is more stable and more global than other features. However, there exist some challenges to estimate such scales from images. Most existing approaches and the model proposed in Sec. 4.3 estimate the line segment or branches based on orientations of level-lines extracted from the gradient of image [16]. The line segment detected from the image in Fig. 4 could be either or since the level-line around the points have probability to aligned with the orientation of vector , which will lead to the line segment that are co-linear with the branch of across the point to or . When the viewpoint changed, illumination varied or noise increased, the orientations of level-lines around , and will be changed with uncertainty. Then, the scale cannot be estimated robust for different imaging conditions.

Fortunately, the inherent property for location of junctions is stable whatever the imaging condition is. Although the orientations of level-lines around the locations of junctions will change with uncertainty, most of them are still aligned to one of the lines that intersects the junction. Motivated by this, we use the very local isotropic-scale junctions in a small neighbor(e.g. or window size) instead of gradient field and level-lines. For a pixel in an image, we calculate the junction-ness for different orientations in a small neighbor according to ACJ [28] algorithm as


where is defined in (2) with fixed radius (eg. ), is the cardinal number of set . and are the mean and variance defined in (13). Then, we leverage the non-maximal-suppression (NMS) [41] to obtain the very local junctions and filter out branches for these junctions with non-meaningful NFA values according to (8). These very local junctions are denoted as , where the and the is the strength and corresponding NFA value for branch with orientation. In the case that pixel is on (around) an edge, there will be two that align to the orientation of this edge up to . If the pixel is around another junction, there will be multiple orientations aligned with different branches of this junction. Meanwhile, we incorporate the strength instead of the norm of (normalized) gradient with into the a-contrario model proposed in Sec. 4.3 with modified probabilistic distribution.

4.5 Modified Probabilistic Distribution

For the sake of estimating scale for a branch with definition 2, the functions and measuring the junction-ness should be changed to




where the index in Eq. (16) is the orientation that is most close to .

According to the Central Limit Theorem(CLT), the random variable follows the Gaussian distribution with mean and variance , the distribution for is


Then, the null Hypothesis discussed in Sec. 4.2 is updated to


4.6 Junction Detection

So far, the a-contrario approach for anisotropic scale estimation is derived. For an input image, isotropic junctions and local junctions for each pixel are firstly detected by ACJ [28] for initialization. The results for junctions are denoted as and local junctions at fixed small scale (usually ) for every pixels are where is the coordinate of a pixel.

We estimate the scale for branch according to the Number of False Alarm

where the probability is the updated version in Eq. (19). The scale is searched starting at until the NFA is larger than .

The accuracy for orientations of branches detected by ACJ [28] is depend on the scale which is bounded by a predefined parameter and hence noised. The scales for ASJ is more sensitive to the noise which should be refined. A branch with the most accurate orientation should have the maximum junction-ness with the scale , we optimize the objective function


to refine the orientation for and check the branch with orientation and scale is -meaningful branch.

5 ASJ Matching for Indoor Images

Since the ASJs contain rich geometric structure informations represented by the anisotropic scales, we are going to study the matching method taken full advantage of ASJs. For a pair of junction and detected from images and , the homography can be estimated by the points set that contain their locations and endpoints for branches, which can be used to compare junctions for correct correspondences. Since there exist -junctions, -junctions and -junctions in an image and the type of a junction might be different across images because of occlusion, the homography estimated from a pair of junctions might be invalid. Fortunately, whatever the type of junction is, the location can be intersected from any two of branches that are not parallel each other, which is saying that a junction with more than two branches can be decomposed two several -junctions. Without saying, the -junction with two branches that their orientation and are equal up to should be filtered out. After decomposing and filtering, the detected in an image are all -junctions.

The perspective effects are typically small on a local patch [42], which can be approximated by affine homography. We use a pair of -junctions to estimate such homographies. Suppose there are and decomposed -junctions in image and , denoted as and respectively. If a pair of junctions are matched, an affine homography would be induced once the orientations are determined. In order to derive a unique affine homography, we define the partial order for two branches and of a -junction as


Every junction need to be sorted by the order defined above. The affine homography for a pair of junction and

are estimated by using DLT (Direct Linear Transform) with their locations and endpoints for the branches. More precisely, we solve the equations


where and are the homogeneous representation of locations and two branches for and respectively. The matrix is

represents the affine transform induced by and for .

From the image pair , there can be affine homographies, denoted by , which maps the -th junction in to -th junction in . For correct correspondence , the matrix will map the image to accurate around the location of junctions while the mismatch will map the image only correct at the endpoints and locations but erroneous at other positions. For the sake of saving computational resource, we just map a patch around to in and map to in by using matrix and its inverse . Then, the distance between two features and are measured by


where the distance are the distance between two patches calculated by raw patches, SIFT descriptor or other descriptors.

Benefiting with the homographies induced by ASJ, the distance between original patches and mapped patches for correct correspondence is usually very small while larger for incorrect correspondence, we can use ratio test proposed in [10] to filter out the incorrect correspondence.

6 Experimental Analysis

This section illustrates the results and analysis for ASJ detection and matching routines with comparison to existing approaches for junction detection, junction matching, key-points matching and line segment corresponding. In our experiments, we first detect anisotropic-scaled junctions by relying on the procedures presented in Section 4, and then make the correspondence of junctions with the affine homography induced by these semi-local geometrical structures.

6.1 Stability and Control of the Number of False Detection

The a-contrario approaches detect meaningful events controlled by the threshold : it bounds the average number of false detections in an image following null hypothesis. In this subsection, we check the average number of false detections in Gaussian noise image and illustrate the results of detected ASJs with fixed threshold .

Experimentally, we generate random images with pixels which follow standard Gaussian distribution independently pixel-wised. For each pixel, we generate an orientation randomly from the uniform distribution in the interval and estimate scale at this pixel with the orientation. Ideally, there is no meaningful line-segment structure appeared in random images but might be detected mistakenly, which are counted in number of false detection averagely. If the number of false detection can be controlled by the NFA proposed a-contrario approach, the approach would be identified as correct a-contrario approach.

(a) original images
(b) repeatability rate with respect to scale changes
Figure 5: Repeatability rate with respect to scale change. Original images to generate image sequences are shown in the first row. In the second row, the repeatability is shown as a function of scale factors.
0.01 0.1 1 10 100 200
Avg. False 0.002 0.006 0.198 5.923 66.472 132.676
Table 1: Average number of false detections in

images generated by Gaussian white noises

The average number of false detections in Gaussian noise images are reported in the Tab. 1. The value of NFA are varied in our experiments from to and the corresponding average number of false detections are upper bounded by the NFA.

6.2 Comparison with ACJ

It is necessary to compare the repeatability for our proposed ASJ with ACJ since we extend the a-contrario model for scale estimation to discuss their difference. Following with the baseline experiments proposed in [28], these images are firstly zoomed with different factors to form the image sequences with scale change. Then, the ASJ and ACJ are performed on these image sequences to detect the junctions. The repeatability for ACJ is discussed in [28], however, their definition for corresponding junction just concentrates on the location and branch of junctions while ignoring the scale coherence. Therefore, we are going to define the corresponding ACJ and ASJ with scale information here. For the original image and the scaled image , the corresponding ACJ junctions should have close locations, branch orientations as well as scales. Meanwhile, two junctions with different number of branches cannot be identified as correspondence. More precisely, we define two ACJ junctions and detected in and if they follow


where the angular distance . Similar to the above, the correspondence for two junctions and detected by ASJ should satisfy the inequalities (24), (26) as well as


In this experiment, the set of scale factors is and the results are shown in Fig. 5. Observing the repeatability curve, our proposed ASJ performs better than ACJ. The repeatability rate reported in [28] is higher, however, it just demonstrate the accuracy of locations and orientation of branches. In our experiment, the scale difference are also considered here.

As reported in [28], the scale of ACJ represents the length of shortest branch and it is roughly linear through the scale factors[28]. Theoretically, if a detected ACJ has scale in original image, its correspondence in the scaled image should be close enough to . However, the upper bound of scale is required for ACJ algorithm as input and it is recommend to be set as in the range of constantly[28] for the sake of computational speed. As a matter of fact, the junctions in indoor images usually have large scale branches and they cannot be bound with a relative small constant.

To demonstrate this fact, we compare the detected junctions in Fig. 6. In this experiment, the junctions are detected by ACJ firstly in original image and scaled image with the factor firstly. Then we find the corresponding ACJ in the image pair by using the inequalities (26) and (24) while ignoring the inequality (25). For the sake of comparing the scale of junctions with respect to factor , all the correspondences are shown with colored circle. In Fig. 6, a correspondence of in image which has scale is shown with a yellow circle with the radius . The red circle and green line segments present the junction . We can find out that there exist several correspondences which do not have consistent scales. If a junction is formed by several line segments of which lengths are more than time of maximal radius threshold of ACJ, the scale of junction will be equal to the threshold in the original image. When the image is zoomed with factor , the scale will not be decreased since it is still larger than the threshold. This is the reason why the repeatability is lower when we use the inequality (25) to calculate it.

In the final of this subsection, some example results of ASJ detector for indoor images are shown in Fig. 7. The anisotropic-scale junction are shown in the middle column and the results of ACJ are listed in the right column. Observing the results, we can find that ASJ has the ability to detect more geometric structure than ACJ. The anisotropic-scale branches of a junction can depict the layout of indoor scenes. By contrast, the results of ACJ just represent the very local information. For example, there are several rectangles in the Fig. 7, our ASJ can produce the boundary of the rectangle while ACJ just detect the corner points and orientations around the corners of rectangle.

Figure 6: The scale consistency between the original image and scaled image. The yellow circles represent the scale estimated in the scaled image with scale factor while red circles represent the scales detected in original images. The scale factors are for the top row and for the bottom.
Figure 7: Some results of ASJ for the input images in the first column are shown in the middle column. The junctions detected by ACJ are shown in the right column for comparison.

6.3 ASJ Matching

Figure 8: Some example of collected indoor images used for comparison of different matching approaches.

In order to evaluate our approach, we collect more than 100 images to perform our proposed approach ASJ. Some of the collected images are from indoor 3D reconstruction dataset [30, 29] while others are taken by ourselves. As shown in Fig. 8, the collected images are less texture than natural images. Some of them contain large viewpoint changes and indistinct texture repeated regions such as Fig. 8(b), Fig. 8(i) and Fig. 8(l).

We define that two junctions are matched, only if the junction centers and orientations of branches are corresponding. In this sense, our matching result is somewhat beyond of local features and can be compared with existing approaches in different settings:

  • It is comparable to keypoint matching methods, if we regard junctions as a specific corner points with two orientations;

  • It is also comparable to line segment matching ones, if we take junctions as several intersecting line segments.

For key-points matching, we compare the results of matched junctions with SIFT [10], Affine-SIFT [43, 13], Hessian-Affine [44], EBR and IBR in [12].

Meanwhile, we compare maching accuracy with existing approaches LPI [14] and LJL [15] for matched line segments that measures the proportion of matches if their endpoints are corresponding. This rule is more strict for assessing line segment matching results. Interestingly, the approaches LPI [15] and LJL [15]

use the epipolar geometry without outliers to assist their line segment matcher, our proposed method without epipolar geometry achieves better accuracy.

The implementation for Affine-SIFT [43, 13], Hessian Affine [44], LPI [14] and LJL [15] are getting from authors’ homepage. EBR and IBR [12] are got from VGG’s website111 The version of SIFT detector is provided by VLFeat222 The descriptor used in our experiment is SIFT and the mismatches are filtered according to the ratio test with threshold for ASJ , SIFT [10], Hessian-Affine [44], EBR [12] and IBR [12] by comparing the -distance, which is the default threshold for computing matches from descriptor in VLFeat. Remarkably, the implementation of Affine-SIFT [43, 13] provided by its authors use threshold since they calculate the distance with norm and we keep it unchanged. Since the released code for Affine-SIFT [43, 13] produce the matched result with outliers filtering, we remove this procedure in all fairness, which makes the results in our experiment different from the released executable program. All of parameters for compared approaches are the default value which is provided by their authors.

MethodsImage pairs (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l)
#correct 12 26 12 16 50 14 197 65 119 37 17 33 85.17%
#total 12 29 13 20 60 15 214 69 121 45 19 40
accuracy (%) 100.00 89.66 92.31 80.00 83.33 93.33 92.06 94.20 98.35 82.22 89.47 82.50
SIFT [10]
#correct 128 476 559 115 435 74 708 199 147 200 103 65 62.69%
#total 206 700 839 222 652 287 770 261 330 299 191 161
accuracy (%) 62.14 68.00 66.63 51.80 66.72 25.78 91.95 76.25 44.55 66.89 53.93 40.37
Affine-SIFT [13]
#correct 135 183 433 364 430 119 4141 1271 136 172 196 163 82.85%
#total 141 240 519 480 549 133 4205 1326 240 263 224 264
accuracy (%) 95.74 76.25 83.43 75.83 78.32 89.47 98.48 95.85 56.67 65.40 87.50 61.74
Hessian-Affine [44]
#correct 24 13 96 82 66 17 640 226 32 114 38 29 79.68%
#total 26 34 132 105 109 18 671 248 63 144 41 49
accuracy (%) 92.31 38.24 72.73 78.10 60.55 94.44 95.38 91.13 50.79 79.17 92.68 59.18
EBR [12]
#correct 0 0 10 0 0 0 64 20 28 14 0 0 32.56%
#total 1 1 16 15 10 2 75 21 46 34 2 4
accuracy (%) 0.00 0.00 62.50 0.00 0.00 0.00 85.33 95.24 60.87 41.18 0.00 0.00
IBR [12]
#correct 0 0 28 11 14 0 46 0 0 10 0 0 31.84%
#total 4 9 39 16 25 9 63 12 10 21 8 5
accuracy (%) 0.00 0.00 71.79 68.75 56.00 0.00 73.02 0.00 0.00 47.62 0.00 0.00
Table 2: Comparison of different matching methods. The number of correct matches, number of total matches and the matching accuracy for the comparision with key-points matching results are reported in the first row. The results for key point matching approaches SIFT [10], Affine-SIFT [43, 13], Hessian-Affine [44], EBR [12] and IBR [12] are list in the 3-th row to 7-th row. The average matching accuracy for all collected images is reported in the last column.

6.3.1 Matching results for key-points matching

As shown in Tab. 2, our proposed feature ASJ is compare with most widely used feature detectors. In the sense for key-points matching, we regard an ASJ as a key-point with two specific orientations. The matching accuracy for ASJ is better than other key-points matches in most cases. Representatively, in Fig. 8(i), the indistinct repeated region in chessboard are matched very well with the accuracy since ASJs makes corner points contain more global information than other approaches, which represents the relative position with meaningful orientations in images.

Comparing with the most related approach EBR and IBR [12], our proposed approach ASJ handles straight edges in a better way which can produce more key-points and more correct correspondences. In many cases as shown in Tab. 2, the results of EBR and IBR illustrate their limitation in indoor images which are dominated by straight edges.

Figure 9: Top row: plotted correct matched ASJ in image pairs Fig. 8(a), Fig. 8(e) and Fig. 8(h). Bottom row: plotted correct matched keypoints by using Affine-SIFT [43, 13]. Although the number of correct matches for Affine-SIFT is more than ASJ, the ASJ can represent structure information for the input images while plotted key-points are confused if we do not have input image for reference.

In the aspect of absolute number of correct matches, ASJ is less than other approaches significantly. The approaches matching most number of correct matches are Affine-SIFT and SIFT. Since the junctions detected in indoor images represents the meaningful junctions in the scene, the fact that absolute number is less than SIFT key-points is not surprising. Nevertheless, ASJ represents the structure information compactly for scenes than key-points. To illustrate this, we plot the correct matched key-points and ASJs in the clean background, the structure of the scene can be represented by ASJs with their branches while plotted key-points are hard to understand without their input images. As shown in Fig. 9, the matched ASJs represents the geometric information with small number of ASJs (12 for Fig. 9 (a), 50 for Fig. 9 (b) and 65 for Fig. 9(c) while matched ASIFT key-points show confused results even though the amount of matches are much more than ASJs. Some example of match results are shown in Fig. 10.

(a) (#correct matches, #total matches) = (16, 20)
(b) (#correct matches, #total matches) = (119, 121)
Figure 10: Matched ASJs for image pairs Fig. 8 (d) and Fig. 8 (i) are shown in the sub-figures (a) and (b) respectively. The false matches are connected as yellow lines while correct matches are connected by cyan lines.

6.3.2 Matching results for line-segments matching

MethodsImage pairs (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l)
(Line segments)
#correct 15 30 14 26 85 21 349 121 232 49 27 48 71.55%
#total 24 58 26 40 120 30 428 138 242 90 38 80
accuracy (%) 62.50 51.72 53.85 65.00 70.83 70.00 81.54 87.68 95.87 54.44 71.05 60.00
LPI [14]
#correct 5 0 15 19 53 3 123 60 33 17 11 16 48.83%
#total 9 0 18 29 90 9 193 102 59 38 15 40
accuracy (%) 55.56 0.00 83.33 65.52 58.89 33.33 63.73 58.82 55.93 44.74 73.33 40.00
LJL [15]
#correct 8 24 26 37 148 4 221 113 129 50 26 22 52.95%
#total 30 79 32 64 251 17 376 186 138 131 50 102
accuracy (%) 26.67 30.38 81.25 57.81 58.96 23.53 58.78 60.75 93.48 38.17 52.00 21.57
Table 3: Comparison of different matching methods for line segment matching. The number of correct matches are counted by the rule that endpoints of corresponding line-segments are correct. We compare ASJ with state-of-the-art approaches LPI [14] and LJL [15] and report the number of correct matches, number of total matches and the matching accuracy in this table. The average matching accuracy is also compare in the last column.

We evaluate the matched line-segments with state-of-the-art approaches LPI [14] and LJL [15] for the comparison in a more strict rule that compare endpoints of corresponding line-segments instead of their line equation. For the example image pairs shown in Fig. 8, our proposed method is better than existing methods in considerable advantage for most cases. Some matched results for line segments are shown in Fig. 11 and Fig. 12. The number of correct matched line-segments are also comparable with other approaches. Besides of the matching accuracy, the result shown in Fig. 12 for our method cover the scene more complete.

Figure 11: Matched line-segments for image pair Fig. 8 (f). (#correct matches, #total matches) = (21, 30). Midpoints of matched line-segments are connect by cyan lines (if they are correct) or yellow lines (mismatches).
Figure 12: Matched line segments for image pair Fig. 8 (b). Left and mid-left: correct matched line-segments for ASJ; Right and mid-right: correct matched line-segments for LJL [15]. The result of ASJ covers the scene more complete benefiting with the anisotropic scales for branches of junctions.

Different from the approaches LPI [14] and LJL [15], our approach performs better while not using any pre-estimated geometric information. As shown the Tab. 3, we will find that key-point driven approach for line segment matching is possible to be failed because of the erroneous estimated geometric relationship. Observing the failed case reported in Tab. 3, the image pair in Fig. 12 is dominant by repeated texture and severe viewpoint change which are challenging for key-point matching. In such scenario, the induced epipolar geometry might be unreliable and therefore produce poor line segment matching results. On the other hand, because our approach performs well in junction matching, we can also use the junction correspondences to refine the line segment matching result.

7 Conclusion

In this paper, we proposed a novel junction detector ASJ which exploits the anisotropy of junctions via estimating the endpoints (length) of branches for isotropic scale junctions for indoor images which are dominanted by junctions in a more global manner. We then devised an affine invariant dissimilarity measure to match these anisotropic-scale junctions across different images. We tested our method on a collected indoor images and compared its performance with several current sate-of-the-art methods. The results demonstrated that our approach establishes new state-of-the-art performance on the indoor image dataset.


  • [1] C. Wu, “Towards linear-time incremental structure from motion,” in International Conference on 3D Vision, 2013, pp. 127–134.
  • [2] D. J. Crandall, A. Owens, N. Snavely, and D. P. Huttenlocher, “Sfm with mrfs: Discrete-continuous optimization for large-scale structure from motion,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 12, pp. 2841–2853, 2013.
  • [3] S. Fuhrmann, F. Langguth, and M. Goesele, “MVE - A multi-view reconstruction environment,” in Eurographics Workshop on Graphics and Cultural Heritage, Darmstadt, Germany, 2014, pp. 11–18.
  • [4] P. Moulon, P. Monasse, R. Marlet, and Others, “Openmvg. an open multiple view geometry library.”
  • [5] B. Wang, X. Bai, X. Wang, W. Liu, and Z. Tu, “Object recognition using junctions,” in European Conference on Computer Vision, 2010, pp. 15–28.
  • [6] A. Y. S. Chia, D. Rajan, M. K. Leung, and S. Rahardja, “Object recognition by discriminative combinations of line segments, ellipses, and appearance features,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 9, pp. 1758–1772, 2012.
  • [7] J. Yan, J. Wang, H. Zha, X. Yang, and S. M. Chu, “Multi-view point registration via alternating optimization,” in

    AAAI Conference on Artificial Intelligence

    , 2015, pp. 3834–3840.
  • [8] Y. Shen, W. Lin, J. Yan, M. Xu, J. Wu, and J. Wang, “Person re-identification with correspondence structure learning,” in IEEE International Conference on Computer Vision, 2015, pp. 3200–3208.
  • [9] K. Mikolajczyk and C. Schmid, “An affine invariant interest point detector,” in European Conference on Computer Vision, 2002, pp. 128–142.
  • [10] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.
  • [11] J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide baseline stereo from maximally stable extremal regions,” in British Machine Vision Conference, 2002, pp. 1–10.
  • [12] T. Tuytelaars and L. J. V. Gool, “Matching widely separated views based on affine invariant regions,” International Journal of Computer Vision, vol. 59, no. 1, pp. 61–85, 2004.
  • [13] G. Yu and J. Morel, “ASIFT: an algorithm for fully affine invariant comparison,” IPOL Journal, vol. 1, pp. 11–38, 2011.
  • [14] B. Fan, F. Wu, and Z. Hu, “Robust line matching through line-point invariants,” Pattern Recognition, vol. 45, no. 2, pp. 794–805, 2012.
  • [15] K. Li, J. Yao, X. Lu, L. Li, and Z. Zhang, “Hierarchical line matching based on line-junction-line structure descriptor and local homography estimation,” Neurocomputing, vol. 184, pp. 207–220, 2016.
  • [16] R. G. von Gioi, J. Jakubowicz, J. Morel, and G. Randall, “LSD: A fast line segment detector with a false detection control,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 4, pp. 722–732, 2010.
  • [17] ——, “LSD: a line segment detector,” IPOL Journal, vol. 2, pp. 35–55, 2012.
  • [18] X. Shen and P. Palmer, “Uncertainty propagation and the matching of junctions as feature groupings,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 12, pp. 1381–1395, 2000.
  • [19] Z. Wang, F. Wu, and Z. Hu, “MSLD: A robust descriptor for line matching,” Pattern Recognition, vol. 42, no. 5, pp. 941–953, 2009.
  • [20] D. Marr, “A computational investigation into the human representation and processing of visual information,” Vision, pp. 125–126, 1982.
  • [21] E. H. Adelson, “Lightness perception and lightness illusions,” New Cogn. Neurosci, vol. 339, 2000.
  • [22] C. Guo, S. C. Zhu, and Y. N. Wu, “Primal sketch: Integrating structure and texture,” Computer Vision and Image Understanding, vol. 106, no. 1, pp. 5–19, 2007.
  • [23] T. Wu, G. Xia, and S. C. Zhu, “Compositional boosting for computing hierarchical image structures,” in CVPR, 18-23 June 2007.
  • [24] M. Maire, P. Arbelaez, C. C. Fowlkes, and J. Malik, “Using contours to detect and localize junctions in natural images,” in CVPR, June 24-26 2008.
  • [25] E. D. Sinzinger, “A model-based approach to junction detection using radial energy,” Pattern Recognition, vol. 41, no. 2, pp. 494–505, 2008.
  • [26] Z. Püspöki and M. Unser, “Template-free wavelet-based detection of local symmetries,” IEEE Transactions on Image Processing, vol. 24, no. 10, pp. 3009–3018, 2015.
  • [27] Z. Püspöki, V. Uhlmann, C. Vonesch, and M. Unser, “Design of steerable wavelets to detect multifold junctions,” IEEE Transactions on Image Processing, vol. 25, no. 2, pp. 643–657, 2016.
  • [28] G. Xia, J. Delon, and Y. Gousseau, “Accurate junction detection and characterization in natural images,” International Journal of Computer Vision, vol. 106, no. 1, pp. 31–56, 2014.
  • [29] F. Srajer, A. G. Schwing, M. Pollefeys, and T. Pajdla, “Match box: Indoor image matching via box-like scene estimation,” in International Conference on 3D Vision, 2014, pp. 705–712.
  • [30] Y. Furukawa, B. Curless, S. M. Seitz, and R. Szeliski, “Reconstructing building interiors from images,” in IEEE Conference on Computer Vision and Pattern Recognition, September 27 - October 4 2009, pp. 80–87.
  • [31] W. Förstner, “A feature based correspondence algorithm for image matching,” International Archives of Photogrammetry and Remote Sensing, vol. 26, no. 3, pp. 150–166, 1986.
  • [32] C. Harris and M. Stephens, “A combined corner and edge detector,” in Alvey Vision Conference, 1988, pp. 147–151.
  • [33] K. Mikolajczyk and C. Schmid, “Scale & affine invariant interest point detectors,” International Journal of Computer Vision, vol. 60, no. 1, pp. 63–86, 2004.
  • [34] W. Förstner, T. Dickscheid, and F. Schindler, “Detecting interpretable and accurate scale-invariant keypoints,” in IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 2256–2263.
  • [35] L. Alvarez and F. Morales, “Affine morphological multiscale analysis of corners and multiple junctions,” International Journal of Computer Vision, vol. 25, no. 2, pp. 95–107, 1997.
  • [36] É. Vincent and R. Laganière, “Junction matching and fundamental matrix recovery in widely separated views,” in British Machine Vision Conference, 2004, pp. 1–10.
  • [37] B. P. D. Ruff, “A pipelined architecture for the canny edge detector,” in Alvey Vision Conference, Cambridge, UK, 1987, pp. 1–4.
  • [38] S. Ramalingam, M. Antunes, D. Snow, G. H. Lee, and S. Pillai, “Line-sweep: Cross-ratio for wide-baseline matching and 3d reconstruction,” in IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1238–1246.
  • [39] M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981.
  • [40] A. Desolneux, L. Moisan, and J.-M. Morel, From Gestalt Theory to Image Analysis: A Probabilistic Approach, 1st ed.   Springer Publishing Company, Incorporated, 2007.
  • [41] A. Neubeck and L. J. V. Gool, “Efficient non-maximum suppression,” in IEEE International Conference on Pattern Recognition, 2006, pp. 850–855.
  • [42] K. Mikolajczyk and C. Schmid, “A performance evaluation of local descriptors,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp. 1615–1630, 2005.
  • [43]

    J. Morel and G. Yu, “ASIFT: A new framework for fully affine invariant image comparison,”

    SIAM J. Imaging Sciences, vol. 2, no. 2, pp. 438–469, 2009.
  • [44] M. Perdoch, O. Chum, and J. Matas, “Efficient representation of local geometry for large scale object retrieval,” in IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 9–16.