1 Introduction
Image correspondence is a key problem for many computer vision tasks, such as structurefrommotion
[1, 2, 3, 4], object recognition [5, 6] and many others [7, 8]. The past decades have witnessed the big successes on that problem achieved by detecting and matching local visual features [9, 10, 11, 12, 13]. Although most of existing image matching algorithms relying on such local visual features perform well for images containing rich photometric information, e.g. outdoor images, they usually lose their efficiency on images that are less photometric and dominated by geometrical structures such as indoor images displayed in Fig. 1. In the indoor scenario, images are often dominated by lowtexture parts and are with severe viewpoint changes, in which case it is reported to be more effective to make the correspondence of geometrical structures [14, 15] such as line segments [16, 17] and junctions [18].The line segment matching problem has been studied in recent years since it can represent more structural information than keypoints. Many algorithms match line segments by using either photometric descriptors with individual line segments [19] or the initial geometric relation [14, 15] to assist line segment matching. The approaches using preestimated epipolar geometry usually perform better than those of using photometric descriptors [19], but the epipolar geometry estimation still needs keypoint correspondences in many situations. In the indoor scenes, due to the fact that descriptors for lowtextured regions are not distinctive enough, it is very likely to produce unstable epipolar geometry for inferring the line segment matching. It is thus of great interest to develop elegant ways to make the correspondences of geometrical structures of images while get rid of the errors raised by the keypoint correspondences, for finally achieving better matching of indoor images.
Alternatively, as a kind of basic structural visual features, junctions, have been studied as the primary importance for perception and scene understanding in recent years
[20, 21, 22]. Being a combination of points and ray segments, junctions contain richer information than line segments, i.e. including a location and at least two ray segments (known as branches). Ideally, the information contained by a pair of junctions enables us to recover the correspondences between images up to affine transformations. However, due to the difficulties in the estimation of the endpoints of junction branches, most of junction detection algorithms [23, 24, 25, 26, 27, 28] concentrate on identifying the locations and orientation of branches while ignoring their length. This actually simplifies junctions as keypoints and does not fully exploit their capabilities for image correspondences. To characterize the structure of junctions better, the detector ACJ [28] estimates scale invariant junction and it can be represented isotropically as a circle region with two or more dominant orientations. Every orientation represents a branch of junction and the radius of circle is equal to the shortest length among these branches. Although the orientation of branches is invariant with respect to viewpoint, it is not enough for estimating the affine transformation. Fortunately, if we can estimate the length of every branch, the affine transformation will be determined by a pair of junction correspondence.Motivated by this, we are going to study for exploiting the invariance of junctions through estimating scale (length) of branches. For indoor images, the inherent scale for junctions usually are the length of some (straightforward) boundary for salient objects in images, which contains rich structure information and beyond local features. More precisely, we proposed an acontrario approach that models the endpoints of a ray segment starting at given location with initial orientations, which check the proposed point if it should be a part of the ray segment according to numberoffalsealarms (NFA). When the points that belong to the ray segment occurs continuously until the continuity broken, the inherent scale for the ray segment is determined. In reality, the initial orientations are noised, we also optimize them with the junctionness based on the acontrario theory. Once the anisotropic scale is estimated for each branch (ray segment), the local homography can be estimated from any pair of junctions extracting from two different images. Theoretically, the correct correspondence produce reasonable local affine homography while incorrect correspondences generate local homographies in their own way. Considering the certainty of junction locations, the regions around location can be mapped by correct or incorrect affine homography. Correct homographies will map one image to another with minimal patch distortion. Comparing the regions with induced affine homography for a pair of junctions can check if the pair are correspondence. When the corresponding junctions are identified in image pairs, the results will produce more structure information. Our contributions in this paper are

We extent the junction detector in [28] to anisotropicscale geometrical structures, which can better depict the geometric aspect of indoor images.

We developed an efficient scheme for making the correspondence of anisotropicscale junctions. More precisely, as a detected anisotropicscale junction provides at least three points, each pair of junctions in images can induce an affine homography. We finally present a strategy by induced homographies to generate accurate and reliable correspondences for the location and anisotropic branches of junctions simultaneously.
The rest of this paper is organized as follows. First, the existing research related to our work is given in Sec. 2. In Sec. 3, the problem of detecting and matching junctions for indoor scene is discussed. Next to this section, an acontrario approach for detecting anisotropicscale junction is described. As for the junction matching, we design a dissimilarity in Sec. 5 to find the correspondences. The experimental results and analysis for our approach are given in Sec .6. Finally, we conclude our paper in Sec. 7.
2 Related works
In this section, we briefly review the existing approaches for junction detection and matching as well as geometrical structure matching for indoor images.
2.1 Junction detection
Detecting junction structure in images has been studied for years[31, 32, 33, 24, 34, 26, 27]. In the early stage, junction was studied as corner points [31, 32]. For the sake of recognition, the scale of junctions or other keypoints also have been studied [10, 33, 28]. These approaches estimate the scale around junction locations by using scale space theories [35, 10, 9, 33] to handle the viewpoint changes across different images. Since these approaches determine the scale of interested points in very local area, their precision and discriminability will be lost quickly. Besides, these methods mainly focus on the localizations and scales of corner points while ignoring the differences between different type of junctions.
To overcome these shortcomings, the ACJ detector [28] was proposed to detect and characterize junctions with nonlinear scale space. In this work, an acontrario approach is proposed for determining the location and branches of junctions with interpretable isotropic scales, which characterizes the ray segments as junction branches and locations explicitly. The scales for detected junctions correspond to the optimal size at which one can observe the junction in the image.
Similar to junction detection, there is an elegant detector named edge based region (EBR) detector [12] for detecting affine invariant regions by estimating relative speed for two points that move away from a corner in both directions along the curve edges. This work can be regarded as a kind of junction detector in curve dominated images. The straight edges which are common in indoor scenario cannot be tackled in this way.
Although above mentioned approaches can extract junctions, their geometric representation is not exploited sufficiently. The scales estimated by these methods are local and insufficient for characterizing indoor scenes.
2.2 Junction matching
Junction matching has been attended since early years and shown promising matching accuracy[18, 36].
In [18], a model for estimating endpoints of junction branches is proposed which is very close to our work that estimating anisotropic scales for each branch. Differently, their approach [18] requires a roughly estimated fundamental matrix while our proposed method estimating anisotropic scales for each branch directly without fundamental matrix. For known fundamental matrix, the local homography between a pair of junctions can be estimated to produce more accurate results and refining epipolar geometry meanwhile [36]. These results are very related to recent approach for the hierarchical line segment matching approach LJL [15]. In this work, detected line segments are used to generate junctions with virtual intersections in the first stage. After that, junctions are regarded as keypoints for matching initially. Finally the epipolar geometry induced from initial matching is used to estimate line segment correspondences with local junctions. The matching accuracy in fact relies on the descriptors of virtual intersections. Although their matching results are promising, the problem of estimating epipolar geometry need to other ways.
2.3 Indoor image matching with geometric structure
Most of indoor scenes can be described by using simple geometric elements such as points and line segments. As a combination of points and lines, junction is also a sort of useful geometrical structure for indoor scene. There has been many approaches such as Canny edge detector [37] and line segment detector (LSD) [16, 17] to extract line segments. LSD, which can produce more complete line segments than canny edge without any parameter tuning procedure, has been applied in many tasks such as linesegments matching [15] and 3D reconstruction [38]. Compared with keypoints, linesegments can produce more complete result that contain the primary sketch for the scene.
Most of algorithms for linesegments matching rely on keypoint correspondences. More precisely, keypoints for an input image pair are firstly detected by using SIFT [10] or other detectors while estimating the epipolar geometry between the image pair by using RANSAC [39] and its variants. Based on the fundamental matrix induced by keypoints matching, many approaches such as linepointinvariant (LPI) [14] and linejunctionline (LJL) [15] can match line segments correctly. LPI has ability to handle the relation between linesegments and matched keypoints with viewpoint changes. LJL[15] method matches image pairs in multiple stages. In the first, detected linesegments are intersected with appropriate threshold to produce junctions and matching these intersections in the same way with keypoints matching. Then, local homography are estimated for these junctions with the estimated fundamental matrices from keypoints matching results. Although these approaches produce good performance in many cases, the matching results are in favor of matching lines instead of linesegments. Their results show that lines are matched while the endpoints of line segments are not matched very well. Except for the reason that the estimated epipolar geometry is sometime erroneous, there is a important reason for failure of linesegments matching that line segment detectors can not guarantee that the linesegments are consistent across imaging condition varying. In many situations, a line segment detected in image might be decomposed to two or more collinear line segments in another image . In this case, the results of line matching can be regarded as correct if the line in is corresponding with . However, in the aspect of linesegments matching, there exists no correct corresponding linesegments for in image . On the other hand, the existing line segment matchers rely on the results of keypoint matching. Once the keypoints matching failed or inaccurate, the induced result of line segment matching will be affected in some extent.
3 Problem Statement
3.1 Junction Model
The early researches for junction detection usually focus on the orientations of branches and the locations while ignoring the length or scale of each branch. Even though the junction locations and orientation of branches are important to depict geometric structure for images, lacking scale of branches limits their performance for image matching. Motivated by this, we want to propose a new junction model for characterizing junction better. We define the our junction model by considering the endpoint of each branch. Since the length of every branch is possible to be different, we call our model as anisotropicscale junction. As a special case, junction model with isometric branches is called isotropicscale junction.
Definition 1 (Anisotropicscale junction)
An anisotrpicscale junction with branches starting at the same location is denote as
(1) 
where and are the scale and orientation for th branch, is the number of branches.
Fig. 2 provides an example for the difference between anisotropicscale (left) and isotropicscale (right) junctions is shown. The isotropicscale junction is actually a special case for the anisotropic model when the length of all branches are identical.
3.2 Detecting Junction Locations and Isotropic Branches
Since junction is formed by several intersected line segments, the problem of localizing the intersection and identifying the normal angle of these line segments is easier to be focused. Once the isotropic junction model is defined, this problem becomes a template matching problem. Based on this idea, Xia et al. exploited the junctionness for branches with given scale and orientations and then an acontrario approach is derived to determine meaningful junctions for input images [28]. The junctionness for given scale and orientation actually contains the neighbor information of normalized gradient. Different from points, the neighborhood for a given scale and is a sector. As shown in the left of Fig. 2, the dark area with represent the sector neighbor of the branch. The sector neighbor for given location , scale and orientation can be denoted mathematically as
(2) 
where the is defined as with some predefined parameter , is the domain of input image, is the distance along the unit circle, defined as and
is the angle of the vector
in .Since a junction is formed by edges and corner points, the normal angle for gradient should be consistent with the orientation of branches. Followed with this idea, if most of points have close normal angles with orientation , the corresponding scale and orientation should be meaningful to be a branch of the junction. For a given sector , the junctionness can be measured by
(3) 
and is the pairwise junctionness with
(4) 
where the is the norm of normalized gradient at point , for pixel is defined as , are the partial derivative of input image in and direction.
For the isotropic scale junction with two or more branches, the minimal junctionness for one of the branches is used to describe the junctionness for the entire junction with the equation (5)
(5) 
where the number and represent the total number of branches and branch index for the junction .
3.3 Analysis for Estimating Anisotropicscale Branches
Although the equation (5) measures junctionness for a given junction, it does not contain any anisotropic scale for branches. Such definition of junctionness only keeps the information that each branch’s scale is larger than and it cannot be used for handling more sophisticated transformations such as affine transform and projective transform. To overcome this problem, we define the anisotropicscale junctions with independent scales in Def. 1. The difference between isotropicscale junction and the anisotropicscale one can be observed in Fig. 2. It is easy to see that the junctionness for entire junction defined in Eq. (5) cannot be used to exploit independent scales . Fortunately, the isotropicscale junctions detected by ACJ [28] is meaningful and the problem of estimating scale and orientation can be simplified to estimating only scale with given location and orientation . In other words, for the detected isotropic junctions, we need to exploit a robust method to estimate the length of corresponding ray segment with specific orientation .
One plausible way to model the unknown scale with respect to given location and orientation is that simply modify the junctionness defined in Eq (3) to with specific . Then, the acontrario approach in [28] seems to be feasible to check whether the scale is
meaningful. The corresponding cumulative distribution function (CDF) used to get
meaningful scale can be formulate to(6) 
where the
represents the distribution of random variable
with(7) 
is the number of pixels in corresponding sector neighbor and the operator
produces the convolutional probability density function (PDF) with
times, which actually represents the random variable of . The meaningful scale for given orientation and location can be determined by the inequality(8) 
where is the number of test for junctions with branch.
However, the NFA defined in Eq. (8) has to face the fact that there exist several junctions in indoor images which have extremely large scale branches. This fact would lead to the above inequality disabled. To illustrate this problem, we studied the relationship between convolution times with the minimal junctionness that can make the probability . As shown in Fig. 3, if the value of is greater than , the probability of will be equal to constantly, which may cause the inequality degenerated to . In fact, the pairwise junctionness defined in Eq (4) can reach to and then the will be equal to . Therefore, the junctionness in [28] is infeasible to model the unknown scale.
4 An acontrario model for anisotropicscale junction detection
To solve the problems addressed in Sec. 3, we derive a differential junctionness model for depicting scale with given location and orientation. Since the scale for each branch of junction is irrelevant, we just model the endpoint of each branch independently.
4.1 Differential Junctionness Model
Suppose the isotropic junctions have been detected in a small scale , the inherent scales of branches will be greater than . If we increase the scale to larger , though the junctionness is still larger, the error will not be increased significant. A reasonable way to recognize the unsignificant variation is to study the variation of with respect to increased. Here, we first reformulate the junctionness for a branch (3) in continuous form. The junctionness for position , scale and orientation is
(9) 
where the is the angle width for given scale, here, we select . The descrete partial derivative is given by
(10) 
where is the th sample angle in the range and is the th sample point in the range .
4.2 Null Hypothesis and Distribution
After the differential junctionness model built, we need to find a robust way to check if the value of for specific is significant enough. One way to achieve this goal is developing an acontrario approach to control the threshold automatically. Since our work is an extension of ACJ [28]
, the null hypothesis here should be same, we say the variables
and follow the null hypothesis if
, follows a Rayleigh distribution with parameter 1;

All of the random variables are independent each other.
According to the dicussion in [28], every follows the distribution (7) independently. The random variable follows the distribution of the random variable
(11) 
where the random variable follow the distribution in equation (7) , is the number of sampling points for and is the number of sampling points for . The function will be very small for reasonable (for example, induced ) since the parameter should have small values. Hence, the random variable could be approximated with for computational simplicity. In practice, is larger than 10 and therefore the PDF of
can be apprixmated accurately by using the Central Limit Theorem as
(12) 
where and
are the expectation and variance of (
7). The PDF of is(13) 
which is the Gaussian distribution with mean
and variance . Meanwhile, the random variable follows . Therefore, the random variable follows the distribution approximately.The probability for given and follows the distribution
(14) 
describes the fact that scale cannot be increased with a sufficient small incremental at along orientation under the hypothesis . The smaller probability is, the more confident that scale is a reasonable scale. The small probability means that the point belongs to the branch with high possibility. Ideally, the existed branch should produce a series small probability in a interval . Then, the (maximum) scale of the branch should be defined as . We use the probability to check if the point belongs to the branch.
4.3 Number of Test and Number of False Alarms
In last subsection, we conclude that sufficient small probability of indicates that the point with certain direction and radius belongs to the branch more probably. The definition of sufficient probability need to be cleared. According to the Helmholtz principle, we bound the sufficient probability with the expectation of the number of occurrences of this event is less than under the acontrario random assumption [40] with
where the denotes the number of occurrences of the point occurs along the given location and orientation. Since the location and orientation of the branch are known, expected number of false alarms should be smaller than where and are the number of rows and columns of the corresponding image. When the point rejects the hypothesis , the scale of the branch should be . The scale is called as the maximum (meaningful) scale of the branch if the scale is the maximum scale that satisfies inequality
Usually, the is defined as , which means the expected Number of False Alarm is not larger than 1.
4.4 Scale Ambiguity for Branches
Junctions are located at the intersections of line segments. Suppose there exist two junctions
where the twotuples denotes the scale and orientation for the th branch of the th junction and is location of the th junction. In the case that the junction is located at and , the scale ambiguity occurs since the line segment and the branch are colinear. The scale of the first branch of can be regarded as either or . For example, there are two junctions and located at and respectively in the Fig. 4. The branch along the direction of for and are colinear with the line segment marked as red. For the branch of , its scales are , and while the scales of the branch of are or . To eliminate the ambiguity, we define the scale for a branch as follow
Definition 2 (Scale of a branch)
Suppose there exist a branch starting at point in the direction , the possible salient scales are , we define the scale of this branch as
The branch with such scale is more stable and more global than other features. However, there exist some challenges to estimate such scales from images. Most existing approaches and the model proposed in Sec. 4.3 estimate the line segment or branches based on orientations of levellines extracted from the gradient of image [16]. The line segment detected from the image in Fig. 4 could be either or since the levelline around the points have probability to aligned with the orientation of vector , which will lead to the line segment that are colinear with the branch of across the point to or . When the viewpoint changed, illumination varied or noise increased, the orientations of levellines around , and will be changed with uncertainty. Then, the scale cannot be estimated robust for different imaging conditions.
Fortunately, the inherent property for location of junctions is stable whatever the imaging condition is. Although the orientations of levellines around the locations of junctions will change with uncertainty, most of them are still aligned to one of the lines that intersects the junction. Motivated by this, we use the very local isotropicscale junctions in a small neighbor(e.g. or window size) instead of gradient field and levellines. For a pixel in an image, we calculate the junctionness for different orientations in a small neighbor according to ACJ [28] algorithm as
(15) 
where is defined in (2) with fixed radius (eg. ), is the cardinal number of set . and are the mean and variance defined in (13). Then, we leverage the nonmaximalsuppression (NMS) [41] to obtain the very local junctions and filter out branches for these junctions with nonmeaningful NFA values according to (8). These very local junctions are denoted as , where the and the is the strength and corresponding NFA value for branch with orientation. In the case that pixel is on (around) an edge, there will be two that align to the orientation of this edge up to . If the pixel is around another junction, there will be multiple orientations aligned with different branches of this junction. Meanwhile, we incorporate the strength instead of the norm of (normalized) gradient with into the acontrario model proposed in Sec. 4.3 with modified probabilistic distribution.
4.5 Modified Probabilistic Distribution
For the sake of estimating scale for a branch with definition 2, the functions and measuring the junctionness should be changed to
(16) 
and
(17) 
where the index in Eq. (16) is the orientation that is most close to .
According to the Central Limit Theorem(CLT), the random variable follows the Gaussian distribution with mean and variance , the distribution for is
(18) 
Then, the null Hypothesis discussed in Sec. 4.2 is updated to
(19) 
4.6 Junction Detection
So far, the acontrario approach for anisotropic scale estimation is derived. For an input image, isotropic junctions and local junctions for each pixel are firstly detected by ACJ [28] for initialization. The results for junctions are denoted as and local junctions at fixed small scale (usually ) for every pixels are where is the coordinate of a pixel.
We estimate the scale for branch according to the Number of False Alarm
where the probability is the updated version in Eq. (19). The scale is searched starting at until the NFA is larger than .
The accuracy for orientations of branches detected by ACJ [28] is depend on the scale which is bounded by a predefined parameter and hence noised. The scales for ASJ is more sensitive to the noise which should be refined. A branch with the most accurate orientation should have the maximum junctionness with the scale , we optimize the objective function
(20) 
to refine the orientation for and check the branch with orientation and scale is meaningful branch.
5 ASJ Matching for Indoor Images
Since the ASJs contain rich geometric structure informations represented by the anisotropic scales, we are going to study the matching method taken full advantage of ASJs. For a pair of junction and detected from images and , the homography can be estimated by the points set that contain their locations and endpoints for branches, which can be used to compare junctions for correct correspondences. Since there exist junctions, junctions and junctions in an image and the type of a junction might be different across images because of occlusion, the homography estimated from a pair of junctions might be invalid. Fortunately, whatever the type of junction is, the location can be intersected from any two of branches that are not parallel each other, which is saying that a junction with more than two branches can be decomposed two several junctions. Without saying, the junction with two branches that their orientation and are equal up to should be filtered out. After decomposing and filtering, the detected in an image are all junctions.
The perspective effects are typically small on a local patch [42], which can be approximated by affine homography. We use a pair of junctions to estimate such homographies. Suppose there are and decomposed junctions in image and , denoted as and respectively. If a pair of junctions are matched, an affine homography would be induced once the orientations are determined. In order to derive a unique affine homography, we define the partial order for two branches and of a junction as
(21) 
Every junction need to be sorted by the order defined above. The affine homography for a pair of junction and
are estimated by using DLT (Direct Linear Transform) with their locations and endpoints for the branches. More precisely, we solve the equations
(22) 
where and are the homogeneous representation of locations and two branches for and respectively. The matrix is
represents the affine transform induced by and for .
From the image pair , there can be affine homographies, denoted by , which maps the th junction in to th junction in . For correct correspondence , the matrix will map the image to accurate around the location of junctions while the mismatch will map the image only correct at the endpoints and locations but erroneous at other positions. For the sake of saving computational resource, we just map a patch around to in and map to in by using matrix and its inverse . Then, the distance between two features and are measured by
(23) 
where the distance are the distance between two patches calculated by raw patches, SIFT descriptor or other descriptors.
Benefiting with the homographies induced by ASJ, the distance between original patches and mapped patches for correct correspondence is usually very small while larger for incorrect correspondence, we can use ratio test proposed in [10] to filter out the incorrect correspondence.
6 Experimental Analysis
This section illustrates the results and analysis for ASJ detection and matching routines with comparison to existing approaches for junction detection, junction matching, keypoints matching and line segment corresponding. In our experiments, we first detect anisotropicscaled junctions by relying on the procedures presented in Section 4, and then make the correspondence of junctions with the affine homography induced by these semilocal geometrical structures.
6.1 Stability and Control of the Number of False Detection
The acontrario approaches detect meaningful events controlled by the threshold : it bounds the average number of false detections in an image following null hypothesis. In this subsection, we check the average number of false detections in Gaussian noise image and illustrate the results of detected ASJs with fixed threshold .
Experimentally, we generate random images with pixels which follow standard Gaussian distribution independently pixelwised. For each pixel, we generate an orientation randomly from the uniform distribution in the interval and estimate scale at this pixel with the orientation. Ideally, there is no meaningful linesegment structure appeared in random images but might be detected mistakenly, which are counted in number of false detection averagely. If the number of false detection can be controlled by the NFA proposed acontrario approach, the approach would be identified as correct acontrario approach.


0.01  0.1  1  10  100  200  

Avg. False  0.002  0.006  0.198  5.923  66.472  132.676 
images generated by Gaussian white noises
The average number of false detections in Gaussian noise images are reported in the Tab. 1. The value of NFA are varied in our experiments from to and the corresponding average number of false detections are upper bounded by the NFA.
6.2 Comparison with ACJ
It is necessary to compare the repeatability for our proposed ASJ with ACJ since we extend the acontrario model for scale estimation to discuss their difference. Following with the baseline experiments proposed in [28], these images are firstly zoomed with different factors to form the image sequences with scale change. Then, the ASJ and ACJ are performed on these image sequences to detect the junctions. The repeatability for ACJ is discussed in [28], however, their definition for corresponding junction just concentrates on the location and branch of junctions while ignoring the scale coherence. Therefore, we are going to define the corresponding ACJ and ASJ with scale information here. For the original image and the scaled image , the corresponding ACJ junctions should have close locations, branch orientations as well as scales. Meanwhile, two junctions with different number of branches cannot be identified as correspondence. More precisely, we define two ACJ junctions and detected in and if they follow
(24) 
(25) 
(26) 
where the angular distance . Similar to the above, the correspondence for two junctions and detected by ASJ should satisfy the inequalities (24), (26) as well as
(27) 
In this experiment, the set of scale factors is and the results are shown in Fig. 5. Observing the repeatability curve, our proposed ASJ performs better than ACJ. The repeatability rate reported in [28] is higher, however, it just demonstrate the accuracy of locations and orientation of branches. In our experiment, the scale difference are also considered here.
As reported in [28], the scale of ACJ represents the length of shortest branch and it is roughly linear through the scale factors[28]. Theoretically, if a detected ACJ has scale in original image, its correspondence in the scaled image should be close enough to . However, the upper bound of scale is required for ACJ algorithm as input and it is recommend to be set as in the range of constantly[28] for the sake of computational speed. As a matter of fact, the junctions in indoor images usually have large scale branches and they cannot be bound with a relative small constant.
To demonstrate this fact, we compare the detected junctions in Fig. 6. In this experiment, the junctions are detected by ACJ firstly in original image and scaled image with the factor firstly. Then we find the corresponding ACJ in the image pair by using the inequalities (26) and (24) while ignoring the inequality (25). For the sake of comparing the scale of junctions with respect to factor , all the correspondences are shown with colored circle. In Fig. 6, a correspondence of in image which has scale is shown with a yellow circle with the radius . The red circle and green line segments present the junction . We can find out that there exist several correspondences which do not have consistent scales. If a junction is formed by several line segments of which lengths are more than time of maximal radius threshold of ACJ, the scale of junction will be equal to the threshold in the original image. When the image is zoomed with factor , the scale will not be decreased since it is still larger than the threshold. This is the reason why the repeatability is lower when we use the inequality (25) to calculate it.
In the final of this subsection, some example results of ASJ detector for indoor images are shown in Fig. 7. The anisotropicscale junction are shown in the middle column and the results of ACJ are listed in the right column. Observing the results, we can find that ASJ has the ability to detect more geometric structure than ACJ. The anisotropicscale branches of a junction can depict the layout of indoor scenes. By contrast, the results of ACJ just represent the very local information. For example, there are several rectangles in the Fig. 7, our ASJ can produce the boundary of the rectangle while ACJ just detect the corner points and orientations around the corners of rectangle.
6.3 ASJ Matching
In order to evaluate our approach, we collect more than 100 images to perform our proposed approach ASJ. Some of the collected images are from indoor 3D reconstruction dataset [30, 29] while others are taken by ourselves. As shown in Fig. 8, the collected images are less texture than natural images. Some of them contain large viewpoint changes and indistinct texture repeated regions such as Fig. 8(b), Fig. 8(i) and Fig. 8(l).
We define that two junctions are matched, only if the junction centers and orientations of branches are corresponding. In this sense, our matching result is somewhat beyond of local features and can be compared with existing approaches in different settings:

It is comparable to keypoint matching methods, if we regard junctions as a specific corner points with two orientations;

It is also comparable to line segment matching ones, if we take junctions as several intersecting line segments.
For keypoints matching, we compare the results of matched junctions with SIFT [10], AffineSIFT [43, 13], HessianAffine [44], EBR and IBR in [12].
Meanwhile, we compare maching accuracy with existing approaches LPI [14] and LJL [15] for matched line segments that measures the proportion of matches if their endpoints are corresponding. This rule is more strict for assessing line segment matching results. Interestingly, the approaches LPI [15] and LJL [15]
use the epipolar geometry without outliers to assist their line segment matcher, our proposed method without epipolar geometry achieves better accuracy.
The implementation for AffineSIFT [43, 13], Hessian Affine [44], LPI [14] and LJL [15] are getting from authors’ homepage. EBR and IBR [12] are got from VGG’s website^{1}^{1}1http://www.robots.ox.ac.uk/~vgg/research/affine/descriptors.html#binaries. The version of SIFT detector is provided by VLFeat^{2}^{2}2http://www.vlfeat.org. The descriptor used in our experiment is SIFT and the mismatches are filtered according to the ratio test with threshold for ASJ , SIFT [10], HessianAffine [44], EBR [12] and IBR [12] by comparing the distance, which is the default threshold for computing matches from descriptor in VLFeat. Remarkably, the implementation of AffineSIFT [43, 13] provided by its authors use threshold since they calculate the distance with norm and we keep it unchanged. Since the released code for AffineSIFT [43, 13] produce the matched result with outliers filtering, we remove this procedure in all fairness, which makes the results in our experiment different from the released executable program. All of parameters for compared approaches are the default value which is provided by their authors.
MethodsImage pairs  (a)  (b)  (c)  (d)  (e)  (f)  (g)  (h)  (i)  (j)  (k)  (l) 



#correct  12  26  12  16  50  14  197  65  119  37  17  33  85.17%  
#total  12  29  13  20  60  15  214  69  121  45  19  40  
accuracy (%)  100.00  89.66  92.31  80.00  83.33  93.33  92.06  94.20  98.35  82.22  89.47  82.50  

#correct  128  476  559  115  435  74  708  199  147  200  103  65  62.69%  
#total  206  700  839  222  652  287  770  261  330  299  191  161  
accuracy (%)  62.14  68.00  66.63  51.80  66.72  25.78  91.95  76.25  44.55  66.89  53.93  40.37  

#correct  135  183  433  364  430  119  4141  1271  136  172  196  163  82.85%  
#total  141  240  519  480  549  133  4205  1326  240  263  224  264  
accuracy (%)  95.74  76.25  83.43  75.83  78.32  89.47  98.48  95.85  56.67  65.40  87.50  61.74  

#correct  24  13  96  82  66  17  640  226  32  114  38  29  79.68%  
#total  26  34  132  105  109  18  671  248  63  144  41  49  
accuracy (%)  92.31  38.24  72.73  78.10  60.55  94.44  95.38  91.13  50.79  79.17  92.68  59.18  

#correct  0  0  10  0  0  0  64  20  28  14  0  0  32.56%  
#total  1  1  16  15  10  2  75  21  46  34  2  4  
accuracy (%)  0.00  0.00  62.50  0.00  0.00  0.00  85.33  95.24  60.87  41.18  0.00  0.00  

#correct  0  0  28  11  14  0  46  0  0  10  0  0  31.84%  
#total  4  9  39  16  25  9  63  12  10  21  8  5  
accuracy (%)  0.00  0.00  71.79  68.75  56.00  0.00  73.02  0.00  0.00  47.62  0.00  0.00 
6.3.1 Matching results for keypoints matching
As shown in Tab. 2, our proposed feature ASJ is compare with most widely used feature detectors. In the sense for keypoints matching, we regard an ASJ as a keypoint with two specific orientations. The matching accuracy for ASJ is better than other keypoints matches in most cases. Representatively, in Fig. 8(i), the indistinct repeated region in chessboard are matched very well with the accuracy since ASJs makes corner points contain more global information than other approaches, which represents the relative position with meaningful orientations in images.
Comparing with the most related approach EBR and IBR [12], our proposed approach ASJ handles straight edges in a better way which can produce more keypoints and more correct correspondences. In many cases as shown in Tab. 2, the results of EBR and IBR illustrate their limitation in indoor images which are dominated by straight edges.
In the aspect of absolute number of correct matches, ASJ is less than other approaches significantly. The approaches matching most number of correct matches are AffineSIFT and SIFT. Since the junctions detected in indoor images represents the meaningful junctions in the scene, the fact that absolute number is less than SIFT keypoints is not surprising. Nevertheless, ASJ represents the structure information compactly for scenes than keypoints. To illustrate this, we plot the correct matched keypoints and ASJs in the clean background, the structure of the scene can be represented by ASJs with their branches while plotted keypoints are hard to understand without their input images. As shown in Fig. 9, the matched ASJs represents the geometric information with small number of ASJs (12 for Fig. 9 (a), 50 for Fig. 9 (b) and 65 for Fig. 9(c) while matched ASIFT keypoints show confused results even though the amount of matches are much more than ASJs. Some example of match results are shown in Fig. 10.
6.3.2 Matching results for linesegments matching
MethodsImage pairs  (a)  (b)  (c)  (d)  (e)  (f)  (g)  (h)  (i)  (j)  (k)  (l) 



#correct  15  30  14  26  85  21  349  121  232  49  27  48  71.55%  
#total  24  58  26  40  120  30  428  138  242  90  38  80  
accuracy (%)  62.50  51.72  53.85  65.00  70.83  70.00  81.54  87.68  95.87  54.44  71.05  60.00  

#correct  5  0  15  19  53  3  123  60  33  17  11  16  48.83%  
#total  9  0  18  29  90  9  193  102  59  38  15  40  
accuracy (%)  55.56  0.00  83.33  65.52  58.89  33.33  63.73  58.82  55.93  44.74  73.33  40.00  

#correct  8  24  26  37  148  4  221  113  129  50  26  22  52.95%  
#total  30  79  32  64  251  17  376  186  138  131  50  102  
accuracy (%)  26.67  30.38  81.25  57.81  58.96  23.53  58.78  60.75  93.48  38.17  52.00  21.57 
We evaluate the matched linesegments with stateoftheart approaches LPI [14] and LJL [15] for the comparison in a more strict rule that compare endpoints of corresponding linesegments instead of their line equation. For the example image pairs shown in Fig. 8, our proposed method is better than existing methods in considerable advantage for most cases. Some matched results for line segments are shown in Fig. 11 and Fig. 12. The number of correct matched linesegments are also comparable with other approaches. Besides of the matching accuracy, the result shown in Fig. 12 for our method cover the scene more complete.
Different from the approaches LPI [14] and LJL [15], our approach performs better while not using any preestimated geometric information. As shown the Tab. 3, we will find that keypoint driven approach for line segment matching is possible to be failed because of the erroneous estimated geometric relationship. Observing the failed case reported in Tab. 3, the image pair in Fig. 12 is dominant by repeated texture and severe viewpoint change which are challenging for keypoint matching. In such scenario, the induced epipolar geometry might be unreliable and therefore produce poor line segment matching results. On the other hand, because our approach performs well in junction matching, we can also use the junction correspondences to refine the line segment matching result.
7 Conclusion
In this paper, we proposed a novel junction detector ASJ which exploits the anisotropy of junctions via estimating the endpoints (length) of branches for isotropic scale junctions for indoor images which are dominanted by junctions in a more global manner. We then devised an affine invariant dissimilarity measure to match these anisotropicscale junctions across different images. We tested our method on a collected indoor images and compared its performance with several current sateoftheart methods. The results demonstrated that our approach establishes new stateoftheart performance on the indoor image dataset.
References
 [1] C. Wu, “Towards lineartime incremental structure from motion,” in International Conference on 3D Vision, 2013, pp. 127–134.
 [2] D. J. Crandall, A. Owens, N. Snavely, and D. P. Huttenlocher, “Sfm with mrfs: Discretecontinuous optimization for largescale structure from motion,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 12, pp. 2841–2853, 2013.
 [3] S. Fuhrmann, F. Langguth, and M. Goesele, “MVE  A multiview reconstruction environment,” in Eurographics Workshop on Graphics and Cultural Heritage, Darmstadt, Germany, 2014, pp. 11–18.
 [4] P. Moulon, P. Monasse, R. Marlet, and Others, “Openmvg. an open multiple view geometry library.” https://github.com/openMVG/openMVG.
 [5] B. Wang, X. Bai, X. Wang, W. Liu, and Z. Tu, “Object recognition using junctions,” in European Conference on Computer Vision, 2010, pp. 15–28.
 [6] A. Y. S. Chia, D. Rajan, M. K. Leung, and S. Rahardja, “Object recognition by discriminative combinations of line segments, ellipses, and appearance features,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 9, pp. 1758–1772, 2012.

[7]
J. Yan, J. Wang, H. Zha, X. Yang, and S. M. Chu, “Multiview point
registration via alternating optimization,” in
AAAI Conference on Artificial Intelligence
, 2015, pp. 3834–3840.  [8] Y. Shen, W. Lin, J. Yan, M. Xu, J. Wu, and J. Wang, “Person reidentification with correspondence structure learning,” in IEEE International Conference on Computer Vision, 2015, pp. 3200–3208.
 [9] K. Mikolajczyk and C. Schmid, “An affine invariant interest point detector,” in European Conference on Computer Vision, 2002, pp. 128–142.
 [10] D. G. Lowe, “Distinctive image features from scaleinvariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.
 [11] J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide baseline stereo from maximally stable extremal regions,” in British Machine Vision Conference, 2002, pp. 1–10.
 [12] T. Tuytelaars and L. J. V. Gool, “Matching widely separated views based on affine invariant regions,” International Journal of Computer Vision, vol. 59, no. 1, pp. 61–85, 2004.
 [13] G. Yu and J. Morel, “ASIFT: an algorithm for fully affine invariant comparison,” IPOL Journal, vol. 1, pp. 11–38, 2011.
 [14] B. Fan, F. Wu, and Z. Hu, “Robust line matching through linepoint invariants,” Pattern Recognition, vol. 45, no. 2, pp. 794–805, 2012.
 [15] K. Li, J. Yao, X. Lu, L. Li, and Z. Zhang, “Hierarchical line matching based on linejunctionline structure descriptor and local homography estimation,” Neurocomputing, vol. 184, pp. 207–220, 2016.
 [16] R. G. von Gioi, J. Jakubowicz, J. Morel, and G. Randall, “LSD: A fast line segment detector with a false detection control,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 4, pp. 722–732, 2010.
 [17] ——, “LSD: a line segment detector,” IPOL Journal, vol. 2, pp. 35–55, 2012.
 [18] X. Shen and P. Palmer, “Uncertainty propagation and the matching of junctions as feature groupings,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 12, pp. 1381–1395, 2000.
 [19] Z. Wang, F. Wu, and Z. Hu, “MSLD: A robust descriptor for line matching,” Pattern Recognition, vol. 42, no. 5, pp. 941–953, 2009.
 [20] D. Marr, “A computational investigation into the human representation and processing of visual information,” Vision, pp. 125–126, 1982.
 [21] E. H. Adelson, “Lightness perception and lightness illusions,” New Cogn. Neurosci, vol. 339, 2000.
 [22] C. Guo, S. C. Zhu, and Y. N. Wu, “Primal sketch: Integrating structure and texture,” Computer Vision and Image Understanding, vol. 106, no. 1, pp. 5–19, 2007.
 [23] T. Wu, G. Xia, and S. C. Zhu, “Compositional boosting for computing hierarchical image structures,” in CVPR, 1823 June 2007.
 [24] M. Maire, P. Arbelaez, C. C. Fowlkes, and J. Malik, “Using contours to detect and localize junctions in natural images,” in CVPR, June 2426 2008.
 [25] E. D. Sinzinger, “A modelbased approach to junction detection using radial energy,” Pattern Recognition, vol. 41, no. 2, pp. 494–505, 2008.
 [26] Z. Püspöki and M. Unser, “Templatefree waveletbased detection of local symmetries,” IEEE Transactions on Image Processing, vol. 24, no. 10, pp. 3009–3018, 2015.
 [27] Z. Püspöki, V. Uhlmann, C. Vonesch, and M. Unser, “Design of steerable wavelets to detect multifold junctions,” IEEE Transactions on Image Processing, vol. 25, no. 2, pp. 643–657, 2016.
 [28] G. Xia, J. Delon, and Y. Gousseau, “Accurate junction detection and characterization in natural images,” International Journal of Computer Vision, vol. 106, no. 1, pp. 31–56, 2014.
 [29] F. Srajer, A. G. Schwing, M. Pollefeys, and T. Pajdla, “Match box: Indoor image matching via boxlike scene estimation,” in International Conference on 3D Vision, 2014, pp. 705–712.
 [30] Y. Furukawa, B. Curless, S. M. Seitz, and R. Szeliski, “Reconstructing building interiors from images,” in IEEE Conference on Computer Vision and Pattern Recognition, September 27  October 4 2009, pp. 80–87.
 [31] W. Förstner, “A feature based correspondence algorithm for image matching,” International Archives of Photogrammetry and Remote Sensing, vol. 26, no. 3, pp. 150–166, 1986.
 [32] C. Harris and M. Stephens, “A combined corner and edge detector,” in Alvey Vision Conference, 1988, pp. 147–151.
 [33] K. Mikolajczyk and C. Schmid, “Scale & affine invariant interest point detectors,” International Journal of Computer Vision, vol. 60, no. 1, pp. 63–86, 2004.
 [34] W. Förstner, T. Dickscheid, and F. Schindler, “Detecting interpretable and accurate scaleinvariant keypoints,” in IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 2256–2263.
 [35] L. Alvarez and F. Morales, “Affine morphological multiscale analysis of corners and multiple junctions,” International Journal of Computer Vision, vol. 25, no. 2, pp. 95–107, 1997.
 [36] É. Vincent and R. Laganière, “Junction matching and fundamental matrix recovery in widely separated views,” in British Machine Vision Conference, 2004, pp. 1–10.
 [37] B. P. D. Ruff, “A pipelined architecture for the canny edge detector,” in Alvey Vision Conference, Cambridge, UK, 1987, pp. 1–4.
 [38] S. Ramalingam, M. Antunes, D. Snow, G. H. Lee, and S. Pillai, “Linesweep: Crossratio for widebaseline matching and 3d reconstruction,” in IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1238–1246.
 [39] M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981.
 [40] A. Desolneux, L. Moisan, and J.M. Morel, From Gestalt Theory to Image Analysis: A Probabilistic Approach, 1st ed. Springer Publishing Company, Incorporated, 2007.
 [41] A. Neubeck and L. J. V. Gool, “Efficient nonmaximum suppression,” in IEEE International Conference on Pattern Recognition, 2006, pp. 850–855.
 [42] K. Mikolajczyk and C. Schmid, “A performance evaluation of local descriptors,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 10, pp. 1615–1630, 2005.

[43]
J. Morel and G. Yu, “ASIFT: A new framework for fully affine invariant image comparison,”
SIAM J. Imaging Sciences, vol. 2, no. 2, pp. 438–469, 2009.  [44] M. Perdoch, O. Chum, and J. Matas, “Efficient representation of local geometry for large scale object retrieval,” in IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 9–16.
Comments
There are no comments yet.