Log In Sign Up

Feature Extraction in Augmented Reality

by   Jekishan K. Parmar, et al.

Augmented Reality (AR) is used for various applications associated with the real world. In this paper, first, describe characteristics and essential services of AR. Brief history on Virtual Reality (VR) and AR is also mentioned in the introductory section. Then, AR Technologies along with its workflow is depicted, which includes the complete AR Process consisting of the stages of Image Acquisition, Feature Extraction, Feature Matching, Geometric Verification, and Associated Information Retrieval. Feature extraction is the essence of AR hence its details are furnished in the paper.


Projection Mapping Technologies for AR

This invited talk will present recent projection mapping technologies fo...

A Review of Augmented Reality Applications for Building Evacuation

Evacuation is one of the main disaster management solutions to reduce th...

A Novel Edge Detection Operator for Identifying Buildings in Augmented Reality Applications

Augmented Reality is an environment-enhancing technology, widely applied...

Design, Assembly, Calibration, and Measurement of an Augmented Reality Haploscope

A haploscope is an optical system which produces a carefully controlled ...

Exploring Extended Reality with ILLIXR: A New Playground for Architecture Research

As the need for specialization increases and architectures become increa...

cMinMax: A Fast Algorithm to Find the Corners of an N-dimensional Convex Polytope

During the last years, the emerging field of Augmented Virtual Reali...

An examination of skill requirements for Augmented Reality and Virtual Reality job advertisements

The field of Augmented Reality (AR) and Virtual Reality (VR) has seen ma...

I Introduction

The term Augmented Reality holds multiple definitions but one of the most popular and the one that still holds is Augmented reality is a field in which 3D virtual objects are integrated into a 3-D real environment in real time [1]. Though, AR can be looked from the point of view of enhancement to reality, it can be defined as AR is use of computers to enhance the richness of the real world [2]. It can also be simply called user interaction in 3D environment [3]. When it is perceived as an interaction between humans and virtual objects, it can also define it as combining the real and the virtual in order to assist the user in performing a task in a physical setting is called Augmented Reality [4]. According to [5]

, Augmented Reality supports context-aware computing. For an AR user, the real world and virtual objects coexist on the same view. For example, in a museum there is Mona Lisa, when this picture of Mona Lisa is looked through an augmented reality smartphone, the information of Mona Lisa will be imposed on top. In addition, it may include information like the artist, da Vinci, as well as the year that it was estimated that this picture was drawn. Figure 1, shows one such example of augmented reality camera with the Taj Mahal. It can be seen in the figure

1 that as the camera is acquiring the Taj in its lenses and displaying to its user, information related to the Taj is also being displayed to the user.

Next section compares AR and VR with deep insights into current advancements and research challenges, section III depicts various technologies involved in AR, then, details about Feature Extraction is furnished in section IV and finally, section V concludes the paper.

Ii Augmented Reality And Virtual Reality

Jaron Lanier coined the phrase “Virtual Reality”, in 1989 [6] and Thomas Caudell defined “Augmented Reality” in 1990 [7]. Hence, these concepts are quite old. Though, its complete implementation has become possible only recently and that too, with full reliability.

Ii-a Reasons for Recent advancements in Augmented Reality

  • High resolution cameras, which enables accurate image and object identification, are only available since recent times. They are also available in smart devices.

  • Recent availability of very high performing Central Processing Units and Graphical Process Units due to this fast and reliable image processing along with feature extraction can be done, which is necessary for Augmented Reality.

  • Cheap availability of large amount of memory and faster input/output access which is useful to store object information and quickly access it, which is highly essential for AR.

  • Sharp virtual text, and sharp images, superimposed on smart devices in a very elegant, and yet, easy-on-the-eye fashion has become possible only due to availability of High Definition displays on these devices.

  • Information needs to be retrieved speedily and brought to the device from AR servers and databases so that it can be displayed on this device for the user to be able to see quickly, which is recently possible due to advent of high-speed broadband, wireless and wired networking technologies.

Due to all these advancements, AR is going to become better and quicker. In spite of all these technological advancements there are some challenges in AR that needs to be worked on and requires proper study as well as research in order to get more from AR.

Fig. 1: Augmented Reality camera example222

Ii-B Research Challenges in Augmented Reality

Despite the growing interests and development in AR, following are the challenges that exists in the field which has very wide scope of research.

  • A precise, fast, and robust registration between synthetic augmentations and the real environment is one of the most important challenge [8].

  • AR system needs to deal with vast amount of information that exists in the real world which requires very quick and portable hardware [9].

  • Highly efficient energy consumption from AR devices is desired due to their limited battery life [9].

  • High network dependency is also a challenge as there may not be network coverage in some areas [10].

  • AR assumes everything to be static while real world is very dynamic, as there are many things that keep on changing like people passing by, weather conditions, color of buildings as they may be repainted over a gap of few years. Thus, inability of AR to lack dynamism is also a challenge [10].

Ii-C Comparison of Augmented Reality and Virtual Reality

A virtual reality user will be fully immersed in animated environment. This is like the game playing space that is commonly used. A user or a game player will commonly use an avatar to exist and interact inside the virtual space. The view of the user in virtual reality is different from the real environment and fantasies and illusions surrounding the avatar are easy to create in the virtual world. Whereas, augmented reality is a mixture of real life and virtual reality [11]. This is how, it differs from virtual reality. So, augmented reality combines information and does various things, but its basis is real life, what the user actually sees [12].

Fig. 2: Augmented Reality (AR) versus Virtual Reality (VR)

An augmented reality user is able to obtain useful information about location or objects and can interact with virtual contents in the real world. Therefore the virtual contents are superimposed onto the real world image that is seen by the user. The augmented reality user can distinguish the superimposed virtual objects and hence able to turn on or turn off selected AR functions, which may be related to certain objects. In other words, if a user is only interested in objects, and not in locations. In that case, user will go and adjust augmented reality functions such that location information does not pop up, on their user screen and only gets object information. Things like that are controllable in futuristic systems. In comparison to virtual reality, augmented reality users commonly feel less separated from the real world, because basically the foundation of their view is exactly the real world. Things are just superimposed on the real world view of that user. So fantasies and illusions can be created and superimposed on the real world view.

Some of the definitions of AR has already been described, but its definition in context of virtual reality can be formed and compared with the definition of virtual reality itself. The comparison is shown in figure 2.

It can be seen in figure 2, that on one side there is the real environment and on the other side is the virtual environment, then these two combine into a mixed reality, where the augmented reality is virtually based on reality that is based on virtuality and augmented virtuality is based on virtuality which in turn is based on reality. Hence, augmented reality is based on reality where virtual information and images are overlapped and superimposed in the right position.

Iii Augmented Reality: Technology

Figure 3 shows technological components present in augmented reality. On the left, there is content provider server which contains contents viz. 3D geographical assets, geographical information, text, images, movies and point of interest information. On the other side there are three major units to make augmented reality possible and this includes detection and tracking engines, a rendering system, and also interaction devices.

Detection and Tracking uses computer vision, tracking sensors, camera API and markers and features in order to carry out its task of Recognition. Necessary visualization is provided to the user by using computer graphics API, video composition and depth composition in order to render images.

Fig. 3: Technological Components in Augmented Reality

Then, interaction is provided to the user by picking up gestures, voice, haptics and providing a UI. Thus, these three are the combination of software framework and application programming, including browser access, as shown in figure 3.

Iii-a Workflow of AR

Fig 4 shows, the user sends an input, and it is detected by the interaction unit, then the interaction unit sends information to the rendering device. Now, the rendering device will take combined input from the detection and tracking unit. In addition, the contents needs to be streamed together and this, combined together by the rendering unit which will send back visual feedback to the user.

Fig. 4: AR Workflow
Fig. 5: Augmented Reality Process Example

Iii-B AR Process

AR Process is divided into five steps, which can be described by an example.

  • Image Acquisition: In this step, the image is retrieved from the augmented reality camera. In our example i.e. the picture which is shown in figure 5, is at the image acquisition stage and it is the Leaning Tower of Pisa in the middle of the device display.

  • Feature Extraction: It is based on an initial set of measured data. The extraction process generates informative non redundant values to cilitate the subsequent feature learning and generalization steps. As depicted in figure 5, part 2), there are redundant red circles on top of the image, which denotes extraction of features. Once the feature is extracted, these feature extracted points will be used to identify and learn what this object is.

  • Feature Matching: This is a process of computing abstractions of image information and to make a local decision if there is an image feature or not and this is conducted for all image points. In order to identify extracted points, other features are needed to be matched, which is performed in this step and the matched objects are sent for verification in next step.

  • Geometric Verification: This is an identification process of finding geometrically related images in the image data set. The image data set is a subset of the overall augmented reality image database. Matched objects in previous step are verified in this step by comparing it with objects in the database and are further sent for retrieval in next step.

  • Associated Information Retrieval: This process is to search and retrieve metadata, text, and content-based indexing information of the identified image or object. Associated information is used for display on the augmented reality screen near the corresponding image or object. As shown in the figure 5, in the middle lower part, there is Leaning Tower of Pisa and its information is superimposed on top of the image that is used on the Smartphone.

Fig. 6: Augmented Reality Process Example

Iv Feature Extraction Process In Augmented Reality

Feature Extraction is the second step in AR Process, though it composes of six stages as shown in figure 6. This section will discuss it in detail.

  • Grayscale Image Generation (GIG): Here, the original image captured by the augmented reality device is changed into a grayscale value image in order to make it robust to colour modifications.

  • Integral Image Generation (IIG): This is a process of building an integral image from the grayscale image. This procedure enables fast calculation of summations over image sub-regions.

  • Response Map Generation (RMG): In order to detect Interest Points (IPs)using the determinant of the Hessian Matrix of image, the RMG Process constructs the scale-space of the image. Only after having IPs, various operations can be done which leads close to actual feature extraction.

  • Interest Point Detection (IPD): Based on the generated scaled response maps, the maxima and minima are detected and used as the Interest Points.

  • Orientation Assignment (OA): Here, each detected Interest Point, is assigned a reproducible orientation to provide rotation invariance. Rotation invariance means invariance to image rotation.

  • Descriptor Extraction (DE): This is a process of uniquely identifying an interest point such that it is distinguished from other Interest Points.

Iv-a Feature Extraction with an Example

The feature extraction process is about finding the Interest Points from the image or the video, detecting the descriptors, from the interest point and then compare the descriptors with the data in the database. As shown in figure 7. Here, first is the original image, then gray scale image, which is processed into the interest points, and as shown, there are certain locations, where some specific characteristics are identified and marked. Then in the next processing stage and based on some color coordination, the descriptors are found. With these descriptors database can be queried and find matching information of what this object is. As shown here in figure 7, in order to have accurate descriptors, keen and sharp image is needed. Therefore, now with new enhanced cameras on smart devices, more accurate descriptors are found and therefore, the augmented reality information is more reliable, and there are less errors, and also it is processed much more quickly.

Iv-B Blob Detection

Feature extraction is a combination where qualification for descriptors is needed in a very accurate way. Therefore, invariability from noise, scale, and rotation needs to be kept. In addition, there are various kinds of descriptors, e.g. corner descriptors, blob descriptors and region descriptors. In this section, describes blob detection with the help of an example.

Fig. 7: Feature Extraction Example
Fig. 8: Blob Detection Example

A blob is a region of an image that has constant or approximately constant image properties. All the points in a blob are considered to be similar to each other. These image properties, which include brightness and color, are used in the comparison process to surrounding regions [13].

Iv-C Algorithm for Blob Detection

Following are the steps are needed to be follow for Blob Detection [14]. Figure 9, shows an image with its blobs detected.

  • Start from the first line of the image and find groups of one or more white (or black) pixels, known as lineblobs.

  • Find X, Y co-ordinates of each those blob and number each of these groups.

  • Repeat this sequence on next line.

  • While collecting the lineblobs, check whether all lineblobs are collected and whether they overlap or not.

  • If lineblobs are overlapping then merge these lineblobs by using there X and Y co-ordinates to one blob and treat it as a whole blob. Repeat this for every line and you have a collection of blobs.

Iv-D Other Feature Extraction Techniques

Following are some feature extraction techniques:

  • Haar [15]

  • Scalable Invariant Feature Transform (SIFT) [16]

  • Histogram of Oriented Gradient (HOG) [17]

  • Speeded Up Robust Features (SURF) [18]

  • Oriented FAST and rotated BRIEF (ORB) [19]

SIFT and SURF will be described in detail in this section. SIFT is the most widely used feature extraction algorithm. It extracts features from images accurately and efficiently. It overcomes the various adverse effects of extraction, such as transformation, noise, and lightness. Following is the four step SIFT algorithm.

  • Step 1. Scale space extreme detection

  • Step 2. Key point localization and filtering,

  • Step 3. Orientation assignment,

  • Step 4. Descriptor construction

SURF is an improvement over SIFT from the aspect of speed. Its algorithm is based on the same algorithmic principle as SIFT, but uses procedures that require less computation to enhance the processing speed. Therefore, SURF made it possible to carry out feature extraction in a near real-time or a real-time manner. Following is the three step algorithm of SURF:

  • Step 1. Interest Point Detection, which is about high-speed detection of interest points.

  • Step 2. Local Neighbourhood Description, which is about descriptors using response of the Haar-wavelet.

  • Step 3. Matching Process, where faster matching algorithm is applied by using a Laplacian operator.

V Conclusion

Looking at the recent advancements in AR Technologies viz. accurate cameras and smart devices, it is bound that more and more AR based devices and applications are going to capture its market share. As a pillar to that, the core component of AR i.e. Feature Extraction is going to be technologically challenging. For the same, SIFT and SURF, presented in this paper, are ready to be used as feature extraction algorithms. Apart from that, some more insight can be given into other feature extraction techniques that are described in the previous section. Though, emphasis on accuracy of the feature extraction and matching process is still a field of research which can be worked on by using new machine learning techniques and algorithms.


  • [1] Ronald Azuma, ”A survey of augmented reality.” MIT Press, Presence 6.4 (1997): 355-385.
  • [2] B.B. Bederson, “Audio Augmented Reality: A Prototype Automated Tour Guide.” In Conference Companion on Human Factors in Computing Systems, 210–11. CHI ?95. New York, NY, USA: ACM, 1995.
  • [3] Carey, Rick, Tony Fields, Andries van Dam, and Dan Venolia. “Why Is 3-D Interaction So Hard and What Can We Really Do About It?” In Proceedings of the 21st Annual Conference on Computer Graphics and Interactive Techniques, 492–93. SIGGRAPH ?94. New York, NY, USA: ACM, 1994.
  • [4] Dünser, Andreas, and Eva Hornecker. “Lessons from an AR Book Study.” In Proceedings of the 1st International Conference on Tangible and Embedded Interaction, 179–82. TEI ?07. New York, NY, USA: ACM, 2007.
  • [5] J. Grubert, T. Langlotz, S. Zollmann and H. Regenbrecht, ”Towards Pervasive Augmented Reality: Context-Awareness in Augmented Reality.” IEEE Trans Vis Comput Graph. March, 2016.
  • [6] C. Conn, J. Lanier, M. Minsky, S. Fisher, and A. Druin. 1989. Virtual environments and interactivity: windows to the future. In ACM SIGGRAPH 89 Panel Proceedings (SIGGRAPH ’89). ACM, New York, NY, USA, 7-18. DOI=
  • [7] Digital Trends. ¡¿
  • [8] M. Mallem, ”Augmented Reality: Issues, Trends and Challenges” In Proceedings of 2nd IEEE Internationcal Conference on Image Processing Theory Tools and Applications (IPTA), 2010.
  • [9] M. Mekni and A. Lemieux, ”Augmented Reality: Applications, Challenges and Future Trends”, In Proc. of the 13th International Conference on Applied Computer and Applied Computational Science (ACACOS), Kuala Lumpur, Malaysia, April 23-25, 2014.
  • [10] C. Arth and D. Schmalstieg, ”Challenges of Large-Scale Augmented Reality on Smartphones”, Graz University of Technology, Graz, 2011.
  • [11] P. Milgram and F. Kishino, ”A Taxonomy of Mixed Reality Visual Displays” IEICE Trans. on Information Systems, Vol. E77-D, No. 12, December 1994.
  • [12] J.M. Harley, E.G. Poitras and A. Jarrell,.”Comparing virtual and location-based augmented reality mobile learning: emotions and learning outcomes” Education Tech Research Dev (2016) 64: 359. doi:10.1007/s11423-015-9420-7
  • [13] Chao Du, Yen-Lin Chen, Mao Ye and Liu Ren, ”Edge Snapping-Based Depth Enhancement for Dynamic Occlusion Handling in Augmented Reality”, In Proc. of 15th IEEE International Symposium on Mixed and Augmented Reality (ISMAR) , Mexico, 2016.
  • [14] S.S.Vanjire, E. Khan, P. Mahamulkar, S. Kohli and P. Kedar, ”Implementation of Augmented Reality using Hand Gesture Recognition”, International Journal of Technical Research and Applications, Vol 3. Issue 2, pp. 253-256, April 2015.
  • [15] Papageorgiou, Oren and Poggio, ”A general framework for object detection”, In Proc. of 6th IEEE International Conference on Computer Vision, India, 1998.
  • [16] D. G. Lowe, ”Method and apparatus for identifying scale invariant features in an image and use of same for locating an object in an image”, U.S. Patent 6 711 293, March 23, 2004.
  • [17]

    N. Dalal and B. Triggs, ”Histograms of oriented gradients for human detection”, In Proc. of 2nd IEEE Int. Conf. on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, 2005.

  • [18] H. Bay, A. Ess, T. Tuytelaars and L. V. Gool, ”Speeded-Up Robust Features (SURF)”, Computer Vision and Understanding, Vol 110, Issue 3, pp. 346-359, June 2008.
  • [19] E. Rublee, V. Rabaud, K. Konolige and G. Bradski, ”ORB: An efficient alternative to SIFT or SURF”, In Proc. of 13th IEEE Int. Conf. on Computer Vision, Spain 2011.