Soft-PHOC Descriptor for End-to-End Word Spotting in Egocentric Scene Images

Word spotting in natural scene images has many applications in scene understanding and visual assistance. We propose Soft-PHOC, an intermediate representation of images based on character probability maps. Our representation extends the concept of the Pyramidal Histogram Of Characters (PHOC) by exploiting Fully Convolutional Networks to derive a pixel-wise mapping of the character distribution within candidate word regions. We show how to use our descriptors for word spotting tasks in egocentric camera streams through an efficient text line proposal algorithm. This is based on the Hough Transform over character attribute maps followed by scoring using Dynamic Time Warping (DTW). We evaluate our results on ICDAR 2015 Challenge 4 dataset of incidental scene text captured by an egocentric camera.


Scene Text Detection via Holistic, Multi-Channel Prediction

Recently, scene text detection has become an active research topic in co...

Visual attention models for scene text recognition

In this paper we propose an approach to lexicon-free recognition of text...

WordSup: Exploiting Word Annotations for Character based Text Detection

Imagery texts are usually organized as a hierarchy of several visual ele...

I2C2W: Image-to-Character-to-Word Transformers for Accurate Scene Text Recognition

Leveraging the advances of natural language processing, most recent scen...

STEFANN: Scene Text Editor using Font Adaptive Neural Network

Textual information in a captured scene play important role in scene int...

PopEval: A Character-Level Approach to End-To-End Evaluation Compatible with Word-Level Benchmark Dataset

The most prevalent scope of interest for OCR applications used to be sca...

Word Searching in Scene Image and Video Frame in Multi-Script Scenario using Dynamic Shape Coding

Retrieval of text information from natural scene images and video frames...


  • [1] L. Gomez and D. Karatzas.: Textproposals: a text-specific selective search algorithm for word spotting in the wild. Pattern Recognition (70) (2017) 60–74
  • [2] Shelhamer, E., Long, J., Darrell., T.: Fully convolutional networks for semantic segmentation. IEEE Trans. on PAMI (2016)
  • [3] Bazazian, D., Gomez, R., Nicolaou, A., Gomez, L., Karatzas, D., Bagdanov., A.: Improving text proposals for scene images with fully convolutional networks. In: DLPR, ICPR. (2016)
  • [4] Bazazian, D., Gomez, R., Nicolaou, A., Gomez, L., Karatzas, D., Bagdanov., A.: Fast: Facilitated and accurate scene text proposals through fcn guided pruning. In: Pattern Recognition Letters. (2016)
  • [5] Redmon, J., Farhadi., A.: Yolo9000: Better, faster, stronger. In: Proc. CVPR. (2017) 6517–6525
  • [6] Galteri, L., Bazazian, D., Seidenari, L., M. Bertini, A.B., Nicolaou, A., Karatzas, D., Bimbo, A.: Reading text in the wild from compressed images. In: Proc. ICCV. (2017) 2399–2407
  • [7] Almazan, J., Gordo, A., Fornes, A., , Valveny, E.: Word spotting and recognition with embedded attributes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36 (12) (2014) 2552–2566
  • [8] Bazazian, D., Karatzas, D., Bagdanov., A.: Word spotting in scene images based on character recognition. In: Proc. Workshop CVPR. (2018) 1872–1874