1 Introduction
Extracting shapes from images is a key issue in vision and image processing. Object detection, especially line detection, is a fundamental operation used in a wide range of applications.
The Hough transform [1], HT, and the Generalized Hough Transform [2], GHT, are tools based on a voting scheme where image elements vote for parameters of the geometric object. Unfortunately, these methods have large memory and computation time requirements as the parameter space increases exponentially with the dimension of the problem, the number of parameters. On the other hand, reducing the image or the parameter space by quantization significantly lowers accuracy.
Sketches as methods to approximate frequencies have been successfully used in big data and streaming, where massive data needs to be processed in memory and time efficient manner [3, 4, 5]. Sketch algorithms refer to a class of streaming algorithms that represent a large dataset with a compact summary, typically much smaller than the full size of the input.
One of the problems solved using sketches is the ’frequent items’ problem. Given an sequence of items, find all items whose frequency (’vote value’) exceeds a specified fraction of the total number of items: A wide variety of algorithms and heuristics have been proposed for this problem, based on sampling, hashing, and counting (see
[6, 7] for surveys).1.1 Our Contribution
Image  Sketch HT  Classic HT 

Classic HT  CU  CMCU  COUNT  COUNTCU  COUNTMU 

Hough transform algorithms detect objects by searching for a local peak in the object parameter space , by the following major steps: 1. Image elements, , vote for the cells in that agree with . 2. Local maxima, peaks, in the accumulator, are the detected shapes where the votes are stored in an array with the dimension of . We propose replacing the accumulator array with a much smaller sketch.
The ’frequent items’ (or ’heavy hitters’) problem is not exactly what is needed for the ’peak detection’ in Hough transforms, for several reasons:

Due to geometric quantization and noise there are several points related to an object and we only want one.

We want to recognize the objects in the image and are not usually interested in their exact number of votes.

There is often a significant amount of noise in the image, which should be ignored.
In this paper we show how a sketch algorithms for the ’frequent items’ problem can improve Hough transform algorithms, using much less memory and with a better robustness to noise. The main idea is that the votes are only approximated and the peak detection is carried out only around the ’frequent votes’. Figure 1 shows the difference between the full object parameter space accumulator in Classic HT and the ’topfrequentvotes’ of the Sketch HT.
We also propose a new sketch algorithm, Count Median Update
, which improves the estimation accuracy compared to known methods.
1.2 Previous Work
1.2.1 Hough Transform Algorithms
The main disadvantages of HT are long computation time and large data storage. Many implementations have being proposed to alleviate these issues [8].
There are probabilistic methods to speed up the HT such as Probabilistic HT [9], PHT, and Randomized HT [10, 11, 12], RHT. Although RHT and PHT are computationally fast, they are sensitive to noise and occlusions, since the noise pixels have extra impact on these randomized algorithms [13, 14]. These algorithms use randomness for choosing points in the image space, while we suggest using randomness in parameter space. Our algorithms can also be combined with the previous random methods.
1.2.2 Data Streaming Sketch Algorithms
Sketches are concise data summaries of a highdimensional vector which can be used to estimate queries on it. The sketch is a linear projection of the input vector with random vectors defined by hash functions. Increasing the range of the hash functions codomain (
) increases the accuracy of the estimation, and increasing the number of hash functions decreases the probability of a bad estimate.
The sketch is a array , and supports and , which can be used for solving the ’frequent items’ problem. We outline two sketching approaches

Count Min sketch[4] (CM)  the sketch is similar to but without the sign hash function. In contrast to this algorithm returns a biased estimator, overestimating the count.
Let be the number of items inserted in the sketch,
the number of times element was inserted in the sketch (), and using hash functions ( memory), sketches guarantee:
Sketch type
Estimation Accuracy
Success Probability
Estimation accuracy is a bound on the distance of sketch from the real vote value of , , and success probability is the probability that this bound fulfilled.
Methods used to improve sketches include:

Conservative Update (CU)  conservative updates [15, 16] were extended to sketches [6, 17] to avoid unnecessary updates and reduce overestimation. is incremented only if^{1}^{1}1 for COUNT . CU depends on the order of the incremented items and although it does not guarantee improvement it often does.

Lossy counting (L)  lossy counting [18] was extended to sketches[19, 17] by removing small votes. In this approach, the input is divided into parts. After processing the ’th part, small cells (or ), are reduced. In our experiments on images lossy counting did not improve the results so we do not mention this method again.
Sketches solve the topk frequent items problem by maintaining a topk list which is updated during the [20] or by comparing the results for all the ’s in parameter space.
2 new Count sketch
While CM with CU shows significant improvement over CM in many cases, [19] show that CMCU can reduce the overestimation error by at least 1.5, it appears that COUNTCU [17] gives little improvement. As CU reduces only overestimate errors and COUNT also contains underestimate errors.
We propose a new variant of conservative update  Count Median Update (COUNTMU)  that reduces both over and under estimate errors.
The motivation for this method is similar to CMCU, updating should only affect the cells equal to ^{2}^{2}2updating in the range for even or
for odd
, instead of just slightly improves results. The other counters are ’wrong’ since they were notably influenced by other elements, so incrementing them will increase noise and inaccuracy.Our new sketch algorithm is significantly better than COUNTCM and usually better than CMCU (see 4) and could possibly be of use in other streaming data/NLP queries.
3 Sketch Hough Transform (SHT)
We claim that any algorithm which estimates the ’frequent items’ can be used to improve Hough transforms algorithms.
The Hough Transform’s parameters are the polar coordinates of the line, and , which are the angle of the normal to the line and the distance from the origin to the line. Let be the used sketch, the number of hash functions used in , the memory used by (the size of hash’s codomain), the maximum number of lines we expect to find in the image, the maximum distance of a line in the image from the origin, and the number of edge points in the image.
The Classic Hough Transform, CHT, stores the line votes in an accumulator array which ranges over all the object space  . Algorithm 1, SHT, replaces this accumulator with a smaller sketch, using memory and with a probability returns a superset of the CHT result.
returns a sketch which stores votes for elements from using hash functions to a size codomain  using memory. returns the top ’frequent items’ from the sketch and is calculated at line 10 with memory.
size can be limited to by searching locally for peaks within windows of a fixed number of angles.
Although the number of votes for a line in SHT is different (depending on , , and ) from the number of votes in CHT, using the right configuration for (see 4.1) results in a superset of lines in almost the same order (sorted by votes) as CHT. Additionally, a simple check can remove false lines which do not exist in the image.
(a)  (b)  (c) 
(b) Quality as a function of memory.
(c) Quality as a function of number of hash functions.
4 Experiments
In order to show the effectiveness of the proposed Sketch Hough Transform (SHT) we run it with several sketching methods: Count Min (CM), Count Min with conservative update (CMCU), Count (COUNT), Count with conservative update (COUNTCU), and Count Median Update (COUNTMU), our method. All of them with and without lossy counting.
The accuracy of SHT results were calculated by comparing them to the result of CHT using an accumulator of (184320) memory. Each algorithm was run 10 times on an image and the results quality (mean accuracy) is reported.
4.1 Synthetic Line Images
We created 204 synthetic images () containing 15 random lines and added uniform noise. We ran SHT algorithms 10 times on each image, and the quality of the results were compared to the result of the classic Hough Transform. As lossy counting did not have a significant effect on the sketches in these cases, we don’t show lossy counting results.
Plot 3(a) shows the dependence of SHT quality on the amount of noise for bytes of memory. It can be clearly seen that the results of COUNTMU are superior to all the other sketches. The advantage of our method increases with the noise.
Plot 3(b) shows the dependence of SHT quality on sketch memory size (hash codomainnumber of hash functions) for images with 19k noise points. It can be seen that CM has better results than CMCU for sketches with small memory, while CMCU is better with memory size above 420. Our method, COUNTMU, is superior to all other methods.
Plot 3(c) shows the dependence of SHT quality on the number of hash functions, using bytes of memory. The seesaw pattern in COUNTMU is a result of the difference in the definition of median for an even or odd number of elements, number of hash functions.
4.2 Real Images
We ran the SHT on 15 random real images of various sizes containing roads, train tracks, skylines and landscapes. Figure 2 shows the detected lines on an image for Classic HT and SHT with various sketch types.
Sketch  Quality 

CM  76% 
CMCU  90% 
COUNT  56% 
COUNTCU  20% 
COUNTMU  96% 
It can be clearly seen that the results of COUNTMU, our method, are superior to all other sketches on real images too.
5 Conclusion
We introduced the Sketch Hough Transform, SHT, algorithm that reduced the amount of memory and increased the robustness to noise compared to the Classic Hough Transform. We showed that the results of SHT, using a small memory are almost the same as the classic Hough Transform.
We also proposed a new sketch, Count Median Update, and showed that this new sketch is significantly superior to other sketching methods especially as the noise in the image increased.
References
 [1] P.V.C. and Hough, “A method and means for recognizing complex patterns,” U.S. Patent 3,069,654, December 1962.
 [2] D. H. Ballard, “Generalizing the Hough transform to detect arbitrary shapes,” Pattern Recognition, vol. 13, no. 2, pp. 111–122, Jan. 1981.
 [3] Moses Charikar, Kevin Chen, and Martin FarachColton, “Finding frequent items in data streams,” Theoretical Computer Science, vol. 312, no. 1, pp. 3–15, Jan. 2004.
 [4] Graham Cormode and S. Muthukrishnan, “An improved data stream summary: the countmin sketch and its applications,” Journal of Algorithms, vol. 55, no. 1, pp. 58–75, Apr. 2005.

[5]
Noga Alon, Yossi Matias, and Mario Szegedy,
“The Space Complexity of Approximating the Frequency Moments,”
Journal of Computer and System Sciences, vol. 58, no. 1, pp. 137–147, Feb. 1999.  [6] Graham Cormode and Marios Hadjieleftheriou, “Finding the frequent items in streams of data,” Communications of the ACM, vol. 52, no. 10, pp. 97, Oct. 2009.
 [7] Hongyan Liu, Yuan Lin, and Jiawei Han, “Methods for mining frequent items in data streams: an overview,” Knowledge and Information Systems, vol. 26, no. 1, pp. 1–30, Jan. 2011.
 [8] Priyanka Mukhopadhyay and Bidyut B. Chaudhuri, “A survey of Hough Transform,” Pattern Recognition, vol. 48, no. 3, pp. 993–1010, Mar. 2015.
 [9] N. Kiryati, Y. Eldar, and A. M. Bruckstein, “A probabilistic Hough transform,” Pattern Recognition, vol. 24, no. 4, pp. 303–316, Jan. 1991.
 [10] L. Xu and E. Oja, “Randomized Hough Transform (RHT): Basic Mechanisms, Algorithms, and Computational Complexities,” CVGIP: Image Understanding, vol. 57, no. 2, pp. 131–154, Mar. 1993.

[11]
S. y Guo, X. f Zhang, and F. Zhang,
“Adaptive Randomized Hough Transform for Circle Detection
using Moving Window,”
in
2006 International Conference on Machine Learning and Cybernetics
, Aug. 2006, pp. 3880–3885.  [12] J.R. Bergen and (Schweitzer) Shvaytser, “A probabilistic algorithm for computing Hough transforms,” Journal of Algorithms, vol. 12, no. 4, pp. 639–656, 1991.
 [13] Wei Lu and Jinglu Tan, “Detection of incomplete ellipse in images with strong noise by iterative randomized Hough transform (IRHT),” Pattern Recognition, vol. 41, no. 4, pp. 1268–1279, Apr. 2008.
 [14] SiYu Guo, YaGuang Kong, Qiu Tang, and Fan Zhang, “Probabilistic Hough transform for line detection utilizing surround suppression,” in 2008 International Conference on Machine Learning and Cybernetics, July 2008, vol. 5, pp. 2993–2998.
 [15] Cristian Estan and George Varghese, “New Directions in Traffic Measurement and Accounting,” New York, NY, USA, 2002, SIGCOMM ’02, pp. 323–336, ACM.
 [16] Saar Cohen and Yossi Matias, “Spectral Bloom Filters,” New York, NY, USA, 2003, SIGMOD ’03, pp. 241–252, ACM.
 [17] Amit Goyal, Hal Daumé, III, and Graham Cormode, “Sketch Algorithms for Estimating Point Queries in NLP,” Stroudsburg, PA, USA, 2012, EMNLPCoNLL ’12, pp. 1093–1103, Association for Computational Linguistics.
 [18] Gurmeet Singh Manku and Rajeev Motwani, “Approximate Frequency Counts over Data Streams,” Hong Kong, China, 2002, VLDB ’02, pp. 346–357, VLDB Endowment.
 [19] Hal Daumé Amit Goyal, “Approximate Scalable Bounded Space Sketch for Large Data NLP,” 2011.
 [20] Moses Charikar, Kevin Chen, and Martin FarachColton, “Finding Frequent Items in Data Streams,” July 2002, Lecture Notes in Computer Science, pp. 693–703, Springer, Berlin, Heidelberg.
Comments
There are no comments yet.