Streaming Algorithm for Euler Characteristic Curves of Multidimensional Images

05/04/2017 ∙ by Teresa Heiss, et al. ∙ Institute of Science and Technology Austria 0

We present an efficient algorithm to compute Euler characteristic curves of gray scale images of arbitrary dimension. In various applications the Euler characteristic curve is used as a descriptor of an image. Our algorithm is the first streaming algorithm for Euler characteristic curves. The usage of streaming removes the necessity to store the entire image in RAM. Experiments show that our implementation handles terabyte scale images on commodity hardware. Due to lock-free parallelism, it scales well with the number of processor cores. Additionally, we put the concept of the Euler characteristic curve in the wider context of computational topology. In particular, we explain the connection with persistence diagrams.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The Euler characteristic curve is a powerful tool in image processing [7]. It has been used in a variety of fields including astrophysics111The astrophysics community refers to the Euler characteristic curve as the genus. [2, 13], medical image analysis [17, 11], and image processing in general [14, 18]. Its wide applicability stems from simplicity and efficient computability.

However, with the advances in image acquisition technology, there is need to handle very large images. For example the state-of-the-art micro-CT scanner Skyscan 1272 creates images of size with 14-bit precision. Therefore, a single scan yields more than half a trillion voxels. It is also possible to combine multiple scans of the same object which further multiplies the size of data. As loading the resulting multi-terabyte image into RAM of a commodity computer is infeasible, a streaming approach is needed.

We present the first streaming algorithm for computing Euler characteristic curves.222We review related work at the end of the paper. Our algorithm divides a multidimensional image into chunks that fit into RAM, calculates the Euler characteristic curves for each chunk separately and merges them in the end. Since these chunks can be made arbitrarily small, commodity hardware can be used to compute Euler characteristic curves of arbitrarily large images. The fact that the chunks are not dependent on each other makes lock-free parallelism possible.

For defining the Euler characteristic curve we first need to explain what the Euler characteristic is. There are two ways to define the Euler characteristic and the Euler-Poincaré formula states that they are both equivalent. For discrete two-dimensional surfaces, like a triangulation of a sphere or a torus, the definitions are: first, the number of vertices minus the number of edges plus the number of faces; second, the number of connected components minus the number of tunnels plus the number of voids. Originally the Euler characteristic was defined for the surface of a convex polyhedron where it always equals two. To see this, consider that such a surface consists of one connected component, no tunnels and one void.

The equivalence between these two definitions seems to be the reason for the usefulness of the Euler characteristic: It captures global topological structures—like holes—although it can be computed locally—by adding up vertices, edges and faces.

The Euler characteristic curve of an image is the vector of Euler characteristics of consecutive thresholded images. We illustrate this for the example image of a bone

333We thank Reinhold Erben and Stephan Handschuh from Vetmeduni Vienna for providing micro-CT scans of rat vertebrae. in Fig. (a)a with values ranging from 0 (black) to 255 (white). The Euler characteristic curve of this image maps each to the Euler characteristic of the set of pixels with gray value smaller or equal to . Figure 1 illustrates this process. This concept can be extended to more general settings, e.g., images with floating point gray values (see Sec. 3).

(a) Example image
(b)
(c)
(d)
(e)
(f)
(g) Euler characteristic curve of the example image
Figure 1: Definition of the Euler characteristic curve illustrated with an image showing a 2D slice of the distance transform of a segmented bone. Subfigures LABEL:sub@fig:t0LABEL:sub@fig:t4 show thresholded images for different thresholds . For , there is 1 component and 4 holes, hence . Starting from , there are no holes, so stabilizes at 1.

2 Theoretical Background

We give basic definitions needed in Sec. 3 using the language borrowed from computational topology [10]. With this we can provide precise definitions in arbitrary dimension and explain the connection between the Euler characteristic curve and other topological descriptors.

Cubical Cell.

A -dimensional cubical cell (short: cell) of embedding dimension is defined as the Cartesian product of intervals and singletons:

where exactly of the sets are intervals of the form with integers and the remaining sets are singletons with integers . A zero-dimensional cell is called a vertex, a one-dimensional cell an edge, a two-dimensional cell a square, a three-dimensional cell a cube.

Face.

A cell of embedding dimension is called a face of a cell of embedding dimension if is a subset of .

Cubical Complex.

A -dimensional cubical complex (short: complex) of embedding dimension is a finite set of cubical cells of embedding dimension such that

  1. The faces of each cell are also elements of the complex

  2. The intersection of any two cells is also an element of the complex444The second condition is implied by the first condition since we allow only consecutive integers as interval endpoints in the definition of cells.

where is the highest dimension of all cells in the complex. The complexes that appear in our algorithm always fulfill . Figure 2 shows an example of a cubical complex.

Figure 2: Example of a two-dimensional cubical complex of embedding dimension two with one square, eight edges and eight vertices.

Filtration of Cubical Complexes.

A sequence of complexes is called a filtration if the complexes are monotonically increasing: .

Sublevel Set Filtration.

Let be a cubical complex. A cell that is not a face of any other cell than itself is called a maximal cell. Let be a function, where is the set of maximal cells. This function can be extended to a function defined on all cells:

For each the sublevel set of this extended function is the set of cells that are a face of at least one maximal cell with -value smaller or equal to . As consists of only a finite number of cells, can only have a finite number of different function values . The sublevel sets form a filtration of cubical complexes—the sublevel set filtration induced by the function .

To see this, notice that the definition of implies that for each the sublevel set is a cubical complex: all faces of a cell belong to the same sublevel set as . Furthermore the sublevel sets are monotonically increasing . Therefore, the sublevel sets form a filtration.

Consecutive Thresholded Images as Sublevel Set Filtrations.

A -dimensional gray scale image with voxels555Throughout this paper we use “voxel” as multidimensional generalization of “pixel”. can be interpreted as a -dimensional cubical complex of embedding dimension with a function on its maximal cells: for each voxel index , the corresponding voxel position is represented by the -dimensional cell of embedding dimension :

The cubical complex is defined as the set of all these cells along with all their faces. The function maps each maximal cell to the gray value of the voxel with index . The sublevel set filtration666Another interpretation of voxel data is via the dual complex (voxels become vertices) using the lower star filtration. The way we use appears more natural in image processing context. The two approaches yield similar but not necessarily identical Euler characteristic curves. induced by the function is formed by consecutive thresholdings of the image.777Defining cells as products of closed intervals implies -connectivity for the voxels of the thresholded images. This corresponds to 8-connectivity for 2D images.

Euler Characteristic.

The Euler characteristic of a -dimensional complex of embedding dimension is defined as

where is the number of -dimensional cells in .

The Euler-Poincaré formula states

where is the th Betti number (the number of -dimensional holes). For a formal definition of cubical homology and the involved Betti numbers, see [10]. In three-dimensional space, is the number of connected components, is the number of tunnels and is the number of voids.

Euler Characteristic Curve.

The Euler characteristic curve of a filtration of cubical complexes is the vector

This vector can also be interpreted as a function

which is used to visualize the Euler characteristic curve as in Fig. (g)g.

Euler Characteristic Curve of an Image.

We already saw that for an arbitrary gray scale image the sequence of consecutive thresholded images is a filtration—the sublevel set filtration. The Euler characteristic curve of this filtration is the Euler characteristic curve of an image.

Connection to Other Topological Descriptors.

We want to put the above considerations in the wider context of computational topology. Two popular topological descriptors of a filtration are Betti curves and persistence diagrams [6], which both capture information about holes at different thresholds.

For each hole in the image the persistence diagram tracks the first and last threshold at which the hole occurs. The th Betti curve, which counts the -dimensional holes at each threshold, is easily computable from the persistence diagram. Furthermore, the alternating sum of the Betti curves yields the Euler characteristic curve. Therefore, the Euler characteristic curve summarizes a persistence diagram, but it can be computed locally.

The usefulness of the Euler characteristic curve suggests that the two richer descriptors may also be useful in image processing. However, large images are out of reach of the currently available persistence diagram software. For now, the Euler characteristic curve remains the only feasible option.

3 Algorithm

The input for our algorithm is a gray scale image of arbitrary dimension with voxels. The output is the Euler characteristic curve of this image, as defined in Sec. 2.

3.0.1 Range of Values.

In Sec. 2 the function maps to . However, in practice the gray values of an image are in a predefined range, usually or . If the range contains negative numbers, it can be shifted so that it starts from zero. For this reason we focus on ranges of the form with a positive integer . A version of our algorithm that can handle floating point values will be discussed at the end of this section.

3.0.2 Tracking the Changes.

It is suboptimal to compute the Euler characteristic for each threshold separately, as already noted in [18]. To avoid redundant computations we track the changes between consecutive thresholds. More precisely, we determine how each voxel contributes to the change in Euler characteristic. Therefore we first compute a vector of changes in Euler characteristic (VCEC) whose entries are the difference between the Euler characteristics of two consecutive thresholded images.888where . The Euler characteristic curve is then:

When changing from one thresholded image to the next , all voxels with gray value are included, along with all their faces that have not already been included at a previous threshold. We say that these new faces are introduced by these new voxels. More precisely, a face of a voxel is introduced by if all other voxels that have as a face fulfill one of the following two conditions:

  1. and  ,

where is any total order of the voxel positions, e.g., the lexicographical order999In lexicographical order a voxel at position succeeds a voxel at position if for the first where and differ.. The output of our algorithm is independent of the chosen total order. However, it is necessary to require in condition 2 to ensure that each face is introduced by a unique voxel.

Now we can decompose the cubical complex of the input image into blocks such that each block contributes to exactly one change of threshold. A block consists of a voxel together with all the faces it introduces. Which faces are introduced by a certain voxel is determined only by the gray values of the voxel’s neighbors. We exploit this locality in the design of our parallel streaming algorithm. Figure 3 shows the decomposition of a two-dimensional example image into blocks.

(a) The cubical complex consists not only of the voxels but also of their faces (red).
(b) Block decomposition shows which voxels introduce which faces, e.g., the upper left voxel introduces 3 edges and 2 vertices.
Figure 3: This is an illustration for the block decomposition.

3.0.3 Storage.

For the computations we store only the gray values of the voxels. The geometric information and the adjacency relations between cells are implicit in the voxel grid and calculated locally whenever needed. Similarly the function and the block decomposition it induces are never explicitly stored. Apart from storing the result vector, the memory overhead is essentially zero.

3.0.4 Streaming.

If the entire image does not fit into RAM, we divide it into chunks that fit into RAM. In our implementation we use a simple strategy: an image of size is divided into chunks of size (see Fig. 4, left). As these correspond to contiguous memory regions, streaming the chunks from a single input file is easy. We then separately compute the VCEC for each chunk either sequentially or in parallel.

Figure 4: Whenever a worker thread is free it loads one chunk along with a one voxel thick collar into its space in RAM and computes the VCEC of this chunk.

3.0.5 Parallel Computations.

For parallelism we use a thread pool. Each worker thread is assigned memory for a single chunk and one initially empty VCEC vector. One task is to read a chunk from disk, update the worker thread’s VCEC vector by the VCEC of this chunk and discard the chunk. At any given time at most chunks reside in RAM, where is the number of worker threads. Because different worker threads work with disjoint memory regions we achieve lock-free parallelism. The underlying data structure for the collection of VCECs is a vector of vectors, called euler_changes.

3.0.6 Processing one Chunk.

Along with the chunk we read a one voxel thick collar surrounding it. This way we have access to all neighbors of the voxels in the chunk (see Fig. 4). With this information we compute the VCEC of this chunk as specified in Algorithm 1.

0:  One chunk of an image along with a one voxel thick collar surrounding it and current_thread, the index of the worker thread processing this chunk.
0:  An updated version of the vector euler_changes[current_thread] which is the VCEC of all chunks this thread has processed.
1:  for all voxels in the chunk do
2:      = gray value of
3:     change
4:     for all faces introduced by  do
5:        if dim() is even then
6:           change
7:        else
8:           change
9:     euler_changes[current_thread][] += change
10:  remove chunk and collar from RAM
Algorithm 1 Computing the VCEC

3.0.7 Post-Processing.

In the end, when all chunks have been processed, a single thread sums up the VCECs yielding the VCEC of the whole image, which is a vector . The Euler characteristic curve is then computed as .

3.0.8 Analysis.

To analyze the complexity we remind that is the dimension of the image, is the number of voxels, is the number of gray values101010The input size is . and is the number of worker threads. We introduce a new variable , the number of voxels per chunk including the collar.

Assuming perfect parallelization, the worst case running time of our algorithm is because for each voxel we visit all its neighbors and sequentially post-process the VCECs. We analyze the practical scaling behavior in Sec. 4. As the dimension is usually small (mostly 2 or 3), the exponential term is usually not a problem in practice.

For each worker thread, we need integers of bits to store the gray values of a chunk. Additionally, the euler_changes data structure consists of 64-bit integers. Therefore, the total storage is bits in RAM. By decreasing the chunk size , the dominant part, , can be made arbitrarily small. Because of this, our algorithm works for arbitrarily large images on commodity hardware.

3.0.9 Other Ranges of Gray Values.

When the range of gray values is not of the form —for example for floating point values—one option is to use a hash map to store the euler_changes. If the number of different input values approaches , the output size dominates the overall storage and the advantage of a streaming approach disappears. In this situation, it is preferable to transform (i.e., round, scale, shift) the input values to obtain a range of the form . Running our standard algorithm on the transformed input yields the same result as transforming the domain of the Euler characteristic curve computed for the original data.

4 Experiments

We implemented the above algorithmic scheme in C++14. We made experiments on two different machines: a laptop with Intel core i5-5200U CPU with two physical cores clocked at 2.2 GHz with 8 GB of RAM and a workstation with Intel Xeon E5645 CPU with 12 physical cores clocked at 2.4 GHz and 72 GB of RAM. Table 1 shows the running time and memory usage for different 3D input images ran on the laptop. We use images from a standard data set111111Most of the images are available at www.byclb.com/TR/Muhendislik/Dataset.aspx, see [21, 4]. The names’ suffixes distinguish between 8- and 16-bit precision images. The last column shows that the running time is linear in and does not depend on the content of the image.

name size million voxels memory[MB] time[s] time[s]/million voxels
prone16 512512463 121.4 70.4 24.7 0.20
xmastree16 512499512 130.8 72.9 26.7 0.20
vertebra16 512512512 134.2 74 28.4 0.21
random8 512512512 134.2 51 28.9 0.22
random8 102410241024 1073.7 93.1 236.7 0.22
random8 204820482048 8589.9 261.6 1767.1 0.21
Table 1: Running time and memory usage.

Due to the above, we show the scaling behavior using a single image. The computations were performed on the workstation for a

image with 16-bit precision. We used from 1 to 12 threads taking the mean running time and standard deviation across 20 runs. Figure 

5 shows the speed-up gained using threads instead of one. In particular using 10 threads is 7.1 times faster than using a single thread.

Figure 5: Scaling with number of threads

Table 2 shows that the Euler characteristic curve of terabyte scale images can be computed on a single computer with limited memory. Memory usage could be further decreased by changing the chunking scheme. However, the experiments demonstrate that—for the foreseeable future—our implementation is a reasonable trade-off between performance and simplicity.

size threads time memory
409640964096 12 1.8 hours 1.93 GB
14 50014 5002 600 24 9 hours 4.5 GB
14 50014 5002 600 12 13 hours 2.27 GB
10 00010 00010 000 8 32 hours 3.98 GB
Table 2: Running time and memory usage for computations on large 3D images, performed on the 12-core workstation. The voxel values are generated independently from a uniform random distribution in the range . As we already showed, using other images of the same size will exhibit almost identical performance.

5 Related Work and Discussion

We review the work related to computing the Euler characteristic (curve) of images. We embed this in the context of computing other topological descriptors of images, particularly persistence diagrams.

Algorithms related to the Euler characteristic received a lot of attention in image processing [5, 16, 19, 22], starting from the seminal work by Gray [8]. Many modern implementations aim at real-time processing of small 2D images [18]. Our goal is different, namely handling large multidimensional images.

Computing other topological descriptors of images is a more recent advancement [20, 12, 10, 21, 15, 9, 4], which has however not entered mainstream image processing. Specialized methods for computing persistence diagrams handle 3D images up to voxels [4, 9] on what we consider commodity hardware. The main limitation is the storage of the entire image in memory. There exist distributed implementations [1], which alleviate the storage problem per machine, but are not specialized to image data, resulting in large overall memory overhead. The largest reported computed instances are in the range of on 32 server nodes.

Overall it is clear that a specialized, streaming approach is necessary for handling large images. We offer a robust implementation for the Euler characteristic curve, with possible future extensions to other topological descriptors.

We expect that these more complex topological descriptors will be computable for this terabyte scale data in the future but currently we are limited to using Euler characteristic curves. Let us discuss the properties of our algorithmic scheme and mention limitations of our current implementation.

The advantages of our algorithm are:

  • It can handle arbitrarily large images on commodity hardware.

  • It can handle images of arbitrary dimension.

  • Linear running time.

  • Predictable running time and memory usage.

  • Due to lock-free parallelism, running time scales well with increasing number of threads.

  • Our algorithm can be easily adapted to a massively-distributed setting using a map-reduce framework [3].

Some limitations of the current implementation:

  • It uses -connectivity. For other types (e.g., 6-connectivity for 2D images), modifications on the algorithm can be made.

  • For simplicity we use slices of the image as chunks. For very large images even a one voxel thick slice may not fit into memory.

  • For technical reasons we surround each chunk with a second collar of voxels with value . Effectively a five voxel thick slice has to fit into memory, which may become a problem for very large images.

  • To include the value we may need a larger data type. For example if the input contains all values from 0 to 255 we use a 16-bit data type to store the original values along with an extra value for infinity.

Despite the limitations, our implementation is robust and can handle even the largest data produced by state-of-the-art image acquisition technology. We released this software under the name CHUNKYEuler as open source: https://bitbucket.org/hubwag/chunkyeuler/src.

References

  • [1] Bauer, U., Kerber, M., Reininghaus, J.: Distributed computation of persistent homology. In: 2014 Proceedings of the Sixteenth Workshop on Algorithm Engineering and Experiments (ALENEX). pp. 31–38. SIAM (2014)
  • [2] Colley, W.N., Richard Gott, III, J.: Genus topology of the cosmic microwave background from wmap. Monthly Notices of the Royal Astronomical Society 344(3), 686–695 (2003)
  • [3] Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)
  • [4] Delgado-Friedrichs, O., Robins, V., Sheppard, A.: Skeletonization and partitioning of digital images using discrete morse theory. IEEE transactions on pattern analysis and machine intelligence 37(3), 654–666 (2015)
  • [5] Dyer, C.R.: Computing the euler number of an image from its quadtree. Computer graphics and image processing 13(3), 270–276 (1980)
  • [6] Edelsbrunner, H., Harer, J.: Computational topology: an introduction. American Mathematical Soc. (2010)
  • [7] Gonzalez, R., Wintz, P.: Digital image processing. Addison-Wesley Publishing Co., Inc.,Reading, MA (Jan 1977)
  • [8] Gray, S.B.: Local properties of binary images in two dimensions. IEEE Transactions on Computers 100(5), 551–561 (1971)
  • [9] Günther, D., Reininghaus, J., Wagner, H., Hotz, I.: Efficient computation of 3d morse–smale complexes and persistent homology using discrete morse theory. The Visual Computer 28(10), 959–969 (2012)
  • [10] Kaczynski, T., Mischaikow, K., Mrozek, M.: Computational Homology. Springer-Verlag, New York (2004)
  • [11] Odgaard, A., Gundersen, H.: Quantification of connectivity in cancellous bone, with special emphasis on 3-d reconstructions. Bone 14(2), 173–182 (1993)
  • [12] Pikaz, A., Averbuch, A.: An efficient topological characterization of gray-levels textures, using a multiresolution representation. Graphical Models and Image Processing 59(1), 1–17 (1997)
  • [13] Rhoads, J.E., Gott III, J.R., Postman, M.: The genus curve of the abell clusters. The Astrophysical Journal 421, 1–8 (1994)
  • [14]

    Richardson, E., Werman, M.: Efficient classification using the euler characteristic. Pattern Recognition Letters 49, 99 – 106 (2014)

  • [15] Robins, V., Wood, P.J., Sheppard, A.P.: Theory and algorithms for constructing discrete morse complexes from grayscale digital images. IEEE Transactions on pattern analysis and machine intelligence 33(8), 1646–1658 (2011)
  • [16] Saha, P.K., Chaudhuri, B.B.: A new approach to computing the euler characteristic. Pattern recognition 28(12), 1955–1963 (1995)
  • [17] Ségonne, F., Pacheco, J., Fischl, B.: Geometrically accurate topology-correction of cortical surfaces using nonseparating loops. IEEE transactions on medical imaging 26(4), 518–529 (2007)
  • [18] Snidaro, L., Foresti, G.: Real-time thresholding with euler numbers. Pattern Recognition Letters 24(9–10), 1533 – 1544 (2003)
  • [19] Sossa-Azuela, J.H., et al.: On the computation of the euler number of a binary object. Pattern recognition 29(3), 471–476 (1996)
  • [20] Verri, A., Uras, C., Frosini, P., Ferri, M.: On the use of size functions for shape analysis. Biological cybernetics 70(2), 99–107 (1993)
  • [21] Wagner, H., Chen, C., Vuçini, E.: Efficient computation of persistent homology for cubical data. In: Workshop on Topology-based Methods in Data Analysis and Visualization (2011)
  • [22] Ziou, D., Allili, M.: Generating cubical complexes from image data and computation of the euler number. Pattern Recognition 35(12), 2833–2839 (2002)