 # Fast Implementation of a Bayesian Unsupervised Algorithm

In a recent paper, we have proposed an unsupervised algorithm for audio signal segmentation entirely based on Bayesian methods. In its first implementation, however, the method showed poor computational performance. In this paper we address this question by describing a fast parallel implementation using the Cython library for Python; we use open GSL methods for standard mathematical functions, and the OpenMP framework for parallelization. We also offer a detailed analysis on the sensibility of the algorithm to its different parameters, and show its application to real-life subacquatic signals obtained off the brazilian South coast. Our code and data are available freely on github.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The problem of signal segmentation arises in different contexts (Makowsky and Hossa (2014) Ukil and Zivanovic (2006) Schwartzman et al. (2011) Kuntamalla and Reddy (2014) Theodorou et al. (2014)). The problem is broadly defined as follows: given a discretely sampled signal , divide it in contiguous sections that are internally homogeneous with respect to some characteristic. The segmentation is thus based on the premise that the signal structure changes one or many times during the entire sampled period (where is the sampling rate in ), and one is looking for the change points.

The signal structure, in this context, refers to a parametric model describing the signal evolution; this can be a tonal model, for instance, where one is interested in detecting changes on the fundamental frequency or other features of the spectrum.

In a previous paper Hubert, Padovese and Stern (2018) we introduced a probabilistic model where the changing points refer to changes in the signal’s average power; our main goal was to segmentate acoustic subacquatic signals, in order to separate sections that are likely to contain significant events (boat engines, fish choruses, whales, etc.). We compared our algorithm to a traditional peak detection strategy, and found promising results.

Our algorithm, however, is very computationally intensive, and in our first tests, using a pilot version written in MATLAB ®, the algorithm took on average seconds to completely segmentate a minutes audio signal with many power changes. To improve this situation, we present here a Python version of the algorithm, using the Cython package Seljebotn (2009) to generate optimized C code.

With this new, optimized version of the algorithm, we offer a detailed analysis of the algorithm’s sensibility to the different parameters and in different situations. Finally, we illustrate the algorithm’s performance in samples of subacquatic recordings in the region of Santos, SP, Brazil. Both the algorithm and the samples are openly available at https://github.com/paulohubert/bayeseg.

The paper is organized as follows: the following section reviews the motivation, the model and the sequential segmentation algorithm. Section 3 discusses in detail the fast implementation using Cython. In section 4 we study the algorithm’s sensibility to changes in the parameters, in different situations and using simulated signals. Section 5 presents an analysis of real subacquatic signals. Section 6 concludes the paper.

## 2 The sequential segmentation algorithm

### 2.1 Motivation

Our main goal when developing this algorithm was to analyze signals coming from the OceanPod: an acoustic underwater recording system developed by the engineering school at University of S o Paulo (EP - USP) Sanchez-Gendriz and Padovese (2016).

One OceanPod was installed in January 2015 at the Parque Estadual Marinho da Laje de Santos (Laje de Santos Marine State Park), a marine preservation park in the region of Santos, SP, Brazil. The prime motivation for installing the hydrophone in this location was to study the patterns of behavior of fishing vessels that navigate through the park. Since fishing in the park is prohibited, understanding of these patterns can be a valuable asset in the design of fiscalization policies.

As a secondary objective, the data coming from the hydrophone also allows the study of other noise irradiating events, either caused by man or nature. Obtaining samples of different kinds of acoustic events from a region with high diversity of marine life can be potentially very useful for the investigation of animal behavior off the brazilian coast, in the least by providing annotated samples to be fed to classification algorithms that can later be developed to search for specific events.

The OceanPod is installed at a depth of m, and delivers minutes long audio files sampled at . With this data, we are interested in indicating times and duration of boats passing by, but also in detecting other possibly interesting events. Given this broad objective of separating segments regardless of their specific nature, we developed a methodology to divide the signal into segments based on differences of the average power only. This leads us to the sequential segmentation algorithm, which we present next.

### 2.2 Model and algorithm

To define the sequential segmentation algorithm (SeqSeg), we start by assuming that the (discretely sampled) signal has mean amplitude, and finite power . Given these two assumptions, by the maximum entropy principle Jaynes (1982) Jaynes (1987) we adopt a Gaussian probabilistic model for the signal, .

Now suppose that from sample on, the signal’s power shifts from to , with . This leads to the following probabilistic model:

 yt∼{N(0,σ20)if t≤¯tN(0,δσ20)if t>¯t (1)

The likelihood function associated with this model is thus

 L(¯t,σ20,δ|y) =(2πσ20)−N2δ−N−¯t2× (2) exp⎡⎣−∑¯tt=1y2t2σ20−∑Nt=¯t+1y2t2δσ20⎤⎦ (3)

By adopting a uniform, uninformative prior for , and Jeffreys’s prior Jaynes (1968) Jeffreys (1946) for , we can integrate equation 2 analitically and obtain the posterior

 P(¯t|y)∝ π(¯t)⋅(¯t∑t=1y2t)−(¯t+6)2(N∑t=¯t+1y2t)−(N−¯t−6)2× (4) Γ(¯t+62)Γ(N−¯t−22) (5)

This is a discrete distribution with support on ; to obtain the MAP (Maximum a posteriori

) estimate for

, we can simply evaluate the above function on its entire domain.

Our real signals, however, will usually have more than one change point. To model this situation, one approach would be to define the model for change points, given knowledge of . This, however, would insert many complications in the algorithm design. For instance, there would be the problem of estimating when it is unknown (the most frequent situation), and the support of the posterior distribution for the change points would be many-dimensional, making it much more difficul to find the MAP estimates.

To avoid these problems, we propose to apply the algorithm in a sequential manner: we start by dividing the original signal in two sections, using model 4 above. Then, if there is sufficient evidence of a difference in power between the two segments, we apply again the same method to each section, and so on until no more significantly different segments are found. As we will see on the following sections, this sequential approach is efficient and captures precisely the changepoints even when there are many of them in the signal.

With this approach, one full step of the algorithm will consist of two substeps: first, to estimate the segmentation point ; second, to compare the power of the segments, calculating a measure of evidence for the hypothesis . A diagram illustrating the algorithm’s structure is included in figure 1.

The evidence measure we use is the Full bayesian significance test (FBST) of Pereira and Stern Pereira and Stern (1999). This is an inferential procedure aimed at the calculation of evidence for sharp hypothesis (hypothesis that induce a reduction on the dimension of the parametric space). The hypothesis is sharp, since under we have the parametric space , and the unrestricted (full) parametric space is .

To compare the segments with respect to their variances, we apply again the Gaussian model, now conditionally on

. This means that the likelihood on equation 2 is the same, but now is considered fixed (known). Given the prior distributions and , and defining the segments and , the full posterior is given by

 P(d,s|y1,y2)∝ πδ(d)πσ(s)(2πs2)−n1+n22d−n22× (6) exp[−∑n1t=1y21,t2s2−∑n2t=1y22,t2ds2] (7)

where and are the segments’ sizes.

We adopt again a Jeffreys’ prior for . For , however, in order to be able to control the sensibility of the algorithm, we propose a Laplace prior distribution centered around :

 πδ(d)=1βe−|d−1|β (8)

The main reason for using the Laplace distribution is that it is a) symmetric, b) high-peaked around its mean, c) has higher tails than the gaussian. Also, by carefully choosing the hyperparameter

, we hope to be able do adapt our algorithm to different situations. In section 4 below we investigate the algorithm results under different choices for .

Calculation of evidence in the FBST framework now involves two steps: first, we obtain the maximum posterior under :

 p0=maxsP(1,s|y1,y2) (9)

In our model, this can be done analytically; by differentiating equation 6 we find

 s0=argmaxsP(1,s|y1,y2)=∑Ni=1y2iN+1 (10)

After obtaining , we now define the evidence against as

 T(p0)= {(d,s)∈R+×R+:P(d,s|y1,y2)>p0 (11) ev(H0)= ∫T(p0)P(d,s|y1,y2)dsdd (12)

The evidence against the hypothesis is the posterior measure of the surprise set of points (in the full parametric space) having larger posterior values than the maximum posterior under . If is high (i.e., the surprise set has high measure), the manifold defined by

traverses regions of low posterior probability, leading us to assign large evidence against the hypothesis.

To obtain the evidence against , thus, we must estimate the integral above. This cannot be done analytically, and simulation methods must be adopted.

We choose to apply the Adaptive Metropolis algorithm of Haario et al Haario et al. (2001). This is a random walk, Metropolis algorithm with Gaussian candidate distributions for the increment in the parameters. To improve the mixing properties of the chain, the covariance matrix of the candidate distribution is first estimated from a few runs of the usual Metropolis algorithm, and then adaptively updated during the burn-in period of the chain. In the next section, we describe the details of the algorithm implementation.

## 3 Fast implementation using Cython

To implement the SeqSeg algorithm we choose the python language, for a few reasons. First, because python is now widely used in the scientific community (specially after the release of NumPy and SciPy), and is thus a convenient programming language to use for open source scientific software. Second, because it is a simple to read language, that will make it easier for users to understand and modify the code. And third, because of the Cython package.

Cython (http://cython.org) is a package that originated from the Pyrex project Seljebotn (2009). Its objective is to automatically generate optimized C code from a python source. To accomplish this, the package demands only that the programmer explicitly declares and types each variable in the Python code; by running the Cython interpreter, and compiling the resulting C code, one is able to improve numerical performance a hundred (sometimes a thousand) fold. Also, Cython supports the OpenMP (http://www.openmp.org) library for multiprocessing programming, with allows the programmer to boost even further the numerical performance of parallelizable methods.

Besides using Cython and parallelizing our algorithm, we also chose to adopt the GSL library (http://www.gnu.org/software/gsl) for random number generation, and the standard C implementations of math.h for the mathematical functions (such as the gamma and log-gamma functions). All of these components together allowed us to improve our average computing time from to less than second.

In the remainder of this section, we introduce our bayeseg python module, which implements the SeqSeg class. SeqSeg is the interface that allows easy application of our algorithm. We also describe the details involved in the implementation of the brute-force optimization step, and the calculation of the evidence value.

### 3.1 The bayeseg module

We implement our algorithm in a Python module which we call bayeseg (from Bayesian segmentation). The code is avaliable on the git repository http:/github.com/xxxx, along with the real data samples used in this paper.

The module is composed of two classes: the OceanPod class is a simple interface that allows the user to easily read the audio files from the hydrophone. It also can convert segments’ indexes to datetime objects, and vice-versa, based on the file name (assuming a specific pattern of file name that contains the date and time of the recording).

The second class is the SeqSeg class, that implements the algorithm. In the next section, we describe this class in more detail.

### 3.2 The SeqSeg class

Our interface class, SeqSeg, provides methods that allow the user to load data, initialize the parameters, and obtain the segmentation of a signal.

The first method, feed_data, takes a NumPy array as argument, stores it internally, and preprocess the signal calculating and storing all the cumulative sums in the form

. This calculation speeds up the segmentation process by eliminating the need to iterate through the vector calculating the sum of squares at each step.

The initialize method of SeqSeg takes the following arguments:

• double : the hyperparameter of the prior for ;

• double : the threshold for the decision functin; when comparing two segments, the algorithm will accept that they have significantly different powers whenever the evidence for is higher than . In other words, the higher the value of , the greater the number of segments produced by the algorithm;

• int mciter: the number of iterations for the MCMC algorithm;

• int mcburn: the number of iterations for the burn-in period of the MCMC algorithm;

• int nchains: the number of parallel chains to run.

There are two more quantities to fully define the SeqSeg algorithm: the minimum segment length and the time resolution.

The minimum segment length ensures that the algorithm stops; it imposes a minimum length for a valid segment, and whenever the change point estimation step finds a cutting point that creates segments with a lower length than this minimum, it skips the evidence calculation step, stopping the segmentation.

The time resolution affects the change point estimation step only, in the following way: with a time resolution of , the brute-force MAP estimation step will evaluate the posterior for on the points . A time resolution greater than 1 speeds up the algorithm (since the brute-force MAP estimation of will evaluate the objetive function only on points separated by ) at the cost of possibly missing a change point.

Both of these parameters can be set on the run, when calling the segments method. This method takes as parameters the minimum segment length and the time resolution, and returns a list with the index of all change points as estimated by the SeqSeq algorithm.

### 3.3 Estimation of the segmentation point

The first step of the SeqSeg algorithm is the MAP estimation of the change point. This is accomplished by simply evaluating the posterior 4 over the set of points , where is the time resolution.

This step can be very demanding, specially with large signals. It involves the calculation of the log-gamma function twice, calculation of two natural logarithms, plus 21 floating point operations.

To achieve maximum performance, and also to make this step parallel, we implement the posterior 4 as a cdef function (named cposterior_t in our code). In Cython, this means that this function will not be available from the Python module, but will be restricted to the C compiled library. In our python implementation, this function is also defined as nogil, which means that it does not make explicit use of the Python interpreter. This is necessary to allow for parallelization (Python is defined to not allow two or more threads of the interpreter to run simultaneously, a feature known as global interpreter lock, gil; to write parallel code, then, one has to circumvent this by explicitly declaring the function to be interpreter-independent), and the consequence is that the code for the function can not use any Python object; it is restricted to basic operations, and to calls to other nogil functions.

The parallelization is implemented using the prange function from Cython; prange, according to the Cython online documentation, causes OpenMP to automatically start a thread pool and distribute the work according to a defined schedule (which can be set to static, dynamic, guided and runtime; for details, see the Cython online documentation at http://cython.readthedocs.io).

### 3.4 Calculation of the evidence value

After obtaining the MAP estimate for the segmentation point, the algorithm divides the signal in two segments, and calculates the evidence for the hypothesis that the segments have equal variance.

This calculation is performed by the function tester of the SeqSeg class. It starts by calculating , the maximum posterior under , which is done analitically. Then it runs the MCMC algorithm to calculate the posterior integral over the surprise set .

In our implementation, the method that runs the MCMC is defined as both cdef and nogil, to achieve maximum performance and also to be possibly executed in parallel. This is the cmcmc method in our code.

Our goal with the MCMC is to sample from the full posterior 6 with support over . To obtain the samples, we adopt the Adaptive Metropolis algorithm of Haario Haario et al. (2001).

The simulation occurs in three phases: first, we generate candidate points from candidate distributions and where and are the current points. Each point is accepted or rejected following the usual Metropolis-Hastings random walk acceptance probability. To start this phase, we initalize the state of the chain using Gaussians with the following averages

 ~s= ∑N1i=1y21,iN1−1 (13) ~d= ∑N2i=1y22,iN2−1⋅N1−1∑N1i=1y21,i (14)

where and ( and ) are the points and the size of the first (second) segment. The dispersion of these initial Gaussians are set to and .

The goal of this phase is to obtain the first estimates for the candidate covariance matrix; after rounds, we estimate , and from the data as follows:

 σ2d= ∑t0i=1(di−¯d)2t0−1 (15) σ2s= ∑t0i=1(si−¯s)2t0−1 (16) cov(d,s)= ∑t0i=1(di−¯d)(si−¯s)t0−1 (17)

Here and are the samples of the two parameters of the model, and , respectively.

After the estimation, the second phase of the simulation starts. This phase is a burn-in period (meaning that the samples generated during this phase will not be used in the estimation of the integral), during which the candidate points are generated simultaneously for both parameters, .

During this phase, the covariance matrix is updated recursively, following the method from Haario et al. (2001):

 Σt+1=t−1tΣt+sdt(t¯Xt−1¯XTt−1−(t+1)¯Xt¯XTt+ϵI2) (18)

Here represents the column vector of the sample points average up to point , and

is the 2-dimensional identity matrix.

This method needs two parameters, and . The first is a scaling parameter which must be set according to the dimension of the parametric space we are sampling from. Following the recomendation from Haario et al. (2001) (which in turn is following an analysis by Gelman et al. (1996)), we define it to be . The second parameter is present only to avoid singularity of the covariance matrix. Our analysis indicate that it must be set to a very low value; in our code, we define (it must be kept strictly positive to guarantee ergodicity of the adaptive chain).

After the burn-in, the actual sampling starts. In this phase, we generate candidates from a Gaussian , where the covariance matrix is now fixed. At each step, we obtain by applying the usual Metropolis-Hasting acceptance probabilities on the candidate points. After that, we obtain the posterior value , compare it to the maximum posterior under , and keep track of the number of points with posterior greater than . This number, divided by the chain’s total length (considering only the last phase), is the evidence value against .

The tester method starts nchains parallel chains; this feature is useful if one is interested in keeping track of the chain’s convergence, by using for instance Gelman-Rubin-Brooks statistics Brooks and Gelman (1998). For each chain, the cmcmc method returns the evidence value for (); tester then averages this evidence, and compares it to the parameter . If the (average) evidence for is greater than , the algorithm stops, declaring that the two segments are equivalent; otherwise, it inserts the current change point in the list, and tries to segment again each of the new segments.

The algorithm is summarized in the pseudocode 1.

## 4 Sensibility to parameters

### 4.1 Time resolution for cutpoint estimation

To appropriately apply the SeqSeg algorithm, a few choices must be made. First, one has to pick the minimum segment length, and the time resolution. The minimum segment length can be set arbitrarily; in most cases the algorithm will stop before reaching small segments, and this parameter has little or no effect. We recommend setting it to a size representing what one expects from the data; in subacquatic audio signals, for instance, we usually avoid segments of less than half a second, since these would hardly contain a significant or interesting event.

The time resolution, on the other hand, have a direct impact on the performance of the algorithm, since it dictates the number of function evaluations during the change point estimation step. However, since the bayeseg module implements a parallel version of this step, and all the calculations involved are fully optimized, it is possible to achieve good performance with a resolution close or equal to 1. Of course, the impact of the parallelization is strongly dependent on the configuration of the system that will run the algorithm.

In table 1, we report computing time for the segmentation step only, on a system with 4 1.6 GHz i5 processors and 8 Gb RAM memory, running Ubuntu 16.04. We simulate signals with lengths of , and points, and use time resolutions of , , and

. The signals are simulated from a Gaussian distribution with

mean and standard deviation of

.

As we see, the total elapsed time necessary to evaluate the posterior over all points in a long signal is in the order of seconds; this means that, when running on a system with many cores, one can safely set the time resolution to 1 and guarantee maximum precision of the algorithm, without incurring in large processing costs.

### 4.2 Convergence of MCMC step

After defining the minimum segment length and the time resolution, the next step should be to define the sample sizes for the MCMC. This can be done by using simulated or real signals, by starting the algorithm with small values for mcburn and mciter, running the segmentation multiple times and observing the variance of the number of segments found. If this number variates between multiple runs of the segmentation, it is an indication that the chain must be simulated for longer times. This is the case because our algorithm is completely deterministic, except for the calculation of the evidence value. This calculation, however, should converge whenever the MCMC method converge sufficiently itself.

It is also possible (and advisable) to run multiple parallel chains and monitor convergence measures such as the Gelman-Rubin statistic Brooks and Gelman (1998). This statistic is obtained in the following way: suppose we run independent chains, starting from different points (drawn from an overdispersed distribution). From each chain we obtain samples of the parameter of interest, and define

 ^θm= 1nn∑i=1θm,i (19) ^σ2m= 1n−1n∑i=1(θm,i−^θm)2 (20) ^θ= 1MM∑i=1^θm (21) B= nM−1M∑i=1(^θm−^θ)2 (22) W= 1MM∑i=1^σ2m (23)

Finally, we define the pooled variance as

 ^V=n−1MW+M+1MnB (24)

and the Gelman-Rubin statistic as

 ^R=^VW (25)

The value of should approach 1 for appropriately convergent chains. If it is higher than 1, longer samples are needed to approach the posterior distribution.

In our algorithm, the MCMC chains will be used to test equality of variances between two segments. To analyze the test’s behavior we simulate gaussian signals with points, with standard deviations , and . When , the cutting point is at .

We use chains of length , and run parallel chains in each situation. For all simulations, we adopt . In table 2 we report the statistic, and other summary measures for the chains.

We see from table 2 that a sample size of around shows good convergence measures already. The minimum and maximum evidence are very tight even for small sample sizes, which is an effect of the small posterior variance. The stopping criteria based on the FBST would be effective in all of these situations, segmentating the signals when , and keeping the original signal when .

It is important to note a few things about our algorihtm and the characteristics of the signals we want to segmentate. These are audio files with a duration of 15 minutes, at a sampling rate of . This means that our original signal will be of size . Supposing for simplicity that the segmentation point is exactly at the middle of the signal, this gives two segments of roughly points, whose variances we need to compare.

With a sample size of this order, and assuming our model is correct, the posterior will be extremely concentrated around its mode; so care must be taken when running the chains, to avoid them to have too small an acceptance ratio. This is accomplished by the above mentioned adaptive algorithm, that keeps track of the covariance matrix of the Gaussian candidate.

### 4.3 Prior for δ and decision function threshold

Now there remains the and parameters. These two parameters control the algorithm’s sensibility: lower values of concentrate the prior distribution of around , thus demanding more evidence from the data to accept different values of this parameter. Lower values of make the algorithm accept equality of variances for lower values for the evidence, thus making the algorithm less prone to accept segments.

To evaluate the performance of the algorithm for different values of and , we simulate a signal with a total length of points; we use a Gaussian distribution with 0 mean and standard deviation of 1. We then change the signal’s variance in different segments, by multiplying the simulated values by ; we simulate a total of 6 segments, with boundaries given by ; ; ; ; ; ; . We create alternated segments, the first always having standard deviation of .

For each test is fixed; we run three batches of tests with in the first, in the second, and finally .

Finally, for each simulated signal with a given value of , we run the segmentation algorithm 30 times for each combination of , , , , , and . We compute the average elapsed time for each combination of values, and also the minimum and maximum of the number of segments returned by the algorithm. Tables 3 through 6 present the results.

For all these tests we used MCMC samples of size after burning another points. Whenever the MCMC step is sufficiently long, we do not expect variations between runs of the algorithm over the same signal. This is indeed what we see in the results, except for , when there is a small variation in the number of segments for and .

The algorithm results were the same for all different values of . This means that, in these tests, the evidence value was very close to the extremes of and when the segmentation point correctly identified a segment (in the first case) or when the segmentation did not capture a change in variance (in the second case).

These tests indicate that the algorithm is very robust, specially to values of , but also for values of in a wide range. Values of that are too small lead to undersegmentation of the signal (i.e., the algorithm ignores some existent segmentation points), as we expected; but when , the algorithm’s output was the same for ; when , i.e., when the difference in variance is higher, it identified the correct number of segments for .

Now there remains the test for different values in each segment. To simulate this situation, we keep the same structure as above ( points with segment boundaries at ; ; ; ; ; ; ) but now we use for the second segment, for the fourth segment, and for the sixth. We fix and run the algorithm for in an evenly spaced grid of points between and , then again on another grid of points between and .

The results show that the number of segments is 1 when is very low (less than ). After that, the number of segments grows quickly, reaching segments when . After this threshold, the algorithm becomes more stable; finally, it finds the correct number of segments when , and then the behavior completely stabilizes apart from a few values of for which there is one more segment. This behavior is illustrated in figures 2 and 3. Figure 2: Number of segments for different values of β (α=0.1) Figure 3: Number of segments for different values of β (α=0.1)

Recalling the ground truth segments

1. From to , with variance ;

2. From to , with variance ;

3. From to , with variance ;

4. From to , with variance ;

5. From to , with variance ;

6. From to , with variance

we see that the algorithm first divided the signal in order to completely separate the higher variance segment (). This also separates the segment with the second highest variance (). After that, the change points at and are found; the last segment to be identified is the smallest one, at .

Also we note that, when there is any oversegmentation (for some higher values of ), it always occurs inside the segment from to .

Now, if we fix and use a points uniform grid for between , we find that the number of segments is the same for all values of in this situation, indicating again that the algorithm is very robust to choices of this parameter whenever is well-calibrated. However, if we fix and vary in the same grid, we see oversegmentation occuring for higher values of , see figure 4. Figure 4: Number of segments for different values of α (β=0.1)

From the results of this section, then, we conclude that the critical parameter to be calibrated is ; when it is well-calibrated, the algorithm becomes unsensitive to changes in . When is too high, the algorithm will tend to oversegmentate the signal; this oversegmentation will grow with .

These observations indicate that, when calibrating the algorithm on real signals, the best procedure is to use low values for (), and find the value that stabilizes the number of segments as equal to the ground truth. In the next section, we apply this procedure in the calibration of the parameters using real samples from the OceanPod.

## 5 Application: event detection on subacquatic signals

In this section, we apply the sequential segmentation algorithm to its original motivating task: the segmentation of subacquatic acoustic signals.

The team that operates the OceanPod, in cooperation with the administration of the Parque da Laje administration, collected 2 years of audio recordings (2015 and 2016). These recordings are organized as 15 minutes long .wav files with a single channel.

TO illustrate the behavior of the algorithm, we select a few files from the collection that have distinct characteristics: one when visual inspection of the signal’s spectrogram doesn’t reveal any significative events; one in which we can identify one long duration event, and a third where we can identify many short duration events. Also, these samples are the same that we analyzed in Hubert, Padovese and Stern (2018) using the pilot version of the algorithm; we also repeat the same parameters used in this previous work, in order to be able to compare our computing times in the two versions of the algorithm.

The first file we use is a recording from 30 January 2015, Saturday, from 02:02:56 to 02:17:56. During this period, there is no perceivable activity beyond background noise (concentrated around kHz). When applying the segmentation to this sample, we would then expect no segments to be found.

The second is a recording from 2 February 2015, Monday, from 07:50:49 to 08:04:49. In this sample, we find a long duration event, starting at time and lasting for approximately min. By listening to the sample, we identify the sound of a large sized vessel, passing by at a long distance and with low speed. The segmentation algorithm applied to this sample should detect one or two change points for the signal’s power, ideally forming a segment starting around the beginning of the signal and ending around , i.e., min into the signal (recall that the sampling rate is Hz). Figure 7: Waveform of the second sample: one long duration event Figure 8: Spectrogram of the second sample: one long duration event

The third sample is from 8 February 2015, Sunday, from 11:26:39 to 11:41:39. During this 15 minutes, there are many events taking place; listening to this sample, we identify the engine of smaller vessels, being turned on and off and near the hydrophone spot. In this sample, we expect the segmentation algorithm to detect many events.

As a first test, we set the mininum segment length and the time resolution to , and obtain samples from the MCMC step. Following the observations from last section, we fix and run the algorithm on a grid for . We set and use points on the grid. A summary of the results appear in table 7. The plots of number of segments as functions of appear in figure 11 below.

The first thing we note is that indeed the execution times dropped sensibly: all segmentation tasks shows in table 7 took less than 2 seconds. This is a major accomplishment for our algorithm, one that allows us to experiment with it and to apply it to the segmentation of very long signals, in the order of months or even years.

However, we also noticed a high sensibility for on the number of segments found, specially in the third sample that contains many events. The number of segments grows with , and shows little stabilization. Also, if we pick a value for with high accuracy on the first two samples, we will ignore many meaningful segments that could be otherwise found in the third sample. In other words, there might not exist a single optimal value for that will work for all kinds of signals.

One posible way to deal with that is to choose by balancing the analysis’s goals: one might consider what is preferable, to have many segments of a uniform signal, or to have longer segments with possibly other change points going undetected inside them. It is really a matter of the signal’s diversity and of what the subsequent step of the analysis is going to be. Consider, for instance, that after segmentating the signal one applies a clustering procedure to group together signals with similar features; in this case, it might be best to choose a higher value of

, allowing some oversegmentation and expecting that these uniform segments would be again classified together after clusterization.

Another strategy might be to start the segmentation procedure with a low , increase it by a small factor and run the segmentation again. One can then accept the segmentation when there is some degree of stabilization (for instance if the number of segments is the same after steps with increasing values). Of course, this kind of procedure would add two new parameteres to the main algorithm: the increase factor for , and the stabilization parameter.

In the samples we analyzed in this section, we can see what would be the outcome of this method: if we started at , as we did, increased it by an additive factor of , and waited for consecutive runs of the algorithm with the same number of segments, we would end with segments for the first sample, for the second and for the third. This numbers seem quite reasonable, as we can see from figures 12 and 13.

A third possibility would be to propose an adaptive scheme for , and increase it as the segment sizes become smaller. Again, this would include new parameters in the algorithm, and also would demand an analysis of the precise form that should take as function of segment sizes.

## 6 Conclusions

The sequential segmentation algorithm is a Bayesian unsupervised methodology aimed at segmentating audio signals. It is based on the basic assumption that the occurrence of events in an audio signal induces a change in the signal’s total power.

In a previous work we presented the algorithm and compared it to a peak detection strategy, with promising results. However, the computing time involved in the sequential segmentation algorithm was almost prohibitive; also, because of this high computing cost, it was not possible to provide a detailed study of the algorithm’s sensibility to parameters.

In this paper we presented a Python module that implements the sequential segmentation algorithm using Cython. With this new implementation, we were able to reduce the computing time from to in the same conditions as before. This represents a performance gain in the order of times.

We have also provided details on the MCMC step involved in the calculation of evidence values for the hypothesis of equality of variances, which is part of the algorithm’s stopping criteria, and finally we analyzed the algorithm’s sensibility to its parameters using both simulated and real signals.

What we found is that the algorithm is robust to the acceptance threshold, in most situations; we recommend setting as a default value. Once fixed the value of , there remains only the need for calibrating for .

The calibration procedure for this parameter, however, depends on the characteristics of the data and the goals of the analysis.

The code and data files used in this paper are available at https://github.com/paulohubert/bayeseg.

## References

• Brooks and Gelman (1998) Brooks, S.P., Gelman, A. General Methods for Monitoring Convergence of Iterative Simulations, Journal of Computation and Graphical Statistics
• Gelman et al. (1996) Gelman, A.G., Roberts, G.O., Gilks, W.R. Efficient Metropolis Jumping Rules, In: J.M. Bernardo, J.O. Berger, A.F. David, A.F.M. Smith (eds), , Oxford University Press, Oxford, USA (1996)
• Haario et al. (2001) Haario, H., Saksman, E., Tamminen, J. An Adaptive Metropolis Algorithm, Bernoulli
• Hubert, Padovese and Stern (2018) Hubert, P., Padovese, L., Stern, J.M. A Sequential Algorithm for Signal Segmentation, Entropy
• Jaynes (1968)

Jaynes, E.T. Prior Probabilities,

IEEE Transactions On Systems Science and Cybernetics
• Jaynes (1982) Jaynes, E.T. On the Rationale of Maximum-Entropy Methods, Proceedings of the IEEE
• Jaynes (1987) Jaynes, E. Bayesian Spectrum and Chirp Analysis, In Maximum-Entropy and Bayesian Spectral Analysis and Estimation Problems, C.R. Smith, G.J. Erickson, Eds.; D. Reidel Publishing Co., Dordrecht, Holland (1987)
• Jeffreys (1946) Jeffreys, H. An Invariant Form for the Prior Probability in Estimation Problems, Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences
• Kuntamalla and Reddy (2014) Kuntamalla, S., Ram Gopal Reddy, L. An Efficient and Automatic Systolic Peak Detection Algorithm for Photoplethysmographic Signals, International Journal of Computer Applications
• Makowsky and Hossa (2014) Makowsky, R., Hossa, R. Automatic Speech Signal Segmentation Based on the Innovation Adaptive Filter, Int. J. Appl. Math. Comput. Sci
• Pereira and Stern (1999) Pereira, C.A.B., Stern, J.M. Evidence and credibility: full Bayesian significance test for precise hypotheses, Entropy
• Sanchez-Gendriz and Padovese (2016) Sanchez-Gendriz, I., Padovese, L. Underwater Soundscape of Marine Protected Areas in the South Brazilian Coast Marine Pollution Bulletin
• Schwartzman et al. (2011) Schwartzman, A., Gavrilov, Y., Adler, R.J. Multiple Testing of Local Maxima for Detection of Peaks in 1D, The Annals of Statistics
• Seljebotn (2009) Seljebotn, D.S. Fast Numerical Computations with Cython, Proceedings of the 8th Python in Science Conference (2009) pp. 15-22
• Theodorou et al. (2014) Thedorou, T., Mporas, I., Fakotakis, N. An Overview of Automatic Audio Segmentation, I.J. Information Technology and Computer Science
• Ukil and Zivanovic (2006) Ukil, A., Zivanovic, R. Automatic Signal Segmentation based on Abrupt Change Detection for Power Systems Applications, Power India Conference