1 Introduction
The problem of signal segmentation arises in different contexts [19, 28, 22, 16, 27]. The problem is broadly defined as follows: given a discretely sampled signal , divide it in contiguous sections that are internally homogeneous with respect to some characteristic. The segmentation is thus based on the premise that the signal structure changes one or many times during the entire sampled period, and one is looking for the times where the changes occur, i.e., the changepoints.
In this work we are interested in segmenting acoustic signals, more specifically underwater acoustic signals acquired off the Brazilian coast. Since 2010, the Acoustics and Environment Laboratory (LACMAM) at University of São Paulo has been designing equipment for underwater acoustic monitoring [1]; and from the past few years, we have acquired and stored over 2 years of acoustic recordings taken from different locations, amounting to more than 15 Tb of data.
The main challenge in exploring these data lies on the abundance of interesting events, and at the same time on the sparsity of such events. The sparsity of events makes the direct inspection of long duration signals a very demanding task, while the variety of potentially interesting events discourages the design and application of detection algorithms aimed at specific events, for they would potentially miss many unexpected (and for this exact reason, interesting) events.
We are currently developing an unsupervised learning approach, based on the tripod
segmentation, characterization and categorization, to deal with this situation. The idea is to first divide the long duration signal into sections which are likely to contain different sets of events; then, we characterize each section by using a sparse representation approach, and finally we cluster the segments together or categorize then in a sequential manner.This paper deals with the first task of the tripod: the segmentation of the signal. Our approach is based on the hypothesis that the occurrence of an event induces an immediate change on the total sound pressure level, and that this change can be detected on the variance of the signal’s amplitude. What we seek then is a variance changepoint detection algorithm.
A few algorithms to detect changes in signal’s variance are available; in the next section we give a quick review on both the signal segmentation and changepoint analysis literatures. After that, section 2 defines the algorithm to be used for the segmentation; section 3 presents our results in the segmentation of both simulated and real acoustic signals, and section 4 concludes the paper.
1.1 Changepoint analysis and signal segmentation
Even though the problems of changepoint analysis and signal segmentation are very closely related, the literatures adopting each nomenclature are somewhat independent.
As for the signal segmentation literature, both probabilistic and nonprobabilistic methods can be found, see [27] for an interesting review. These algorithms have a few features in common:

The use of a more or less detailed parametric model to describe the signal;

The definition of frames, or windows, to characterize local behavior;

A peak detection or thresholding procedure applied to the collection of frames to obtain segments’ boundaries.
These methods are well suited for the analysis of short to medium term signals (up to a few thousand data points), because the estimation step for the parametric models, be it a discrete Fourier or wavelet transform, and / or a filtering procedure, is usually computationally intensive. Also, the use of a detailed parametric model is adequate only when the additional structure imposed by the model over the original signal is well justified, i.e., when the phenomena causing the change in the signal’s characteristics is reasonably well known.
The changepoint literature, in the other hand, is more prolific and has more of a statistical flavor to it; see [24] for a review on changepoint research up to the decade of 1970.
In the changepoint literature, the problem is modelled over a onedimensional signal (a real or complex vector) obtained from the noisy measurements of some system. The properties of the system change over time, altering the signal in a measurable way. There are two main cases of this problem: 1) the goal is to detect a change and act immediately; this is usually called realtime segmentation; and 2) the goal is to analyze a long prerecorded signal and find all the changepoints in it, along with estimatives of the system’s state inside each block; this is usually called retrospective segmentation.
The recent literature proposes a few solutions for the problem. [9], for instance, provides a general method based on dynamic programming that is able to find the global optimum of a fitness function, , where the sum is taken over blocks, and is the fitness function of a single block (usually a likelihood based on a probabilistic model), in time.
In the same spirit, [15] improves the work of Jackson by proposing a Pruned Exact Linear Time (PELT) algorithm that, under mild conditions, is able to optimize the global fitness function with complexity . Killick’s method is general, and can be applied to any fitness function that fulfills a mild condition on the relation between the fitness of an entire segment and the fitness of the same segment divided by one changepoint (for details, see the original paper [15]).
Many other papers are available on the subject, both under the names of segmentation and changepoint analysis. We intend to write a second paper offering a compared review of the two approaches for the problem, but for now our main goal is to present a new, Bayesian binary algorithm, that is closer in spirit to the methods found in the changepoint literature. Our algorithm approaches the problem of segmentation as one of sequential hypothesis testing. We adopt a binary strategy, first finding the best changepoint for the entire signal, and, if this changepoint is accepted, applying the procedure recursively to each segment obtained. In the next section, we define our model and the Bayesian binary algorithm.
2 A Bayesian algorithm for variance changepoint detection
We start by assuming that the (discretely sampled) signal at time , , has mean amplitude for all , and finite power . We adopt a Gaussian probabilistic model for the signal, . The choice of the Gaussian model can be justified by the maximum entropy principle [11, 12], which states that the most conservative probabilistic model to be adopted in any situation is the one which maximizes Shannon’s entropy (where is the model’s density, and the expectation is taken with respect to ) conditionally on what we already know about the data (in this case, mean amplitude and finite variance). This maximization of entropy guarantees that we are not allowing any hidden assumptions into our model, and this kind of reasoning can keep the algorithm more robust to deviations from the model’s assumptions, as we will see later on.
We will assume that is a piecewise constant function on , and we are interested in estimating the localization of discontinuities or jumps in this function.
2.1 Binary algorithms
One of the simplest ways to tackle the changepoint location task is by using a binary algorithm. Given the entire signal, the first part of the algorithm looks for the single changepoint that is most likely or best in some sense. After obtaining this changepoint, the traditional binary approach will apply the same procedure recursively to the newly obtained segments. The stopping condition is usually based on a model selection criteria.
Our algorithm differs from the traditional binary strategy in that it will apply a statistical hypothesis testing procedure at each step to decide if a given changepoint is valid (i.e., if there is enough evidence in the data that there is indeed a change at this point). If the changepoint is considered valid, the algorithm continues to estimate new changepoints in the two segments obtained from the last iteration. If not, the execution is halted.
The binary segmentation algorithm is then based on a single changepoint model defined as follows:
(1) 
The likelihood function associated with this model is thus
(2)  
The first part of the algorithm involves picking the best value for ; in so doing, the values of and are not important, i.e., they are nuisance parameters. To eliminate this parameters and obtain the marginal posterior of , we choose priors for each parameter and integrate them out.
For variance parameters like and
, it is wellknown in the Bayesian inference literature that to obtain an uninformative prior one should not adopt the usual uniform distribution for
, but rather an uniform for , the socalled Jeffreys’ prior [10, 13] for . These priors, besides being uninformative and invariant to different parameterizations of the model (over variances or precisions, for instance), allow analytical integration of equation 1, yielding the marginal posterior(3)  
With this posterior, the algorithm now must estimate the best unique changepoint for the current segment. This is a standard statistical estimation procedure, and as is wellknown, different cost functions to evaluate the estimation error yield different estimators. If the cost function is quadratic, the best changepoint is the posterior mean; if the cost function is the absolute value, the best changepoint is the median, if the cost function is a function, the best changepoint is the posterior mode.
In this algorithm, neither the median nor the mean estimator would be ideal, specially because the assumption of a single changepoint is most likely false. Consider, for example, figure 1 below, that shows the single changepoint posterior calculated on a signal with two changepoints.
Both the mean and median of this distribution are located near the center, which is not close to neither changepoint. The posterior mode, however, is robust to the number of changepoints being greater than , and this will be the estimator of choice.
This choice defines the first part of the algorithm: obtain the marginal posterior, and its mode. The discrete optimization involved in the determination of the posterior mode can be carried out by direct inspection, which can be parallelized.
In the next step of the algorithm, the goal is to determine the validity of the changepoint based on the evidence that the data gives about this changepoint being a true one.
2.2 Full Bayesian evidence measure
To be a valid changepoint, in the present context, means that the signal variances of the two segments are different. So this step requires an equality of variances test.
From the full model’s likelihood 2, conditioning on and multiplying by the joint prior on yields the posterior
(4) 
This time, however, it is obviously not desirable to marginalize out and , since now these parameters are no longer nuisant. They are, in fact, the very parameters that must be tested for equality: is the hypothesis of interest.
It is important to note that the full model 4 is defined over a dimensional parametric space, and that describes a lower ()dimensional manifold on this original space. Hypothesis that define lower dimensional manifolds on the parametric space are called sharp or precise hypothesis in the Bayesian literature [3].
These hypothesis are challenging to test in the usual Bayesian hypothesis testing frameworks, because the posterior measure over is by definition . However, in [21], an evidence measure for sharp hypothesis is presented; this measure is shown to be fully Bayesian (in the sense that it arrives directly from a particular cost function [18]), and to possess many desirable properties. The literature presents already many situations where this measure was succesfully applied [6, 4, 2, 7] to sharp hypothesis settings in different problems.
Following the original authors, we call this measure the evalue, being the evidence value in favor of .
The full definition and analysis of the evalue is beyond the scope of this paper; the interested reader is directed to the previously cited references, in special [21]. However, to keep this work reasonably selfcontained, we now define the evalue in broad terms.
Given a full posterior model with , and given a sharp hypothesis with , obtain the maximum value of the fullposterior restricted to
Now define the tangent space or surprise set as
(5) 
The tangent space is the set of all parameter values with higher posterior density than the maximum posterior under . If this set has high posterior measure, it means that does not traverse regions of high posterior density, and the evidence in favor of must be low. In fact, define
(6) 
to be the evidence in favor of . The evidence will take the value if the measure of the surprise set is (i.e., if the maximum posterior value under is almost surely the minimum unrestricted posterior value), and conversingly the evidence in favor of will be if the measure of the surprise set is (i.e., the maximum posterior under is almost surely the unrestricted maximum).
As the definition above shows, the calculation of the evalue involves two steps: an optimization step and an integration step. The optimization is constrained to , and will depend on the choice of priors; sufficiently simple priors will lead to analytical solutions to this step.
The integration step can be carried out by Markov Chain Monte Carlo methods, as is usual in Bayesian inference procedures.
This finishes the definition of the binary algorithm. One full step of the algorithm will consist of two substeps: first, to estimate the segmentation point ; second, to compare the variance of the segments, calculating a measure of evidence for the hypothesis . A diagram illustrating the algorithm’s flow can be seen in Figure 2.
2.3 Priors and the power of the evalue
To calculate the evalue in the segmentation model 4, all that is left to do is to pick a joint prior , and from then on follow the procedure delineated above.
One obvious choice for the priors is to adopt the product of Jeffreys’ priors ; by doing so, the model is treating both these parameters as completely unknown in advance, i.e., the algorithm will act as if it knows nothing about the segments’ variances and the relation between them.
This choice gives the optimal value
(7) 
for the signal’s variance under (no changepoint). To calculate the evidence in favor of , we estimate the integral of the posterior over the surprise set by the adaptive MCMC method of [5].
To verify the behaviour of the evalue with this choice of priors, we simulate Gaussian signals with various sample sizes, divided into two segments, with the variance of the first segment set to , and that of the second segment varying in . Figure 3 shows the evidence in favor of for several values of and several sample sizes.
It is very important to take notice that the evalue is not a significance measure, i.e., it does not result from a control typeI error procedure. This implies that the sampling distribution of the evalue is not uniform; however, a transformation exists that changes the evalue into a significance measure [25]
. Using this transformation, it is possible to fix the typeI error at
and evaluate the power of the test. The result for different sample sizes and values of is on figure 42.4 Using informative priors
The test based on the (transformed) evalue is quite powerful, as the simulations indicate. The power, as expected, gets higher for greater sample sizes; this means that the test will detect smaller deviations from
as the sample size grows, while at the same time keeping the typeI error probability fixed.
This is an important issue, specially in the segmentation algorithm where the test will be sequentially applied to the comparison of segments with different sample sizes. If we choose to keep
(probability of typeI error) fixed, the power of the test will change as the sample size changes. However, in a signal detection setup, usually one desires to balance both typeI and typeII error probabilities regardless of the size of the incoming signal.
The relation between significance levels, test power and sample size is a deep and often discussed question in hypothesis testing [17, 20]. Recent literature proposes to change the significance level as the sample size changes, to keep some relation between the probabilities of both error types at a constant value. This can be done by using adaptive significance levels (given by a function of the sample size , see [17]
) or by imposing an ordering on the parameter space based on Bayes factors
[20].Usually, the procedure starts by asking the researcher to pick a sensibility, and the typeI error probability for the test given a value for . After that, the statistician calculates the respective power of the test, and obtains a rule to define the new significance value for a new value of , in order to keep constant the relation .
For the segmentation task, however, and in our particular application (segmentation of large samples), the algorithm will have to work with segments of very different sizes (from to more than million), and the adaptive significance level would also vary wildly. The consequence is that, for the larger segments, the algorithm would require very small significance values; and in a MCMC setting, higher precision for the probability estimates means longer chains, and longer chains mean higher execution times.
So instead of using an adaptive significance value, we propose instead to use a strongly informative prior, and use the hyperparameters to calibrate the power of the procedure.
This idea was first introduced in a previous paper [8]. The paper analyzes the binary algorithm for signal segmentation, but uses a different parameterization where . Independent priors for these two parameters are proposed, one that is uninformative on the value of , and strongly informative over . The advantage of working with instead of is that is a pure number, i.e., it does not depend on scale. It can be interpreted as the quotient between the power of any two contiguous segments.
There are however some difficulties in working with , one of them being that must be nonnegative. For this new, current version of the algorithm, we parameterize the problem using , and propose a Laplace prior with the form
(8) 
The above Laplace distribution has a peak on , and the peak is sharper as the value of
decreases. The Laplace distribution is a maximum entropy prior, i.e., it is the probability distribution with higher entropy subject to the constraint
.The segmentation algorithm works as above, except that now the evalue calculation uses the Laplace prior for . This prior, when is close enough to , changes significantly the power of the test, and thus allows tuning of the algorithm’s behavior.
Figure 5 shows the same estimation of power as in figure 4, but this time using the Laplace prior. The values of where taken as for respectively.
Being able to control the power of the test will prove useful when segmenting underwater acoustic signals; in this setting, long segments with true stationary power are not to be expected, even when the segment is capturing a single event. That is the case because both the background noise and the event’s physical cause might be changing, due to many factors (including the weather, the movement of event’s causes relative to the hydrophone, among others). With a high sampling rate (the data we use in this paper was sampled at ) the evalue would give strong evidence against even inside a segment containing a uniform event, and this would lead to oversegmentation. To control the power of the test using an informative prior will allow the algorithm’s sensibility to be tuned to the goals of the analysis: if one is interested in capturing larger sections, that might suffer an internal power change that is small compared to the difference between the segment overall power and the background noise power, one only needs to adjust the hyperparameter accordingly.
2.5 The resolution parameter
The most demanding step in our binary algorithm is the optimization procedure that looks for the most likely changepoint at each step. This is done by a brute force procedure, that can be parallelized but nevertheless is costly, specially with long signals.
One way to increase the speed of our algorithm is to limit the search for the optimal changepoint: instead of calculating the objective function for all , we can instead calculate the objective only for .
If the (discrete) posterior for , the changepoint parameter, is not very sharp around its maximum, and if the minimum expected segment length is also not too small, above can be set to a high value, increasing the speed of the algorithm while still being able to identify the most probable changepoints at each step.
However, and since the optimization step will be applied many times, to segments of different lengths, it is not advisable to pick a fixed integer value for ; imagine, for instance, that we fix . In a signal of size , this value won’t stop the algorithm from finding the optimal value (or some good approximation to it); however, for a signal of size , it is quite possible that using will cause the algorithm to miss the optimal point. For this reason, we adopt an adaptive resolution strategy: we pick a starting value for the resolution (say ), but as the algorithm starts obtaining new segments, it will keep the ratio fixed at each step.
2.6 The PELT algorithm
As a basis of comparison to the Bayesian binary algorithm results, we use the PELT algorithm of [15]; the PELT (Pruned Exact Linear Time) algorithm solves the dynamical optimization problem exactly, yielding the global optimum of the model. It does that with complexity in the worst case, but it can be shown to have complexity under mild conditions.
The algorithm is defined in terms of an additive cost function
(9) 
where in the case of detection of variance changepoints
(10)  
and is the penalty or regularization function for the number of segments.
The penalty function is essential, since the direct optimization of the cost function will lead to overfitting (which, in this case, will mean oversegmentation). In our tests below, we adopt the MBIC penalty function [29], which is the penalty function used by default by the R package changepoint that implements the PELT algorithm [14].
For further comparison of our algorithm with other alternatives, we also run the binary segmentation algorithm of [23], which is also implemented by the R package changepoint.
3 Results
3.1 Simulated data
To analyze the performance of the Bayesian binary algorithm, we start by simulating Gaussian signals with constant mean and variance. We then simulate the changepoint process by using a geometric distribution to model the times between changepoints, and multiply the signal between changepoints for a given factor in order to obtain different variances.
It is clear that the effectiveness of a changepoint detection algorithm depends directly on both the size of the segments, and the magnitude of the jump in the process parameters. To observe the behavior of all algorithms with varying segment sizes, we will keep the expected number of changepoints fixed at changepoints regardless of the signal’s size. When the signal’s size changes, the expected length of the segments will change accordingly (linearly with ).
To simulate the magnitude of change in power between segments, we force the segments to alternate variances between and .
The simulation of the changepoint process was repeated ten times for each value of , and we report the average results for each of these values.
The results appear in table 1. The table reports the true number of changepoints in the simulated signal, the estimated total number of changepoints for each algorithm, and the F1 score. The F1 score is calculated as
where is the number of true positives divided by the total number of changepoints identified, and is the number of true positives divided by the total number of true changepoints. To accept an estimated changepoint as a true one, it must be between points of a true changepoint.
The value of for the Jeffreys prior, and the values of both and for the Laplace prior were selected using the Bayesian Information Criterion (BIC); both the PELT and the BinSeg algorithms utilized the Modified BIC of Zhang [29].
N  Algorithm  Time (s)  True k  Estimated k  F1 score 

10,000  binseg  0.407200  34.3  2.4  0.085693 
10,000  jeffreys  0.210437  34.3  4.0  0.172064 
10,000  laplace  0.245093  34.3  5.9  0.236544 
10,000  pelt  0.037800  34.3  5.1  0.218018 
50,000  binseg  2.151700  46.1  15.9  0.489096 
50,000  jeffreys  1.628161  46.1  28.6  0.701796 
50,000  laplace  1.563547  46.1  34.1  0.761310 
50,000  pelt  0.177500  46.1  30.7  0.793996 
100,000  binseg  4.269800  45.9  29.5  0.772511 
100,000  jeffreys  2.624351  45.9  37.3  0.840989 
100,000  laplace  2.394387  45.9  41.7  0.872812 
100,000  pelt  0.333200  45.9  38.2  0.907438 
500,000  binseg  20.954300  50.8  42.6  0.870825 
500,000  jeffreys  4.558587  50.8  50.2  0.888668 
500,000  laplace  4.088778  50.8  49.9  0.828553 
500,000  pelt  1.997400  50.8  49.1  0.981732 
1,000,000  binseg  20.661000  51.8  40.0  0.372078 
1,000,000  jeffreys  6.243566  51.8  53.7  0.924682 
1,000,000  laplace  5.911876  51.8  56.5  0.921549 
1,000,000  pelt  3.909400  51.8  50.0  0.982603 
The PELT algorithm was the quickest and also the most accurate algorithm on average for all signal sizes, except for where the Bayesian binary algorithm with the Laplace prior showed a higher F1 score. The binary algorithm of Scott [23] was always the slowest and less precise; also, since it is implemented recursively, for longer signals there was an operational system error related to the stack size that stopped the algorithm from running in many simulations.
The Bayesian binary segmentation can be seen to be competitive with PELT in both execution time and accuracy. The use of an informative (Laplace) prior improved the accuracy in almost all scenarios.
In the next section, we apply the Bayesian binary algorithm and PELT to real underwater acoustic signals; the binary algorithm won’t be tested because it is unpractical for signals of the size we will be using.
3.2 Underwater acoustic signals
Now we apply the four algorithms to the segmentation of real underwater acoustic signals. These signals were obtained by the LACMAM’s team on 2017, in the region of Alcatrazes, an archipelago km off the Brazilian coast, in the city of São Sebastião, SP. More information about the data and the experiment can be found in [26].
One of the main goals in acquiring these samples is the study of acoustical signatures of boats. Alcatrazes is a marine ecological reserve, the second largest in Brazil, and as such fishing is prohibited in the archipelago’s area. As passive acoustic monitoring is cheap, efficient algorithms for boat detection using hydrophone data are a valuable resource to the reserve’s fiscalization authorities.
The laboratory has, by January, 2019, collected almost two years of acoustic signals from the reserve’s region. In these signals, many events can be found: the passage of boats, but also fish and whales’ vocalizations, and other events with both biological and anthropogenic sources. These events, however, are scarce, making the direct inspection and annotation of the signal a demanding task.
The segmentation algorithm will be used to aid in this inspection, by first separating sections of the signal that are likely to contain any significant event.
To test the segmentation algorithms, we have chosen two minutes long samples where visual inspection of the spectrogram shows many short duration events. After examination of the spectrograms, the samples were listened to and the start and finish times of all events were annotated. A total number of changepoints were detected, all of them caused by the passage of boats. What we expect is that the segmentation algorithm will be able to correctly identify the boundaries of these events.
One disclaimer is due at this point. The inspection of the samples was aimed at the separation of samples of the acoustic signal generated by the passage of boats. The researcher responsible for the annotation, thus, was not looking to annotate changes in the signal power. For that reason, it is not expected that any algorithm will get high measures of precision or recall.
The sampling rate of these files is kHz, resulting in signals with size . To reduce this signal size, it is possible to arbitrarily break the minutes signal into smaller pieces, or to downsample the signal. The arbitrary separation of smaller pieces seem the least desirable approach, since it introduces the problem of deciding where to separate the pieces.
For the following tests, however, no downsampling was adopted, and the reported results refer to the segmentation of the full points signal.
For the Bayesian binary algorithm with the Laplace prior, the selection of the value is done based on an elbow plot of the BIC criterion, i.e., we select the least for which the plot shows a pronounced decrease. For the PELT algorithm, the MBIC criterion is applied. In the results in table 2, the execution time for the Bayesian binary algorithm with Laplace prior includes all the runs necessary to obtain the best . In order to assess the effect of using strongly informative priors in our algorithm, we also included the results for the Bayesian binary algorithm using the Jeffreys’ (noninformative) prior.
Sample  Method  Time (s)  Beta  True k  Estimated k  Precision  Recall  F1 

A  jeffreys  1239.59    12  42074  0.03%  100%  0.0003 
B  jeffreys  1329.73    20  45277  0.04%  100%  0.0004 
A  laplace  27.41  3.3e5  12  28  17.9%  41.7%  0.1250 
B  laplace  30.89  1.6e5  20  21  30.0%  30.0%  0.1500 
A  pelt  205.41    12  39170  0.03%  100%  0.0003 
B  pelt  205.38    20  38274  0.05%  100%  0.0005 
As seen in table 2, the Bayesian binary algorithm showed superior results to PELT in the segmentation of real samples. The first thing to notice is that PELT resulted in an excessive number of changepoints; that is the case because PELT works with the exact optimization of a cost function that is based on a (Gaussian) likelihood, and even with the regularization induced with the MBIC criterion, a higher number of changepoints gives a better fit. The same happens with the Bayesian binary algorithm using noninformative priors, i.e., with uncontrolled power of the test based on the evalue.
With the Bayesian binary algorithm, on the other hand, the value of helps to control the power of the test based on the evalue, avoiding oversegmentation.
In figures 6 and 7, the changepoints estimated by the Bayesian binary algorithm are plotted over the spectrogram of the samples. It is noticeable that the boundaries of the most prominent events are correctly captured by the algorithm, while at the same time sections with no important events (as can be seen by direct inspection of the spectrogram) are kept unsegmented.
4 Conclusion
The segmentation of acoustic signals is an important task, specially in the retrospective analysis of long duration signals.
Among the many possible criteria for the segmentation, the RMSbased segmentation is particularly interesting when one is mainly interested in separating sections with background noise only, from sections composed of background noise plus some (possibly) interesting event.
In this paper, we present a Bayesian binary algorithm for RMSbased acoustic signal segmentation. We show that this algorithm is precise, and robust to violations on the basic assumptions: normality of background noise, and a stepfunction for the RMS in the different segments. We claim that this robustness is mainly due to two characteristics of our algorithm: first, the use of a marginal posterior for the selection of candidate changepoints; and second, the use of maximum entropy models (both the Gaussian for the background noise, and the Laplace for the logratio of variances are maximum entropy models) with strongly informative priors.
By comparing our algorithm with other alternatives from the literature, we showed that it is competitive with the current stateoftheart changepoint algorithm (PELT), and sensibly superior to previous binary algorithms in simulated data. When analyzing real data, we showed that our algorithm can have superior results even when compared to PELT, if we use the strongly informative (Laplace) prior on the logratio of variances between segments.
The hyperparameter of the Laplace prior can be efficiently selected using model selection criteria such as the Bayesian Information Criterion (BIC).
Further work will analyze other possibilities for the model selection problem in this setting. We are also working on a hybrid version of our algorithm and the PELT algorithm, by using a version of our marginal posterior as the cost function to be optimized with PELT.
Our algorithm is written in cython, is open sourced an can be downloaded at http://github.com/paulohubert/bayeseg, along with some sample acoustic data and some illustrative IPython notebooks. The signals used in this paper are available upon request.
References
 [1] M. CaldasMorgan, A. AlvarezRosario, and L. R. Padovese. An autonomous underwater recorder based on a single board computer. PLos One, 10, 2015.
 [2] Dalia Chakrabarty. A new bayesian test to test for the intractabilitycountering hypothesis. Journal of the American Statistical Association, 112(518):561–577, 2017.
 [3] James M. Dickey and B. P. Lientz. The weighted likelihood ratio, sharp hypotheses about chances, the order of a markov chain. The Annals of Mathematical Statistics, 41(1):214–226, 1970.
 [4] M. Diniz, C. A. B. Pereira, and J. M. Stern. Unit roots: Bayesian significance test. Communications in Statistics  Theory and Methods, 40(23):4200–4213, 2012.
 [5] H. Haario, E. Saksman, and J. Tamminen. An adaptive metropolis algorithm. Bernoulli, 2001.

[6]
P. Hubert, M. Lauretto, and J. M. Stern.
Fbst for the generalized poisson distribution.
AIP Conference Proceedings, 1193(210), 2009.  [7] P. Hubert, L. Padovese, and J. M. Stern. Full bayesian approach for signal detection with an application to boat detection on underwater soundscape data, chapter 19, pages 199–209. Springer, New York, NY, 2017.
 [8] P. Hubert, L. Padovese, and J. M. Stern. A sequential algorithm for signal segmentation. Entropy, 20(1):44, 2018B.
 [9] B. Jackson, J. D. Scargle, D. Barnes, S. Arabhi, A. Alt, P. Gioumousis, E. Gwin, P. Sangtrakulchareon, L. Tan, and T. T. Tsai. An algorithm for optimal partioning of data on an interval. IEEE Signal Processing Letters, 12(2), 2005.
 [10] E. T. Jaynes. Prior probabilities. IEEE Transactions On Systems Science and Cybernetics, 4(3):227–241, 1968.
 [11] E. T. Jaynes. On the rationale of maximumentropy methods. Proceedings of the IEEE, 70(9):939–952, 1982.
 [12] E. T. Jaynes. Bayesian spectrum and chirp analysis. In C.R. Smith and G.J. Erickson, editors, MaximumEntropy and Bayesian Spectral Analysis and Estimation Problems. D. Reidel Publishing Co, 1987.
 [13] H. Jeffreys. An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London Series A, Mathematical and Physical Sciences, 186(1007):453–461, 1946.
 [14] R. Killick and I. A. Eckley. changepoint: an r package for changepoint analysis. Journal of Statistical Software, 58(3), 2014.
 [15] R. Killick, P. Fearnhead, and I. A. Eckley. Optimal detection of changepoints with a linear computational cost. Journal of the American Statistical Association, 107(500):1590–1598, 2012.
 [16] S. Kuntamalla and L. Ram Gopal Reddy. An efficient and automatic systolic peak detection algorithm for photoplethysmographic signals. International Journal of Computer Applications, 97:19, 2014.
 [17] C. A. B. Pereira L. Pericchi. Adaptative significance levels using optimal decision rules: Balancing the error probabilities. Brazilian Journal of Probability and Statistics, 30(1):70–90, 2016.
 [18] M. R. Madruga and S. Wechsler L. G. Esteves. On the bayesianity of pereirastern tests. Test, 10(2):291–299, 2001.
 [19] R. Makowsky and R. Hossa. Automatic speech signal segmentation based on the innovation adaptive filter. Int. J. Appl. Math. Comput. Sci., 24(2):259–270, 2014.
 [20] C. A. B. Pereira, E. Y. Nakano, V. Fossaluza, L. G. Esteves, M. A. Gannon, and A. Polpo. Hypothesis tests for binomial experiments: Ordering the sample space by Bayes factors and using adaptive significance levels for decisions. Entropy, 19(12), 2017.
 [21] C. A. B. Pereira and J. M. Stern. Evidence and credibility: full Bayesian significance test for precise hypotheses. Entropy, 1:99–110, 1999.
 [22] A. Schwartzman, Y. Gavrilov, and R. J. Adler. Multiple testing of local maxima for detection of peaks in 1d. The Annals of Statistics, 39(6):3290–3319, 2011.

[23]
A. J. Scott and M. Knott.
A cluster analysis method for grouping means in the analysis of variance.
Biometrics, 30(3):507–512, 1974.  [24] S. A. Shaban. Change point problem and twophase regression: An annotated bibliography. International Statistical Review, 48(1):83–93, 1980.
 [25] J. M. Stern. Cognitive Constructivism and the Epistemic Significance of Sharp Hypothesis. 2008.
 [26] I. SánchezGendriz and L. Padovese. A methodology for analyzing biological choruses from longterm passive acoustic monitoring in natural areas. Ecological Informatics, (41):1–10, 2017.
 [27] T. Thedorou, I. Mporas, and N. Fakotakis. An overview of automatic audio segmentation. I.J. Information Technology and Computer Science, 11:1–9, 2014.
 [28] A. Ukil and R. Zivanovic. Automatic Signal Segmentation based on Abrupt Change Detection for Power Systems Applications. Power India Conference, 2006.
 [29] N. R. Zhang and D. O. Siegmund. A modified bayes information criterion with applications to the analysis of comparative genomic hybridization data. Biometrics, 63:22–32, 2007.
Comments
There are no comments yet.