Classification and Verification of Online Handwritten Signatures with Time Causal Information Theory Quantifiers

01/26/2016 ∙ by Osvaldo A. Rosso, et al. ∙ 0

We present a new approach for online handwritten signature classification and verification based on descriptors stemming from Information Theory. The proposal uses the Shannon Entropy, the Statistical Complexity, and the Fisher Information evaluated over the Bandt and Pompe symbolization of the horizontal and vertical coordinates of signatures. These six features are easy and fast to compute, and they are the input to an One-Class Support Vector Machine classifier. The results produced surpass state-of-the-art techniques that employ higher-dimensional feature spaces which often require specialized software and hardware. We assess the consistency of our proposal with respect to the size of the training sample, and we also use it to classify the signatures into meaningful groups.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


We present a new approach for online handwritten signature classification and verification based on descriptors stemming from Information Theory. The proposal uses the Shannon Entropy, the Statistical Complexity, and the Fisher Information evaluated over the Bandt and Pompe symbolization of the horizontal and vertical coordinates of signatures. These six features are easy and fast to compute, and they are the input to an One-Class Support Vector Machine classifier. The results produced surpass state-of-the-art techniques that employ higher-dimensional feature spaces which often require specialized software and hardware. We assess the consistency of our proposal with respect to the size of the training sample, and we also use it to classify the signatures into meaningful groups.


The word biometrics is associated to human traits or behaviors which can be measured and used for individual recognition. In fact, the biometry recognition, as a personal authentication signal processing, can be used in applications where users need to be security identified[1]. Clearly, these kind of systems can either verify or identify.

Two types of biometrics can be defined according to the personal traits considered: physical/physiological or behavioral. Physical/physiological biometrics is about catering the biological traits of users, like fingerprints, iris, face, hand, etc. Behavioral biometrics takes into account dynamic traits of users, such as, voice, handwritten and signature expressions.

One of the main advantages of biometric systems is that users do not have to remember passwords or carry access keys. Another important advantage lies in the difficulty to steal, imitate or generate genuine biometric data, leading to enhanced security[1].

As mentioned, behavioral biometrics is based on measurements extracted from an activity performed by the user, in conscious or unconscious way, that are inherent to his/her own personality or learned behavior. In this aspect, behavioral biometrics has interesting pros, like user acceptance and cancelability, but it still lacks of some level of the uniqueness physiological biometrics has.

Among the pure behavioral biometric traits, the handwritten signature and the way we sign is the one with widest social and legal acceptance[2, 3, 4, 5, 6]. Identity verification by signature analysis requires no invasive measurements and people are familiar with the use of signatures in their daily life. Also, it is the modality confronted with the highest level of attacks.

A signature is a handwritten depiction of someone’s name or some other mark of identification written on documents and devices as proof of identification. The formation of signature varies from person to person, or even from the same person due to the psychophysical state of the signer and the conditions under which the signature apposition process occurs.

Hilton[7] studied how signatures are produced, and found that the signature has at least three attributes: form, movement and variation; being movement the most important, because signatures are produced by moving a writing device. The study also noted that a person’s signature does evolve over time and, with the vast majority of users, once the signature style has been established the modifications are usually slight. The movement is produced by muscles of fingers, hand, wrist, and in some writers the arm; these muscles are controlled by nerve impulses. When one person is signing these nerve impulses are controlled by the brain without any particular attention to detail. The signing processes can be described then, at high level, as how the central nervous system (the brain) recovers information from long term memory in which parameters such as size, shape, timing etc. are specified. At the peripheral level, commands are generated for muscles. In consequence, the signing process is believed to be a reflex action (ballistic action111Ballistic movement can be defined as muscle contractions that exhibit maximum velocities and accelerations over a very short period of time. They exhibit high firing rates, high force production, and very brief contraction times[8].) rather than a deliberate action. Then, the production of genuine signatures is associated to a ballistic handwriting, which is characterized by a spurt of activity, without positional feedback, whereas the production of forgery signature is associated to a deliberate handwriting which is characterized by a conscious attempt to produce a visual pattern with the aid of positional feedback[9, 10].

Handwritten signature verification is a problem in which the input signature (a test signature) is classified as genuine or forged. This process is usually performed in three main phases:[2, 3, 4, 5, 6]

  • Data acquisition and pre-processing. Two different categories of systems can be identified, depending on whether there is electronic access to the handwritten process or not. a) Online or dynamic recognition, in which the pen’s instantaneous information trajectories, and also information like pressure, speed or pen-up movements can be captured. b) Offline or static recognition: those that record signatures as images on paper which can be later digitized by means of a scanner, and processed. In the latter, the pre-processing phase involves filtering, noise reduction and smoothing. Online signature verification offers reliable identity protection, as it employs dynamic information not available on the signature image itself but in the process of signing. As a consequence, online signature verification systems usually achieve better accuracy than offline systems.

  • Feature extraction. Two types of features can be used. a) Function features of the signature: time functions whose values constitute the feature set. b) Parameter features: the signature is characterized as a vector of elements, each one representative of the value of the feature. Usually, the last one yields better performance, but it is also time-consuming.

  • Classification. In the verification process, the authenticity of the test signature is evaluated by matching it against those stored in the knowledge base developed during the enrollment stage. This process produces a single response that attests to the authenticity of the test signature. When template matching techniques are considered, a questioned sample is matched against templates of authentic/forgery signatures. Distance-based classifiers, mostly when parameters are used as features, are usually developed with statistical techniques, e.g. with Mahalanobis and Euclidean distances. The performance of a signature verification system is commonly assessed in terms of the percentage Equal Error Rate.

On the one hand, template matching attempts at finding similarities between the input signature and those in a data base. Most approaches use Dynamic Time Warping to perform this match[5, 6]. On the other hand, distance-based classifiers rely on the use of features derived from the signatures.

Two opposite mechanisms describing the signing process can be found in the literature. The nonlinear character and chaotic behavior of several physiological complex processes are well established[11, 12]. In particular, Longstaff and Heath[13] found evidence of chaotic behavior on the underlying dynamics of time series related to velocity profiles of handwritten texts. Taking into account the inherent behavioral nature of the online signing process, the input information could be associated to deterministic (nonlinear low dimensional chaotic) signals, and the handwritten signature variations as a consequence of chaos (sensibility to initial conditions). In opposition, most of the research in the field of signal verification considers the input information as well described by a random process[2, 3, 4, 5, 6]

. Then, the dynamic input information acquired through a time sampling procedure must be consequently considered as discrete time random sequence. In any case, the signature analysis taken as a time-based sequence characterization process is strongly related to the way in which a reference model is established. From the stochastic point of view, Hidden Markov Models are among the most commonly used in the literature, and the ones with the best performance in signature verification

[2, 3, 4, 5, 6].

Our proposal relies on the use of time causal quantifiers based on Information Theory for the characterization of online handwritten signatures: normalized permutation Shannon entropy, permutation statistical complexity and permutation Fisher information measure. These quantifiers have proved to be useful in the identification of chaotic and stochastic dynamics throughout the associated time series[14, 15]. Their evaluation is simple and fast, making them apt for the signature verification problem. We apply our proposal to the well know MCYT online signature data base[16].

Next section describes the database used in this study, followed by a section where we detail the quantifiers employed and by their application to the data. In addition to the usual data flow, we present an exploratory data analysis (EDA) of the features that enhances their appropriateness for this problem. The expressiveness and usefulness of these descriptors for the problem of online signature classification and verification follows in the sequence: we experiment their application to the test-bed.

Handwritten signatures database

The present study is carried out on the freely available and widely used handwritten signatures database MCYT-100 subset of 100 persons[16]. The acquisition of each on-line signature is accomplished dynamically using a graphics tablet. The signatures are acquired on a WACOM graphic tablet, model INTUOS A6 USB. The tablet resolution is 2540 linesin (100 lines), and the precision is 0.25 . The maximum detection height is 10  (so also pen-up movements are considered), and the capture area is 127  (width) 97  (height). This tablet provides the following discrete-time sequences: a) position in the -axis, b) position in the -axis, and c) also the time series corresponding to the pressure applied by the pen, as well as the azimuth and altitude angles of the pen with respect to the tablet, not used in the present work. The sampling frequency is set to 100 . Taking into account the Nyquist sampling criterion and the fact that the maximum frequencies of the related biomechanical sequences are always under 20-30 [17], this sampling frequency leads to a precise discrete-time signature representation.

The signature corpus comprises genuine and shape-based highly skilled forgeries with natural dynamics[16, 18]. In order to obtain the forgeries, each contributor is requested to imitate other signers by writing naturally. For this task, they were given the printed signature to imitate and were asked not only to imitate the shape but also to generate the imitation without artifacts such as breaks or slow-downs (see [16, 18] for more details of the acquisition procedure). Each signer contributes with genuine signatures in five groups of five signatures each, and is forged times by five different imitators. Figure 1 presents examples for six different subjects, being the first two columns genuine and the third column forgery signatures.

Figure 1: Six different subjects signatures from the MCYT database. Two genuine signatures (left, blue) and a skilled forgery (right, red). The two first signatures were classified as H1A and H1B, the following two to types H2A and H2B, and the last two to types H3A and H3B; cf. Sec. Signatures classification.

Since signers are concentrated in a different writing task between genuine signature sets, the variability between client signatures from different acquisition sets is expected to be higher than the variability of signatures within the same set. The total number of contributors in the MCYT is , and the total number of signatures present in the signature database is , half of them genuine signatures and the rest forgeries[16, 18].

As previously mentioned, we used a subset of the database, denominated MCYT-100, which includes subjects and for each one, genuine and skilled forged signatures, and only the corresponding time series corresponding to the - and -coordinates of each signature will be analyzed. In particular, one must note that the time series’ lengths are quite variable. In order to facilitate our Information Theory analysis, we pre-processed each time series as follows: a) the coordinates were re-scaled into the unit square ; b) taken as base these scaled values, the original total number of data for each time series is expanded to points using a cubic Hermite polynomial. In this way, for each subject () and associated signatures () we will analyze two time series, denoted by and , in which the supra-index denotes genuine and forgery signature, and and

are the interpolated values, respectively.

Information Theory quantifiers

The basic elements for the study of a system dynamics, either natural or man-made, are sequences of measurements or observations whose evolution can be tracked through time. Then, given an observable of such system, a natural question that arises is: how much information is this observable encoding about the dynamics of the underlying system? The information contents of a system are typically evaluated via a probability distribution function (PDF)

obtained from such observable. We can define Information Theory quantifiers as measures able to characterize relevant properties of the PDF associated with these time series, and in this way we should judiciously extract information on the dynamical system under study.

Shannon entropy, Fisher Information Measure, and Statistical Complexity

Entropy is a basic quantity with multiple field-specific interpretations; for instance, it has been associated with disorder, state-space volume, and lack of information[19]. When dealing with information content, the Shannon entropy is often considered the foundational and most natural one[20, 21].

Given a continuous probability distribution function (PDF) with and , its associated Shannon Entropy [20, 21] is


It is a global measure, that is, it is not too sensitive to strong changes in the distribution taking place on a small-sized region of . Such is not the case with Fisher’s Information Measure (FIM) [22, 23], which constitutes a measure of the gradient content of the distribution , thus being quite sensitive even to tiny localized perturbations. It reads


The Fisher Information Measure can be variously interpreted as a measure of the ability to estimate a parameter, as the amount of information that can be extracted from a set of measurements, and also as a measure of the state of disorder of a system or phenomenon

[23], its most important property being the so-called Cramer-Rao bound. It is important to remark that the gradient operator significantly influences the contribution of minute local -variations to the Fisher information value, accordingly, this quantifier is called “local”[23]

. Note that the Shannon entropy decreases with the distribution skewness, while the Fisher information increases.

Local sensitivity is useful in scenarios whose description necessitates an appeal to a notion of “order”. In the previous definition of FIM (Eq. (2)) the division by is not convenient if at certain points of the support . We avoid this if we work with real probability amplitudes, by means of the alternative expression that employs [22, 23]. This form requires no divisions, and shows that simply measures the gradient content in .

Let now be a discrete probability distribution, with the number of possible states of the system under study. The Shannon’s logarithmic information measure reads


This can be regarded to as a measure of the uncertainty associated (information) to the physical process described by . For instance, if , we are in position to predict with complete certainty which of the possible outcomes , whose probabilities are given by

, will actually take place. Our knowledge of the underlying process described by the probability distribution is maximal in this instance. In contrast, our knowledge is minimal for a uniform distribution

since every outcome exhibits the same probability of occurrence, and the uncertainty is maximal, i.e., . In the discrete case, we define a “normalized” Shannon entropy, , as


The concomitant problem of loss of information due to the discretization has been thoroughly studied (see, for instance, [24, 25] and references therein) and, in particular, it entails the loss of Fisher’s shift-invariance, which is of no importance for our present purposes. For the FIM we take the expression in terms of real probability amplitudes as starting point, then a discrete normalized FIM, , convenient for our present purposes, is given by


It has been extensively discussed that this discretization is the best behaved in a discrete environment[26]. Here the normalization constant reads


The perfect crystal and the isolated ideal gas are two typical examples of systems with minimum and maximum entropy, respectively. However, they are also examples of simple models and therefore of systems with zero complexity, as the structure of the perfect crystal is completely described by minimal information (i.e., distances and symmetries that define the elementary cell) and the probability distribution for the accessible states is centered around a prevailing state of perfect symmetry. On the other hand, all the accessible states of the ideal gas occur with the same probability and can be described by a “simple” uniform distribution.

According to López-Ruiz et al.[27], and using an oxymoron, an object, a procedure, or system is said to be complex when it does not exhibit patterns regarded as simple. It follows that a suitable complexity measure should vanish both for completely ordered and for completely random systems and cannot only rely on the concept of information (which is maximal and minimal for the above mentioned systems). A suitable measure of complexity can be defined as the product of a measure of information and a measure of disequilibrium, i.e. some kind of distance from the equiprobable distribution of the accessible states of a system. In this respect, Rosso and coworkers[28] introduced an effective Statistical Complexity Measure (SCM) , that is able to detect essential details of the dynamical processes underlying the dataset.

Based on the seminal notion advanced by López-Ruiz et al.[27], this statistical complexity measure[28] is defined through the product


of the normalized Shannon entropy , see Eq. (4), and the disequilibrium defined in terms of the Jensen-Shannon divergence . That is,


the above-mentioned Jensen-Shannon divergence and , a normalization constant such that :


are equal to the inverse of the maximum possible value of . This value is obtained when one of the components of , say , is equal to one and the remaining are zero.

The Jensen-Shannon divergence, which quantifies the difference between probability distributions, is especially useful to compare the symbolic composition between different sequences[29]. Note that the above introduced SCM depends on two different probability distributions: one associated with the system under analysis, , and the other the uniform distribution, . Furthermore, it was shown that for a given value of , the range of possible values varies between a minimum and a maximum , restricting the possible values of the SCM[30].

Thus, it is clear that important additional information related to the correlational structure between the components of the physical system is provided by evaluating the statistical complexity measure. In this way, the information plane constitute a nice tool to visualizate and characterize different dynamical systems.

If our system lies in a very ordered state, which occurs when almost all the –values are zeros except for a particular state with , both the normalized Shannon entropy and statistical complexity are close to zero ( and ), and the normalized Fisher’s information measure is close to one (). On the other hand, when the system under study is represented by a very disordered state, that is when all the –values oscillate around the same value, we have while and . One can state that the general FIM–behavior of the present discrete version (Eq. (5)), is opposite to that of the Shannon entropy, except for periodic motions. The local sensitivity of FIM for discrete–PDFs is reflected in the fact that the specific “ordering” of the discrete values must be seriously taken into account in evaluating the sum in Eq. (5). This point was extensively discussed by Rosso and co-workers[31, 32]. The summands can be regarded to as a kind of “distance” between two contiguous probabilities. Thus, a different ordering of the pertinent summands would lead to a different FIM-value, hereby its local nature. In the present work, we follow the Lehmer lexicographic order[33] in the generation of Bandt and Pompe PDF (see next section). Given the local character of FIM, when combined with a global quantifier as the normalized Shannon entropy, conforms the Shannon–Fisher plane, , introduced by Vignat and Bercher[34]. These authors showed that this plane is able to characterize the non-stationary behavior of a complex signal.

The Bandt and Pompe approach to the PDF determination

The evaluation of the Information Theory derived quantifiers, like those previously introduced (Shannon entropy, Fisher information and statistical complexity), suppose some prior knowledge about the system; specifically, a probability distribution associated to the time series under analysis should be provided beforehand. The determination of the most adequate PDF is a fundamental problem because the PDF and the sample space are inextricably linked.

Usual methodologies assign to each time point of the series a symbol from a finite alphabet , thus creating a symbolic sequence that can be regarded to as a non causal coarse grained description of the time series under consideration. As a consequence, order relations and the time scales of the dynamics are lost. The usual histogram technique corresponds to this kind of assignment. Causal information may be duly incorporated if information about the past dynamics of the system is included in the symbolic sequence, i.e., symbols of alphabet are assigned to a portion of the phase-space or trajectory.

Many methods have been proposed for a proper selection of the probability space . Bandt and Pompe (BP)[35] introduced a simple and robust symbolic methodology that takes into account time causality of the time series (causal coarse grained methodology) by comparing neighboring values in a time series. The symbolic data are: (i) created by ranking the values of the series; and (ii) defined by reordering the embedded data in ascending order, which is tantamount to a phase space reconstruction with embedding dimension (pattern length) and time lag . In this way, it is possible to quantify the diversity of the ordering symbols (patterns) derived from a scalar time series.

Note that the appropriate symbol sequence arises naturally from the time series, and no model-based assumptions are needed. In fact, the necessary “partitions” are devised by comparing the order of neighboring relative values rather than by apportioning amplitudes according to different levels. This technique, as opposed to most of those in current practice, takes into account the temporal structure of the time series generated by the physical process under study. As such, it allows us to uncover important details concerning the ordinal structure of the time series[14, 36] and can also yield information about temporal correlation[37, 38].

It is clear that this type of analysis of a time series entails losing details of the original series’ amplitude information. Nevertheless, by just referring to the series’ intrinsic structure, a meaningful difficulty reduction has indeed been achieved by BP with regard to the description of complex systems. The symbolic representation of time series by recourse to a comparison of consecutive () or nonconsecutive () values allows for an accurate empirical reconstruction of the underlying phase-space, even in the presence of weak (observational and dynamic) noise[35]. Furthermore, the ordinal patterns associated with the PDF are invariant with respect to nonlinear monotonous transformations. Accordingly, nonlinear drifts or scaling artificially introduced by a measurement device will not modify the estimation of quantifiers, a nice property if one deals with experimental data (see, e.g.,[39]). These advantages make the BP methodology more convenient than conventional methods based on range partitioning, i.e., a PDF based on histograms.

To use the BP methodology[35] for evaluating the PDF, , associated with the time series (dynamical system) under study, one starts by considering partitions of the -dimensional space that will hopefully “reveal” relevant details of the ordinal structure of a given one-dimensional time series with embedding dimension () and time lag (). We are interested in “ordinal patterns” of order (length) generated by


which assign to each time the -dimensional vector of values at times . Clearly, the greater , the more information on the past is incorporated into our vectors. By “ordinal pattern” related to the time , we mean the permutation of defined by


We set if for uniqueness, although ties in samples from continuous distributions have null probability.

For all the possible orderings (permutations) when embedding dimension is , and time-lag , their relative frequencies can be naturally computed according to the number of times this particular order sequence is found in the time series, divided by the total number of sequences,


where denotes cardinality. Thus, an ordinal pattern probability distribution is obtained from the time series.

Figure 2 illustrates the construction principle of the ordinal patterns of length , and with [40]. Consider the sequence of observations . For , there are only two possible directions from to : up and down. For , starting from (up) the third part of the pattern can be above , below , or between and . A similar situation can be found starting from (down). For , for each one of the six possible positions for , there are four possible localizations for , yielding different possible ordinal patterns. In Fig. 2, full circles and continuous lines represent the sequence values , which leads to the pattern . A graphical representation of all possible patterns corresponding to and can be found in Fig. 2 of Parlitz et al.[40].

Figure 2: Illustration of the construction principle for ordinal patterns of length [40]. If and , full circles and continuous lines represent the sequence of values which lead to the pattern .

The embedding dimension plays an important role in the evaluation of the appropriate probability distribution, because determines the number of accessible states and also conditions the minimum acceptable length of the time series that one needs in order to work with reliable statistics[41]. Regarding the selection of the parameters, Bandt and Pompe suggested working with , and specifically considered a time lag in their cornerstone paper[35]. Nevertheless, it is clear that other values of could provide additional information. It has been recently shown that this parameter is strongly related, if it is relevant, to the intrinsic time scales of the system under analysis[42, 43, 44].

Additional advantages of the method reside in i) its simplicity (it requires few parameters: the pattern length/embedding dimension and the time lag ), and ii) the extremely fast nature of the calculation process. The BP methodology can be applied not only to time series representative of low dimensional dynamical systems, but also to any type of time series (regular, chaotic, noisy, or reality based). In fact, the existence of an attractor in the -dimensional phase space in not assumed. The only condition for the applicability of the BP method is a very weak stationary assumption: for , the probability for should not depend on . For a review of BP’s methodology and its applications to physics, biomedical and econophysics signals see Zanin et al.[45]. Moreover, Rosso et al.[14] show that the above mentioned quantifiers produce better descriptions of the process associated dynamics when the PDF is computed using BP rather than using the usual histogram methodology.

The BP proposal for associating probability distributions to time series (of an underlying symbolic nature) constitutes a significant advance in the study of nonlinear dynamical systems[35]. The method provides univocal prescription for ordinary, global entropic quantifiers of the Shannon-kind. However, as was shown by Rosso and coworkers[31, 32], ambiguities arise in applying the BP technique with reference to the permutation of ordinal patterns. This happens if one wishes to employ the BP-probability density to construct local entropic quantifiers, like the Fisher information measure, which would characterize time series generated by nonlinear dynamical systems.

The local sensitivity of the Fisher information measure for discrete PDFs is reflected in the fact that the specific “-ordering” of the discrete values must be seriously taken into account in evaluating Eq. (5). The numerator can be regarded to as a kind of “distance” between two contiguous probabilities. Thus, a different ordering of the summands will lead, in most cases, to a different Fisher information value. In fact, if we have a discrete PDF given by , we will have possibilities for the -ordering.

The question is, which is the arrangement that one could regard as the “proper” ordering? The answer is straightforward in some cases, the histogram-based PDF constituting a conspicuous example. For such a procedure, one first divides the interval (with and the minimum and maximum amplitude values in the time series) into a finite number on non-overlapping sub-intervals (bins). Thus, the division procedure of the interval provides the natural order sequence for the evaluation of the PDF gradient involved in the Fisher information measure. In our current paper, we chose the lexicographic ordering given by the algorithm of Lehmer[33], among other possibilities, due to its better distinction of different dynamics in the Shannon–Fisher plane, (see[31, 32]).

Signature features and exploratory data analysis

Online handwritten classification and verification is an interesting and challenging classification problem. On the one hand, intra-personal variation information can be large. Some people provide signatures with poor consistency. The speed, pressure and inclination, for example, pertaining to the signatures made by the same person can differ greatly on regularity which makes it quite challenging to extract consistent features. On the other hand, we can only obtain few samples from one person and no forgeries in practice. This makes it very difficult to determine the reliability of extracted features. The main idea is to construct an efficient classification scheme for data acquisition, or the reduction of often unmanageable large datasets to a parsimonious form, without mislay important statistical information. We aim at discovering relevant characteristic statistical structures which could be exploited if the key information can be efficiently condensed into a suitable low-dimensional object.

The features we employ in this work are the Information Theory quantifiers already presented. For each of the subjects () in the database and its associated signatures ( genuine and skilled forgery), two associated time series and are extracted and transformed into BP’s PDFs with pattern length (embedding dimension) and time lag . Note that the condition its satisfied.

We denoted these PDFs as:

in which , and identify genuine and skilled forgery signatures, respectively.

We computed the normalized permutation Shannon entropy , the permutation statistical complexity , and the permutation Fisher information measure from these PDFs, and the obtained values are denoted as:

We perform Exploratory Data Analysis (EDA) on the Information Theory quantifiers looking for simple descriptions of the data. Apart from simple descriptive univariate measures, we use the Pearson correlation to measure the association between features. This analysis was performed using the R language and platform version 3.2.1 (http:

Figure 3 shows a scatterplot of the Entropy for both the genuine and skilled forgery signatures. The points correspond to genuine signatures (in blue) and forgery signatures (in red) for each of the subjects. Both types of signatures show similar association (Correlation): and . The entropies of both types of signatures are overlapped and scattered elliptically. However, the bivariate mean and dispersion values differ.

Entropies are less dispersed in the genuine than in the skilled forgery signatures, a signal of the separability between them. Marginal density plots show the distribution of entropy for each coordinate of both types of signatures. These plots, however limited due to its marginal nature, reveal several modes, and suggest both wide and narrow structures in the data.

Figure 3:

Scatter plot with marginal kernel density estimates of entropy quantifiers in both trajectory coordinates time series

and . Genuine (blue) and skilled forgery signatures (red points), 100 subjects. Marginal kernel densities depict the distribution of entropy quantifiers along both axes.

Figure 4 shows the contour plots of bivariate kernel density estimates for the entropy in genuine and forgery signatures. A number of features are immediately noticeable. The dispersion in the former group is much smaller than in the latter (less than

). The kernel density estimates reveal skewness and a mild multimodality in the joint distribution of the data. There are also quite many points that are far from these curves and cluster centers. These points correspond to abnormal local estimates obtained in heterogeneous blocks, possibly induced by the presence of clusters. The modes in genuine signatures are smaller than in forgery signatures, and this may be used as discriminatory measure. Similar results are obtained for the Complexity and the Fisher information; these are reported in the Supplementary Information, see Figs. S1 to S4, respectively.

Figure 4: Contour plot superimposed on the scatterplot of entropy quantifiers for genuine (right panel) and skilled forgery signatures (left panel)

Signatures classification

As pointed out by Boulétreal et al.[46], a signature is characterized by two aspects: a) a conscious one associated to the pattern signature; and b) an unconscious one which leads spontaneous movements constituting the drawing. These two factors produce high variability, being the amount of signature variability strongly writer-dependent. In fact, the signature variability or, conversely, the signature stability can be considered an important indicator for writer characterization[47]. Houmani and Garcia-Salicetti[47] argue that signature stability is required in genuine signatures in order to characterize a writer: the less stable a signature is, the more likely it is that forgery will be dangerously close to genuine signatures for any classifier. Also, complex enough signatures are required in order to guarantee a certain level of security, in the sense that the more complex a signature is, the more difficult it will be to forge it[47].

Boulétreal and collaborators[46, 48] propose a signature complexity measure related to signature legibility and based on fractal dimension. They classify writer styles into: highly cursive, very legible, separated, badly formed and small writings, using only genuine signatures. Unfortunately, such resulting categories were not confronted to classifiers for performance analysis.

We classify the genuine signatures based on causal Information Theory quantifiers: Normalized Permutation Shannon Entropy, Permutation Statistical Complexity and Permutation Fisher Information Measure of both and trajectories on each of the one hundred writers in the MCYT data base, and their

original signatures. The mean and standard deviation values were clustered using the neighbor-joining method and an automatic Hierarchical Clustering with the Euclidean distance-based dissimilarity matrix. Each feature was treated independently, and the results are shown as circular dendrograms. Figure 

5 shows the results of clustering the Entropy. We distinguish three classes of genuine signatures denoted by H1, H2, and H3.

Figure 5: Neighbor-joining, rooted, circular dendrogram clustering of genuine signatures by Entropy: H1, H2, and H3, in red, blue, and green, respectively.

The H1 group is the first group to form, i.e., the one comprised of the most similar individuals. It is formed below the level, and it is composed by two subgroups: H1A and H1B. The H1A group is formed exclusively by oversimplified signatures made by simple loops without identifiable letters. It encompasses the following subjects: 1, 16, 17, 22, 23, 27, 29, 37, 83. The same group is formed when the other features are used. The H1B group is comprised of the following subjects: 2, 5, 8, 10, 19, 21, 24, 28, 32, 35, 36, 39, 43, 48, 49, 51, 55, 58, 59, 64, 69, 70, 74, 77, 89. Although these are simplified signatures, traces of letters and/or more complex curves appear and differentiate them from the members of the H1A group.

The H2 group is formed approximately at the level, and, again, it is comprised of two distinct groups: H2A and H2B. The subjects that make the H2A group are: 4, 7, 12, 15, 18, 20, 30, 31, 34, 38, 40, 41, 42, 52, 57, 60, 62, 66, 67, 68, 71, 73, 75, 79, 80, 81, 86, 87, 91, 96, 100. It is composed by signatures with traces that resemble letters, but that are not perfectly identifiable, and that include circling traces of large or moderate size. Signatures in this group are kind of framed by large loops. The H2B group is similar to the previous one, i.e., it is formed by signatures with large and medium size circling traces, but with more identifiable letters than in the previous groups. Names and surnames are more readable in this group than in previous ones. It is formed by the following signatures: 6, 9, 13, 25, 33, 45, 50, 63, 65, 76, 78, 82, 84, 85, 88, 92, 94, 95, 97, 99.

The H3 group is formed at, approximately, the level by the fusion of two other highly unbalanced subgroups: one, H3A, with only two subjects (44, 46) and the other, H3B, with thirteen subjects (3, 11, 14, 26, 47, 53, 54, 56, 61, 72, 90, 93, 98). These two clusters form at approximately the same level. The former is composed of calligraphic signatures where vertical traces predominate over horizontal ones. The latter is composed of highly cursive signatures, where separation between the surname and the family name predominates.

The same results of clustering was obtained with the Manhattan (norm ) and Maximum distances ( norm), showing that Entropy is an expressive and stable quantifier. Similar analyses were carried with the Permutation Statistical Complexity and Permutation Fisher Information (presented in figures Figs. S5 and S6 in the Supplementary Information). Complexity produces the same clusters identified by Entropy, so it adds no new information. The Fisher information measure forms the same H1A group that was identified by the Entropy, but with less cohesion, at about . In other words, these nine subjects are more similar locally than globally. As with Entropy, three main groups form at similar levels. The members of these clusters are slight variations of those identified using Entropy, with very similar structure.

Table 1 presents the mean and standard deviation of the three quantifiers over the genuine and skilled forgery signatures ( and time series) for each of the typical subjects, split in the three aforementioned types H1, H2 and H3. There are interesting tendencies in these data. Genuine signatures present quantifiers values lower than those corresponding to forgery signatures, and the latter also exhibit larger standard deviation. This could be explained by the imitative character of these signatures, however it deserves closer studies.

Entropy Complexity Fisher Information
Subject Coordinate Class Mean S.D. Mean S.D. Mean S.D.
H1 H1A 22 F
H1B 39 F
H2 H2A 60 F
H2B 6 F
H3 H3A 98 F
H3B 46 F
Table 1: Sample mean and standard deviation (S.D.) of the time series quantifiers for the 25 genuine (G) and 25 skilled forged (F) signatures, for each of the typical subjects: H1A, H1B, H2A, H2B, H3A, and H3B (same order as in Fig. 1).

The classification into subclasses of genuine signatures was also carried by the parallelepiped algorithm[49], arguably the simplest model-free classification procedure. Entropy leads to clusters with nice interpretability. Figure 6 shows the regions that define the three classes identified by the dendrogram based on Entropy presented in Fig 5. All subclasses are well separated by disjoint boxes, with the only exception of H1B and H2A that overlap slightly but without compromising the discrimination. The classes are preserved using this classification superimposed with Complexity and Fisher Information features; see Figs. S7 and S8 in the Supplementary Information.

Figure 6: Classification by the rule of the parallelepiped of genuine signatures using Entropy (one signature example from each of the three groups is shown). Each subject is identified by its ID.

Online signature verification

The problem we have at hand consists in identifying suspicious signatures, given that we only have examples from genuine signatures. This is due to the fact that, in practice, it is too expensive, too hard or even impossible to obtain a significant number of good quality forgery signatures for every possible individual in the data base. This, thus, configures a One-Class classification problem. Among the many ways of tackling such problems, Support Vector Machines (SVMs) are suitable for solving machine learning problems even in large dimensional feature spaces

[50, 51, 52].

SVMs were introduced by Vapnik and co-workers[51, 53]

, and extended by a number of other researchers. Their remarkably robust performance with respect to sparse and noisy data makes them the choice in several applications. A SVM is primarily a method that performs classification tasks by constructing hyperplanes in a multidimensional space that separates cases of different class labels. SVMs perform both regression and classification tasks and can handle multiple continuous and categorical variables. To construct an optimal hyperplane, a SVM employs an iterative training algorithm, which is used to minimize an error function.

One-Class Support Vector Machines (OC-SVMs) are a natural extension of SVMs[54, 55]. The solution consists in estimating a distribution that encompasses most of the observations, and then labeling as “suspicious” those that lie far from it with respect to a suitable metric. An OC-SVM solution is built estimating a probability distribution function which makes most of the observed data more likely than the rest, and a decision rule that separates these observation by the largest possible margin. The computational complexity of the learning phase is intensive because the training of an OC-SVM involves a quadratic programming problem[51], but once the decision function is determined, it can be used to predict the class label of new test data effortlessly.

In our case, the observations are six-dimensional vectors: Entropy, Complexity and Fisher Information in each of the two directions, horizontal and vertical, and we train the OC-SVM with genuine signatures. Let be the six-dimensional training examples of genuine signatures. Let be a kernel map which transforms the training examples to another space. Then, to separate the data set from the origin, one needs to solve the following quadratic programming problem:


subject to


where are nonzero slack variables which allow the procedure to incur in errors. The parameter characterizes the solution as a)

it sets an upper bound on the fraction of outliers (training examples regarded out-of-class) and,

b) it is a lower bound on the number of training examples used as Support Vectors. We used in our proposal.

Using Lagrange techniques and a kernel function , for the dot-product calculations, the decision function becomes:


This method thus creates a hyperplane characterized by and which has maximal distance from the origin in the feature space and separates all the data points from the origin. Here are the Lagrange multipliers; every is weighted in the decision function and thus “supports” the machine; hence the name Support Vector Machine. Since SVMs are considered to be sparse, there will be relatively few Lagrange multipliers with a nonzero value.

Our choice for the kernel is the Gaussian Radial Base function:


where is a kernel parameter and is the dissimilarity measure; we used Euclidean distance.

The parameter was selected by 5-fold-cross validation, that its, the dataset is divided into five disjoint subsets, and the method is repeated five times. Each time, one of the subsets is used as the test set and the other four subsets are put together to form the training set. Then the average error across all trials is computed. Every observation belongs to a test set exactly once, and belongs to a training set four times. Accuracy (ACC), Area Under the ROC Curve (AUC) and Equal Error Rate (EER) are used as performance measures [56].

In the context of signature verification one-class classification problems, a false positive occurs when a genuine signature is erroneously classified as being atypical. The probability of false positive misclassification is the false positive rate, which is controlled by the parameters in the aforementioned OC-SVM formulation. The parameter can be fixed a priori

and it corresponds to the percentage of observations of the typical data which will be assigned as the Type I Error.

We used the LIBSVM (version 2.0) tool, linked with the R software, that supports vector classification and regression, including OC-SVM.[57] We used the standard parameters of the algorithm.

In order to assess the consistency of our procedure, and to promote the comparison with other methods reported in the literature, we evaluate the performance of the proposed verification system for different training samples: random samples of size () of genuine signatures were selected for each user. Table 2 presents the average value of all performance metrics using . ACC suggests that the larger the training sample, the better the performance is. AUC presents a similar tendency, and its average is larger than , indicating that our verification system produces an excellent classification.

ACC () AUC () EER ()(%)
5 0.6940 0.8816 0.1890
10 0.7678 0.8940 0.1711
14 0.8144 0.8975 0.1634
18 0.8250 0.8866 0.1731
22 0.8389 0.8909 0.1632
Table 2: Performance of the system trained with varying number of samples of genuine signatures. and denote measures of quality (the higher the better) and of error (the smaller the better), respectively.

As mentioned in the introduction, the two methodologies with best results are those based on Dynamic Time Warping (DTW) and Hidden Markov Models (HMM). In the following we compare our proposal with these two recent state-of-the-art methods using the ERR(%) over the same data base:

  • Fierrez-Aguilar et al.[58], ERR(%) = 2.12 (five training signatures; Global (Parzen WC) and local (HMM) experts function);

  • Fierrez-Aguilar et al.[59], ERR(%) = 0.74 (ten training signatures; HMM based algorithm);

  • Pascual-Gaspar et al.[60], ERR(%) = 1.23 (five training signatures; DTW-bases algorithm, result with scenario-dependent optimal features.

The results of our proposal using five (ten, respectively) training samples, are ERR(%) = 0.19 (0.17, respectively). Clearly, our system provides better performance using similar number of training signatures (see Table 2 for more details).

In the following we analyze the performance of the proposed procedure applied selectively to the pre-classified samples. Table 3 presents the performance of the system when applied to genuine pre-classified signatures. For all classes we observe that the larger the training sample, also the larger the average ACC is. The best average AUC are observed for the class H2, followed by H1 and H3. This indicate that H2 signatures are easily identifiable. Note that the mean values of ERR(%) for H2 are smaller than H1 and H3. The ERR(%) values in H3 indicate that identifying forgeries in this class is hard.

Class ACC () AUC () EER(%) ()
H1 5 0.6758 0.8692 0.1976
10 0.7566 0.8828 0.1812
14 0.8039 0.8857 0.1717
18 0.8217 0.8894 0.1662
22 0.8277 0.8788 0.1631
H2 5 0.7059 0.8945 0.1784
10 0.7819 0.9079 0.1548
14 0.8284 0.9096 0.1509
18 0.8327 0.8900 0.1734
22 0.8515 0.8996 0.1608
H3 5 0.6948 0.8653 0.2053
10 0.7450 0.8720 0.2036
14 0.7907 0.8832 0.1874
18 0.8062 0.8686 0.1874
22 0.8214 0.8889 0.1716
Table 3: Performance of the classification of pre-classified samples varying the number of samples of genuine signatures used for training; same coding as in Tab. 2.


We proposed a procedure for identifying skilled forgery online handwritten signatures using time causal Information Theory quantifiers and One-Class Support Vector Machines. This is a competitive proposal from the computational viewpoint as it uses only the signatures coordinates, and it produces better results than state-of-the-art techniques. The technique also produces meaningful classification of the input data, as it is able to separate different types of signatures. To the best of our knowledge, this is the first time Information Theory quantifiers have been used for this problem.

The central contribution is the use of the Bandt and Pompe (BP) PDF symbolization which is invariant to a number of transformations of the input data. In fact, the original time series are pre-processed only to facilitate the signal sampling, and this scaling has no effect on the BP PDFs. This representation, which is sensitive to the time causality, is able to capture essential dynamical characteristics of the signatures that lead to excellent discrimination between skilled forgery and genuine online handwritten signatures, despite the high variability the data possess. Additionally, obtaining the BP PDFs is computationally simple and efficient.

Only six Information Theory features are required for the classification, three from each horizontal and vertical direction: Shannon Entropy, Statistical Complexity and Fisher Information. This contrasts many state-of-the-art works that require features in high-dimensional spaces, e.g. forty or even more. As said, our proposal does not require highly specialized hardware able to capture signature speed, pressure, orientation etc.

The classification was performed by a One-Class Support Vector Machine trained with genuine signatures. The learned rule is consistent with respect to the number of training samples, and with as few as five examples it surpasses the performance of recent successful techniques. We assessed the performance of our proposal using the same data base employed in the current literature, with also the same measures of quality and error.


The authors are grateful to CONICET, CNPq and FACEPE for partial funding of this research. The Biometrics Research Lab (ATVS), Universidad Autónoma de Madrid, provided the MCYT-100 signature corpus employed in this work.

Authors Contributions

OAR, RO and ACF conceived and designed the research. OAR performed the numerical data analysis. RO and ACF performed the statistical analysis. OAR and RO prepared figures. OAR and ACF wrote the manuscript. All authors reviewed and approved the manuscript

Competing interests

The authors declare no competing financial interests.


  • [1] Ortega-Garcia J, Bigun J, Reynolds D, Gonzalez-Rodriguez J. Authentication gets personal with biometrics. IEEE Signal Processing Magazine. 2004 Mar;21(2):50–62.
  • [2] Plamondon R, Lorette G. Automatic signature verification and writer identification: the state of the art. Pattern Recognition. 1989;22(2):107–131. Available from:
  • [3] Leclerc F, Plamondon R. Automatic signature verification: The state of the art: 1989–1993.

    International Journal of Pattern Recognition and Artificial Intelligence. 1994;8(3):643–660.

  • [4] l Gupta G, McCabe A. A review of dynamic handwritten signature verification. Department of Computer Science, James Cook University, Australia; 1997. Available from:
  • [5] Impedovo D, Pirlo G. Automatic Signature Verification: The State of the Art. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews. 2008 Sept;38(5):609–635.
  • [6] El-Henawy IM, Rashad MZ, Nomir O, Ahmed K. Online signature verification: state of the art. International Journal of Computers and Technology. 2013;4:664–678.
  • [7] Hilton O. Signatures, review and a new view. Journal of Forensic Sciences. 1992;37:125–129.
  • [8] Wikipedia. Ballistic movement — Wikipedia, The Free Encyclopedia; 2015. [Online; accessed 18-January-2016]. Available from: {}{}}{}{cmtt}.
  • [9] Denier van~der Gon JJ, Thuring JP. The guiding of human writing movement. Kybernetik. 1965;2:145--148.
  • [10] Nalwa VS. Automatic on-line signature verification. Proceedings of the IEEE. 1997 Feb;85(2):215--239.
  • [11] Longstaff M, Heath R. Chaos and fractals in human physiology. Scientific American. 1990;262:42--49.
  • [12] West BJ. Fractal Physiology and Chaos in Medicine. 2nd ed. No.~16 in Studies of Nonlinear Phenomena in Life Science. Singapore: World Scientific Publishing; 2013.
  • [13] Longstaff M, Heath R. A nonlinear analysis of temporal characteristic of handwriting. Human Movement Science. 1999;18:485--524`.
  • [14] Rosso OA, Larrondo HA, Martín MT, Plastino A, Fuentes MA. Distinguishing noise from chaos. Physical Review Letters. 2007;99:154102. Available from:
  • [15] Rosso OA, Olivares F, Plastino A. Noise versus chaos in a causal Fisher-Shannon plane. Papers in Physics. 2015 Apr;7:070006.
  • [16] Ortega-Garcia J, Fierrez-Aguilar J, Simon D, Gonzalez J, Faundez-Zanuy M, Espinosa V, et~al. MCYT baseline corpus: a bimodal biometric database. IEE Proceedings Vision, Image and Signal Processing. 2003 Dec;150(6):395--401.
  • [17] Baron R, Plamondon R. Acceleration measurement with an instrumented pen for signature verification and handwriting analysis. IEEE Transactions on Instrumentation and Measurement. 1989 Dec;38(6):1132--1138.
  • [18] Garcia-Salicetti S, Houmani B N abd Ly-Van, Dorizzi B, Alonso-Fernandez F, Fierrez J, Ortega-Garcia J, et~al. Online handwritten signature verification. In: Petrovska-Delacrétaz D, Chollet G, Dorizzi B, editors. Guide to Biometric Reference Systems and Performance Evaluation. London: Springer-Verlag; 2009. p. 125--165.
  • [19] Brissaud JB. The meanings of entropy. Entropy. 2005;7(1):68--96. Available from:
  • [20] Shannon CE. A Mathematical Theory of Communication. The Bell System Technical Journal. 1948;27:379--423.
  • [21] Shannon C, Weaver W. The Mathematical Theory of Communication. University of Illinois Press; 1949.
  • [22] Fisher RA. On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London, A. 1922;222:309--368.
  • [23] Frieden RB. Science from Fisher information: A Unification. Cambridge, UK: Cambridge University Press; 2004.
  • [24] Zografos K, Ferentinos K, Papaioannou T. Discrete approximations to the Csiszár, Renyi, and Fisher measures of information. Canadian Journal of Statistics. 1986;14(4):355--366. Available from:
  • [25] Pardo L, Morales D, Ferentinos K, Zografos K. Discretization problems on generalized entropies and R-divergences. Kybernetika. 1994;30:445--460`.
  • [26] Sanchez-Moreno P, Dehesa JS, Yáñez RJ. Discrete densities and Fisher Information. In: Proceedings of the 14th International Conference on Difference Equations and Applications. Difference Equations and Applications. Istanbul, Turkey: Uğur-Bahçeşehir University Publishing Company; 2009. .
  • [27] López-Ruiz R, Mancini HL, Calbet X. A statistical measure of complexity. Physics Letters A. 1995;209(5-6):321--326. Available from:
  • [28] Lamberti PW, Martín MT, Plastino A, Rosso OA. Intensive entropic non-triviality measure. Physica A: Statistical Mechanics and its Applications. 2004;334(1--2):119--131. Available from:
  • [29] Grosse I, Bernaola-Galván P, Carpena P, Román-Roldán R, Oliver J, Stanley HE. Analysis of symbolic sequences using the Jensen-Shannon divergence. Phys Rev E. 2002 Mar;65:041905. Available from:
  • [30] Martin MT, Plastino A, Rosso OA. Generalized statistical complexity measures: Geometrical and analytical properties. Physica A. 2006;369:439--462.
  • [31] Olivares F, Plastino A, Rosso OA. Ambiguities in Bandt-Pompe's methodology for local entropic quantifiers. Physica A: Statistical Mechanics and its Applications. 2012;391(8):2518--2526. Available from:
  • [32] Olivares F, Plastino A, Rosso OA. Contrasting chaos with noise via local versus global information quantifiers. Physics Letters A. 2012;376(19):1577--1583. Available from:
  • [33] Schwarz K. The Archive of Interesting Code; 2011. Available from:
  • [34] Vignat C, Bercher JF. Analysis of signals in the Fisher-Shannon information plane. Physics Letters A. 2003;312(1-2):27--33. Available from:
  • [35] Bandt C, Pompe B. Permutation Entropy: A Natural Complexity Measure for Time Series. Physical Review Letters. 2002 Apr;88:174102--1--174102--4. Available from:
  • [36] Rosso OA, Olivares F, Zunino L, De Micco L, Aquino ALL, Plastino A, et~al. Characterization of chaotic maps using the permutation Bandt-Pompe probability distribution. European Physics Journal B. 2013 Mar;86(4):116--129. Available from:
  • [37] Rosso OA, Masoller C. Detecting and quantifying stochastic and coherence resonances via information-theory complexity measurements. Physical Review E. 2009 Apr;79:040106. Available from:
  • [38] Rosso OA, Masoller C. Detecting and quantifying temporal correlations in stochastic resonance via information theory measures. European Physics Journal B. 2009 Apr;69(1):37--43. Available from:
  • [39] Saco PM, Carpi LC, Figliola A, Serrano E, Rosso OA. Entropy analysis of the dynamics of El Niño/Southern Oscillation during the Holocene. Physica A: Statistical Mechanics and its Applications. 2010;389(21):5022--5027. Available from:
  • [40] Parlitz U, Berg S, Luther S, Schirdewan A, Kurths J, Wessel N. Classifying cardiac biosignals using ordinal pattern statistics and symbolic dynamics. Computers in Biology and Medicine;42(3):319--327. Available from:
  • [41] Kowalski AM, Martín MT, Plastino A, Rosso OA. Bandt-Pompe approach to the classical-quantum transition. Physica D: Nonlinear Phenomena. 2007;233(1):21--31. Available from:
  • [42] Zunino L, Soriano MC, Fischer I, Rosso OA, Mirasso CR. Permutation-information-theory approach to unveil delay dynamics from time-series analysis. Physical Review E. 2010 Oct;82:046212. Available from:
  • [43] Soriano MC, Zunino L, Rosso OA, Fischer I, Mirasso CR. Time Scales of a Chaotic Semiconductor Laser With Optical Feedback Under the Lens of a Permutation Information Analysis. IEEE Journal of Quantum Electronics. 2011 Feb;47(2):252--261.
  • [44] Zunino L, Soriano MC, Rosso OA. Distinguishing chaotic and stochastic dynamics from time series by using a multiscale symbolic approach. Physical Review E. 2012 Oct;86:046210. Available from:
  • [45] Zanin M, Zunino L, Rosso OA, Papo D. Permutation Entropy and Its Main Biomedical and Econophysics Applications: A Review. Entropy. 2012;14(8):1553. Available from:
  • [46] Bouletreau V, Vincent N, Sabourin R, Emptoz H. Handwriting and signature: one or two personality identifiers? In: Proceedings. Fourteenth International Conference onPattern Recognition. vol.~2; 1998. p. 1758--1760.
  • [47] Houmani N, S GS. Quality measures for online hadwritten signatures. In: Scharcanski J, ca HP, Du E, editors. Signal and Image Processing for Biometrics. No. 292 in Lecture Notes in Electrical Engineering. Springer; 2014. p. 255--283.
  • [48] Vincent N, Boulétreau V, Empotz H, Sabourin R. How to use fractal dimensions to qualify writings and writers. Fractals. 2000;8:85--97.
  • [49] Richards JA, Jia X. Remote Sensing Digital Image Analysis. 4th ed. Berlin: Springer; 2006.
  • [50] Campbell C, Ying Y. Learning with Support Vector Machines. In: Brachman RJ, Dietterich T, editors. Synthesis Lectures on Artificial Intelligence and Machine Learning. No.~5 in Synthesis Lectures on Artificial Intelligence and Machine Learning. Santa Fe, CA: Morgan and Claypool; 2011. p. 1--95.
  • [51] Boser BE, Guyon IM, Vapnik VN. A training algorithm for optimal margin classifiers.

    In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory. Pittsburgh: ACM Press; 1992. p. 144--152.

  • [52] Vapnik VN.

    The Nature of Statistical Learning Theory.

    Springer; 1995.
  • [53] Vapnik VN. Statistical Learning Theory. Willey; 1998.
  • [54] Schölkopf B, Platt JC, Shawe-Taylor JC, Smola AJ, Williamson RC. Estimating the Support of a High-Dimensional Distribution. Neural Computation. 2001 Jul;13(7):1443--1471. Available from:
  • [55] Schölkopf B, Smola AJ. Learning with Kernels. Cambridge: MIT Press; 2002.
  • [56] Provost F, Kohavi R. On Applied Research in Machine Learning. Machine Learning. 1998;30(2/3):127--132. Available from:
  • [57] Chang CC, Lin CJ. LIBSVM: a library for support vector machines; 2001. Available from:
  • [58] Fierrez-Aguilar J, Nanni L, Lopez-Peñalba J, Ortega-Garcia J, Maltoni D. An On-Line Signature Verification System Based on Fusion of Local and Global Information. In: Kanade T, Jain A, Ratha N, editors. 5th International Conference on Audio- and Video-Based Biometric Person Authentication (AVBPA). vol. 3546 of Lecture Notes in Computer Science. Springer Berlin Heidelberg; 2005. p. 523--532. Available from:
  • [59] Fierrez J, Ortega-Garcia J, Ramos D, Gonzalez-Rodriguez J. HMM-based on-line signature verification: Feature extraction and signature modeling. Pattern Recognition Letters. 2007;28(16):2325--2334. Available from:
  • [60] Pascual-Gaspar JM, Cardeñoso Payo V, Vivaracho-Pascual CE. Practical On-Line Signature Verification. In: Tistarelli M, Nixon MS, editors. Proceedings Third International Conference Advances in Biometrics ICB. vol. 5558 of Lecture Notes in Computer Science. Springer Berlin Heidelberg; 2009. p. 1180--1189. Available from: