Highly Efficient Human Action Recognition with Quantum Genetic Algorithm Optimized Support Vector Machine

by   Yafeng Liu, et al.

In this paper we propose the use of quantum genetic algorithm to optimize the support vector machine for human action recognition. The Microsoft Kinect sensor can be used for skeleton tracking, which provides the joints' position data. However, how to extract the motion features for representing the dynamics of a human skeleton is still a challenge due to the complexity of human motion. We present a highly efficient features extraction method for action classification, that is, using the joint angles to represent a human skeleton and calculating the variance of each angle during an action time window. Using the proposed representation, we compared the human action classification accuracy of two approaches, inclduing the optimized SVM based on quantum genetic algorithm and the conventional SVM with cross validation. Experiemental results on the MSR-12 data show a higher accuracy in quantum genetic algorithm optimized support vector machine.



There are no comments yet.


page 1

page 2

page 3

page 4


Skeleton Based Action Recognition using a Stacked Denoising Autoencoder with Constraints of Privileged Information

Recently, with the availability of cost-effective depth cameras coupled ...

Quantum-Inspired Support Vector Machine

Support vector machine (SVM) is a particularly powerful and flexible sup...

Selection of a Minimal Number of Significant Porcine SNPs by an Information Gain and Genetic Algorithm Hybrid Model

A panel of large number of common Single Nucleotide Polymorphisms (SNPs)...

GA-SVM for Evaluating Heroin Consumption Risk

There were over 70,000 drug overdose deaths in the USA in 2017. Almost h...

Automatic design of quantum feature maps

We propose a new technique for the automatic generation of optimal ad-ho...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Human action recognition (HAR) plays an important role in video surveillance, health care and human computer interaction(HCI) (Chernbumroong et al., 2014)

. One goal of HAR is to provide the information about the user’s actions with the help of a computer. The information can be widely applied in artificial intelligence area. For example, the recognition and prediction of elderly people’s actions will help them with their health care

(Abowd et al., 1998). Human activity recognition also plays a key role in natural interaction area in HCI. Moreover, HAR bring a new vision to some traditional areas, e.g. sports motion analysis, virtual reality (VR), augmented reality (AR) and other human-computer interaction area.

According to the classical Newtonian mechanics theory, people can get the kinematic law of the object completely when they know the initial state and driving force. However, this method does not work here, since the actions of human body is complex and furthermore there is a large number of interactions. So the actions pattern is not easy to be described (Aggarwal and Ryoo, 2011), we must ask some tools for help. The common research tools can be divided into two categories, including the video-based methods and the sensor-based methods (Woznowski et al., 2016). We will use sensor-based one here. The development of new sensing devices, e.g. the Microsoft Kinect and other RGB-D devices bring new opportunities for the HAR researchers (Presti and La Cascia, 2016). This kind of devices are inexpensive, portable, and can be used for skeleton tracking, which provide joints’ information. One question we should mention that sample inputs will be considered since some good sample representation can make problem simple and accuracy. Joint positions, key poses and joint angles is some usual sample representations. This paper presents an approach for features extraction that considers only the information obtained from the 3-dimensional skeletal joints. We extract the skeletal features by computing all angles between any triplet of joints and then calculate the variance of each angle during the time period when an action is performed.

Another question is that how to choose an effective pattern recognition algorithm. people have tried many methods, such as Decision Tree(DT), Bayes methods, k-Nearest Neighbour(kNN), Neural Network(NN), Support Vector Machine(SVM), Hidden Markov Model(HMM) and so on

(Seddik et al., 2017; Gaglio et al., 2015). Among them Support vector machines are widely used because of their simplicity and efficiency. Support Vector Machine (Cortes and Vapnik, 1995)classifies the data by constructing hyperplane, separating different categories of data from each other. Nevertheless, it is not an easy task to find the appropriate parameters for SVM due to the limited searching capability with the grid search method. Thus the best classification results can not be achieved. An inappropriate parameter will decrease the performance of SVM classifier, so people have tried some methods for optimized parameters. Grid search, particle swarm algorithm and genetic algorithm are common used here.

Here we will use the quantum genetic algorithm to improve the efficiency of SVM parameter optimization. The quantum algorithm is based on the correlation (Mosca, 2008; Nielson and Chuang, 2011; Jones, 2013) of quantum bits, which gives the algorithm the characteristics of parallelism. Compared with the classical algorithm, the computational efficiency has been greatly improved (Lenstra, 2000; Jones, 2013). Since the improvement of the search efficiency, the population search range of SVM parameters is enlarged. In recent years, the quantum genetic algorithms have been widely used in machine fault diagnosis, geology research and environmental analysis (Zhang and Jiang, 2017; Wei et al., 2016; Chen et al., 2016; Zhou and An, 2010; Xie et al., 2015). In this paper, we use the quantum genetic algorithm to optimize the SVM for classifying the human actions by building a better SVM model.

The rest of the paper is organized as follows. Section presents the Kinect system and the classification algorithm. We describe the experimental results in Section and concludes the paper in Section .

Ii Methods

ii.1 Feature Extraction of Human Action

(a) The skeleton joints
(b) Special label of angel sample within the part of the hip and the shoulder center.
Figure 1: The schematic diagram of the skeleton joints

As shown in Figure 1(a), we get the 3D positions of each skeleton joint with the Kinect. The human action sample is defined as


where is the coordinates of the -th joint. As there are joints, we get a 60-dimensional vector. In reality, we don’t need an accurate joint position for human activity recognition as the relative positions will meet our requirements. We calculate the relative position of the two adjacent joints. This vector illustrates the direction of the limb between these two joints. For a joint connected with multiple limbs, the angles are formed, as shown in Figure 1(b).

Consider a skeleton joint and the two adjacent joints, the coordinates are


The vector of the limb is defined by two adjacent joints:


The angle formed by these two limb is


As Fig 1(a) shows, there are five joints which own only one junction (red points in Fig 1(a)), and no angles exist on these joints. There are 13 joints which own two junctions (blue points in Fig 1(a)), and each of these joints owns one angle. There are only one joint which owns three junctions (yellow point in Fig 1(a)). According to the knowledge of permutation and combination, the angles on this joint are totally. At last, the shoulder center joint own four junctions(black point in Fig 1(a)). So there are angles on it. The total number of angles is on body joints.

The relationship between the skeleton joints and the angles is shown in Table 1. The left column in the table shows the spatial position label of joints, and they are all three dimensional vector. The right column shows the the intersection angle label between limb, and all of them are scalars. These labels of angle sample are arranged according to their order of position sample. We can see that the dimensions of sample are reduced from 60 to 22 using angle strategy. However, there are two special joint, joint 1 and joint 3(see Fig 1(b)). More than one angles exist within these two joints. We define another ranking method here: we arrange angles within the same joints according to the order of joint label next to it. For example, the first joint, hip center, there are three connecting joints, which are the 2nd, the 13th and the 17th joint, as in the upper part of the Fig 1(b): We name the angle formed by joint 2 and joint 13 , the angle formed by joint 2 and joint 17 , and the joint formed by joint 13 and 17 . For the joint 3, we use the same method dealing with it(see the lower part of fig 1(b)).

The index of skeleton joints The index of joint angles
Table 1: The corresponding relationship between the skeleton joints and the angles. From the beginning of the forth joint, the joint connected to only one joint will exist every other three joints periodically, and it can’t form any angles. All the other joints connect to two joints, and form one angle. The table doesn’t show all mapping relations, and the omitted part is represented by the ellipsis.

We use the angle representation method to reduce the -dimensional vector to -dimensional vector. For the continuous action, we need to add the timing information and process multiple frames together for action recognition. Here we set a time window for action segmentation. Assume the action lasts for s, during which period there are frames acquired from the Kinect. We can get the variance of each angle in this time window.


ii.2 Quantum Genetic Algorithm (QGA)

Quantum genetic algorithm is an optimization algorithm based on quantum computing theory. The basic representation in the quantum theory is a coherent state, which is very different from classical one. Here, we use the state vector to describe genetic coding, and use the quantum logic gate to realize the evolution of population. Because of the kind of representation, quantum algorithm has the characteristics of parallelism, and for this reason it is more faster than the traditional algorithm in p searching speed.

The Coding of Quantum genetic algorithm. The binary and decimal codes are used in the classical genetic algorithm. When quantum bits are used, the encoding will be different. There is superposition and coherence between the quantum states, so unlike the classical bits, there are entanglement properties in the quantum bits. For a quantum bit, it cannot simply be written as 0 or 1 states, but as an arbitrary superposition between them, so the quantum bit can be written as:


where and are both vectors, representing the system states. and

are a pair of parameters, and the square of them corresponds to the probability measuring of these two states. These two parameters satisfy the normalization rule:


A chromosome with bits can be expressed as Eq. 8, and for each element of the chromosome,


The Quantum logic gate. In the quantum genetic algorithm, the operation of quantum bits is achieved through the quantum logic gate. The quantum logic gate can help realize the evolution of the population. The optimal gene can be produced through the guidance of rotation strategy(see Table 2). This can speed up the entire algorithm. The operation of a quantum logic gate can be expressed in the form of a matrix:


where and represent the quantum bits of the chromosomes for the generation and respectively. represents the quantum logic gate:


is the rotation angle. The selection of direction and magnitude is shown in Table 2.

0 0 FALSE 0 0 0 0 0
0 0 TRUE 0 0 0 0 0
0 1 FALSE +1 -1 0
0 1 TRUE -1 +1 0
1 0 FALSE -1 +1 0
1 0 TRUE +1 -1 0
1 1 FALSE 0 0 0 0 0
1 1 TRUE 0 0 0 0 0
Table 2: The rotation strategy of the quantum logic gate.

In Table 2, and represent the optimal chromosome and the current optimal chromosome, respectively. is the fitness function, is the rotation angle. By selecting different rotation angles, we can control the convergence speed and accurate.

ii.3 Support Vector Machine

Figure 2: The diagram of the classification surface

Support Vector Machine (SVM) can mainly be divided into two parts: classification and regression. It is widely used for its highly efficiency and simplicity. For classification problem, it is a kind of supervised learning model which classifies the representation

of an object in high dimensional space according to a label . This representation and label constitute the sample space , where is the number of the samples.Support Vector machine is to find a pair of hyperplanes, which separately passes through the nearest two points in different classes. In order to achieve the best classification results, we need to make the distance between the hyperplanes as large as possible. As shown in Figure 2, the hyperplanes are represented by the solid line. Therefore, the task of SVM is simply boiled down to find the maximum value of in this figure. This optimization problem can be written as:


Where and represent the slope and intercept of hyperplanes. Considering the constraint of hyperplanes passing through the closest points , the Lagrange equation can be obtained:


here is Lagrange operator. The parameters of SVM can be obtained by finding the extreme values of the equation. However, for practical problems, the distance between two classes may not be so large. Thus, it is necessary to introduce the concept of soft interval classification, that is, to allow some points to fall between two of hyperplane, but not across the middle dotted line. In this case, the target function needs to add a slack variable and the constraint should be modified to , . is the kth relaxation variable. This condition is not so strict as the previous constraint. The sample point may appear between the two hyperplanes. The corresponding dual equation can be modified to:


In the Eq 13, is the penalty factor and is a natural number, corresponding to the -order soft interval classification. and here are Lagrange operator. We set here, which is the linear soft interval classification. By some commonly used derivation methods, we can get a simplified equation:


In comparison with the conventional SVM, the constraint conditions are changed:


When describing the parameters of the sample points, we do not use directly, instead we use a mapping . This is due to the fact that we cannot get good classification results with a linear classification plane in many practical problems. We need a more complex plane to make the classification better. This kind of mapping plays such a role. In 14, we define , where is called the kernel function. The commonly used kernel functions are listed follow:

Radial Basis Function


Polynomial Kernel Function


Sigmoid Kernel Function


Linear Kernel Function


ii.4 The Flowchart of the Algorithm

We will use radial basis function for next research. In order to make the support vector machine run normally, the penalty factor

and the kernel function parameter are two variables needed to be determined according to . These two variables will directly affect the accuracy of the classification. How to determine these two parameters quickly and accurately is the key to the successful SVM model. Therefore, we will use the more efficient quantum genetic algorithm to help find these two parameters. The flowchart is shown in Figure 3. It can be seen that the two parameters need to be quantum encoded first. Then the optimal solution is constantly adjusted through the quantum logical gate. By initializing a set of system parameters, we can calculate the classification accuracy. This accuracy can be used as the fitness function. We aim to search out a set of according this fitness function.

Figure 3: Flowchart
Figure 4: Action Segmentation: we use the MSRC-12 dataset collected by the Cambridge Microsoft Lab. We segment a complete action from the whole , the upper panel shows a segmentation of the ”throw” action. The lower panel shows the ”raising both arms” action.

The eight steps of the SVM based on QGA optimization algorithm:

Step 1 Initialize the algorithm parameters, including the maximum number of iterations, population size, variable binary length and so on. Enter the training set data and test set data, as well as the corresponding labels.

Step 2 Initialize population of penalty factor and parameters of kernel function: equal treatment of all genes, that is, initialize all genes to , indicating that each chromosome appears equally in the initial search.

Step 3 Measure the initial population and get a specific , which is a series of binary codes of the initialization length. Change them into decimal number and bring them into the SVM model with the training sample. The current individual is evaluated and the optimal individual is retained.

Step 4 Determine if precision is convergent or if the maximum number of iterations is reached. If yes, the algorithm terminates, else go to step 5.

Step 5 Update population by using the rotation angle strategy in table 2.

Step 6 Check to see whether the catastrophic conditions are met. If yes, keep the optimal value and re-initialize the population. If not, go to step 7.

Step 7 Increase the number of iterations by one and return to step 3 to continue the execution.

Step 8 Output the optimization parameters and evaluate the test samples with these parameters.

Iii The Classification Process and Experiment Results

iii.1 Problem Statement

Figure 5: The curve of the elbow angle changes. It is the angle between the forearm and the back arm of the elbow. The picture above is the action ”throw”, the lower figure shows the raising both arms movement. The motion amplitude of the left arm of the throwing action is much smaller than that of other limbs.

We used the MSRC-12 gesture dataset Fothergill et al. (2012), which consists of sequences of 12 groups of actions collected by the Cambridge Microsoft Laboratory through the system. We selected the eighth group of holding the hand and the ninth group of protest the music two similar movements to carry on our research. The segmentation of both actions is shown in Figure 4. As mentioned above, the Kinect collection method is to record the three-dimensional real-time coordinates of the human joints as shown in Figure 1(a). Further, we calculated the angle of the torso on each joint point through the these data. The represents the relative angle between the limbs of torso, and indicates the relative angle between the limbs of the upper body, and the relative angle between the limbs of the lower body is .

Figure 5 shows the change of the angle of limbs at the elbow in both arms. As can be seen from the figure, the curve periodically renders 10 sets of actions. We can see that the magnitude of the left arm movement changes during the throwing motion, which is much smaller than the other three curves. Therefore, the angle changing amplitude of the two arms can be used as a basis to distinguish the two kinds of motion patterns.

iii.2 Results

penalty kernel function generation accuracy time
factor C parameter
cross validation 0.25 0.0625 93.85 4.38
genetic algorithm
15.839 0.155 1 93.85 6.83
10.831 0.080 2 96.15 12.29
10.831 0.080 3 96.15 17.62
10.831 0.080 5 96.15 28.17
Table 3: The parameters obtained through the cross validation (CV) and the quantum genetic algorithm (QGA), respectively.

We processed 13 sets of holding-the-hand action and 16 protest-the-music action, and corresponding obtained 130 groups of holding-the-hand samples and 160 groups of protest-the-music samples. Select 70 groups and 90 groups respectively from these two kinds of samples as training sets, and the remaining 60 groups and 70 groups as test sets. The penalty factor and kernel function parameter of model are determined by grid search and quantum genetic algorithm, respectively. In this paper, we won’t use any dimensional reduction algorithm. It mainly bases on two reasons: Firstly, this is for better expansibility. It can extend from the upper limb action classification to the whole body action classification. Secondly, in our classification algorithm, the time and the computing complexity is acceptable for -dimensional data. Thus, we take the sample directly into the SVM for training. Following, The holding-the-hand action will be labelled as , the protest-the-music action will be labelled as .

During the calculation process, we set the population size to 80 and the quantum bit length to 60 for QGA. The search range of penalty factor is set to , meanwhile the range of the kernel function parameter is set to . We also provide the classification accuracy results for different generations of quantum genetic algorithm. The results are shown in Table 3. It can be seen that quantum genetic algorithm almost converges after two generations. For this reason, we won’t consider catastrophe situation here, and set parameter . Due to the fast convergence speed, we can see that the time complexity difference between grid research and QGA is not very large. Otherwise, It’s noted that the quantum genetic algorithm increased the classification accuracy by nearly at the expense of less time. As we know, quantum algorithm has the parallel characteristic, it can search much more larger parameter space with the same time. So the more optimized and can be found with the help of quantum method improvement. For the grid research approach, a very large amount of computation will be needed to achieve such an accuracy. To make the results more intuitive, we refine the results by confusing matrices in Table 5(a) and Table 5(b).

Throw Raise both arms Accuracy
Throw 60 0
Raise both arms 5 65
(a) QGA
Throw Raise both arms Accuracy
Throw 60 0
Raise both arms 8 62
(b) CV
Table 4: Confusion Matrix

The confusion matrix is shown in Table 4. The solution space of grid search is limited and the result is farther from the optimal solution. The quantum genetic algorithm takes the characteristics of quantum parallelism, extends the solution space at the cost of a little higher time complexity and brings better results. On the other hand, we can use the angles of limbs attached to the joints to represent and identify the human behavior pattern and the correct rate of this method can achieve an accuracy of above . On some conditions, this can be thought as a successful classification result.

Iv Conclusions

With the help of the parallel characteristics of quantum algorithm, we succeeded in improving the accuracy of SVM classification at cost of a little time complexity. This paper can be considered as a good example of the combination of QGA and classification algorithm. The quantum-inspired algorithm can also be used in combination with other algorithms. Next, we will work on more complex actions and search new features to further improve the accuracy of SVM classification.

This paper presents a new method of representing and classifying human actions by using the quantum generic algorithm to optimize the parameter of the SVM. We extracted the joints’ angles from the skeleton joints’ positions to represent the human stick figure in Kinect. By this way, the dimensionality was reduced by . By reducing the dimensionality of samples and increasing the efficiency of computation, we achieved a higher classification accuracy in comparison with the conventional pattern recognition method.

V Acknowledgement

This research has been funded by the National Key Research and Development Plan under the grant 2017YFC0804401; the Natural Science Funds of Jiangsu Province of China under Grant BK20140216; and the National Key Research and Development Plan under the grant 2017YFC0804401.


  • Chernbumroong et al. (2014) S. Chernbumroong, S. Cang,  and H. Yu, decision support systems 66, 61 (2014).
  • Abowd et al. (1998) D. Abowd, A. K. Dey, R. Orr,  and J. Brotherton, Virtual Reality 3, 200 (1998).
  • Aggarwal and Ryoo (2011) J. K. Aggarwal and M. S. Ryoo, ACM Computing Surveys (CSUR) 43, 16 (2011).
  • Woznowski et al. (2016) P. Woznowski, D. Kaleshi, G. Oikonomou,  and I. Craddock, Computer Communications 89, 34 (2016).
  • Presti and La Cascia (2016) L. L. Presti and M. La Cascia, Pattern Recognition 53, 130 (2016).
  • Seddik et al. (2017)

    B. Seddik, S. Gazzah,  and N. E. B. Amara, Iet Computer Vision 

    11, 530 (2017).
  • Gaglio et al. (2015) S. Gaglio, G. L. Re,  and M. Morana, IEEE Transactions on Human-Machine Systems 45, 586 (2015).
  • Cortes and Vapnik (1995) C. Cortes and V. Vapnik,  20, 273 (1995).
  • Mosca (2008) M. Mosca, Physics 43, 309 (2008).
  • Nielson and Chuang (2011) M. A. Nielson and I. L. Chuang, Quantum Computation and Quantum Information: 10th Anniversary Edition (Cambridge University Press, 2011) pp. 1–59.
  • Jones (2013) N. Jones, Nature 498, 286 (2013).
  • Lenstra (2000) A. K. Lenstra, Designs Codes Cryptography 19, 101 (2000).
  • Zhang and Jiang (2017) X. Zhang and D. Jiang, Shock and Vibration,2017,(2017-02-15) 2017, 1 (2017).
  • Wei et al. (2016) F. Wei, L. Min, W. Gang, J. Xu, B. Ren,  and G. Wang, in International Conference on Ubiquitous Robots and Ambient Intelligence (2016) pp. 997–1002.
  • Chen et al. (2016) P. Chen, L. Yuan, Y. He,  and S. Luo, Neurocomputing 211, 202 (2016).
  • Zhou and An (2010) J. G. Zhou and Y. Y. An, in International Conference on Advanced Computer Theory and Engineering (2010) pp. V4–553 – V4–556.
  • Xie et al. (2015) F. D. Xie, W. M. Yang, D. H. Qiu,  and Y. Li, Advanced Materials Research 1065-1069, 199 (2015).
  • Fothergill et al. (2012) S. Fothergill, H. Mentis, P. Kohli,  and S. Nowozin, in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (ACM, 2012) pp. 1737–1746.