Understanding animal behavior is central to answering the fundamental question of why do animals (including humans) do what they do. Recently, biologist started to use wearable technologies, such as GPS, accelerometers, and radio sensors, to track animals and their activities. However, the collected raw data are not human-interpretable and needs to be processed to extract behavioral patterns. The raw data are usually represented as time series that contain timestamped observations of the sensor readings. Biologists also collect behavioral annotations that describe the behavior of an individual (walking, grooming) or a group of individuals (grazing, coordinated movement) during a predefined temporal interval, as well as the context of that behavior (such as the habitat, weather, etc.). In the wild, where the environment cannot be well instrumented, biologists are unable to continuously observe behaviors of wild animals but collect observations over small intervals that are typically insufficient to describe the complex behavioral dynamics of wild animals. Machine learning can help biologists in the process of inferring these behaviors from the sensor data.
The rise of human wearable technology has lead to the development of new solution to the problem of behavior inference from various sensors. Activity recognition models can be used to learn the relations between the raw time series and the behavioral annotations collected through observations or other modalities. Then, the obtained models can automatically classify the intervals of the collected data for which behaviors were not observed.
In this work we propose a new framework for inferring group behavior of wild animals from sensor time series.
2. Related Work
The field of activity recognition is directly related to the field of time series classification. There are two main directions to solve this classification problem: based on temporal sequence analysis and using deep learning.
Temporal analysis methods are based on an explicit description of the raw signals (banos2014window, ; lara2012survey, ). The temporal stream can be represented as segments (changpinyo2018multi, ) or as a whole (xing2010brief, ). The former requires to define the length of the segments while the latter automatically handles the interdependencies between each subsequent observation. In the sequence analysis methods, Conditional Random Fields (CRF) are considered the gold standard (lafferty2001conditional, ), including the specific case of wild animals activity recognition (li2016adversarial, ).
Deep learning methods do not require the explicit description of the raw signals. These methods automatically infer the feature set using the hidden layers. For time series classification, deep learning methods that exploit the power of Long Short-Term Memory (LSTM) components are considered the state-of-the-art(ordonez2016deep, ).
3. Problem Definition
Time series classification problems are characterized by two main components: the selection of the temporal resolution and the selection of the classification model.
To define these two problems, we first present the basic notation for time series classification in the case of multiple entities.
Let denote the set of time series of entities. Let be a time series of timestamps: . Let denote the set of labels for each entity: , with denoting the set of labels , where is an entity and is the timestamps for which a label is provided.
Definition 3.1 (Temporal segmentation function).
A temporal segmentation function is a mapping between each time series to a set of time windows , where each contains one or more timestamps of .
Given and a set of time series , the classification task is the process to find a model such that . The result is a complete label set for each .
4. The Framework
We propose a framework based on sequence analysis, which is composed of two main steps. First we select the best global temporal resolution value. Then, we propose a new way to encode the social relations among a group of entities, which in our case are wild animals.
Finding the right temporal resolution is critical for time series classification. We base our approach on an optimization task over each time series in . Given a set of possible global temporal lengths, we infer, for each value, a set of consecutive time windows. We validate each inferred set based on one or more metric, which can be interdependently defined for each different classification task. Given the combined scores of each metric, we select the temporal resolution value that maximizes the total score, thus optimally segmenting the time series.
We directly encode the social intrinsic features of group behavior representing it as social graphs (network). Given the optimally segmented time series, we infer a network over the set of entities for each segment. Then, we extract relevant topological and relational features for a general purpose classification model. The network definition is domain-dependent and is assumed to be given (by domain experts) for each dataset (DBLP:journals/corr/BrugereKB17, ).
Dataset. We use a publicly available dataset of group activities of baboons (Crofoot-1358, ), which has been previously used for the activity recognition task (li2016adversarial, ). The dataset contains 26 individuals tracked for 35 days. The labels set contains 8 activities and are available for 2 days only, at a 1 minute (60 sec) resolution. The temporal resolution selection is bounded from below by the label resolution. For this dataset, the social network was defined based on pairwise proximity. An edge between two baboons exists if they are within 2 meters to each others. This network definition has been proven to be meaningful in the biological research (strandburg2015shared, ). For each inferred network we extract topological and relational features, such as the degree of a node, the page rank score and the average of the features of the neighbors.
Experimental Setup. We validate our approach using 10-fold cross validation and we report results for accuracy and the weighted F1 score (sokolova2006beyond, ), ensuring same size training and testing instances. As a simple baseline we consider a majority classifier. For state-of-the-art baselines we consider a CRF, the Adversarial Sequence Tagging (AST), and a deep learning approach DeepConvLSTM
. For our implementation we use XGBoost(chen2016xgboost, ) as the classifier.
Figure 1 shows the results of the temporal resolution step. The best selected value is 60 seconds (minimum possible) over an interval starting from 60 to 180 seconds. The results shown do not use network information. Table 1 shows the results of comparing our complete approach, including the encoding of the social relations as network information, with the selected baselines. The metrics show that our framework achieves better performance than the baselines. The lift for the accuracy is about and for the F1 score is . Adding social information provides a lift of for accuracy and for the F1 over the initial results shown in Figure 1.
We presented preliminary results for a new framework for inferring group behavior of wild animals. Our approach is based on finding the best possible temporal resolution for classifying the behaviors and an explicit encoding of the social structure of a group of individuals as network information. Our evaluation on a real world dataset shows that the proposed framework better identifies the complex behavioral dynamics of groups of wild animals.
We are currently working on extending the temporal resolution step to a more dynamic approach allowing varying temporal steps, which will allow to better identify the critical components of each different behaviors. We are also planning to include other datasets.
-  O. Banos, J.-M. Galvez, M. Damas, H. Pomares, and I. Rojas. Window size impact in human activity recognition. Sensors, 14(4):6474–6499, 2014.
-  I. Brugere, C. Kanich, and T. Y. Berger-Wolf. A general framework for task-oriented network inference. CoRR, abs/1705.00645, 2017.
-  S. Changpinyo, H. Hu, and F. Sha. Multi-task learning for sequence tagging: An empirical study. arXiv preprint arXiv:1808.04151, 2018.
-  T. Chen and C. Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794. ACM, 2016.
-  M. C. Crofoot, R. Kays, and M. Wikelski. Data from: Shared decision-making drives collective movement in wild baboons, 2015.
-  J. Lafferty, A. McCallum, and F. C. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. 2001.
-  O. D. Lara and M. A. Labrador. A survey on human activity recognition using wearable sensors. IEEE communications surveys & tutorials, 15(3):1192–1209, 2012.
-  J. Li, K. Asif, H. Wang, B. D. Ziebart, and T. Y. Berger-Wolf. Adversarial sequence tagging. In IJCAI, pages 1690–1696, 2016.
F. Ordóñez and D. Roggen.
Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition.Sensors, 16(1):115, 2016.
M. Sokolova, N. Japkowicz, and S. Szpakowicz.
Beyond accuracy, f-score and roc: a family of discriminant measures for performance evaluation.In
Australasian joint conference on artificial intelligence, pages 1015–1021. Springer, 2006.
-  A. Strandburg-Peshkin, D. R. Farine, I. D. Couzin, and M. C. Crofoot. Shared decision-making drives collective movement in wild baboons. Science, 348(6241):1358–1361, 2015.
-  Z. Xing, J. Pei, and E. Keogh. A brief survey on sequence classification. ACM Sigkdd Explorations Newsletter, 12(1):40–48, 2010.