Understanding how animals, including humans, move is a grand challenge for modern science that has direct impact on our health and wellbeing. It is a useful instrument with which to explain the biological world, and to treat human and animal disease. It most directly impacts the treatment of musculoskeletal injuries Arnold92 and neurological disorders Cirak11, can improve prosthetic limb design Herr12, and aids in the construction of legged robots Alphadog12.
One of the main features of locomotion is the gait (relative timing of leg recirculation: e.g., walk, run, trot, or gallop). How gait is chosen and the regulation of gait can provide detailed information about the condition of a subject Clarke99. Although significant insight into the neuromechanical basis of movement has been gained Orlovsky99, there are many questions to be asked in this area; such as: how does gait control reflect the morphology and dynamics of the fast moving body? How is sensory feedback used during fast legged locomotion?
New genetic tools such as optogenetics and chemogenetics are making possible unprecedented manipulations of the nervous system in intact, freely behaving mice and rats. These include temporally fast manipulations, and therefore high frame rate kinematic data from these animals are increasingly important courtine2008recovery.
Segmentation of body parts, including ear, nose, tail, and skin can provide valuable information to study biomechanics or the progress of diseases affecting motor controls or nervous systems. These points can be used, for example, to estimate the global position and orientation of the body, as well as the posture, of rodentsmigliaccio11; baker05.
Rodents, especially mice and rats, are premier models of human disease and increasingly the model system of choice for basic neuroscience. High frame rates ( 150 Hz) are needed to quantify the kinematics of running rodents, due to their high stride frequency (up to 10 Hz). Achieving an adequate number of strides to capture inter-stride variability may require 3-seconds or more of video; at least 450 frames need to be captured. This number increases rapidly with frame rate, which may be increased to capture sudden movements or reaction to impulsive perturbations, or with duration, which may be increased to yield large data sets for more sophisticated analyses of locomotor dynamics Revzen12. Larger datasets are increasingly yielding insight Wiltschko15, but cause difficulties in requiring bandwidth and space to store this data, algorithms to automatically track the desired animal body regions, and the required processing power and time to analyze them. The usual method to track the markers is manual clicking, simple thresholding, cross correlation, or template matching hedrick2008software, which can be prohibitively time consuming for high frame rates and multiple views. Thresholding has been a popular method for segmentation and tracking of insect noldus2002computerised; although, it cannot be used for tracking of an object showing variation in the intensity level amongst the frames.
Tracking of tip of paw is useful for many studies in biology, biomechanics, and robotics wenger2016spatiotemporal. To automatically or semi-automatically track rodents paws, that can provide the required information for gait analysis, several methods have been proposed, including commercially available systems (Digigait Dorman14; Gadalla14, Motorater, Noldus Catwalk Huehnchen13; Hamers04; Parvathy13). These systems can be prohibitively expensive, and may only provide information about paws during the stance phase. In both research and commercial systems, tracking rodents has frequently relied on shaving fur and then drawing markers on the skin for subsequent tracking from raw video Maghsoudi15, or on the attachment of retroreflective markers, and the use of optical motion capture systems. These methods have the drawback of requiring anesthesia and multiple handlings applications of markers, and the problem of animal removing the attached markers.
In addition to tracking of paws, tracking of joints, like nose, tail, ear, and all other parts of rodent body (referred to as skin here), provides information about kinematics of the running animal on a treadmill including pitch, roll, and yaw. However, tracking the whole body has been presented using thresholding and active contours Spence13, simple kmeans and particle filters gonccalves07, and auto-adjustable observation model enhanced particle filter results pistori10; no method has been proposed to that specifically aims to track paws, nose, ear, tail, or skin (that with future work could be correlated to joint locations and potentially angles) from side view cameras.
There is a large amount of literature on automatic superpixel algorithms, for example, normalized cuts Ren03, mean shift algorithm Comaniciu02, graph-based method Felzenszwalb04, Turbopixels Levinshtein09, SLIC superpixels Achanta12, and optimization-based superpixels Veksler10.
However, superpixel methods have not been used for animal tracking and segmentation of different landmarks in the bodies of animal, there have been some studies for hand segmentation, tracking, and gesture recognition shu2013improving; serra2013hand; li2013model; baraldi2014gesture; li2013pixel.
A superpixel based bag-of-words (BoW) approach was used to segment people walking in a parking lot shu2013improving. Conditional random field (CRF) along with BoW model were used to differentiate the object from background. The proposed method by Smith et al. generated an output of the exact object regions instead of the bounding boxes generated by the previous methods.
Methods to detect the global behavioral state using thresholding noldus2001ethovision have been used widely for behavioral experiments; but these methods cannot provide the required information for biomechanincs and related neuroscience applications as tracking of specific parts of body is needed. The main contribution of this study is to investigate the efficiency of a superpixel based method to segment these parts of body. We study the performance of simple linear iterative clustering (SLIC), graph based (Gb) Felzenszwalb04, and quick shift (QS)vedaldi2008quick superpixels methods on RGB, hue channel from the HSV color space Maghsoudi16_2, and the gray scale images. To determine the separability of our segmented regions, we extracted 28 features and applied t-SNE. In addition, we propose a tracking system to show the abilities of the discussed methods for biomechanics applications.
Ii The Treadmill System
We analyzed data from female C57BL/6 mice and Sprague-dawley rats, because they are the most widely used strains in basic research and biomedicine. Animals were housed under a 12-12 hours light-dark cycle in a temperature-controlled environment with food and water available ad libitum. Animal procedures were approved by the Temple University Institutional Animal Care and Use Committee.
Ximea USB3 (Serial number: MQ022CG-CM) cameras were used to capture frames using a 250 Hz external synchronization signal. The trigger signal was generated and synchronized with a host PC using the triggerbox tools generously made available by the Straw Laboratory Straw11; StrawGit. Briefly, the trigger pulses were generated by an Arduino Uno, running the triggerbox firmware. The Arduino was controlled via serial over USB by a standard desktop PC. The camera resolution was set to pixels at 8 bits depth, using a Bayer filter pattern to recover color.
ii.3 Treadmill and Tracking System
We used a closed-loop treadmill system described in Spence13 to control and adjust the speed of treadmill while the mouse was running. The feedback loop helped us to keep animal in a specific place on treadmill (for example in middle) or control the speed of treadmill at specific speed while the animal was running in that specific region. We captured 1000 frames for each trial (providing four seconds of running).
Five cameras (one at top and four side-views) were used to capture the locomotion of the animal on the belt. An additional camera located at top of belt tracked the animal to provide the real-time feed for the visual servo-ing of the treadmill belt. Here, we analysed the frames captured just by one of the side views, the front left view of the mouse Maghsoudi16.
We applied a control law to the treadmill belt speed that sought to keep the mouse at the mid-point of the belt. We further used the real-time feed of mouse position on the belt to apply a mechanical perturbation (a sudden vertical displacement of the belt surface, caused by an actuated camera under the belt) and captured two seconds before and after the perturbation applied.
The superpixel algorithm contracts and groups uniform pixels in an image. It has been widely used in many computer vision applications such as image segmentation and object recognitionMori04; Li12. The superpixel concept was originally presented by Ren and Malik Ren03 as defining the perceptually uniform regions using the normalized cuts algorithm. The main merit of superpixel is to provide a more natural and perceptually meaningful representation of the input image. Therefore, compared to the traditional pixel representation of the image, the superpixel representation greatly reduces the number of image primitives and improves the representative efficiency. Furthermore, it is more convenient and effective to compute the region based visual features by superpixel, which has been shown to provide important benefits for vision tasks such as object recognition Mori04 or hand gesture recognition shu2013improving; serra2013hand; li2013model; baraldi2014gesture; li2013pixel.
Here, we use SLIC superpixels segmentation on different color images. SLIC is a form of kmeans clustering for superpixels generation having two main advantages: the number of distance calculations is decreased by superpixel size and a weighted distance measure combines color and spatial relation which updates the size and compactness of superpixels.
The key parameter for SLIC is size of superpixels. First, centers are defined as cluster centers. Then, to avoid having centers that are on the edge of an object, the center is transferred to the lowest gradient position in a neighborhood. The next step is clustering, as each of the pixels are associated with the nearest cluster center based on color information. It means that two coordinate components ( and ) depict the location of the segment and three components (for example in the RGB color space, , , and ) are derived from color channels. SLIC finds and minimizes a distance (an Euclidean norm on 5D spaces) function defined as follow:
Where and are respectively maximum distances within a cluster used to normalize the color and spatial proximity. Then, SLIC merges the pixels based on the calculated number to create superpixels. It should be said that SLIC is also constrained to ensure that the region does not grow more than twice the cluster radius; therefore, SLIC size plays an important role on how the segmentation is performed.
Iv Color Spaces
The frames captured by cameras were in bayered raw images. We first converted them to the RGB color space using a debayering process Maghsoudi16. The color information in the RGB color space is shared between all three channels of red, green, and blue.
The rodents’ body carries different color information compared to the belt and the background in the frames. Therefore, we intend to use different color spaces to find the best color space for using superpixel segmentation. We tried to use A and B channels from the LAB color space showing Chroma information in image Phung05; but, the intensity values were almost too close and did not provide enough distinctive information for SLIC segmentation. We were, however, able to use this segmentation approach on RGB, HSV, and gray scale images. Sample images and segmentation can be seen in Figures 1 and 2 for mice and rats respectively.
Here, we investigate the SLIC superpixel segmentation method Achanta12 for different parts of body. We categorize the segmentation to three sections: paw segmentation; ear, nose, and tail segmentation; and skin (considered as body segmentation class in this paper) segmentation. After applying SLIC segmentation, a merging function was used to connect the neighbor superpixels. First, the center of superpixels were found by the following equation:
where X, Y, i, and C respectively show the horizontal coordinate, vertical coordinate, superpixel number after segmentation (between 1 and N which N is superpixel size), and center of a superpixel. Then, we calculated the average of image intensity from the channel (hue or gray scale) or channels (average of three channels of R, G, and B) by the following equation:
where I shows the average intensity for superpixels number i. Each superpixels was connected to the neighboring superpixels which had the closest average intensity expect the difference of this average was more than ten percent of image intensity or five percent should not be more than ten percent of the maximum intensity of image and ten percent of the median intensities in the connected region. This is achieved by finding the all superpixels () having borders with a superpixel () as follow:
where A is the set of superpixels considered as the neighbors of superpixels number j. This leads to find the intensity difference using:
where D shows the difference intensity, and M is five percent of maximum intensity value (180 for hue channel and 255 for red, green, blue, and gray scale). Finally, the superpixels were connected to each other by the following equation:
L and G respectively represent the indexes of all linked superpixels and the grouped superpixels. The segmentation algorithm is simplified and illustrated in Figure 3.
Regarding the importance of paws for biomechanics studies and the size of body compared with other landmarks, we report the results based on the importance for three manually classified merged regions: paw; skin (also referred as body); and ear, nose, and tail.
v.1 Paw Segmentation
The location of a foot is frequently one the most interesting regions of body for biology, biomechanics, and robotics; in our images, it can consist of 100 to 3500 pixels depending on the front and hind limbs, the camera positioning, stride cycle, and the mouse movement direction on treadmill. The shape has lots of changes especially on swing phase of stride cycle. Having variable shape, size, and position makes the paw segmentation difficult. There are, however two features that can be used to segment the paws: first, features derived from color and gray scale images, and second, texture features which are unique for paws. Here, we use superpixels for segmentation that mainly relies on the first feature. The segmentation using SLIC is shown in Figure 1 for mice and in Figure 2 for rats.
v.2 Ear, Nose, and Tail Segmentation
Ear, nose, and tail (considered as tail segmentation class) are three parts of body that carry different color information than the skin. Despite lots of shape, size, and position variations for paws, the ear, nose, and base of the tail however are most closely coupled to movements of the center of the body/center of mass. Although, the tail moves with more variation (especially in terms of position), the base of the tail can be considered moving with the center of body, especially at high speeds.
v.3 Skin Segmentation for 3D Modeling
Subtracting the paws, nose, ear, and tail leaves the body in the frames. The idea behind superpixels is to create meaningful ”superpixels” that are collections of pixels with similar color information. The segmentation of skin as some meaningful pixels (superpixels) is an important step towards creating a 3D model of a mouse body using four views Maghsoudi15.
Two sets of features were extracted from the superpixels: texture and color features. The color features were the average of intensity for each of the superpixels and from four color channels, gray scale, green, saturation, and hue. This provided four features. The texture features were extracted by cropping the superpixel regions and calculating the co-occurrence matrix albregtsen2008statistical
on four different angles (0, 45, 90, and 145 degree). Then, following six features were extracted for each of the angles: contrast, dissimilarity, homogeneity, angular second moment, energy, and correlationalbregtsen2008statistical.
After segmentation and merging of the superpixels using one of the alternate methods, SLIC, Gb, and QS, we use a tracker algorithm that is based on position, speed, size, and color information of the tracked region in the previous frame. A user was asked to click on the correct landmark on the first frame. We subsequently focused on an pixel region of interest (ROI) given the user initialization in the first frame, because frame-to-frame landmark movement was always within this ROI, and considering only this ROI drastically reduces computation time. The size of image was selected based on the maximum displacement of center of body in rats (30 pixels). Then, we designed a function, referred to as the ”tracker function”, to assign a weight to each of objects remaining after segmentation. This function found the closest object to the previous tracked marker position, average of hue, size, and following the same speed and direction of movement. The object with the maximum value of this function was chosen as the tracked object in the current frame.
The tracker function can be simplified as follow:
where W, T, and G are respectively the weighted function chosen based experiments, the tracked marker for the current frame, and average of gray scale image. k and f are respectively the superpixel number and the frame number.
To evaluate the segmentation, we used the frames captured from five mice and five rats. Two trials from each animal were selected just from the front right camera. Each trial created 1000 frames, but to test the method for different animals and reduce the manual burden of segmentation, we randomly selected 25 frames from each trial. Therefore, 250 frames from five mice and 250 frames from five rats were established as the database for this study.
The SLIC superpixels method was applied on three image types (RGB, hue channel, and gray scale) and at three different superpixels sizes: 500, 1500, and 4500. These numbers were selected based on the size of paw in the frames which can vary between 100 to 3500 pixels. The image size is which creates 1,433,600 pixels. SLIC method can generate superpixels that are twice or half initially specified size. This means that by specifying a superpixel size of 4500, we can have between 150 to 600 pixels in each of the superpixels (). Figure 1 and Figure 2 illustrate respectively how the SLIC is applied on a mouse sample frame and a rat sample frame.
Then, the process described in Figure 3 was applied on the segmented regions to connect them to each other and create paws, nose, ear, tail, and skin. Figure 4 and Figure 5 show the segmented area using this method. To quantify the segmentation method, we needed to compare with a ground truth segmentation. The ground truth segmentation was done by manual supervision using a designed graphical interface in Matlab. This was then compared to the segmented regions using SLIC segmentation and our merging function. To do this comparison, we used the following measures:
TP is the number of pixels were segmented by the method and they are matching with the ground truth segmented region. FP is the number of pixels were segmented by the method and they are not matching with the ground truth segmented region. TN is the number of pixels were not segmented by the method and they should not be part of segmentation. FN is the number of pixels were not segmented by the method and they should be part of segmentation. The results of SLIC superpixel method following by the merging function are illustrated in Figures 6 and 7. Figure 8 shows the temporal segmentation accuracy for 50 consecutive frames for SLIC method with 1500 superpixels.
QS vedaldi2008quick and Gb Felzenszwalb04 methods were selected to compare the SLIC with the common superpixel methods. The methods were examined using python platform on a MacBook pro 2.7 GHz Intel Core i5 with 8 GB 1867 MHz DDR3. Figure 9 shows the results for sensitivity and the average speed of these three methods to segment the superpixels in a frame.
In addition, we used t-SNE to visualize a 2D representation of the extracted features mentioned in section VI. The results are illustrated in Fig 10. Finding the best features can help to design better trackers and this leads the goal needed in biomechanics and neuroscience studies. The t-SNE shows the automatic classification of the three groups can be easier for mice compared to rats, especially in differentiation between the body and the other regions.
We presented a simple tracker in section VII to show how the segmented regions can be used to design a tracker. We have found that this tracker can be used to track any of the objects but not paws. We evaluated the performance of this tracker on 5 trials from mice each having 1000 frames to track the lowest part of the ear. Out of 5000 frames, there was just 43 consecutive mistakes which happened when mouse was turning the head in one of the trials.
We presented a method for segmentation of different parts of rodents body running on treadmill. We categorized the body parts to three classes: paw; ear, nose, and tail; and skin. the SLIC superpixels method was used for the segmentation and it was applied on three different color images (RGB, hue, and gray scale) from three different color spaces (RGB, HSV, and gray scale) with three SLIC sizes (500, 1500, and 4500). After segmentation, we calculated the average of intensity for each of the segments in the three images, and then, we connected superpixel segments to each other if they were neighbor and they had less than ten percent difference in average intensity. This process is illustrated in Figure 3.
Among the three color spaces selected, RGB showed the best accuracy of segmentation, although hue had almost the same results. This was more distinctive especially for lower SLIC sizes, as can be seen in Figures 1 and 2. Therefore, the best image format for using SLIC in our context is RGB.
As mentioned above, we used the function, illustrated in Figure 3, to connect the segments to each other. This function gave us the possibility to join the segments with a similar range of average intensity values. We divided animals parts to three classes just to differentiate between these parts. Using this function, we created larger segments, and finally, the segments consisting three classes were automatically selected. The results are shown in Figures 4 and 5.
Having these larger segments allowed us to compare the segmentation sensitivity, specificity, accuracy, and precision compared to the manually outlined for each frame. The results are illustrated in Figure 6. The results indicated that the sensitivity and precision of segmentation increased by having a larger number of superpixels. This trend was seen for the specificity and accuracy; however, they had smaller changes comparing to the other two measures because of the number of pixels indicating TN was larger compared to the other three variables (TP, FP, and FN), especially for mice. The changes for specificity and accuracy were more significant for rats because of the animal size, as seen in Figure 7.
As shown in Figure 6, The best image to segment body in mice was the RGB image while the best image for the segmentation of paw and tail was the hue channel. This pattern was not seen for rats. The best image was always the hue channel from the HSV color space, based on the reported results in Figure 7. In addition, to segment the body of rats, the gray scale image showed the higher measures compared to the RGB image; demonstrating that the fact that the white body of rats was easier to distinguish from the background.
In conclusion, the SLIC supper pixel gave reliable results for the segmentation of landmarks in rodents body running on the treadmill. RGB and HSV color spaces achieved almost similar segmented regions, although RGB was slightly better in the term of segmentation, especially for lower SLIC size numbers. This means that when we had bigger superpixels, creating more meaningful superpixels, the RGB images showed higher measures as can be seen in Figures 6 and 7. This was opposite when it came to using color channels information for classifying the segmented region using the average intensity. Hue carried more information by itself compared to the average of R, G, and the gray scale. It gives us the idea to use RGB for segmentation and use hue channel information for classification in future works.
The results of tail segmentation (Figures 6 and 7) showed a zig zag behavior in the ROC plots (especially the frames in the RGB color space) captured from both rats and mice (more significance changes for mice). This might be because the tail was small and narrow for some parts and differentiation of these small parts from background was harder using the average of RGB channels or gray scale intensity. In addition, the lateral part of tail showed a different color information compared with other parts (as shown Figure 1).
As mentioned, there have been methods proposed to segment the animal using simple thresholding, cross correlation, or template matching hedrick2008software; noldus2002computerised; noldus2001ethovision. These methods can provide information for behavioral experiments while tracking of specific landmarks on body is needed for biomechanics. The proposed method using SLIC provides remarkably fast and accurate segmentation leading to a promising tracking system as an example presented here.
On the other hand, superpixel based methods have been used frequently for detection of human hand and the gestures. The t-SNE was used to evaluate the importance of features for superpixels for hand detection li2013pixel. This was inspired us to extract features and evaluate how much they can provide information to distinguish the regions from each other and background. The results are illustrated in Figure 10.
Last but not least, although SLIC was equally good for segmentation and much faster than the other algorithms. However, Gb can be used to segment the ear, paws, and body in a merged form by itself as seen in Figure 11. However, it takes more time to have the regions segmented and accuracy is lower comparing to SLIC.
For future directions, we will extract more texture, color, and kinematics features, and then, classify and track these regions using NN maghsoudi2014informative, SVM Maghsoudi16, or neuro fuzzy logic. Achieving this goal will help us to track each of the objects in the video, that subsequently will lead to an accurate 3D reconstruction of these objects. 3D data on animal movement will likely provide a wealth of information for not just biomechanics but also neuroscience and broader biological investigations. Finally, we will try to predic and edit the gait transitions wilshin2017morphology.
This material is based upon work supported by, or in part by, the U. S. Army Research Laboratory and the U. S. Army Research Office under contract/grant number W911NF1410141, proposal 64929EG, to A. Spence.
Omid Haji Maghsoudi is a Ph.D. student majoring in the Department of Bioengineering at Temple University. He has been a research assistant in the Spence lab for three years. He got his MS degree in medical radiation engineering from Shahid Beheshti University and his BS degree in biomedical engineering (with electrical engineering minor) from Isfahan University. His research interests include image and signal processing, computer vision, neuroscience, biomechanics, and medical imaging devices. His current research is focused on developing a software to track landmarks in the body of running rodents and make 3D model of those markers. Author and coauthor of more than 13 papers.
Annie Vahedipour received her BS and MS degrees in Mechanical Engineering from North Carolina State University and Southern Methodist University, respectively. She is currently a Ph.D. student at Temple University and works as a research assistant in the Spence lab. Author and coauthor of two papers. Her work focuses on developing and applying new technologies for the application of external and internal manipulation of the nervous system in so called “neuromechanical” perturbations.
Benjamin D. Robertson is a postdoctoral researcher in the Spence lab in the Temple University Department of Bioengineering. He completed his BS degree in Applied Physics at Emory University, and his PhD in the Joint Department of Biomedical Engineering and UNC-Chapel Hill and NC State University. Author and coauthor of more than 11 papers. His current research is focused on the role of sensory systems in modulating recovery from Spinal Cord Injury.
Andrew J. Spence is an Associate Professor in the Department of Bioengineering at Temple University. He got his PhD from Cornell University. His research is focused on understanding the control and biomechanics of movement, through an integrative and multidisciplinary approach that combines biology, engineering, mathematics, and molecular genetic tools. Applications of this work are found in spinal cord injury, rehabilitation, neuromuscular disease, and prosthetics, as well as in bio-inspired robotics. Author and coauthor of more than 75 papers.