3D Based Landmark Tracker Using Superpixels Based Segmentation for Neuroscience and Biomechanics Studies

11/23/2017 ∙ by Omid Haji Maghsoudi, et al. ∙ Temple University 0

Examining locomotion has improved our basic understanding of motor control and aided in treating motor impairment. Mice and rats are premier models of human disease and increasingly the model systems of choice for basic neuroscience. High frame rates (250 Hz) are needed to quantify the kinematics of these running rodents. Manual tracking, especially for multiple markers, becomes time-consuming and impossible for large sample sizes. Therefore, the need for automatic segmentation of these markers has grown in recent years. Here, we address this need by presenting a method to segment the markers using the SLIC superpixel method. The 2D coordinates on the image plane are projected to a 3D domain using direct linear transform (DLT) and a 3D Kalman filter has been used to predict the position of markers based on the speed and position of markers from the previous frames. Finally, a probabilistic function is used to find the best match among superpixels. The method is evaluated for different difficulties for tracking of the markers and it achieves 95 of markers.



There are no comments yet.


page 3

page 4

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Studying animal, including humans, locomotion has been one of the challenging areas in the modern science. Our health and well-being are directly linked with movement. Animal movement can explain some biological world phenomena. In addition, it can impact the treatment of musculoskeletal injuries and neurological disorders, improve prosthetic limb design, and aid in the construction of legged robots [maghsoudi2015novel].

The intentional changes in an animal gait, the timing of paw motion relative to each other using by the animal [deumens2007catwalk], can be seen during movement. The animal movement can be perturbed using an internal or external perturbation. A mechanical perturbation (e.g., earthquake) while the animal is running, for example, deflecting the surface during running, or an electrical stimulation applied to the nervous system, or even the application of new genetically targeted techniques, like optogenetics [deisseroth2011optogenetics] or designer receptors exclusively activated by designer drugs [roth2016dreadds], are several of the increasingly sophisticated methods applying perturbations that dissect the movement control.

To study the gait and kinematics, it is needed to track specific landmarks on the body of an animal. Tracking of these landmarks on the body of animal relies on shaving fur, drawing markers on the skin, attaching retroreflective markers, or manual clicking by a user for consecutive frames[maghsoudi2016rodent]. The attachment of retroreflective markers can be impossible in many cases as animals, like rats and mice, start grooming and chewing the markers. Therefore, shaving fur and drawing markers can be the most reliable method for tracking of specific landmarks [schubert2015automatic].

Commercially available systems (Digigait [dorman2014comparison, gadalla2014gait, nori2015long], Motorater [preisig2016high], Noldus Catwalk [deumens2007catwalk, hamers2001automated, huehnchen2013assessment, parvathy2013gait]) are prohibitively expensive, and may only provide information about paws during the stance phase which makes them limited for some studies. In addition, some computerized methods (simple thresholding, cross-correlation, or template matching) have been proposed to answer this need. However, manual clicking can be considered the usual method to track the markers [hedrick2008software]. Therefore, the need for a robust method to help neuroscientists and biologists have been felt.

Tracking has been a recently popular topic in image processing. Many methods have been developed for different applications; cell migration tracking [penjweini2017investigating], human tracking [ma2016counting], and diseases frames tracking in consecutive frames [haji2012automatic, mahdi2017detection]. Tracking methods should be developed based on conditions governing around a specific problem which make them unique [Maghsoudi17IET].

As discussed, locomotion analysis needs to track some landmarks on the body of an animal. The 2D tracking from the frames can provide the required knowledge to examine the animal’s gait. However, access to 3D information can improve our understanding about locomotion including roll, pitch, and yaw [migliaccio2011characterization].

We presented a superpixel based segmentation method to find the markers following by a weighted 2D tracker [maghsoudi2017superpixels]. The results were so promising and it inspired us to use the SLIC method for segmentation. The tracker was using 2D information from an image plane to find the position of landmarks for consecutive frames. However, the latest issue caused problems for tracking of landmarks when they were occluding by the body or another limb, getting too close to another landmark, or having some dirt on the plexy glass.

Here we take the advantage of superpixels for segmentation of landmarks, with a small difference compared to [maghsoudi2017superpixels]. However, the main contribution of this study is a robust tracker design to resolve the limitations of our previously presented method. We find the solution in using of 3D information and processing two camera information at the same time. 3D Kalman filter and direct linear transform were used to achieve this goal.

Figure 1: A sample video frame of rat locomotion with five markers drawn on the right side of an animal. A shows the original frame with the drawn red rectangle showing a region of a frame which is zoomed in for a better visualization. B, c, d, and e illustrate the zoomed in area from image (a) with 1250, 2500, 5000, and 10000 superpixels.

2 Methods

2.1 Camera and Treadmill Setup

Four side view cameras were used to capture video from a treadmill located in the middle of capturing area. The cameras were synchronized using an external pulse generator inducing a 250 HZ pulse to assure they were captured at the same time. In addition, the frames were labeled by a UTC time provided the pulse generator to not miss a single frame. The capture time for each trial was 4 seconds providing 1000 frames. The frames were Bayer encoded and we use a debayering function to convert them to RGB color space frames [maghsoudi2016rodent].

We converted the frames from the RGB color space to the HSV color space because it places all color information in a single channel, as compared to the RGB or the LAB colors spaces [hajimaghsoudi2012automatic, Maghsoudi16_2].

2.2 Superpixel Segmentation

Superpixels contract and group uniform pixels in an image and have been widely used in many computer vision applications such as image segmentation

[Li12, Mori04]. the outcome is more natural and perceptually meaningful representation of the input image compared to pixels. Different approaches have been developed to generate superpixels: normalized cuts [Ren03], mean shift algorithm [Comaniciu02], graph-based method [Felzenszwalb04], Turbopixels [Levinshtein09], SLIC superpixels [Achanta12], and optimization-based superpixels [Veksler10]. Simple linear iterative clustering (SLIC) [Achanta12] generates superpixels relatively faster than other methods.

SLIC speed performance depends on a number of superpixels and the size of an image. Considering the size of image constant, the number of superpixels plays as the key parameter. Having superpixels divides the image to initial squares and associate the center of each square as the cluster center. This center should not be on an edge of an object; therefore, the center is transferred to the lowest gradient position in a neighborhood. Based on color information of each pixel with its nearest cluster centers, the pixel would be associated with a cluster center. It means that two coordinate components ( and ) depict the location of the segment and three components (for example in the RGB color space, , , and ) are derived from color channels. SLIC calculates a distance (an Euclidean norm on 5D spaces) function, which is defined as follow, and try to match the pixels based on this function.


where and are respectively maximum distances within a cluster used to normalized the color and spatial proximity. SLIC calculates this function for the cluster centers located in twice width of the initial square to minimize the calculation process.

The results of superpixel segmentation with four different SLIC superpixel numbers on a sample frame captured from a rat by five markers supposed to be drawn on the body is illustrated in Figure 1. Although, six markers were drawn because of a human error in the drawing. This frame was intentionally selected to show that a high number of superpixels can help us to segment the small markers and fix the human error for drawing the markers. The human error like this would not affect the segmentation and tracking; however, it can affect the tracking if they, the mistake and real marker, would be too close to each other (less than 10 pixels) which makes it impossible to differentiate them in some cases. Fortunately, a mistake like this is rare.

2.3 Direct Linear Transform

Direct linear transform (DLT) has been proposed to calibrate cameras for generating 3D reconstruction from the captured frames [hatze1988high, pvribyl2017absolute]. It has been used to create a 3D model of objects in different applications, especially in biology and biomechanics worlds [choo2003improved, hedrick2008software, hedrick2012morphological, theriault2014protocol, song2014three].

Figure 2 shows how an object can be projected to the camera image plane. with is the an object in 3D space. with and with are the camera projection point which they project the object to with coordinate in the image plane of camera 1 (the space) and with coordinate in the image plane of camera 2 (the space). DLT gives the following relation between the object coordinate and projected object on the image plane from camera 1:


To find to , it is needed to calibrate the cameras. Camera calibration should be done using a calibration object having some specific markers with known coordinates. We used a custom-made Lego with attached balls on top. This Lego can be seen in Figure 3.

2.4 3D Kalman Filter

Kalman filter for motion analysis uses some observed measurements over time and estimates variables related to the motion. Kalman filter have been used frequently to predict the position of objects in different fields, human tracking

[ligorio2015novel], mice tracking [Spence13], or cardiovascular disease detection [bersvendsen2016automated]. The Kalman filter model assumes that the state of a system for a frame n evolved from the prior state at frame n-1 as follow [kalman1960new]:


where , , and are the position of object, external force causing changes in position, and frame number. , , , and are four coefficients for each frame. We considered that there is no external and acceleration causing changes in our system to simplify the system. Considering having three dimensions, we got the following equations:


where , , are the coordinates in the object space seen in Figure 2. Therefore, our system had three states and three measurements to update the coefficients.

2.5 Features

Seven features were extracted from each superpixel to find an object having the best match with the previous detected landmarks. These seven features can be formalized as follow:


where , , and are the superpixel number (for all superpixels in a sub-image), the marker number (five markers), and the frame number (1000 frames in our studies for each trial). , , and are the superpixel number for the frame , the detected landmark number for frame number , and the predicted position of lanmark number for frame number . to are the seven features corresponding to . In addition, , , , , and are saturation channel from the HSV color space [hajimaghsoudi2012automatic], hue channel from the HSV color space [Maghsoudi16_2], gray scale intensity, horizontal coordinate in the image plane (Figure 2), and vertical coordinate in the image plane (Figure 2), respectively.

Figure 2: This figure shows how two camera can be used to extarct the DLT coefficients from those two cameras image plane. is the object located in a 3D object space with , , and coordinate system. and are the projection point in camera 1 and 2, respectively. and are the projected points of object in image plane 1 and 2, respectively.

2.6 Initial Tracker

The initial tracker was a simple step but necessary to initialize the marker position for two consecutive frames. Two frames from each camera were needed to update the Kalman filter coefficients as speed was needed to be calculated. The frames captured from every two Cameras located on the same side were processed at the same time to have a DLT based 3D reconstruction model.

The Initial tracker generated superpixels, by an initial value for the number of superpixels, asking a user to zoom in for a better resolution and click on the five landmarks for the first frame from each camera. Then, the maximum and minimum of all five landmarks coordinates on an image plane were calculated. This maximum and minimum numbers in each direction ( and ) were added by 100 to make sure that not missing the marker for the next frame. A smaller sub-image was extracted to reduce the required time for performing SLIC superpixels method. The features described in section 2.5 would be extracted and created the initial values needed in equation 8. Finally, the user was asked to click the markers for the second time. The rest of frames were processed using the method described in section 2.7.

Figure 3: The calibration objects used for extracting the DLT coefficients. The calibration object had 25 balls located at different heights and locations with known coordinates relative to one of the balls, the one located on a corner with the lowest height.
Figure 4: The calibration objects used for extracting the DLT coefficients. The calibration object had 25 balls located at different heights and locations with known coordinates relative to one of the balls, the one located on a corner with the lowest height.

2.7 General Tracker

We subsequently focused on a pixel region of interest (ROI) given by the 2D projection of the 3D coordinate predicted using 3D Kalman filter described in section 2.4. This point has 2D coordinates of [, ] in the image plane for the marker number m and frame number n which it was used in equation 8 for calculation of . This zoomed in the region is referred as sub-image. The size of the image was selected based on the maximum displacement of the center of the body in rats (50 pixels); the same number can be applied to mice too.

By applying the SLIC method on each of these sub-images, the features described in section 2.5

were extracted for each of superpixels. One of the options for having these features was using a classifier like support vector or neural network as we presented a method for mice paw tracking using thresholding segmentation and both classifiers

[Maghsoudi17IET]. However, it should be reminded that the markers might have different intensities and even colors which makes the usage of a classifier limited as a tracker.

Therefore, we developed a probabilistic function, inspired by the one we presented in [Maghsoudi17IET], to help us for tracking. First, to normalize the features and have the probabilistic function, we subtracted the from the data and divided the data by the range ( - ) where , , are the feature number, maximum function, and minimum function, respectively. Therefore, the normalized feature () can be written as follow:


The normalized features are weighted based on the importance of features using the following array:


Then, each of should be multiplied by the corresponding for . then the sum of this product would be calculated and the superpixel having the maximum number would be considered as the marker for the that frame. This can be formalized as follow:


where is the sum of weighted features of the marker calculated for the frame . finds the index of which is equal by . The process for tracking is illustrated in Figure 4.

3 Results

The method was examined using Python 2.7.12 platform with installed OpenCV 3.1.0-dev on a MacBook pro 2.7 GHz Intel Core i5 with 8 GB 1867 MHz DDR3.

Method Manual Tracking Thre + 2D Tracking SLIC + 2D Tracking SLIC + 3D Tracking
Database Frames 1,000 4,000 4,000 24,000
Database Markers 5,000 20,000 20,000 120,000
Average Time per Trial

Bad Marker Frames
- 800 800 13,500
Total Bad Markers - 800 800 13,500
Correct Tracked - 23 127 11,582
Percentage 100 2.88 15.87 85.79

Missing Start
100 400 400 2,200
Total Markers 500 2,000 2,000 11,000
Correct Tracked 500 35 59 1319
Percentage 100 1.75 2.95 11.99

Partitially Occluded
30 250 250 4,500
Total Markers 150 1,250 1,250 22,500
Correct Tracked 150 10 217 21,212
Percentage 100 0.8 17.36 94.28

50 300 300 5,100
Total Markers 250 1,500 1,500 25,500
Correct Tracked 250 10 217 22,788
Percentage 100 0.67 14.47 89.36

Perfect Consecutive
450 1,300 1,300 8,100
Total Markers 2,250 6,500 6,500 40,500
Correct Tracked 2,250 6,377 6,494 40,496
Percentage 100 98.11 99.90 99.99

Total Frames
900 3,600 3,600 21,800
Total Markers 4,500 18,000 18,000 109,000
Correct Tracked 4,500 10,891 14,237 103,562
Percentage 100 60.50 79.09 95.01
Table 1: Tracking results. The ”SLIC + 3D Tracking” is the method presented here which is compared with three methods; ”SLIC + 2D Tracking” [maghsoudi2017superpixels], ”Thre + 2D Tracking” [maghsoudi2017superpixels], and ”Manual Tracking”. ”SLIC” and ”Thre” are superpixel method presented in [Achanta12] and the thresholding on the hue channel. There are eight small tables showing different conditions for evaluation of the method. From top to bottom the small tables show: the number of frame and markers for the whole database used here; average time was required for each method to process one trial (1000 frames); the results for tracking of the markers drawn bad or unclear, the tracking results when the markers were unclear or hidden from the beginning; the tracking results while a marker was partially occluded; the results when the markers were completely occluded; the results when a perfect consequence of frames are next to each other; the comprehensive results considering all mistakes and conditions excluding the ”Missing Start”.

Applying SLIC superpixel method on the frame or the generated sub-images, to reduce the required time for the superpixel process [Achanta12, maghsoudi2017superpixels], was the segmentation process as illustrated in Figure 4. As shown in Figure 1, the number of superpixels plays an important role in how the SLIC method would be performed. We had a comprehensive discussion on how we can select the correct superpixel number based on the size of markers [maghsoudi2017superpixels]. If the size of marker would be a known parameter, then, the following equation can find the best superpixel number (NSLIC):


where is the number of pixels for that marker. 2048, 700, and 100 are the image width, height, and sub-image window size, respectively. Equation 13 can provide an ideal number of superpixels for SLIC; however, we needed an estimation of the size as the SLIC can segment the objects with half up to twice of initial size. Therefore, we considered 10,000, 10,000, 7,000, 3,000, and 3,000 as a number of superpixels of a frame, , for toe, ankle, knee, hip, and anterior superior iliac spine markers, respectively.

The segmentation process using the SLIC superpixel method was examined in [maghsoudi2017superpixels].

As discussed in the introduction, manual tracking can be considered as the common method to track the markers for many applications in biomechanics. To compare how the proposed method can be helpful for biomechanics/neuroscience applications; we compare this method with manual tracking, thresholding for segmentation and 2D tracking, SLIC method for segmentation and 2D tracking [maghsoudi2017superpixels], SLIC method for segmentation and 3D tracking.

Figure 5: A comparison between the methods. Manual tracking, thresholding following by 2D tracking, SLIC following by 2D tracking, and the method presented here are compared with each other. The Average time to process a trial, 1,000 frames, is graphed in red. The average time is on the left axis and the rest of plots are on the right axis.

The method was examined in six Sprague-Dawley rats. Each rat had five markers showing: toe, ankle, knee, hip, and anterior superior iliac spine. We randomly selected two trails from each rat and each trial contained 1,000 frames. It created 12,000 from each of the two cameras capturing the right side of the animal, the five markers were drawn on the right side. Therefore, total 24,000 frames producing 120,000 markers consist of this study database.

We evaluated the method for different conditions: bad marker frames, the marker was painted poorly causing difficulties in finding them; missing start, a trial starts with a set of markers barely being visualized but still user initialized the marker location by guessing the position; partially occluded, the markers were partially occluded by body or dirt on the plexy glass; occluded, the markers were completely occluded; perfect consecutive, a consequence of frames that the markers were clear for whole time; total frames, the total results were reported.

The results for all these conditions are separately illustrated in Table 1 and Figure 5. In addition, Figure 6 shows a 3D reconstructed video from a rat while running on the treadmill.

It should be noted that we did not add the required time for finding the DLT coefficients in the results presented here.

4 Conclusion

We presented a method to segment the drawn markers on the body of rats using SLIC superpixel method following by a 3D Kalman based tracker to predict the position of markers in a 3D domain and projecting them to the 2D image plane. Having the coordinates on the 2D image plane and assigning a score to each of the superpixels based on the predicted coordinate, color, and texture information of marker in the previous frame provided us the ability to use a probabilistic function 4.

The method was evaluated 24 trails and 5 markers drawn on the body of an animal. We compared the method with available methods [maghsoudi2017superpixels] utilizing simple thresholding or superpixel method followed by 2D tracker based.

The results showed that the best method, as expected, was using manual tracking; however, it takes so much time to process one trail. It shows the importance of using an accurate method for marker tracking. In addition, the manual tracking involves intraobserver and interobserver tracking error which was not studied here.

The 3D tracking showed its superiority compared with 2D tracking methods in all conditions. However, the results show that if there would be a perfect consequence of frames, the superpixel method using 2D tracking can work the same as 3D based tracking method while it is slightly faster than 3D based tracking method. It should be reminded that the required time to calculate the DLT coefficients was not involved in the time plot in Table 1. However, it is hard to find perfect consequence of frames when capturing from the animal.

Figure 6: A video of tracked markers for 1,000 consecutive frames using the presented method here. (MP4 14.6MB). Left frames from top to bottom show a captured frame from camera 3, tracked markers for the corresponding frame from camera 3, a captured frame from camera 4, and tracked markers for the corresponding frame from camera 4. The right image shows the 3D reconstruction of markers using the DLT coefficients.

Our future study would be developing 3D based tracker for a markerless animal to avoid handling and painting of markers on the body of the animal. The painting a marker needs long anesthesia for mice following by bleaching and drawing markers. In addition, we will try to use a 3D model of markers to reduce the miss-tracking. Having a 3D model can provide a good setup to track the markers based on the other markers. It means that we can find the markers missing or wrongly labeled using the other markers.