A Gun Detection Dataset and Searching for Embedded Device Solutions

05/03/2021
by   Delong Qi, et al.
0

Gun violence is a severe problem in the world, particularly in the United States. Computer vision methods have been studied to detect guns in surveillance video cameras or smart IP cameras and to send a real-time alert to safety personals. However, due to no public datasets, it is hard to benchmark how well such methods work in real applications. In this paper we publish a dataset with 51K annotated gun images for gun detection and other 51K cropped gun chip images for gun classification we collect from a few different sources. To our knowledge, this is the largest dataset for the study of gun detection. This dataset can be downloaded at www.linksprite.com/gun-detection-datasets. We also study to search for solutions for gun detection in embedded edge device (camera) and a gun/non-gun classification on a cloud server. This edge/cloud framework makes possible the deployment of gun detection in the real world.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

05/04/2019

WoodScape: A multi-task, multi-camera fisheye dataset for autonomous driving

Fisheye cameras are commonly employed for obtaining a large field of vie...
11/29/2020

Reconfigurable Cyber-Physical System for Critical Infrastructure Protection in Smart Cities via Smart Video-Surveillance

Automated surveillance is essential for the protection of Critical Infra...
06/10/2020

WasteNet: Waste Classification at the Edge for Smart Bins

Smart Bins have become popular in smart cities and campuses around the w...
01/11/2021

Colorectal Polyp Detection in Real-world Scenario: Design and Experiment Study

Colorectal polyps are abnormal tissues growing on the intima of the colo...
12/16/2021

FIgLib SmokeyNet: Dataset and Deep Learning Model for Real-Time Wildland Fire Smoke Detection

The size and frequency of wildland fires in the western United States ha...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Gun violence has been a big problem around the world, especially in those where it’s legal to carry a gun, like the United States. Every year, many innocent people are killed or harmed by gun violence in public areas [8].

There are video surveillance systems deployed in many public places, but they still require the supervision and intervention of humans. As a result, these system cannot detect gun-related crimes fast enough to prevent the crimes. In more recent years, automatic gun detection systems have been studied and deployment in real applications [7, 6, 20, 21, 14, 11, 12]. Most of them still use surveillance video cameras, and the video streams are sent to a edge device or server to decode the video, detect guns in the video frame, and send alert (email, text message, or phone call) to personnel on duty. Others use smart IP camera as a edge device, do gun detection on the camera, send the cropped gun image to server, and activate the server to send alert. The former can reuse existing surveillance video cameras, but the video streaming uses a lot of WiFi or Ethernet bandwidth, and the video processing and gun detection use expensive edge devices or servers. The latter needs to deploy new smart IP cameras, and the cameras accomplish all the video processing and gun detection, only very little data is sent to server for further processing. As a result, it is a lot less expensive than the former one.

There are a few challenges in gun detection system. First of all, it has to achieve a very high true positive (correct detection) rate (TPR), and very low false positive (false alarm) rate (FPR). In our study, we find that one major problem is how to control the false alarms, i.e., the FPR.

The second challenge is there are no large public gun detection datasets. All researchers collected their own dataset and evaluate the performance of their algorithm on these private dataset. To our knowledge, the amount of data in their dataset is typically a few thousands [20, 14], which is way too few. As a result, there is no way researchers can compare the performance of their algorithms to others and it is impossible to tell which algorithm is better. Our first contribution is that we collect a dataset with 51K gun images. For the problem of false alarm we mentioned before, we also collect a non-gun image dataset for the study of gun/non-gun classification.

Our second contribution is to study to search for gun detection solution for low-cost embedded edge device (e.g.,smart IP camera). Our goal is not to propose a new detection algorithm, but to find and modify existing algorithms for use in practical applications in embedded devices with limited hardware resources. We also evaluate gun/non-gun classification algorithms on a cloud server, where the restriction of low complexity can be lifted. The goal is to achieve the accuracy as best as possible.

Ii Related Work

Among the works on gun detection [7, 6, 20, 21, 14, 11, 12], most of them focused on applying existing CNN networks on object detection of guns. They used very complex CNN networks without considering the applications in practical systems. In addition, they used proprietary dataset which were usually very small.

We summarize the CNN networks and datasets used in their work in Table I. Since our goal is to find a solution for embedded device, we add comments if these networks are applicable in practical systems. From the summary, we observe that people really like Faster-RCNN [23], which is way more complex and slow for use in practical system. The one exception is [7], which uses a transitional image processing and a 3-layer MLP. However, it used a very small dataset of only 3559 frame images. From this summary we see the need of a large dataset and a detection network for practical real-time applications.

Ref Method dataset comments
[7] MLP 3559 edge + 3-layer MLP
[6] fasterRCNN - Very complex and slow
[20, 21] fasterRCNN 8996 Very complex and slow
[14] M2Det 5000 Very complex and slow
[11] FasterRCNN - Very complex and slow
[12] VGG16 etc. 3500 Very complex and slow
TABLE I: CNN networks and datasets used in existing work

Iii Gun Detection Dataset

Iii-a Image Sources

The sources where we collect our gun image dataset include:

  • We first collect a lot of gun images from the IMFDB website [10] a movie internet firearms database. Then we use a CNN-based gun detector to roughly label the data. Finally we manually check and relabel the inaccurate labels.

  • We collect some images from publicly available websites of some of the papers [20, 14, 26]. Note that many of these images overlap with the first source and we have to manually clean them. The images from the video footages in [26] are used in [14]. We do not use most of them because their resolution is very low and different from real cameras.

  • We deploy a few cameras in a private office and capture some gun detection data in a real scene. The data we collect this way is about a couple of thousands.

  • Many negative samples for gun detection and classification are from the failed cases in our study.

For the concern of FPR, we also collect a large non-gun image dataset for detection and classification. Since they are non-guns, so no annotation is needed. Non-gun images are rather abundant and easy to collect. We collect 94K non-gun images from the popular ImageNet

[4]

and COCO

[16] datasets.

In Fig.1, we show 40 sample gun detection images where annotated gun bounding boxes are drew with red rectangles. In Fig.2

, we show 40 sample gun classifier images, which are cropped from the ground truth locations of the some gun detection images.

Fig. 1: Sample gun detection images
Fig. 2: Sample gun classification images

Iii-B Dataset Folder Structure

The dataset folder structure is shown in Figure 3. In the top dataset folder, there are two subfolders - ”detector” and ”classifier”, where all images for the gun detection and classification are saved. Any folder named ”gun” is a folder where gun images are saved, and any folder named ”other” is a folder where non-gun images are saved. In the ”detector” folder, we split images to training set and test set. Users are free to re-split the training and test sets. Please note that this folder structure is the one we have when we decide to publish the dataset. The split of training and test dataset is not necessarily the one we use in our performance benchmark.

Fig. 3: Dataset folder structure

The number of images are summarized in Table II. Please note that many of the negative samples of both detector and classification are from failed cases in our study. In the gun images, majority are handgun, while a good number of them are rifles, machine guns and other types.

Model Gun/Non-Gun(Other) Number
Detector Gun 51,889
Detector Other 154798
Classifier Gun 51,398
Classifier Other 371,078
TABLE II: Number of images

Iii-C Annotation File Format

In the gun folders for detection, there are two sub-folders: JpegImages, and Annotations. All gun images in JPG format are saved in the JpegImages folder, and their corresponding annotation files are saved in the folder Annotations. The file names, one for JPG, and another for the annotation, are one to one mapped in the two folders.

The annotation file is in XML format. In the annotation file, the filename, dimension of the whole image, and the upper-left and lower-right coordinates of guns are given. An example of such a file is shown in Fig.4.

Fig. 4: XML annotation format
Fig. 5: Gun detection image statistics

Iii-D Statistics of Gun Images

We check the statistics of the gun images in the detector training set. We look at the whole image size (width x height) and the gun bounding box size (width x height). These area indicates the resolution of the whole image and the gun object. We also look at the number of gun objects in every image. Shown in Figure 5 are the statistics.

From the distribution, we notice that 1) there are two peaks in the whole image size, which correspond to the widely used image sizes in surveillance video cameras and IP cameras. There are also a very few very large images that are not shown in the distribution; 2) The sizes of the gun objects are pretty much uniform distributed, except for the very small sizes. This imposes an challenge in gun detection because there exist a lot of very small gun objects in images; 3) There are mostly one gun object in every image. Some images have two gun objects, and very few images have up to 10 gun objects in an image.

In the gun classification training dataset, there is always one gun object in every image. These images are all resized to 112x112.

Iv Searching for Gun Detector Solution

In this section we present a framework with an edge and a cloud, where the edge device is typically a smart camera, while the cloud is the remote server. The camera does all the video decoding, gun detection, cropping the gun image and sending to server. On the server, the received small gun image is further processed in a more powerful classifier. If a gun is confirmed, an alert is sent to a safety personnel.

Iv-a Gun Detection Network

There are a lot of great progresses in object detection including the SSD [19], YOLO family [22], Faster-RCNN family [23], the feature pyramid network [15], MMDetection [3], EfficientDet [27], transformer(DETR) [2], Centernet [5, 28], and so on. Most of them have outstanding performance but run very slowly. For real-time object detection, the YOLO3, SSD, and Centernet are a few good choices. After extensive experiments, we find that the Centernet is our best choice offering a good trade-off between performance and speed.

The original Centernet used a few backbone networks, including the Hourglass-104, DLA-34, ResNet-101, and ResNet-18 [5, 28]. We evaluate these choices on a HiSilicon-Hi3516A embedded platform, and find that their FPSs are all too slow for real time applications. Then we go further to test different light-weight backbone networks, including the MobileNet [25] and VovNet [13]. The results are shown in Table III. We see that these backbones can mostly meet the real-time or near real-time requirement.

Backbone AP(%) FPS
MobileNetv2 32.2 12
VoVNetv2-19-slim 33.2 25
VoVNetv2-19 35.1 10
VoVNetv2-39-slim 36.1 20
VoVNetv2-39 40.0 8
TABLE III: Different backbone network in the Centernet. The AP(0.5:0.05:0.95) performance is on the COCO-2017 val set

Next we use the VoVNetv2-39 [13]

as example and study different data augmentation techniques, loss function, learning rate scheme etc. to improve its detection performance. In these experiments, we use VOC2007+VOC2012 training sets as the training dataset, and use VOC2007 test set as the test dataset. Listed in Table

IV are the experiment results. The best techniques are used in our gun detection performance evaluation.

Technique mAP(%) at IOU=0.5
basline 74.1
+multi-scale 75.1
+Giou loss [24] 75.8
+Cosine LR 76.1
+Mosaic [1] 76.9
+FPN [15],PAN [18],RFB [17] 78.1
TABLE IV: Techniques to improve the Centernet performance on VOC dataset. Training dataset = VOC2007+VOC2012 training sets, test set = VOC2007 test set.

Finally we test the performance of the VoVNetv2-19-slim on our gun detection dataset. For comparison a couple of other networks are also evaluated. The results are presented in Table V. The IOU threshold used is 0.3. We choose this threshold because we still have the gun/non-gun classifier on the cloud server. Using a small IOU threshold will improve the recall rate of gun detection, while the classifier on the server filters out non-gun objects.

Backbone Acc % Rec % Pre %
VoVNet-v2-19_slim_light 81.06 90.19 85.05
VoVNet19_slim 84.28 90.33 88.60
Resnet18 84.28 89.88 88.86
TABLE V: The performance of the Centernet gun detection at IOU threshold = 0.3. Acc=accuracy, Rec=Recall, Pre=precision

Iv-B Gun/Non-Gun Classifier

The gun/non-gun classifier runs one the cloud server, so it is not restricted to use complex networks. Instead, we use a large classification network to ensure the best accuracy to filter out false positives detection.

We choose to use Resnet [9], with different depths and compression on the number of channels. We train the classification network with 51K gun images, and 94K negative sample images. The test set includes 998 positive samples and 980 negative samples. The final test results are shown in Table VI. We notice that even with Resnet50, the accuracy is only 97.83%. This is because there are some very hard cases where non-gun objects are classified as gun. So this problem is still to be solved. Shown in Fig. 6 is the ROC curve of the classifier.

Please note that, in practice we can use the detection and classification results in a few consecutive frames to improve the recall and precision.

Model Acc(%) Rec(%) Pre(%)
Resnet18 96.97 98.55 95.39
ResNet34 97.57 99.27 95.89
Resnet50 97.83 99.69 95.99
TABLE VI: The performance of gun classifier at threshold = 0.5. Acc=accuracy, Rec=Recall, Pre=precision
Fig. 6: ROC of gun/non-gun classifiers

V Conclusion

We collect and publish a gun detection and gun classification dataset. To our knowledge this is the largest gun detection and classification dataset available for research and development purpose. We evaluate the gun detection performance on an edge device, and the gun/non-gun classification on a cloud server. We hope the publication of this dataset will foster more studies on gun detection and its applications in practical systems.

Another future work is instead of gun detection on static images, gun shooting can be detected as an action, which perhaps makes more sense for application in surveillance systems. One such work is the anomaly detection

[26], where gun shooting is one of the anomaly activities.

References

  • [1] . Note: https://github.com/ultralytics/yolov5 Cited by: TABLE IV.
  • [2] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and Z. Zagoruyko (2020) End-to-end object detection with transformers. ECCV. Cited by: §IV-A.
  • [3] K. e. al. Chen (2020) MMDetection: open mmlab detection toolbox and benchmark. ECCV. Cited by: §IV-A.
  • [4] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and F. Li (2009) ImageNet: a large-scale hierarchical image database. CVPR. Cited by: §III-A.
  • [5] K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, and Q. Tian (2019) CenterNet: keypoint triplets for object detection. ICCV. Cited by: §IV-A, §IV-A.
  • [6] V. G. and D. A. (2017)

    A handheld gun detection using faster r-cnn deep learning

    .
    Proceedings of the 7th International Conference on Computer and Communication Technology. Cited by: §I, TABLE I, §II.
  • [7] M. Grega, A. Matiolanski, P. Guzik, and M. Leszczuk (2015) Automated detection of firearms and knives in a cctv image. Sensors 16. Cited by: §I, TABLE I, §II, §II.
  • [8] Gun violence in the united states. Note: https://en.wikipedia.org/wiki/Gun_violence_in_the_Uni ted_States Cited by: §I.
  • [9] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. CVPR. Cited by: §IV-B.
  • [10] Internet movie firearms database. Note: https://http://www.imfdb.org/ Cited by: 1st item.
  • [11] G. J., Z. C., Á. J., M. L., and C. F. (2020) Real-time gun detection in cctv: an open problem. Neural Networks 132, pp. 297–308. Cited by: §I, TABLE I, §II.
  • [12] J. Lai and S. Maples Developing a real-time gun detection classifier. http://cs231n.stanford.edu/reports/2017/pdfs/716.pdf. Cited by: §I, TABLE I, §II.
  • [13] Y. Lee and J. Park (2020) CenterMask: real-time anchor-free instance segmentation. CVPR. Cited by: §IV-A, §IV-A.
  • [14] J. Lim, M. Jobayer, V. Baskaran, J. Lim, K. Wong, and J. See (2019) Gun detection in surveillance videos using deep neural networks. Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. Cited by: §I, §I, TABLE I, §II, 2nd item.
  • [15] T. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie (2017) Feature pyramid networks for object detection. CVPR. Cited by: §IV-A, TABLE IV.
  • [16] T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. Zitnick (2014) Microsoft coco: common objects in context. ECCV. Cited by: §III-A.
  • [17] S. Liu, D. Huang, and Y. Wang (2018) Receptive field block net for accurate and fast object detection. ECCV. Cited by: TABLE IV.
  • [18] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia (2018) Path aggregation network for instance segmentation. CVPR. Cited by: TABLE IV.
  • [19] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, and A. Berg (2016) YOLOv3: an incremental improvement. ECCV. Cited by: §IV-A.
  • [20] R. Olmos, S. Tabik, and F. Herrera (2018) Automatic handgun detection alarm in videos using deep learning. Neurocomputing 275, pp. 66–72. Cited by: §I, §I, TABLE I, §II, 2nd item.
  • [21] R. Olmos, S. Tabik, A. Lamas, F. Pérez-Hernández, and F. Herrera (2019) A binocular image fusion approach for minimizing false positives in handgun detection with deep learning information fusion. Information Fusion 49, pp. 271–280. Cited by: §I, TABLE I, §II.
  • [22] J. Redmon and A. Farhadi (2015) YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767. Cited by: §IV-A.
  • [23] S. Ren, K. He, R. Girshick, and J. Sun (2016) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence. Cited by: §II, §IV-A.
  • [24] H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese (2019) Generalized intersection over union: a metric and a loss for bounding box regression. Cited by: TABLE IV.
  • [25] M. Sandler, A. Howard, w. Zhu, A. Zhmoginov, and L. Chen (2018) MobileNetV2: inverted residuals and linear bottlenecks. CVPR. Cited by: §IV-A.
  • [26] W. Sultani, C. Chen, and M. Shah (2018) Real-world anomaly detection in surveillance videos. arXiv preprint arXiv:1801.04264. Cited by: 2nd item, §V.
  • [27] M. Tan, R. Pang, and Q. Le (2020) EfficientDet: scalable and efficient oa gun detection dataset and searching forembedded device solutionsdelong qishenzhen deepcam information technologiesshenzhen, chinadelong.qi@deepcam.comweijun tanlinksprite technologieslongmont, co, usaweijun.tan@linkspritecomzhifu liu, qi yao, jingfeng liushenzhen deepcam information technologiesshenzhen, chinazhifu.liu,qi.yao,jingfeng.liu@deepcam.combject detection. CVPR. Cited by: §IV-A.
  • [28] X. Zhou, D. Wang, and K. P. (2019) Objects as points. In arXiv preprint arXiv:1904.07850, Cited by: §IV-A, §IV-A.