Smart Inference for Multidigit Convolutional Neural Network based Barcode Decoding

04/14/2020 ∙ by Thao Do, et al. ∙ KAIST 수리과학과 0

Barcodes are ubiquitous and have been used in most of critical daily activities for decades. However, most of traditional decoders require well-founded barcode under a relatively standard condition. While wilder conditioned barcodes such as underexposed, occluded, blurry, wrinkled and rotated are commonly captured in reality, those traditional decoders show weakness of recognizing. Several works attempted to solve those challenging barcodes, but many limitations still exist. This work aims to solve the decoding problem using deep convolutional neural network with the possibility of running on portable devices. Firstly, we proposed a special modification of inference based on the feature of having checksum and test-time augmentation, named as Smart Inference (SI) in prediction phase of a trained model. SI considerably boosts accuracy and reduces the false prediction for trained models. Secondly, we have created a large practical evaluation dataset of real captured 1D barcode under various challenging conditions to test our methods vigorously, which is publicly available for other researchers. The experiments' results demonstrated the SI effectiveness with the highest accuracy of 95.85 which outperformed many existing decoders on the evaluation set. Finally, we successfully minimized the best model by knowledge distillation to a shallow model which is shown to have high accuracy (90.85 of 34.2 ms per image on a real edge device.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Linear 1D barcodes appeared in the 1970s and are now become ubiquitous on almost all consumer products and for logistics due to its ease of identification. Some newer tagging technologies emerged over the last decades allowing more information (e.g. RFID, NFC) stored. However, none of them has fully replaced its role in the industry because of its legacy and its economy. The low cost of printing barcode and the durability of the tag under minor damages make it remain an industry standard (standardized by GS1) for the coming decades.

One essential property of the tagging technology is that it must be read quickly, robustly and accurately using readers. For the case of barcode, the readers (or the scanners) are categorized into 3 types: laser-based, LED-based and camera-based. In the first 2 types, the laser/LED ray needs to be close to the barcode, no stripe obscured on the line of the ray and suffered the problem of emitter overheating. Camera-based readers have some advantages over the laser/LED based solutions. The first advantage is built on the fact that numerous smartphones with high-quality camera integrated are already in use. With Internet connection, useful mobile applications were born by online retrieval of product information and giving out ingredients information, alerting allergies, calorie intake, comparing prices between sellers; or for retailers, they learn eye-catching products, have consumer feedback and so on (e.g. in [6]

). Another advantage of camera-based solution is the possibility of multiple and long-range recognition by the support of computer vision algorithms.

However most of current techniques (static image processing and pattern matching) being used in camera-based readers have flaws that limit their usability. The main problem with them is the need for well framed flatbed-scanned-style input than normal captured. Wilder but common-captured conditions such as underexposed, occluded, blurry or curved, non-horizontal position (as in Figure

1) become unrecognizable. This requires the user correction which is unhandy and slows down the scanning process. There are 2 separated tasks in scanning barcode: detecting (i.e. locating) where the barcode region in the image and decoding detected region to barcode sequence. Recent works showed that the first task is nearly solved even in challenging conditions.

Fig. 1: Challenging conditions

On the other hand, the task of decoding those challenging barcodes still needs to improve since existing works still has many limitations. Traditional methods presented in [13, 18, 23]

just apply traditional techniques like Hough transformations, scanline-based approach with thresholds for binarization based on certain assumptions of barcode characteristics while they’re not always true. Many evaluated their tools on unpublished sets, some published their sets but small and not really enough challenging conditions. With those limitations and the successes of convolutional neural network (CNN) in many applications,

[7] was the first proposed work using CNN to decode these difficult codes. However, their work has some weak points making the performance much lower than CNN potential. Not only their CNN feature extractors are simple but also their input assumption is oversimplified. They only assumed the horizontal barcode as input; their test set is made from printed rectangle-shaped generated barcodes on plain papers while real-life barcode is printed in customized shapes (e.g. coca icon shape) on various materials with many kind of distortion and sometime covered by film. They didn’t also consider the possibility of running the task on edge device because of unoptimized models.

Therefore, in this study, we proposed an CNN-based method to solve decoding task with the following contributive points: (i) we proposed Smart Inference - 3 algorithms leveraging the feature of having checksum and test-time augmentation built on top of trained deep CNN models which considerably boost model accuracies and reduce false prediction; (ii) we made a challenging 2500-sample-cropped EAN13(UPC-A, ISBN13 are its subsets) barcode dataset from real captured images on various (included harsh) conditions on numerous products - this dataset is published for other researchers to evaluate their models and encourage more contribution on this task; (iii) lastly, we applied knowledge distillation technique with a target to have lightweight model from the best model which is suitable on handheld devices, the experimental result consequently confirmed the possibility by a good inference speed on a real edge board.

Ii Related work

Regarding to the task of barcode locating, there are some methods presented with improving performance over the past decade. In 2011, Lin et al [17] presented first multiple and rotation tolerated barcode recognition methods. This work focused more on detecting problem by using several image processing schemes such as Gaussian smoothing filtering to segment out barcode regions, enhanced the stripes, rotated the regions to horizontal angle and put into a decoder with voting. Although the method did well on lottery barcodes (printed on plain papers), it was still slow and didn’t get high accuracy on a dataset of merchandise products which was unclear about the challenging level. Katona et al [14] in 2012 proposed a method using morphological operations to also segment out 1D and 2D barcode under blurry, noise, shear and various rotated conditions with good performance. Soros et al [22] continued dealing with blur using structure matrix and saturation from HSV color system to detect blurry barcodes better but with expense of lowing speed in 2013. Recently, Creusot et al. [4] proposed a faster method for blurry barcodes based on Line Segment Detector after their previous work [3] using Maximal Stable Extremal Region shown sensitive to blur. In another way, Hansen [9]

first tried to apply an object detection deep learning model (YOLO) on both 1D and 2D codes with the best bounding box detection rate.

Fig. 2: Overall Architecture

While the task of barcode detection nearly reached its saturation, several works on decoding had been proposed sparsely since 1990s. Early works [13, 18] achieved their goal by techniques as Hough transformation, wavelet-based peak location on their simple (scanned-style) inputs. Wachenfeld et al [23] proposed a scanline-based approach accompanied by a EAN13 dataset (so called MuensterDB). However, since their method was based on scanline approach at that time, it just worked well on slightly rotated (±15 degrees) barcode, stronger rotated or distorted would be problematic. Similar to [14], Zamberletti et al [25]

also tackled the problem of out-of-focus (blurry) barcode by using multilayer perceptron model to find parameters of adaptive thresholding (instead of standard binarization) to restore blurry image to more clearer image and put it into Zxing

[19] to decode. Nonetheless, this approach is simple with low recall and time-consuming (2 steps). Recently, Yang et al [24] tried to address 2 tasks on 5 rigorous datasets. The work outperformed all other methods very well on EAN13 barcodes, but since the method was heavily based on scanline-base and hand-crafty featuring and analysis for each challenging condition, it’s less scalable solution to extend the all other 1D barcode types (even though all 1D codes use stripes, they might differ in guard bar layouts, in how black-and-white stripe mixing) and the case of double-obscured condition (Figure 1) would be inapplicable. Lastly, Fridborn [7] in 2017 first leveraged the power of CNN to directly extract features and predict to 13 outputs (correspond to 13 digits) simultaneously (similar to [8]

in Street View House Numbers problem). Compared with traditional and hand-crafty featuring methods, CNN-based approach is relatively more straightforward and data-driven rather than case-by-case analysis. One obvious example is the double-obscured condition which is problematic for scanline-based approaches but could be easily learnt and overcome by CNN classifier. Thus, our work is also CNN-based, however, differs from

[7]

by following points: (i) we use more advanced CNN feature extracting models; (ii) our input assumption is more practical as well as our training set and evaluation set covered more cases; (iii) we proposed Smart Inference exploiting the checksum attribute of barcode sequences to enhance model accuracies; (iv) we considered minimizing and verify the possibility of CNN-based approach on a real edge device.

Referring to test-time augmentation we used in this work for enhancing the inference accuracy a model, the technique is commonly used in deep learning as this survey [21] and can be found in AlexNet paper [15], ResNet paper [10]. This technique is one of data augmentations researchers usually use to deal with the problem of limited data. While train-time data augmentation gives more variants of the dataset with goal to let the model also learn all possible variants, test-time augmentation also applies some proper modifications to original samples, let the model give multiple predictions on those modified versions and pick the most suitable one among these predictions by voting or ensemble mechanism. How to augment data for better performance is also one trendy topic in deep learning now with such papers like AutoAugment [5], Smart Augmentation [16]. In our work, we integrated test-time augmentation into Smart Inference quite effectively.

On the topic of model compression and deep learning applicability on mobile, Cheng et al [2]

categorized methods into 4 types: Parameter pruning and sharing, low-rank factorization, compact convolutional filters and knowledge distillation. The first one reduces redundant parameters which are not sensitive to the performance while the second one uses matrix decomposition to estimate the informative parameters. The third one builds special filters to save parameters for only convolutional layers whereas the last technique basically trains a compact neural network with knowledge distilled from a large model which is so called teacher model. For simplicity, in this paper, we used the original knowledge distillation (KD) method proposed by Hinton et al

[11].

Iii Methodology

Our approach basically trains a probabilistic model of decoded barcode sequences given barcode images. Let represent the output sequence and represent the input barcode. Our goal is to learn a model of by maximizing on the training set. is modelled as a collection of random variables

representing 13 digits of the decoded sequence. We assume that the identities of the separate digits are independent from each other, so that the probability of a sequence

is given by . Each of the variables is discrete and has 10 possible values (0 to 9). This means each digit could be represented with a softmax classifier that receives as input features extracted from by a CNN. This type of model is based on [8] so we call it as Multidigit CNN. We use a simple deep CNN (no residual/skip connections), ResNet, MobileNetV2 [20] and DenseNet [12] in our experiments. During training phase, the loss is calculated by sum of cross entropy loss of each digits as usual. However, in test-time phase (inference phase), we proposed a modification named as Smart Inference. The overall model is as shown in Figure 2.

Input : Trained Multidigit CNN (MDCNN), Barcode Image (BI), Maximum Iteration (Max), Voting status (Voting)
Output : Barcode Digit Combination/s (BDC)

Compute logit[ ][ ] using MDCNN given BI;

for  to  do
       ;
       Descending sorting ;
       ;
       ;
       append } and to ;
      
end for
Ascending sorting with element ;
new combination with ;
append to ;
initialize to zero for each gap_list do
       increment by one;
       if  is greater than  then
             if voting  then
                   return ;
                  
             else
                   return null;
             end if
            
       end if
       gap.position;
       ;
       for each  do
             modify at position with ;
             append to ;
            
       end for
      for each  do
             Compute checksum test for ;
             if status  then
                   if voting  then
                         append to ;
                        
                   else
                         return ;
                   end if
                  
             end if
            
       end for
      
end for
return
Algorithm 1 Modified Prediction Algorithm (MPA)
Input : Trained Multidigit CNN (MDCNN), Barcode Image (BI), Maximum Iteration (Max)
Output : Barcode Digit Combination (BDC)
append degrees [90,180,270];
rotate image BI with ;
for each  do
       Prediction using Algorithm 1 given MDCNN, , and voting = False ;
       if  is not null  then
             return ;
            
       end if
      
end for
return null
Algorithm 2 MPA with Augmentation
Input : Trained Multidigit CNN (MDCNN), Barcode Image (BI), Maximum Iteration (Max)
Output : Barcode Digit Combination (BDC)
append degrees [90,180,270];
rotate image BI with ;
for each  do
       Prediction with Algorithm 1 given MDCNN, , and voting = True ;
       if  is not null  then
             append to ;
            
       end if
      
end for
if  is not empty  then
       group combinations which are similar;
       select combination with highest count;
       return ;
      
else
       return null;
      
end if
Algorithm 3 MPA with Augmentation and Voting

Iii-a Smart Inference

Normally after getting logits from the model given barcode images, we apply softmax function to get the probabilities of each value (0 to 9) for all 13 digits, then, we pick the value with highest probability to predict. By this way, we finally have 13-digit sequences from values having highest probabilities, however, the value with highest probability is not always the correct value. Sometime, the correct value is actually is the value with the second highest or third highest probability, in addition, since most 1D barcodes have a characteristic of checksum satisfaction as [1]. Basically, let denote is the barcode sequence, is the digit of the sequence from left to right, is the length of the barcode sequence (e.g. length of EAN13 is 13) (so first digit is ), the checksum attribute could be summarized as this equation:

(1)

Leverage this characteristic, our initial idea was to make more than one predicted sequence from not only the value with highest probability but also from value having probability (or ) highest for each digit of 13 digits; then, we verify those combinations by equation (1). Intuitively, if the gap between the value having highest probability and the value having (or ) highest probability is bigger, the model is more confident in prediction of the value having highest probability and vice versa. Therefore, it is priority to consider those digits having smallest gaps where the model is more confused and less certain in only value having highest probability. Let call is the number of values having highest probabilities and Maxim Iteration is the number of digits considered more than one value (as in Algorithm 1). In this work, because the bigger and Maxim Iteration are, the larger number of combination created causes inference downtime, we only picked = 2 (i.e. we only consider 2 values having 2 highest probabilities), and conduct experiments with Maxim Iteration from to (i.e. for unchosen digits, each is picked only the value having highest probability, and thus we have totally combinations). Lastly, we sort candidate combinations by order of larger to smaller probability and test the equation satisfaction (1) one by one, stop at the first satisfied combination for fast inference. This process is clearly described in Algorithm 1.

The Algorithm 1 is enhanced by applying test-time augmentation in 2 ways: fast-track as in Algorithm 2 and voting as Algorithm 3. For simplicity and fast inference which is important in this application, we only used 3 rotation operations as augmentation for each input image. Algorithm 2 iterates through original input and 3 its variants step-by-step calling Algorithm 1 and stops as soon as Algorithm 1 gets the first equation satisfied combination, otherwise, no decoded sequence is returned. On the other hand, Algorithm 3 collects all satisfied combinations from all iterations (original input & variants) and picks the most frequent combinations.

One thing we need to emphasize this idea compared to [24] is our proposed techniques are more scalable for other types of 1D barcode (EAN, UPC, ITF barcode family) with a few changes. This techniques could be applicable for multiple barcode types in one model, we just need to add a few more nodes, some to categorize barcode types, some to fill up the length of the longest barcode types (each digit now having 11 values, 0-9 and NA), the equation 1 still applicable to all other EAN, UPC codes.

Iii-B Minimize Deep Model

As an effort to minimize deep models to have more suitable model for edge devices, we use original knowledge distillation technique in [11]

to distill knowledge from the best (deep) model to small shallow models by replacing original loss function by combined loss:

Where is the cross-entropy loss from the hard labels,

is the Kullback–Leibler divergence loss from the teacher labels (soft label) and

is hyperparameter. A visual diagram of this process is shown in Figure

3. The detail KL loss is presented in [11] and has a hyperparameter as temperature.

Fig. 3: Knowledge distillation process

Iv Experiments

Iv-a Datasets

Our real collected set is comprised of 1055 samples from extended MuensterDB, 408 samples from [25] and 1037 our self-collected. Totally we have 2500 samples after we drew bounding box, labeled the decoded sequences ([23] and [25] hadn’t finish both tasks in their datasets). Our self-collected samples are captured from 5 supermarkets (1 in France, 2 in South Korea, 2 in Vietnam) both indoors and outdoors for 2 weeks. A wide range of products from food and edible product packages, books, kitchenware, office stationary, clothes tags on various material such as metal cans, wine bottles, food plastic bags, cardboard box under various light sources (florescent light, incandescent bulb, morning and afternoon sunlight) and conditions (auto-focus off, hand shaking, long distance, obscured by fingers, wrinkled, distort, cornered); also 195 printed barcodes on plain papers with occluded and wrinkled conditions.

Condition(s) Number of samples
norm 30000
dark 30000
occluded 20000
occluded+dark 20000
rotated & perspective transformed (RPT) 20000
RPT + dark 20000
cylindered & curvy warped (CCW) 20000
CCW + dark 20000
occluded + RPT 5000
blur 5000
RPT + blur 5000
CCW + blur 5000
upside down 6000
upside down + dark 6000
upside down + blur 6000
upside down + CCW 6000
upside down + occluded 6000
heavy noise + rotated 2000
overexposed + occluded + RPT + CCW 6000
dark + occluded + RPT + CCW 6000
occluded + RPT + CCW 6000
TABLE I: Synthesized conditions

Our training set consists of 250000 synthesized samples (without decoded text under stripes) with conditioned described in Table I (note that 40000 samples are randomly added noise) and 20000 samples augmented from 500 real samples chosen randomly from the real collected set. Some of synthesized samples are shown in Figure 4. The rest 2000 samples of the real collected set are used as test set.

Fig. 4: Some Synthesized Samples

Iv-B Experimental Setups

To demonstrate the performance improvement of our proposed model, we have done test experiment both on enterprise solutions and base model with deep learning. Zxing [19]

is an open source tool which is used by most of developers while Google Barcode API is an commercialized version of Zxing. On the other hand, Cognex and Dynamsoft are two large corporations with long history of developing products using machine vision for industrial uses. For these 4 tools, to make fair evaluations, we all applied test-time augmentation just the same way with our deep learning models in Algorithm

2 and set them to work specifically for EAN13. Note that we used Zxing source and Google API latest versions, Dynamsoft and Cognex demo web-based API were used to evaluated so we also deduced round-trip message duration in measuring inference time. We already considered to compare our results with [23, 17, 25, 24] methods however since we could not get any source or binary so we just stopped at using the listed public tools. We also could not compare with the result claimed in [23] since our downloaded MuensterDB is modified (actually unzipped 1055 samples instead of only 1000 samples mentioned in [23]).

Regarding deep neural network models, Fridborn-similar model is similar to what described in their work [7], (since their input was 196x100x1 while ours is 285x285x3, so that 4 convolutional blocks results in exceeding GPU resource) what we had to change are 32 kernels instead of 256 kernels for last convolutional layer and 2048 nodes instead of 4096 nodes for each of 2 top FC layers). Next, we modelled Non-residual model with 8 convolutional blocks and 2 FC layers having many fewer parameters compared to Fridborn-similar one. Other models using SOTA CNN feature extractors such as ResNet50, ResNet34, MobileNetV2, DenseNet169 just have original feature extractor parts come directly before 13 output nodes as in Figure 2

. Various batch sizes were tried but in our empirical observation, 32 might be the best number. All models were trained from scratch without pretrained knowledge. Note that we had to train models with only synthesized set first until the loss reduce to around 1 (i.e. models converged to a certain level) before training on full training set (with 20000 real-collected augmented samples) because directly training on full training set results in very high loss (even NaN). The training processes were made using NVIDIA Titan RTX with 24 GB VRAM. Our CPU evaluation experiments were conducted on desktop using Intel Core i9 9900KF processor, 32GB RAM while low-computational experiments were run on a NVIDIA Jetson Nano board with CUDA-enabled using NVIDIA TensorRT models (converted from PyTorch).

Iv-C Evaluation

We have 2 metrics to clarify here: accuracy and errors. Basically, a tool would have 3 outcome states given a barcode image: correct (i.e. match the ground truth) decoded sequence, incorrect decoded sequence and no barcode existed (or no checksum-satisfied sequence for our proposed models). Accuracy metric in this work is calculated by

while the number of error = . This means a good model is the one achieves higher accuracy and fewer errors. Another figure that needs to be mentioned in this section is the inference time which is average inference time per one image since each image takes different amount of time for decoders.

Model Accuracy CPU (ms) # of params (M)
Zxing 58.25% 7.65 NA
Dynamsoft 93.10% 978.8 NA
Google API 82.45% 211.9 NA
Cognex 84.60% 111.9 NA
ResNet50 93.35% 66.5 99.5
MobiletNetV2 72.25% 32.4 15.7
MobiletNetV2_kd 83.45% 32.4 15.7
DenseNet169 84.90% 75.65 30
ResNet34 88.70% 38.3 40.7
ResNet34_kd 89.20% 38.3 40.7
Fridborn-similar 31.85% 104.9 403.3
Non-residual 80.80% 103.2 78.5
TABLE II: Tool & models without MPA performances
Model nonMPA max=1 max=2 max=3 max=4

Accuracy

ResNet50 0.9335 0.942 0.9435 0.9445 0.9445
MobiletNetV2 0.8345 0.8595 0.869 0.868 0.8645
DenseNet169 0.849 0.8645 0.876 0.8775 0.874
ResNet34 0.892 0.9075 0.911 0.912 0.911
Non-residual 0.808 0.8295 0.844 0.8455 0.841

# of errors

ResNet50 133 31 48 77 106
MobiletNetV2 331 59 106 164 241
DenseNet169 302 54 105 176 230
ResNet34 216 41 73 116 157
Non-residual 384 55 104 197 285
TABLE III: Using MPA performances
Model nonMPA max=1 max=2 max=3 max=4

Accuracy

ResNet50 0.9335 0.958 0.956 0.951 0.946
MobiletNetV2 0.8345 0.906 0.8975 0.8855 0.8705
DenseNet169 0.849 0.9155 0.9075 0.8915 0.877
ResNet34 0.892 0.9375 0.9315 0.9235 0.916
Non-residual 0.808 0.89 0.879 0.861 0.8445

# of errors

ResNet50 133 56 73 95 108
MobiletNetV2 331 121 182 225 259
DenseNet169 302 113 172 216 246
ResNet34 216 95 129 152 168
Non-residual 384 145 212 274 311
TABLE IV: Using MPA & Augmentation performances
Model nonMPA max=1 max=2 max=3 max=4

Accuracy

ResNet50 0.9335 0.9585 0.9585 0.9525 0.95
MobiletNetV2 0.8345 0.9085 0.906 0.899 0.8945
DenseNet169 0.849 0.9125 0.8985 0.8785 0.8545
ResNet34 0.892 0.933 0.93 0.9215 0.911
Non-residual 0.808 0.8855 0.8735 0.853 0.8355

# of errors

ResNet50 133 55 68 92 100
MobiletNetV2 331 116 165 198 211
DenseNet169 302 119 190 242 291
ResNet34 216 104 132 156 178
Non-residual 384 154 223 290 329
TABLE V: Using MPA & Augmentation with Voting performances

Regarding inference time, we should note that it’s hard to have good perfect evaluation since Dynamsoft, Google API and Cognex were tested via APIs which are actually run on their own servers that aren’t matched our configured desktop. Since some of the tools don’t use machine learning (except Dynamsoft, Cognex which use DNN on their many other products so theirs might be DNN model also), their processing time is relatively smaller than deep learning based techniques but with low accuracy of prediction. The basic evaluation is presented in Table

II. Our basic models outperform other model with reasonable computation time for prediction with an accuracy more than 0.93.

To show the performance gained by applying Smart Inference during testing time, we have performed three different experiments. The first experiment is done with Algorithm 1. As depicted in Table III, the result shows that the performance is improved compared to models without it. The result also shows that the performance improves when the number of gaps considered is increased up to some level, it then shows degradation. The second experiment is done to demonstrate how (Algorithm 2) fast-track augmentation can improve MPA performance. It clearly shows a performance improvement over Algorithm 1 as depicted in Table IV and significant improvements compared to basic approach. However, this time, just after considering one pair having smallest gap, we already reached the best results. Like MPA without augmentation (Algorithm 1), number of errors in Algorithm 2 increases as the number of consider gap is increased. Nevertheless, Algorithm 1 still has showed smaller number of errors when compared to predictions with Algorithm 2. The third experiment corresponded to Algorithm 3 which is done with more cost. As shown in table V. It sometimes slightly outperforms the experiments based on Algorithm 2 with similar behaviour when we change parameter . This suggests that voting scenario is not always a good choice for our models and the models might be already relatively robust to original input image and only need help after they failed in first place.

As we mentioned in last section, with purpose of demonstrate the possibility of the solution on portable devices, we distilled knowledge from best model (ResNet50) to 2 considered small models: ResNet34 and MobileNetV2. The result in Table II clearly shows that knowledge distillation does help in gaining higher performance compared with training by normal loss function. MobileNetV2 jumps considerable from 72.25% to 83.45% while the improvement in ResNet34 model is not much. This could be because ResNet34 is still deep (40.7 million parameters compared to 15.7 million parameters of MobileNetV2) and so their learning ability from itself is still robust enough to reach high performance without the guiding from the teacher model. Finally, our experimental tests on NVIDIA Jetson Nano board show that MobileNetV2, ResNet34 achieved average speeds of 34.2, 45.6 milliseconds per images respectively. This speed is equivalent to smooth frame-per-second experience with the robustness of the model, we expect it is comfortable for users.

V Conclusion

In this work, we have proposed Smart Inference for Multidigit CNN based models to improve the performance of 1D barcode decoding. We have collected multiple real barcodes with label data to train and test the proposed model. We have also added better synthesized data to strengthen the training and testing process. The algorithms proposed during testing time boosted the performance over the base models. It not only outperforms the base model in terms of accuracy but also has small inference time which makes it efficient. The Multidigit CNN based approach with Smart Inference is also a scalable solution as it could extend to decode more than one barcode type. We have also showed that distillation technique transfers effectively knowledge from best model to shallower model to run on low computational edge device, performs clearly better compared to training with normal loss function.

Even though the performance is better in terms of accuracy, the proposed model has limitation in predicting false records (Dynamsoft also predicts 3 errors). Another limitation in this approach is that it’s not applicable for non-fixed length barcode types such as Code39. In the future, the problem of false predicting can be mitigated by applying product recognition techniques.

References

  • [1] Anonymous (2014-12) How to calculate a check digit manually - services. External Links: Link Cited by: §III-A.
  • [2] Y. Cheng, D. Wang, P. Zhou, and T. Zhang (2017) A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1710.09282. Cited by: §II.
  • [3] C. Creusot and A. Munawar (2015) Real-time barcode detection in the wild. In 2015 IEEE winter conference on applications of computer vision, pp. 239–245. Cited by: §II.
  • [4] C. Creusot and A. Munawar (2016) Low-computation egocentric barcode detector for the blind. In 2016 IEEE International Conference on Image Processing (ICIP), pp. 2856–2860. Cited by: §II.
  • [5] E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, and Q. V. Le (2019) Autoaugment: learning augmentation strategies from data. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    ,
    pp. 113–123. Cited by: §II.
  • [6] W. P. Fernandcz, Y. Xian, and Y. Tian (2017) Image-based barcode detection and recognition to assist visually impaired persons. In 2017 IEEE 7th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER), pp. 1241–1245. Cited by: §I.
  • [7] F. Fridborn (2017) Reading barcodes with neural networks. External Links: Link Cited by: §I, §II, §IV-B.
  • [8] I. J. Goodfellow, Y. Bulatov, J. Ibarz, S. Arnoud, and V. Shet (2013) Multi-digit number recognition from street view imagery using deep convolutional neural networks. arXiv preprint arXiv:1312.6082. Cited by: §II, §III.
  • [9] D. K. Hansen, K. Nasrollahi, C. B. Rasmussen, and T. B. Moeslund (2017) Real-time barcode detection and classification using deep learning.. In IJCCI, pp. 321–327. Cited by: §II.
  • [10] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §II.
  • [11] G. Hinton, O. Vinyals, and J. Dean (2015) Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531. Cited by: §II, §III-B, §III-B.
  • [12] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger (2017) Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708. Cited by: §III.
  • [13] E. Joseph and T. Pavlidis (1994) Bar code waveform recognition using peak locations. IEEE Transactions on Pattern Analysis and Machine Intelligence 16 (6), pp. 630–640. Cited by: §I, §II.
  • [14] M. Katona and L. G. Nyúl (2012) A novel method for accurate and efficient barcode detection with morphological operations. In 2012 Eighth International Conference on Signal Image Technology and Internet Based Systems, pp. 307–314. Cited by: §II, §II.
  • [15] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105. Cited by: §II.
  • [16] J. Lemley, S. Bazrafkan, and P. Corcoran (2017) Smart augmentation learning an optimal data augmentation strategy. Ieee Access 5, pp. 5858–5869. Cited by: §II.
  • [17] D. Lin, M. Lin, and K. Huang (2011) Real-time automatic recognition of omnidirectional multiple barcodes and dsp implementation. Machine Vision and Applications 22 (2), pp. 409–419. Cited by: §II, §IV-B.
  • [18] R. Muniz, L. Junco, and A. Otero (1999) A robust software barcode reader using the hough transform. In Proceedings 1999 International Conference on Information Intelligence and Systems (Cat. No. PR00446), pp. 313–319. Cited by: §I, §II.
  • [19] S. Owen et al. (2013) Zxing. Zebra Crossing. Cited by: §II, §IV-B.
  • [20] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. Chen (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510–4520. Cited by: §III.
  • [21] C. Shorten and T. M. Khoshgoftaar (2019) A survey on image data augmentation for deep learning. Journal of Big Data 6 (1), pp. 60. Cited by: §II.
  • [22] G. Sörös and C. Flörkemeier (2013) Blur-resistant joint 1d and 2d barcode localization for smartphones. In Proceedings of the 12th International Conference on Mobile and Ubiquitous Multimedia, pp. 1–8. Cited by: §II.
  • [23] S. Wachenfeld, S. Terlunen, and X. Jiang (2008) Robust recognition of 1-d barcodes using camera phones. In 2008 19th International Conference on Pattern Recognition, pp. 1–4. Cited by: §I, §II, §IV-A, §IV-B.
  • [24] H. Yang, L. Chen, Y. Chen, Y. Lee, and Z. Yin (2016) Automatic barcode recognition method based on adaptive edge detection and a mapping model. Journal of Electronic Imaging 25 (5), pp. 053019. Cited by: §II, §III-A, §IV-B.
  • [25] A. Zamberletti, I. Gallo, M. Carullo, and E. Binaghi (2010) Decoding 1-d barcode from degraded images using a neural network. In International Conference on Computer Vision, Imaging and Computer Graphics, pp. 45–55. Cited by: §II, §IV-A, §IV-B.