Attacking Optical Character Recognition (OCR) Systems with Adversarial Watermarks

02/08/2020 ∙ by Lu Chen, et al. ∙ Tsinghua University 17

Optical character recognition (OCR) is widely applied in real applications serving as a key preprocessing tool. The adoption of deep neural network (DNN) in OCR results in the vulnerability against adversarial examples which are crafted to mislead the output of the threat model. Different from vanilla colorful images, images of printed text have clear backgrounds usually. However, adversarial examples generated by most of the existing adversarial attacks are unnatural and pollute the background severely. To address this issue, we propose a watermark attack method to produce natural distortion that is in the disguise of watermarks and evade human eyes' detection. Experimental results show that watermark attacks can yield a set of natural adversarial examples attached with watermarks and attain similar attack performance to the state-of-the-art methods in different attack scenarios.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


Optical Character Recognition (OCR) is a widely adopted application for conversing printed or handwritten images to text, which becomes a critical preprocessing component in text analysis pipelines, such as document retrieval and summarization. OCR has been significantly improved in recent years thanks to the wide adoption of the deep neural network (DNN), and thus deployed in many critical applications where OCR’s quality is vital. For example, photo-based ID recognition depends on OCR’s quality to automatically structure information into databases, and automatic trading sometimes relies on OCR to read certain news articles for determining the sentiment of news.

Unfortunately, OCR also inherits all counter-intuitive security problems of the DNNs. Especially, the OCR model is also vulnerable to adversarial examples, which are crafted by making human-imperceptible perturbations on original images with the intent of misleading the model. The wide adoption of OCR in real pipelines gives more incentives for adversaries to game the OCR, such as causing fake ID information, incorrect readings of metrics or instructions, etc. Figure 2 and 3 in the evaluation section illustrate two real-world examples with attacking the ID number and financial report number. This paper provides a preliminary study on the possibility of OCR attacks.

Many prior works [Nguyen, Yosinski, and Clune2015, Goodfellow, Shlens, and Szegedy2014, Papernot et al.2016a, Szegedy et al.2013] have shown that changing the prediction of DNNs is practicable by applying carefully-designed perturbations (usually background noise) to original images, in traditional image classification tasks. Recent projects, Adversarial Patch [Brown et al.2017] and LaVAN [Karmon, Zoran, and Goldberg2018], introduced the adversarial patch attack, which puts visible perturbations confined to a small region or location in the image.

However, these methods are not directly applicable to OCR attacks for the following three reasons:

First, the input image to OCR is on a white paper with a spotless background. Thus any perturbation added by existing attacks will appear so obvious to human readers that it will cause suspicion.

Second, in complex languages like Chinese, there are many characters (e.g., the dataset we use contains 5,989 unique characters). If an adversary wants to perform a targeted attack, i.e., changing one character to another specific one (target) in a sentence and meanwhile resulting in semantically meaningful recognition results, it requires a large number of perturbations that are too obvious to hide.

Third, instead of classifying characters individually, the modern OCR model is an end-to-end neural network, inputing a variable-sized image and outputting sequences of labels. In other words, it works on feeding images line by line. It is usually called the

sequential labeling task, which is relatively harder to be attacked than the image classification task. It is insufficient to just add perturbations to a single character. Instead, the perturbations are required to span multiple characters. Also, as the OCR model is end-to-end, the internal feature representations rely on nearby characters (contexts). Thus the perturbations of attacking a single character are designed given its contexts.

In this preliminary study, we propose a new attack method, WATERMARK attack, against modern OCR models. Watermarks are images or patterns commonly used in documents as background to identify things, such as marking a document proprietary, secret, urgent, or even simply as decoration. Similar to watermarks, in Asian countries, documents often contain stamps to verify their authority. Human eyes are so used to these watermarks and ignore them. In this paper, we generate natural watermark-style perturbations. That is, we limit all perturbations within a small region of the image, i.e., a watermark. Given the bound, we minimize the perturbation level. In comparison, classic adversarial examples spread noise all over the image. Our approach is similar to the patch-based attacks [Brown et al.2017, Karmon, Zoran, and Goldberg2018]. Different from that patches absolutely cover part of the images, watermarks do not hinder text’s readability and thus look more natural. [Heng, Zhou, and Jiang2018] generated disguising perturbations as shadows, exposure problems or color problems. And [Hanwei Zhang2019] generated smooth noise by Laplacian smoothing. But none of these solve the clear background challenge for OCR.

We focus on white-box, targeted attack

in this paper. That is, we assume adversaries have perfect knowledge of the DNN architecture and parameters (white-box model) and aim to generate specific recognition results (targeted attack). Given that many real OCR softwares are based on similar open-source OCR DNN models, we believe the white-box model, in addition to being a starting point, also has real-world applicability.

As a consequence, the WATERMARK attack is an adversarial attack on the OCR model. The WATERMARK attack attaches natural watermark-style noise, tricks the OCR model into outputting specific recognition results, and preserves the readability of adversarial images at the same time. To some extent, the WATERMARK attack solves the clear background problem.

As an evaluation, we performed the WATERMARK attack on a state-of-the-art open-source OCR model using DenseNet + CTC neural network architectures for the Chinese language. We used a dataset with 3.64 million images and 5,989 unique characters. With 158 pairs of original-target, we show that the WATERMARK

attack can generate quite human-eye friendly adversarial samples with a high probability of success. Some of

WATERMARK adversarial examples even work on Tesseract OCR [Google2019] in a black-box manner.

Even more, we applied our model to real-world scenarios. In Figure 3, we employed the WATERMARK method to a page of an annual report of a listed Chinese company and changed the semantics, in the meantime, the image looks natural to human readers.

The contributions of this paper include: 1) We propose the WATERMARK adversarial attack to generate natural-to-human watermark-style perturbations, targeting DNN-based OCR systems. We also demonstrated a method to hide perturbations that human eyes are accustomed to, in a watermark-bounded region . 2) Using difficult OCR cases (Chinese), we demonstrated the success rate of WATERMARK attacks comparing to existing ones.

Background and Related Work

Optical Character Recognition (OCR)

Generally, the OCR pipeline, as shown in Figure 1

, begins with line segmentation, which includes page layout analysis for locating the position of each line, de-skewing the image, and segmenting the input image into line images. After preprocessing line images, such as rescaling and normalizing, such images are fed into the recognition model, which outputs recognition results.

There are two types of OCR models. 1) Character-based models are the classic way [Smith2007]. Such a recognition model segments the image into per-character sub-images and classifies each sub-image into the most likely recognition result. Obviously, its performance heavily relies on the character segmentation. 2) End-to-end models are a segmentation-free approach. It recognizes entire sequences of characters in a variable-sized image. [Bengio et al.1995, Espana-Boquera et al.2011] adopted sequential models. [Breuel et al.2013, Wang et al.2012] utilize DNNs as the feature extractor for the end-to-end sequential recognition. Sequential DNN models [Graves et al.2006] introduced a segmentation-free approach, connectionist temporal classification(CTC), which allows variable-sized input images and output results.

In end-to-end models, sequence labeling is a task that assigns a sequence of discrete labels to variable-sized sequential input data. In our case, the input is a variable-size image and the output is a sequence of characters , from predefined character set .

Connectionist Temporal Classification (CTC).

CTC is an alignment-free method for training DNNs on the sequential labeling task, which provides a kind of loss enabling us to recognize sequences without explicit segmentation while training DNNs. Therefore, many state-of-the-art OCR models use CTC as the model’s loss function. Given the input image

, let be the sequence of model ’s outputs, where and

is the probability distribution over the character set

in observing label .

CTC requires calculating the likelihood , which is barely directly measured from the model’s probability distribution and the target sequence . To settle this, CTC uses a valid alignment of , , where the target sequence can be obtained by removing all blanks and sequential duplicate characters (e.g. both [a, –, a, b, –] and [–, a, a, –, –, a, b, b] are valid alignments of [a, a, b]). The likelihood is to sum up the probability of all possible valid alignments denoted as .


The negative log-likelihood of is the CTC loss function .


To obtain the most probable output sequence , a greedy path decoding algorithm can select the most probable alignment at each step. However, the greedy algorithm does not guarantee to find the most probable labeling. A better method, beam search decoding, simultaneously keeps a certain number of the most probable alignments at each step and chooses the most probable output in the top-alignment list.

Attacking DNN-based computer vision tasks

Where to add perturbations?

Attacking DNN models is a popular topic in both computer vision and security fields. Many projects focus on finding small

-bounded perturbations, hoping that the bound will keep the perturbations visually imperceptible. FGSM [Goodfellow, Shlens, and Szegedy2014], L-BFGS [Szegedy et al.2013], DeepFool [Moosavi-Dezfooli, Fawzi, and Frossard2016], Carlini  [Carlini and Wagner2017], PGD [Madry et al.2017] and EAD [Chen et al.2018] all performed modifications at the pixel level by a small amount bounded by .

Other attacks such as JSMA [Papernot et al.2016a], Carlini  [Carlini and Wagner2017], Adversarial Patch [Brown et al.2017] and LaVAN [Karmon, Zoran, and Goldberg2018], perturb a small region of pixels in an image but the pixel-level perturbations are not bounded by .

As we have mentioned, neither approach can hide perturbations from the normal human vision in OCR tasks, as a document with enough readability usually has a spotless background and vivid text, which is greatly different from natural RGB photos.

How to generate perturbations?

There are two types of methods to generate perturbations.

1) Gradient-based attack is to add perturbations generated by gradient against input pixels. Formally, we can describe the general problem as: For a -bounded adversary, we compute an adversarial example given an original image and target labels where perturbations’ bound is tiny enough to be indistinguishable to human observers.

Fast Gradient Sign Method (FGSM) [Goodfellow, Shlens, and Szegedy2014] is a one-step attack that obtains the adversarial image as . The original image takes a gradient sign step with step size in the direction that increases the probability of the target label . It is efficient, but it only provides a coarse approximation of the optimal perturbations.

Basic Iterative Method (BIM) [Kurakin, Goodfellow, and Bengio2016] takes multiple smaller steps and the result image is clipped by the same bound : , where is an adversarial example yielded at step . BIM produces superior results to FGSM.

Momentum Iterative Method (MIM) [Dong et al.2018] extends BIM with a momentum item. MIM can not only stabilize update directions but also escape from poor local maxima during the iteration. Thus, it generates more transferable adversarial examples. Each iterative update is to adjust the update direction and generate new adversarial image using the momentum item , as following , where is the decay factor.

2) Optimization-based attack directly solves the optimization problem of minimizing the distance between the original example and the adversarial example and yielding the incorrect classification.

Box-constraint L-BFGS [Szegedy et al.2013] finds the adversarial examples by solving the box-constraint problem, subject to , where

is the cross-entropy loss between the logit output and the target label

. Although the perturbations generated by L-BFGS are much less than the gradient-based attack, L-BFGS has far low efficiency.

C&W [Carlini and Wagner2017] is a

-oriented attack that can successfully break undefended and defensively distilled DNNs. Given the logit

of the model , other than applying cross-entropy as the loss function, C&W attack designed a new loss function solved by gradient descent, where is the confidence of misclassification.

Defense methods against these attacks

People have proposed many practical defense methods against adversarial examples. Adversarial training [Tramèr et al.2017] improves the robustness of DNNs by injecting label-corrected adversarial examples into the training procedure and training a robust model that has a resistance to perturbations generated by gradient-based methods. Defensive distillation [Papernot et al.2016b] defends against adversarial perturbations using the distillation techniques [Hinton, Vinyals, and Dean2015] to retrain the same network with class probabilities predicted by the original network. There are also methods focusing on detecting adversarial samples [Xu, Evans, and Qi2017, Lu, Issaranon, and Forsyth2017, Grosse et al.2017, Feinman et al.2017].


Preliminaries. We assume that attackers have full knowledge of the threat model, such as the model architecture and parameter values. Given an input image , an adversarial image where is the number of pixels in the original image, OCR model , the adversarial example’s prediction result is . Given target label , loss function of the target model with respect to the input image is . Besides, we assume to be differentiable almost everywhere (sub-gradient may be used at discontinuities). Because the gradient-descent approach is applicable to any DNNs with a differentiable discriminant function.

Distance Metric. We define the distance metric to quantify the similarity between the original image and the corresponding adversarial image . Such a distance metric reflects the cost of manipulations. -norm is a widely-used distance metric defined as

where d-dimensional vector

for any . accounts for the total variation in the perturbations and serves as a popular convex surrogate function of the that measures the number of modified pixels. measures the standard Euclidean distance, which is usually used to improve the visual quality. measures the maximum change of the perturbations between and .

Figure 1: The pipeline of the WATERMARK attack. We generate noise using MIM with CTC loss function back propagating the targeted DenseNet and then mask the noise outside the watermark region. We iterating the procedure above until an iterations threshold.

Watermark attack to CTC-based OCR

In this paper, we propose the MIM-based WATERMARK attack on the CTC-based OCR model to generate adversarial examples. In this section, we will first introduce how to integrate watermarks into MIM [Dong et al.2018] , which induces the MIM-based WATERMARK attack method (WM) to generate adversarial examples satisfying the -norm restriction in the targeted and non-targeted attack fashion. We then present several variants of WM to -norm bound. The generation pipeline of the WATERMARK adversarial attack is illustrated in Figure 1. Table 1 shows adversarial examples generated from each method.

MIM-based Watermark attack (WM).

Watermark widely occurs in a mass of documents and files. Making use of the popularity of the watermark in the documents, we apply the idea of the watermark to decorate perturbations as the watermark; that is, we restrict the manipulation region on a specific predefined watermark-shape region .

To generate a targeted -bounded adversarial example , we start with an initial image given an original image . WM seeks the adversarial example by solving the constrained optimization problem


where is the size of adversarial perturbations and . We summarized WM in Algorithm 1. At each attacking iteration , the attacker first feeds the adversarial example to the OCR model and obtain the gradient through back-propagation. Then, for the purpose of stabilizing update directions and escaping from poor local maxima, update momentum item by accumulating the velocity vector in the gradient direction as Equation 3 shown in Algorithm 1. Last, update new adversarial example by applying the -restricted sign gradient with small step size , and clip the intermediate perturbations to ensure them in the -ball of as Equation 4. The attacking iteration proceeds until the attack is successful or reaches the maximum iterations .

0:  A clean image , OCR model with CTC loss , ground-truth text , targeted text , watermark modification region , -ball perturbation, # of iterations , decay factor
0:  An adversarial example with or attack failure
1:  Initialization: ; ;
2:  for all each iteration to  do
3:      Input to and obtain the gradient
4:      Update by accumulating the velocity vector in the gradient direction as
5:      Update by applying watermark-bound sign gradient as
6:      if  then
7:          return  
8:      end if
9:  end for
10:  return  failure
Algorithm 1 MIM-based WATERMARK example generation
original MIM WM WMinit WMneg WMedge OCR output English translation


drive left
drive right

1 Sep.9 Sep.

I am Xiao Fang
I am Sun Fang


class is over at 1
class is over at 2

class is over at 1
once class is over

class is over at 1
class is not over at 1

Table 1: Adversarial examples with different attacks. The last two rows show text deletion and insertion. Other rows show text substitution.

Variants of WM attack.

To present a more natural appearance of watermark-like perturbations, we design three variants of WM attack.

WMinit. Watermark region element-wisely multiplies the sign gradient which attackers only operate pixels inside the watermark region . As shown in the WM column of Table 1, the perturbations of WM adversarial examples are not dense enough to construct a complete watermark and be a natural watermark. Thus, for filling in the blanks of the watermark region, we start from an initial watermark-pasted image by attaching a watermark to the original image, , where is the grayscale value of the pasted watermark and denotes the position except the text.

WMneg. The sign of the gradient, , can be -1 or +1 based on the direction of gradient descent. When the gradient is positive, , the pixel value will increase, that is, the pixel looks whiter (the maximum of the grayscale value mean the whitest color or else blackest color). Otherwise, the pixel value decreases, and the pixel looks blacker. Obviously, the pixels in the text region become whiter resulting in the fuzzy text, and it’s meaningless to whiten the clear white background. Thus, we only need to keep the negative gradient and leave the positive gradient behind. We generated WM noise but only kept the negative gradient during attacking iteration. After adding the new constraint, the update step of new adversarial example, Equation 4, becomes


WMedge. A different way to add perturbations is to confine watermark region around the text edges, pretending to be defectives in printing.

WMedge is similar with WM. We define the watermark as the region of the text edge, which can be obtained by image erosion in morphological image processing. The erosion operation erodes the original image using the specified structuring element that determines the shape of a pixel neighborhood over which the minimum is taken. In experiments, we use a rectangular structuring element as a kernel, . We take the bolder text region after erosion as the watermark. Thus, the text-edge shape watermark is defined as


In this section, we generate adversarial examples on the CTC-based OCR model. We compared the performance of the basic MIM, WM, and its variants.


Threat model. We performed WM attack on the DenseNet + CTC neural network 111˙ocr which is trained in the Chinese text image dataset. DenseNet [Huang et al.2017] is one of powerful DNNs for visual recognition, which can capture complex features. Thus, we utilize DenseNet as the feature extractor and CTC [Graves et al.2006] as the loss function. In the test phase, the DenseNet+CTC OCR recognition model achieved 98.3% accuracy on the validation dataset that involves 36400 images. The Chinese text image dataset has 3.64 million images that are generated by altering fonts, scale, grayscale, blur, sketch based on Chinese news articles. The character set has 5989 unique characters, including Chinese and English characters, digits, punctuations.

Attack setting. The attack setting is applied among all experiments. Our experiment setup is based on MIM’s framework. We use the implementation of MIM in CleverHans package222 We use the attack setting which runs iterations at the most. We utilize an early stopping criterion based on the attacking result at each iteration. The -norm perturbations is bounded by 0.2. The pixel value of the image ranges from 0 (black) to 1 (white). For the initial watermark in WMinit, we set the grayscale value of the watermark to 0.3, and we put the watermark in the center of the image by default. The watermarks are set to the font size 30.

Choose attacked candidates. To satisfy the semantic fluency in our OCR attack, we choose 691 pairs of antonym characters with high similarity of character shape, 333Given two characters and , character similarity is defined as
where denotes the absolute value of strokes between and . is the Levenshtein distance of sijiao between and , which is an encoding approach to fast retrieving Chinese characters. is edit distance of character images between and . The weights , and is chosen as 0.33, 0.33 and 0.34, respectively.
, so that the adversarial attack only requires adversarial perturbations as less as possible to fool the OCR system. Then we match selected antonym character pairs in the corpus of People’s Daily in1988 and choose 158 sentences containing selected characters which do not cause syntactic errors after substituting the corresponding antonym character. Last, we generate the line image containing chosen sentences.

Evaluation metrics. To quantify perturbations of adversarial images compared with benign images , we measure perturbations from MSE, PSNR and SSIM. Mean-squared error (MSE) denotes the difference between adversarial images and original images, calculated by .

Peak-signal-to-noise ratio

(PSNR) is a ratio of maximum possible power of a signal and power of distortion, calculated by where denotes the dynamic range of pixel intensities, e.g., for an 8 bits/pixel image we have . Structural similarity index (SSIM) attempts to model the structural information of an image from luminance, contrast and structure respectively.

To evaluate the efficiency of adversarial attacks, we calculate attack success rate (ASR) by that is the fraction of adversarial images that were successful in fooling the DNN model, targeted attack success rate (ASR*) of adversarial attacks calculated as , the average time to generate adversarial perturbations from the clean images.

Comparison of attacks on single character altering

We compare different methods of altering a single character. Table 1 shows some successful adversarial examples that different attack methods generated. Our intuitions are: 1) MIM generates human-perceptible and unnatural noise on account of the dirty background, which distributes all over the image, and harms the image structure similarity and image quality. 2) WM and its variants retain the noise in the watermark region bringing in a more clear background and reasonable perturbations. 3) The watermark-fashion perturbations of WM are relatively light, and do not look like a real watermark. 4) WMinit and WMneg look more real with darker and more complete watermark’s shape. 5) The perturbations of WMedge are around the edge of the text, which makes the text looks bolder and similar to printing/scanning defects.

Intuitively, we can see that WM family of attacks generate better visual quality (in terms of looking natural) images if the attack is successful.

We evaluate the attack performance of altering a single character using the corpus discussed above. In Table 2, we report the metrics above (MSE/PSNR/SSIM,ASR*/ASR), as well as the average time required to generate an attack sample. Our observations are: 1) From Table 2 , compared to MIM, we can observe that WM, WMneg and WMedge obtained lower MSE, higher PSNR and higher SSIM, indicating that the noise level is indeed lower on a successful attack. 2) Due to the lower noise level, the attack success rates (ASR* and ASR) of WM and its variants are also lower than MIM’s. We believe there are several reasons why they are lower. First, in this preliminary study, we always choose a fixed shape and location of the watermark that is at the center of the original image. The fixed location severely limits what the adversary can do. As our future work, we will allow multiple shapes of the watermark (e.g., different texts, logos of the watermark), and different locations. 3) WMneg casts away the positive gradient noise which possesses a certain attack ability. Hence, WMneg behaved worse than WM. However, WMneg does generate more natural examples visually. 4) WMedge is a special case of WM which restricts the watermark region to the shape of the text edge, which has no location problem of the watermarks like other WM-series methods. It achieved good ASR and preserved the naturalness of perturbations. It’s obvious to find that retaining all gradient noise is better than only keeping the negative gradient noise. 5) WM0 is to attach an initial watermark to the original image, which can evaluate the impact of the watermark originally. After attaching an initial watermark, 37.9% and 17.1% images are misclassified by the DenseNet+CTC model and Tesseract OCR, respectively. Thus watermark owns attacking properties intrinsically. 6) The time for producing each adversarial example is similar and within a reasonable range. Thus, a practical strategy is to combine different attack methods to improve ASR.

DenseNet+CTC Tesseract OCR
MIM 0.0102 32.27 84.92 92.4 93.7 20.8 19.6 84.2
WM 0.0020 34.70 96.03 60.8 61.4 17.5 19.0 88.0
WMneg 0.0023 34.26 94.11 52.5 53.2 20.7 19.6 88.6
WMinit 0.0094 34.64 89.02 55.1 71.5 19.9 0.0 100.0
WMedge 0.0058 34.43 93.87 87.3 88.0 13.5 19.6 83.5
WM0 0.0034 30.92 93.37 37.9 17.1
Table 2: Comparison of varing adversarial attacks on DenseNet+CTC and Tesseract OCR. See text for the metric definitions.
ASR*/ASR MIM WM WMneg WMinit WMedge example
no deformation 92.4/93.7 60.8/61.4 52.5/53.2 55.1/71.5 87.3/88.0
Deformation AvgBlur@2x2 44.30/62.7 47.47/51.27 46.84/54.43 18.99/79.11 48.73/55.06
MedianBlur@3x3 2.53/99.37 1.27/98.73 1.90/95.57 0.00/100.0 2.53/96.84
GaussianBlur@3x3,0 51.27/58.86 48.10/52.53 46.20/55.70 27.85/72.78 52.53/55.06
Salt&Pepper@2% 4.43/99.37 3.16/98.10 2.53/97.47 0.00/100.0 3.80/98.73
Compress@20 38.61/68.99 46.20/55.06 47.47/58.86 22.78/76.58 56.33/63.92
inpainting@2 0.00/100.0 0.00/100.0 0.00/100.0 0.00/100.0
Table 3: Comparison of ASR*/ASR (%) of adversarial examples under different preprocessing / defense methonds.

Attack transferability to blackbox OCR

We want to see if adversarial examples can mislead other (black-box) models, or have commonly called transferability [Liu et al.2016, Papernot, McDaniel, and Goodfellow2016, Sharma and Chen2017, Papernot et al.2017].

We adopt the widely-used latest version Tesseract OCR [Google2019] as a black-box model to perform adversarial attacks. We fed the adversarial samples, which are generated by attack methods above in the Densenet+CTC model, into the off-the-shelf Tesseract OCR, and evaluated recognition results (ASR*/ASR) shown in the last two columns of Table 2.

We find that all attacks produce transferable adversarial examples in terms of ASR. It may be due to the reason that the noise indeed perturbs the intrinsic features of a character sequence for different models, or because Tesseract OCR cannot handle noise. However, ASR* reduces significantly because perturbations are still trained on a different model.

Real-world examples

Figure 2 and 3 show two real-world examples of WM. In Figure 2, using watermarks, we successfully altered the license number recognition results. Figure 3 shows an example of a paragraph of an annual financial report of a public company. By adding the AICS watermark, we altered all the revenue numbers in the recognition results.

Figure 2: An adversarial attack example in driver license recognition. The OCR output a licenses number of NAL12505717 while it is actually NHL12506717.
Figure 3: Attack on a listed Chinese company’s annual report. All the revenue numbers are altered in the OCR result.

Defense against these attacks

We evaluate the robustness of these attacks against common defense methods that preprocess the input images, and Table 3 summarizes the results.

Noise removing methods with local smoothing. Local smoothing makes use of nearby pixels to smooth each pixel. We use three local smoothing methods from OpenCV [OpenCVb], average blur, median blur and gaussian blur. We observe that 1) median blur is particularly effective in removing sparsely-occurring black and white pixels in an image while preserving edges of the text well. 2) Median blur with kernel size 3 blurs texts so much that OCR algorithms no longer work, although it reduces ASR*. 3) Average smoothing with kernel size 2 and Gaussian smoothing with kernel size 3 have similar performance. Although MIM has a high ASR, it seems more sensitive to various deformations than WM and WMedge.

Salt&pepper noise is a common way against adversarial examples. We find that it is particularly effective in decreasing ASR* (to 0), but the result is an increase of ASR to almost . It indicates that salt&pepper noise harms the general image quality too much in exchange for reducing adversarial perturbations.

Image compression. We show that adversarial examples can survive lossy image compression and decompression, applying the standard JPEG compression method, with the quality parameter is set to 20. We can point out that WATERMARK attack has more chance of surviving under the compression process.

Watermark removing techniques. Inpainting is a commonly used method to remove (real) watermarks. We use the inpainting method [Telea2004] implemented in OpenCV [OpenCVa], which required the mask of watermarks as a priori and tried to recover the watermark region according to surrounding pixels. While inpainting eliminated the watermark, because the text region overlaped with the watermark region, the inpainting method removed too many useful pixels, that is text pixels, causing OCR to fail completely. We observe that the text even lost readability to human eyes.

Conclusion and Future Work

Generating adversarial examples for OCR systems is different from normal CV tasks. We propose a method that successfully hides perturbations from human eyes while making them effective in the modern sequence-based OCR, by pretending perturbations as a watermark, or printing defects. We show that even with preliminary implementations, our perturbations can be still effective, transferable, and deceiving to human eyes.

There are many future directions. For example, allowing different watermark shapes and locations, as well as on longer sequences. Also, it is interesting to add semantic-based (language model) attacks even further to improve the attack effectiveness. Also, the adversarial attack calls for better defense methods other than traditional image transformations.

Acknowledgements. This work is supported in part by the National Natural Science Foundation of China (NSFC) Grant 61532001 and the Zhongguancun Haihua Institute for Frontier Information Technology.