Robust DNN Watermarking via Fixed Embedding Weights with Optimized Distribution
Watermarking has been proposed as a way to protect the Intellectual Property Rights (IPR) of Deep Neural Networks (DNNs) and track their use. Several methods have been proposed that embed the watermark into the trainable parameters of the network (white box watermarking) or into the input-output mappping implemented by the network in correspondence to specific inputs (black box watermarking). In both cases, achieving robustness against fine tuning, model compression and, even more, transfer learning, is one of the most difficult challenges researchers are trying to face with. In this paper, we propose a new white-box, multi-bit watermarking algorithm with strong robustness properties, including retraining for transfer learning. Robustness is achieved thanks to a new information coding strategy according to which the watermark message is spread across a number of fixed weights, whose position depends on a secret key. The weights hosting the watermark are set prior to training, and are left unchanged throughout the entire training procedure. The distribution of the weights carrying out the message is theoretically optimised to make sure that the watermarked weights are indistinguishable from the other weights, while at the same time keeping their amplitude as large as possible to improve robustness against retraining. We carried out several experiments demonstrating the capability of the proposed scheme to provide high payloads with practically no impact on the network accuracy, at the same time retaining excellent robustness against network modifications an re-use, including retraining for transfer learning.
READ FULL TEXT