Using Set Covering to Generate Databases for Holistic Steganalysis

11/07/2022
by   Rony Abecidan, et al.
0

Within an operational framework, covers used by a steganographer are likely to come from different sensors and different processing pipelines than the ones used by researchers for training their steganalysis models. Thus, a performance gap is unavoidable when it comes to out-of-distributions covers, an extremely frequent scenario called Cover Source Mismatch (CSM). Here, we explore a grid of processing pipelines to study the origins of CSM, to better understand it, and to better tackle it. A set-covering greedy algorithm is used to select representative pipelines minimizing the maximum regret between the representative and the pipelines within the set. Our main contribution is a methodology for generating relevant bases able to tackle operational CSM. Experimental validation highlights that, for a given number of training samples, our set covering selection is a better strategy than selecting random pipelines or using all the available pipelines. Our analysis also shows that parameters as denoising, sharpening, and downsampling are very important to foster diversity. Finally, different benchmarks for classical and wild databases show the good generalization property of the extracted databases. Additional resources are available at github.com/RonyAbecidan/HolisticSteganalysisWithSetCovering.

READ FULL TEXT
research
11/09/2020

Characterizing Transactional Databases for Frequent Itemset Mining

This paper presents a study of the characteristics of transactional data...
research
11/10/2021

Understanding the Generalization Benefit of Model Invariance from a Data Perspective

Machine learning models that are developed to be invariant under certain...
research
02/25/2019

Quantifying error contributions of computational steps, algorithms and hyperparameter choices in image classification pipelines

Data science relies on pipelines that are organized in the form of inter...
research
09/12/2021

Team NeuroPoly: Description of the Pipelines for the MICCAI 2021 MS New Lesions Segmentation Challenge

This paper gives a detailed description of the pipelines used for the 2n...
research
08/11/2022

Finding Reusable Machine Learning Components to Build Programming Language Processing Pipelines

Programming Language Processing (PLP) using machine learning has made va...
research
08/24/2018

ParaNet - Using Dense Blocks for Early Inference

DenseNets have been shown to be a competitive model among recent convolu...

Please sign up or login with your details

Forgot password? Click here to reset