SecretBench: A Dataset of Software Secrets

03/12/2023
by   Setu Kumar Basak, et al.
0

According to GitGuardian's monitoring of public GitHub repositories, the exposure of secrets (API keys and other credentials) increased two-fold in 2021 compared to 2020, totaling more than six million secrets. However, no benchmark dataset is publicly available for researchers and tool developers to evaluate secret detection tools that produce many false positive warnings. The goal of our paper is to aid researchers and tool developers in evaluating and improving secret detection tools by curating a benchmark dataset of secrets through a systematic collection of secrets from open-source repositories. We present a labeled dataset of source codes containing 97,479 secrets (of which 15,084 are true secrets) of various secret types extracted from 818 public GitHub repositories. The dataset covers 49 programming languages and 311 file types.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/03/2023

A Comparative Study of Software Secrets Reporting by Secret Detection Tools

Background: According to GitGuardian's monitoring of public GitHub repos...
research
11/11/2022

Committed by Accident: Studying Prevention and Remediation Strategies Against Secret Leakage in Source Code Repositories

Version control systems for source code, such as Git, are key tools in m...
research
08/24/2022

What are the Practices for Secret Management in Software Artifacts?

Throughout 2021, GitGuardian's monitoring of public GitHub repositories ...
research
01/29/2023

What Challenges Do Developers Face About Checked-in Secrets in Software Artifacts?

Throughout 2021, GitGuardian's monitoring of public GitHub repositories ...
research
12/07/2020

A Tool to Extract Structured Data from GitHub

GitHub repositories consist of various detailed information about the pr...
research
08/13/2020

Sniffing for Codebase Secret Leaks with Known Production Secrets in Industry

Leaked secrets, such as passwords and API keys, in codebases were respon...
research
02/28/2023

Benchmarking Deepart Detection

Deepfake technologies have been blurring the boundaries between the real...

Please sign up or login with your details

Forgot password? Click here to reset