Using Large Language Models for Cybersecurity Capture-The-Flag Challenges and Certification Questions

by   Wesley Tann, et al.

The assessment of cybersecurity Capture-The-Flag (CTF) exercises involves participants finding text strings or “flags” by exploiting system vulnerabilities. Large Language Models (LLMs) are natural-language models trained on vast amounts of words to understand and generate text; they can perform well on many CTF challenges. Such LLMs are freely available to students. In the context of CTF exercises in the classroom, this raises concerns about academic integrity. Educators must understand LLMs' capabilities to modify their teaching to accommodate generative AI assistance. This research investigates the effectiveness of LLMs, particularly in the realm of CTF challenges and questions. Here we evaluate three popular LLMs, OpenAI ChatGPT, Google Bard, and Microsoft Bing. First, we assess the LLMs' question-answering performance on five Cisco certifications with varying difficulty levels. Next, we qualitatively study the LLMs' abilities in solving CTF challenges to understand their limitations. We report on the experience of using the LLMs for seven test cases in all five types of CTF challenges. In addition, we demonstrate how jailbreak prompts can bypass and break LLMs' ethical safeguards. The paper concludes by discussing LLM's impact on CTF exercises and its implications.


page 1

page 2

page 3

page 4


Ethical ChatGPT: Concerns, Challenges, and Commandments

Large language models, e.g. ChatGPT are currently contributing enormousl...

On the application of Large Language Models for language teaching and assessment technology

The recent release of very large language models such as PaLM and GPT-4 ...

ChatGPT: The End of Online Exam Integrity?

This study evaluated the ability of ChatGPT, a recently developed artifi...

Creating Large Language Model Resistant Exams: Guidelines and Strategies

The proliferation of Large Language Models (LLMs), such as ChatGPT, has ...

FuzzLLM: A Novel and Universal Fuzzing Framework for Proactively Discovering Jailbreak Vulnerabilities in Large Language Models

Jailbreak vulnerabilities in Large Language Models (LLMs), which exploit...

Performance of ChatGPT on USMLE: Unlocking the Potential of Large Language Models for AI-Assisted Medical Education

Artificial intelligence is gaining traction in more ways than ever befor...

Towards Mitigating ChatGPT's Negative Impact on Education: Optimizing Question Design through Bloom's Taxonomy

The popularity of generative text AI tools in answering questions has le...

Please sign up or login with your details

Forgot password? Click here to reset