Professional Certification Benchmark Dataset: The First 500 Jobs For Large Language Models

by   David Noever, et al.

The research creates a professional certification survey to test large language models and evaluate their employable skills. It compares the performance of two AI models, GPT-3 and Turbo-GPT3.5, on a benchmark dataset of 1149 professional certifications, emphasizing vocational readiness rather than academic performance. GPT-3 achieved a passing score (>70 the professional certifications without fine-tuning or exam preparation. The models demonstrated qualifications in various computer-related fields, such as cloud and virtualization, business analytics, cybersecurity, network setup and repair, and data analytics. Turbo-GPT3.5 scored 100 Security Certified Professional (OSCP) exam. The models also displayed competence in other professional domains, including nursing, licensed counseling, pharmacy, and teaching. Turbo-GPT3.5 passed the Financial Industry Regulatory Authority (FINRA) Series 6 exam with a 70 preparation. Interestingly, Turbo-GPT3.5 performed well on customer service tasks, suggesting potential applications in human augmentation for chatbots in call centers and routine advice services. The models also score well on sensory and experience-based tests such as wine sommelier, beer taster, emotional quotient, and body language reader. The OpenAI model improvement from Babbage to Turbo resulted in a median 60 years. This progress suggests that focusing on the latest model's shortcomings could lead to a highly performant AI capable of mastering the most demanding professional certifications. We open-source the benchmark to expand the range of testable professional skills as the models improve or gain emergent capabilities.


page 10

page 11

page 14

page 21

page 22

page 23

page 25

page 26


The Two Word Test: A Semantic Benchmark for Large Language Models

Large Language Models (LLMs) have shown remarkable abilities recently, i...

Collaborative Storytelling with Human Actors and AI Narrators

Large language models can be used for collaborative storytelling. In thi...

PGD: A Large-scale Professional Go Dataset for Data-driven Analytics

Lee Sedol is on a winning streak–does this legend rise again after the c...

HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models

Large Language Models (LLMs) pretrained on massive corpora exhibit remar...

Cybersecurity Career Requirements: A Literature Review

This study employs a systematic literature review approach to identify t...

ChatGPT is not a pocket calculator – Problems of AI-chatbots for teaching Geography

The recent success of large language models and AI chatbots such as Chat...

Creating Large Language Model Resistant Exams: Guidelines and Strategies

The proliferation of Large Language Models (LLMs), such as ChatGPT, has ...

Please sign up or login with your details

Forgot password? Click here to reset