GPT-4 to GPT-3.5: 'Hold My Scalpel' – A Look at the Competency of OpenAI's GPT on the Plastic Surgery In-Service Training Exam

04/04/2023
by   Jonathan D. Freedman, et al.
0

The Plastic Surgery In-Service Training Exam (PSITE) is an important indicator of resident proficiency and serves as a useful benchmark for evaluating OpenAI's GPT. Unlike many of the simulated tests or practice questions shown in the GPT-4 Technical Paper, the multiple-choice questions evaluated here are authentic PSITE questions. These questions offer realistic clinical vignettes that a plastic surgeon commonly encounters in practice and scores highly correlate with passing the written boards required to become a Board Certified Plastic Surgeon. Our evaluation shows dramatic improvement of GPT-4 (without vision) over GPT-3.5 with both the 2022 and 2021 exams respectively increasing the score from 8th to 88th percentile and 3rd to 99th percentile. The final results of the 2023 PSITE are set to be released on April 11, 2023, and this is an exciting moment to continue our research with a fresh exam. Our evaluation pipeline is ready for the moment that the exam is released so long as we have access via OpenAI to the GPT-4 API. With multimodal input, we may achieve superhuman performance on the 2023.

READ FULL TEXT

page 2

page 9

page 10

page 12

page 13

page 14

page 15

research
03/14/2018

What Should You Know Before Developing a Service Identification Approach

In this paper, we answer a set of research questions that are required t...
research
07/03/2023

Analyzing Multiple-Choice Reading and Listening Comprehension Tests

Multiple-choice reading and listening comprehension tests are an importa...
research
04/10/2023

DISTO: Evaluating Textual Distractors for Multi-Choice Questions using Negative Sampling based Approach

Multiple choice questions (MCQs) are an efficient and common way to asse...
research
05/07/2018

O.D.E.S. : An Online Dynamic Examination System based on a CMS Wordpress plugin

This paper describes the online dynamic examination application plugin n...
research
09/05/2019

An Empirical Study on the Characteristics of Question-Answering Process on Developer Forums

Developer forums are one of the most popular and useful Q&A websites on ...
research
01/24/2019

A BERT Baseline for the Natural Questions

This technical note describes a new baseline for the Natural Questions. ...
research
08/30/2019

CodeSwitch-Reddit: Exploration of Written Multilingual Discourse in Online Discussion Forums

In contrast to many decades of research on oral code-switching, the stud...

Please sign up or login with your details

Forgot password? Click here to reset