News Verifiers Showdown: A Comparative Performance Evaluation of ChatGPT 3.5, ChatGPT 4.0, Bing AI, and Bard in News Fact-Checking

by   Kevin Matthe Caramancion, et al.

This study aimed to evaluate the proficiency of prominent Large Language Models (LLMs), namely OpenAI's ChatGPT 3.5 and 4.0, Google's Bard(LaMDA), and Microsoft's Bing AI in discerning the truthfulness of news items using black box testing. A total of 100 fact-checked news items, all sourced from independent fact-checking agencies, were presented to each of these LLMs under controlled conditions. Their responses were classified into one of three categories: True, False, and Partially True/False. The effectiveness of the LLMs was gauged based on the accuracy of their classifications against the verified facts provided by the independent agencies. The results showed a moderate proficiency across all models, with an average score of 65.25 out of 100. Among the models, OpenAI's GPT-4.0 stood out with a score of 71, suggesting an edge in newer LLMs' abilities to differentiate fact from deception. However, when juxtaposed against the performance of human fact-checkers, the AI models, despite showing promise, lag in comprehending the subtleties and contexts inherent in news information. The findings highlight the potential of AI in the domain of fact-checking while underscoring the continued importance of human cognitive skills and the necessity for persistent advancements in AI capabilities. Finally, the experimental data produced from the simulation of this work is openly available on Kaggle.


Artificial intelligence is ineffective and potentially harmful for fact checking

Fact checking can be an effective strategy against misinformation, but i...

Bridging History with AI A Comparative Evaluation of GPT 3.5, GPT4, and GoogleBARD in Predictive Accuracy and Fact Checking

The rapid proliferation of information in the digital era underscores th...

Large language models can rate news outlet credibility

Although large language models (LLMs) have shown exceptional performance...

Deceptive AI Systems That Give Explanations Are Just as Convincing as Honest AI Systems in Human-Machine Decision Making

The ability to discern between true and false information is essential t...

Extractive and Abstractive Explanations for Fact-Checking and Evaluation of News

In this paper, we explore the construction of natural language explanati...

True or false? Cognitive load when reading COVID-19 news headlines: an eye-tracking study

Misinformation is an important topic in the Information Retrieval (IR) c...

Automated, not Automatic: Needs and Practices in European Fact-checking Organizations as a basis for Designing Human-centered AI Systems

To mitigate the negative effects of false information more effectively, ...

Please sign up or login with your details

Forgot password? Click here to reset