RoCar: A Relationship Network-based Evaluation Method to Large Language Models

07/29/2023
by   Ming Wang, et al.
0

Large language models (LLMs) have received increasing attention. However, due to the complexity of its capabilities, how to rationally evaluate the capabilities of LLMs is still a task to be solved. We propose the RoCar method, which utilizes the defined basic schemas to randomly construct a task graph and generates natural language evaluation tasks based on the task graph to evaluate the reasoning and memory abilities of LLMs respectively. Due to the very large randomness of the task construction process, it is possible to ensure that none of the LLMs to be tested has directly learned the evaluation tasks, guaranteeing the fairness of the evaluation method.

READ FULL TEXT
research
08/18/2023

Enhancing Reasoning Capabilities of Large Language Models: A Graph-Based Verification Approach

Large Language Models (LLMs) have showcased impressive reasoning capabil...
research
05/24/2023

GPT4Graph: Can Large Language Models Understand Graph Structured Data ? An Empirical Evaluation and Benchmarking

Large language models (LLM) like ChatGPT have become indispensable to ar...
research
04/27/2017

Duluth at SemEval-2017 Task 6: Language Models in Humor Detection

This paper describes the Duluth system that participated in SemEval-2017...
research
05/17/2023

Can Language Models Solve Graph Problems in Natural Language?

Large language models (LLMs) are increasingly adopted for a variety of t...
research
08/08/2023

AgentSims: An Open-Source Sandbox for Large Language Model Evaluation

With ChatGPT-like large language models (LLM) prevailing in the communit...
research
02/17/2023

Unsupervised Task Graph Generation from Instructional Video Transcripts

This work explores the problem of generating task graphs of real-world a...
research
02/24/2023

Robot Behavior-Tree-Based Task Generation with Large Language Models

Nowadays, the behavior tree is gaining popularity as a representation fo...

Please sign up or login with your details

Forgot password? Click here to reset