Safurai 001: New Qualitative Approach for Code LLM Evaluation

09/20/2023
by   Davide Cifarelli, et al.
0

This paper presents Safurai-001, a new Large Language Model (LLM) with significant potential in the domain of coding assistance. Driven by recent advancements in coding LLMs, Safurai-001 competes in performance with the latest models like WizardCoder [Xu et al., 2023], PanguCoder [Shen et al., 2023] and Phi-1 [Gunasekar et al., 2023] but aims to deliver a more conversational interaction. By capitalizing on the progress in data engineering (including latest techniques of data transformation and prompt engineering) and instruction tuning, this new model promises to stand toe-to-toe with recent closed and open source developments. Recognizing the need for an efficacious evaluation metric for coding LLMs, this paper also introduces GPT4-based MultiParameters, an evaluation benchmark that harnesses varied parameters to present a comprehensive insight into the models functioning and performance. Our assessment shows that Safurai-001 can outperform GPT-3.5 by 1.58 WizardCoder by 18.78

READ FULL TEXT

page 6

page 11

page 14

page 15

page 20

research
03/24/2022

Language Models that Seek for Knowledge: Modular Search Generation for Dialogue and Prompt Completion

Language models (LMs) have recently been shown to generate more factual ...
research
04/19/2021

When FastText Pays Attention: Efficient Estimation of Word Representations using Constrained Positional Weighting

Since the seminal work of Mikolov et al. (2013a) and Bojanowski et al. (...
research
06/07/2023

INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models

Instruction-tuned large language models have revolutionized natural lang...
research
02/02/2021

Broadcast Rate Requires Nonlinear Coding in a Unicast Index Coding Instance of Size 36

Insufficiency of linear coding for the network coding problem was first ...
research
03/13/2019

Rejoinder: "Gene Hunting with Hidden Markov Model Knockoffs"

In this paper we deepen and enlarge the reflection on the possible advan...
research
03/17/2021

The Human Evaluation Datasheet 1.0: A Template for Recording Details of Human Evaluation Experiments in NLP

This paper introduces the Human Evaluation Datasheet, a template for rec...
research
01/19/2021

Learning Outcome Oriented Programmatic Assessment

This paper describes considerations behind the organisation of a third s...

Please sign up or login with your details

Forgot password? Click here to reset