Comparing PCG metrics with Human Evaluation in Minecraft Settlement Generation

07/06/2021
by   Jean-Baptiste Hervé, et al.
4

There are a range of metrics that can be applied to the artifacts produced by procedural content generation, and several of them come with qualitative claims. In this paper, we adapt a range of existing PCG metrics to generated Minecraft settlements, develop a few new metrics inspired by PCG literature, and compare the resulting measurements to existing human evaluations. The aim is to analyze how those metrics capture human evaluation scores in different categories, how the metrics generalize to another game domain, and how metrics deal with more complex artifacts. We provide an exploratory look at a variety of metrics and provide an information gain and several correlation analyses. We found some relationships between human scores and metrics counting specific elements, measuring the diversity of blocks and measuring the presence of crafting materials for the present complex blocks.

READ FULL TEXT

page 1

page 5

page 8

page 9

page 10

research
05/29/2018

Human vs Automatic Metrics: on the Importance of Correlation Design

This paper discusses two existing approaches to the correlation analysis...
research
08/31/2022

The Glass Ceiling of Automatic Evaluation in Natural Language Generation

Automatic evaluation metrics capable of replacing human judgments are cr...
research
05/08/2022

RoViST:Learning Robust Metrics for Visual Storytelling

Visual storytelling (VST) is the task of generating a story paragraph th...
research
04/07/2022

Automated Isovist Computation for Minecraft

Procedural content generation for games is a growing trend in both resea...
research
01/29/2023

EMP-EVAL: A Framework for Measuring Empathy in Open Domain Dialogues

Measuring empathy in conversation can be challenging, as empathy is a co...
research
01/13/2022

Beyond chord vocabularies: Exploiting pitch-relationships in a chord estimation metric

Chord estimation metrics treat chord labels as independent of one anothe...
research
05/17/2023

FACE: Evaluating Natural Language Generation with Fourier Analysis of Cross-Entropy

Measuring the distance between machine-produced and human language is a ...

Please sign up or login with your details

Forgot password? Click here to reset