DixitWorld: Evaluating Multimodal Abductive Reasoning in Vision-Language Models with Multi-Agent Dixit Gameplay
Published in EMNLP 2025 Workshop (Spotlight), 2025
Authors: Yunxiang Mo, Tianshi Zheng, Qing Zong, Jiayu Liu, Baixuan Xu, Yauwai Yim, Chunkit Chan, Jiaxin Bai, Yangqiu Song.
Venue: EMNLP 2025 Workshop — Spotlight.
An extended version was later accepted to ACL 2026 Main Conference.
Links: [arXiv]
Abstract
We introduce DixitWorld, an evaluation framework for assessing multimodal abductive reasoning in vision-language models (VLMs). DixitWorld has two components:
- DixitArena — a dynamic multi-agent setting in which models alternate between generating cryptic clues (storyteller) and selecting the target image from alternatives (listener).
- DixitBench — a static benchmark (168 questions, Easy / Hard) that isolates the listener task for controlled assessment.
We find that smaller open-source models often excel as creative storytellers — producing imaginative but less discriminative clues — while larger proprietary models show stronger overall performance. The results expose a fundamental tradeoff between generative creativity and discriminative understanding in multimodal reasoning.
Recommended citation: Yunxiang Mo, Tianshi Zheng, Qing Zong, Jiayu Liu, Baixuan Xu, Yauwai Yim, Chunkit Chan, Jiaxin Bai, Yangqiu Song. (2025). "DixitWorld: Evaluating Multimodal Abductive Reasoning in Vision-Language Models with Multi-Agent Dixit Gameplay." EMNLP 2025 Workshop (Spotlight).
Download Paper
