Yunxiang Mo 莫云翔
I am an undergraduate at the Hong Kong University of Science and Technology (HKUST), pursuing a double major in Computer Science and Mathematics with an Extended Major in Artificial Intelligence (CGA: 4.1 / 4.3). I am fortunate to be advised by Prof. Yangqiu Song and Dr. Tianshi Zheng at the HKUST KnowComp Group.
My research interests center on natural language processing, with a focus on the reasoning and evaluation of large language models and vision-language models. I am especially interested in abductive and multimodal reasoning — how models form, defend, and revise hypotheses under ambiguity.
I am currently looking for a research-exchange position in the U.S. for the upcoming term. If you are a faculty member working on related topics and have an opening, I would be glad to chat — please feel free to reach out via email.
🔥 News
- 2026.03: 🎉 An extended version of DixitWorld was accepted to ACL 2026 (AC meta-review 9/10). [OpenReview]
- 2026.01: 🎉 ScaleCUA was accepted to ICLR 2026 as an Oral. [arXiv]
- 2025.10: 🎉 DixitWorld received a Spotlight at the EMNLP 2025 Workshop (BlackBox NLP). [arXiv]
💼 Experience

Mentored by Dr. Fang Wu in the groups of Prof. Yejin Choi and Prof. Jure Leskovec.

Mentored by Dr. Tianshi Zheng in the group of Prof. Yangqiu Song.

Developed and optimized ML models for embedded and on-chip AI scenarios; built training, evaluation, and inference pipelines in PyTorch; deployed models to edge devices under tight latency and memory constraints.

Developed front-end modules with the MFC framework for an internal mini-program project; UI design, event handling, and system debugging in a small team.
📚 Publications
Robust Decision-Making for LLM Agents in Multi-Turn Reasoning
LLM agents in multi-turn reasoning frequently collapse into self-locking loops, where approximate belief tracking causes them to revisit the same hypotheses without making epistemic progress. We formalize the structural conditions under which such loops arise and show that the failure mode persists across frontier models even when standard information-seeking objectives are applied. To address it, we propose a training-free, distributionally-robust information-gain objective that explicitly hedges against belief-tracking error and restores exploratory progress without any fine-tuning. The method is evaluated on multi-turn reasoning, planning, and decision-making benchmarks across both open- and closed-source LLM agents.
A Multi-Domain LLM Benchmark for Scientific Hypothesis Generation
Scientific hypothesis generation is an open-ended, multi-step task that current LLM benchmarks evaluate poorly: free-text outputs are scored inconsistently, and most setups exclude the literature-grounded reasoning that real scientists rely on. We construct a multi-domain benchmark spanning multiple scientific disciplines, paired with an anchored 5-dimensional rubric that scores coherence, factual consistency, and the presence of boilerplate or hedging language. The benchmark supports two evaluation modes — direct prompting and an agentic mode that allows tool-augmented literature search — making it possible to attribute performance gains to the underlying model versus the surrounding agent scaffold.
Extended version of the workshop paper below — adds a Medium difficulty tier (252 vs. 168 QA items), a 72B-parameter scaling ablation, and calibration/sensitivity analyses.
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data
My contribution: data pipeline and cross-platform workflow components in the open-source codebase.
Original workshop version; extended version accepted to ACL 2026 Main (above).
🛠️ Projects
Facere — AI-Native Hardware Design Agent
A terminal-native agent that drafts and edits schematics and PCB layouts alongside KiCad 9. We built a hardware-aware MCP server backed by a curated 153-motif schematic corpus, paired with a sister PCB physical-simulation package, so the agent can plan, edit, and verify designs end-to-end. Distributed as a single-command bootstrap installer.
PastPaper Master — AI Past-Exam Tutor
A full-stack AI tutoring tool for HKUST students preparing past exam papers. GPT-4o auto-segments and tags every question in an uploaded PDF; Qwen-plus generates a per-question knowledge primer, scaffolded hint, and step-by-step solution. The workbench features side-by-side PDF↔question navigation, photo-OCR handwriting grading, automatic variant-problem generation, and an error book with spaced review.
🏆 Honors and Awards
- University’s Scholarship Scheme for Continuing Undergraduate Students, HKUST, 2024. Top 1% of continuing undergraduates.
- S.S. Chern Class, HKUST. Honor for top academic performance across all mathematics coursework.
- Dean’s List Honor, HKUST, 2024 & 2025. GPA above 3.7.
🤝 Academic Services
(Coming soon.)
🎓 Teaching
- Teaching Assistant, Discrete Mathematics — HKUST.
- Teaching Assistant, Exploring Artificial Intelligence — HKUST.
