DMRAG

Dungeon Master Retrieval-Augmented Generation system that automates the Dungeon Master role in Dungeons & Dragons.

View on GitHub

Overview

Tabletop games like D&D rely on a human Dungeon Master to run the story and rules - solo or small groups often lack that. DMRAG automates the DM role: it produces immersive, context-aware adventure narration and handles mechanics (dice, hit points, etc.) in a single modular pipeline. The main technical challenge we addressed was narrative coherence over multi-turn dialogue without sacrificing rule fidelity.

The system combines dense semantic search, a fine-tuned generative model, and symbolic validation: each response is grounded in retrieved rule and lore passages, then checked for mechanical correctness. We used three data sources - SRDv5.2 rules, four campaign modules, and the CRD3 Critical Role dialogue dataset (~87k training / 22k validation pairs, ~5M tokens) - and a five-stage pipeline from ingestion and chunking through embedding, hybrid retrieval, generation, and game-state parsing.

Key contributions

Fine-tuned sentence transformer (BAAI/bge-base-en-v1.5) with Multiple Negatives Ranking Loss for D&D retrieval
GPT2-medium (355M) fine-tuned on chunked SRD + campaigns for domain style, then used for response generation
Hybrid retrieval: BM25 narrows to top 1k chunks, then FAISS + cosine similarity for top-50; combined scoring for final top-k
OpenAI function calling (GPT-4.1-nano) for parsing LLM output and user input to track game state (HP, dice, etc.)
Metadata filtering so users can choose a storyline (e.g. Curse of Strahd); retrieval stays lore-focused and efficient
Neural-symbolic pipeline: retrieval-grounded generation plus symbolic rule enforcement

Results

We evaluated retrieval grounding, language quality, and rule compliance. On representative queries, the retriever achieved 100% recall at top-3 (ground-truth chunk always in top-3) with semantic similarity scores > 0.5 for contextually relevant results. Perplexity on user-style inputs fell in the expected range for a RAG system (~14), indicating the model adapted well to D&D-style dialogue. The pipeline delivers rule-grounded, context-aware DM responses and game-state updates in one system.

Tech Stack

Python
PyTorch
Transformers
GPT2-medium
OpenAI API
Sentence Transformers (BGE)
FAISS
BM25
RAG
NLP