DMRAG
Dungeon Master Retrieval-Augmented Generation system that automates the Dungeon Master role in Dungeons & Dragons.
Overview
Tabletop games like D&D rely on a human Dungeon Master to run the story and rules - solo or small groups often lack that. DMRAG automates the DM role: it produces immersive, context-aware adventure narration and handles mechanics (dice, hit points, etc.) in a single modular pipeline. The main technical challenge we addressed was narrative coherence over multi-turn dialogue without sacrificing rule fidelity.
The system combines dense semantic search, a fine-tuned generative model, and symbolic validation: each response is grounded in retrieved rule and lore passages, then checked for mechanical correctness. We used three data sources - SRDv5.2 rules, four campaign modules, and the CRD3 Critical Role dialogue dataset (~87k training / 22k validation pairs, ~5M tokens) - and a five-stage pipeline from ingestion and chunking through embedding, hybrid retrieval, generation, and game-state parsing.
Key contributions
- Fine-tuned sentence transformer (BAAI/bge-base-en-v1.5) with Multiple Negatives Ranking Loss for D&D retrieval
- GPT2-medium (355M) fine-tuned on chunked SRD + campaigns for domain style, then used for response generation
- Hybrid retrieval: BM25 narrows to top 1k chunks, then FAISS + cosine similarity for top-50; combined scoring for final top-k
- OpenAI function calling (GPT-4.1-nano) for parsing LLM output and user input to track game state (HP, dice, etc.)
- Metadata filtering so users can choose a storyline (e.g. Curse of Strahd); retrieval stays lore-focused and efficient
- Neural-symbolic pipeline: retrieval-grounded generation plus symbolic rule enforcement
Results
We evaluated retrieval grounding, language quality, and rule compliance. On representative queries, the retriever achieved 100% recall at top-3 (ground-truth chunk always in top-3) with semantic similarity scores > 0.5 for contextually relevant results. Perplexity on user-style inputs fell in the expected range for a RAG system (~14), indicating the model adapted well to D&D-style dialogue. The pipeline delivers rule-grounded, context-aware DM responses and game-state updates in one system.
Tech Stack
- Python
- PyTorch
- Transformers
- GPT2-medium
- OpenAI API
- Sentence Transformers (BGE)
- FAISS
- BM25
- RAG
- NLP