← Back to Projects

DMRAG

Dungeon Master Retrieval-Augmented Generation system that automates the Dungeon Master role in Dungeons & Dragons.

Overview

Tabletop games like D&D rely on a human Dungeon Master to run the story and rules - solo or small groups often lack that. DMRAG automates the DM role: it produces immersive, context-aware adventure narration and handles mechanics (dice, hit points, etc.) in a single modular pipeline. The main technical challenge we addressed was narrative coherence over multi-turn dialogue without sacrificing rule fidelity.

The system combines dense semantic search, a fine-tuned generative model, and symbolic validation: each response is grounded in retrieved rule and lore passages, then checked for mechanical correctness. We used three data sources - SRDv5.2 rules, four campaign modules, and the CRD3 Critical Role dialogue dataset (~87k training / 22k validation pairs, ~5M tokens) - and a five-stage pipeline from ingestion and chunking through embedding, hybrid retrieval, generation, and game-state parsing.

Key contributions

  • Fine-tuned sentence transformer (BAAI/bge-base-en-v1.5) with Multiple Negatives Ranking Loss for D&D retrieval
  • GPT2-medium (355M) fine-tuned on chunked SRD + campaigns for domain style, then used for response generation
  • Hybrid retrieval: BM25 narrows to top 1k chunks, then FAISS + cosine similarity for top-50; combined scoring for final top-k
  • OpenAI function calling (GPT-4.1-nano) for parsing LLM output and user input to track game state (HP, dice, etc.)
  • Metadata filtering so users can choose a storyline (e.g. Curse of Strahd); retrieval stays lore-focused and efficient
  • Neural-symbolic pipeline: retrieval-grounded generation plus symbolic rule enforcement

Results

We evaluated retrieval grounding, language quality, and rule compliance. On representative queries, the retriever achieved 100% recall at top-3 (ground-truth chunk always in top-3) with semantic similarity scores > 0.5 for contextually relevant results. Perplexity on user-style inputs fell in the expected range for a RAG system (~14), indicating the model adapted well to D&D-style dialogue. The pipeline delivers rule-grounded, context-aware DM responses and game-state updates in one system.

Tech Stack

  • Python
  • PyTorch
  • Transformers
  • GPT2-medium
  • OpenAI API
  • Sentence Transformers (BGE)
  • FAISS
  • BM25
  • RAG
  • NLP