Multi-turn Multi-Agent System for Prompt Injection detection
MAPD is a production‑ready FastAPI service and research harness for detecting prompt injection/jailbreaks using a multi‑agent LLM pipeline: Agents work to normalizes obfuscated prompts and judge them with optional ProtectedContext signals and an incremental history “unsure” loop for multi‑turn cases. It supports Ollama or Gemini backends, detailed per‑conversation logging and audit trails, a Vite frontend for interaction, and experiment tooling to run sweeps/ablations and generate metrics and figures for evaluation.

Gallery

MAPD Experiment Control Page

MAPD Experiment Control Page

MAPD Experiment Results Page

MAPD Experiment Results Page


MAPD Experiment Figures Page

Single Chat Page

Single Chat Page

Single Chat Page
MAPD — Prompt Defense Evaluation Platform
MAPD is a research-focused platform for exploring prompt safety and jailbreak detection. It provides a production-style API and an interaction layer that lets users run controlled evaluations, monitor progress, and review run artifacts without exposing sensitive implementation details or results.
Research Problem
Modern LLM applications face adversarial prompts that attempt to bypass safety controls, extract protected information, or derail system behavior. The research goal is to design a repeatable, measurable evaluation environment that supports rigorous testing of detection strategies across diverse prompts, contexts, and operational constraints.
Research Goals
- Establish a consistent workflow to measure detection quality and reliability.
- Enable controlled experiments (e.g., sweeps, ablations) without manual setup.
- Surface operational signals (latency, usage, and stability) alongside accuracy.
- Provide an interface for iterating on defenses while preserving safety.
Highlights
- End-to-end workflow for single prompts and batch suites with repeatable runs.
- Interaction layer that connects a web UI to a REST API for execution and review.
- Experiment management with configuration sweeps and structured outputs.
- Operational visibility via structured logs and usage tracking.
Evaluation Scope
MAPD is designed to support structured prompt evaluation in a controlled environment. It emphasizes reproducibility and traceability over one-off demos, making it suitable for research-grade iterations and portfolio-ready writeups.
Interaction Layer
- Web interface for launching evaluations, browsing runs, and reviewing summaries.
- REST endpoints used by the UI for health checks, configuration, execution, and results retrieval.
- Local development setup for running backend and frontend together.
