MAPD is a production‑ready FastAPI service and research harness for detecting prompt injection/jailbreaks using a multi‑agent LLM pipeline: Agents work to normalizes obfuscated prompts and judge them with optional ProtectedContext signals and an incremental history “unsure” loop for multi‑turn cases. It supports Ollama or Gemini backends, detailed per‑conversation logging and audit trails, a Vite frontend for interaction, and experiment tooling to run sweeps/ablations and generate metrics and figures for evaluation.

MAPD — Prompt Defense Evaluation Platform

MAPD is a research-focused platform for exploring prompt safety and jailbreak detection. It provides a production-style API and an interaction layer that lets users run controlled evaluations, monitor progress, and review run artifacts without exposing sensitive implementation details or results.

Research Problem

Modern LLM applications face adversarial prompts that attempt to bypass safety controls, extract protected information, or derail system behavior. The research goal is to design a repeatable, measurable evaluation environment that supports rigorous testing of detection strategies across diverse prompts, contexts, and operational constraints.

Research Goals

Establish a consistent workflow to measure detection quality and reliability.
Enable controlled experiments (e.g., sweeps, ablations) without manual setup.
Surface operational signals (latency, usage, and stability) alongside accuracy.
Provide an interface for iterating on defenses while preserving safety.

Highlights

End-to-end workflow for single prompts and batch suites with repeatable runs.
Interaction layer that connects a web UI to a REST API for execution and review.
Experiment management with configuration sweeps and structured outputs.
Operational visibility via structured logs and usage tracking.

Evaluation Scope

MAPD is designed to support structured prompt evaluation in a controlled environment. It emphasizes reproducibility and traceability over one-off demos, making it suitable for research-grade iterations and portfolio-ready writeups.

Interaction Layer

Web interface for launching evaluations, browsing runs, and reviewing summaries.
REST endpoints used by the UI for health checks, configuration, execution, and results retrieval.
Local development setup for running backend and frontend together.

MAPD — Prompt Defense Evaluation Platform

Research Problem

Research Goals

Establish a consistent workflow to measure detection quality and reliability.
Enable controlled experiments (e.g., sweeps, ablations) without manual setup.
Surface operational signals (latency, usage, and stability) alongside accuracy.
Provide an interface for iterating on defenses while preserving safety.

Highlights

End-to-end workflow for single prompts and batch suites with repeatable runs.
Interaction layer that connects a web UI to a REST API for execution and review.
Experiment management with configuration sweeps and structured outputs.
Operational visibility via structured logs and usage tracking.

Evaluation Scope

Interaction Layer

Web interface for launching evaluations, browsing runs, and reviewing summaries.
REST endpoints used by the UI for health checks, configuration, execution, and results retrieval.
Local development setup for running backend and frontend together.

Multi-turn Multi-Agent System for Prompt Injection detection

Gallery

MAPD — Prompt Defense Evaluation Platform

Research Problem

Research Goals

Highlights

Evaluation Scope

Interaction Layer

Related Projects

Research Assistant - LLM Research Pipeline

Multi-turn Multi-Agent System for Prompt Injection detection

Gallery

MAPD — Prompt Defense Evaluation Platform

Research Problem

Research Goals

Highlights

Evaluation Scope

Interaction Layer

Related Projects

Research Assistant - LLM Research Pipeline