Test your prompts like code.
PromptOps is an open-source Claude Code plugin that treats prompts like tested code. Evaluate, improve, regression-test, benchmark across models, and run — all without leaving your terminal.
Install
claude mcp add-json promptops -- '{"type":"url", "url":"https://promptops-mcp.harmansidhudev.workers.dev/sse"}'Prompts break silently.
Engineers treat prompts like throwaway text — no version control, no test suite, no regression detection. A prompt that worked last week silently degrades after a model update, and there's no tooling to catch it.
PromptOps closes the gap. It gives you the same evaluate → improve → verify → run loop you'd expect from any serious engineering workflow — applied to prompts.
Five commands. Full coverage.
Each command is a step in the prompt engineering workflow. Use them individually or chain them together.
/evaluateStep 1
Score any prompt, instantly
Get a 1–5 score across five dimensions: structure, specificity, output control, error prevention, and testability. Know exactly where your prompt is weak.
/improveStep 2
Turn rough prompts into reliable ones
Automated rewrite with guardrails, output schemas, and edge-case handling. Paste anything — get back a production-ready version.
/regressionStep 3
Catch what broke between versions
Compare two prompt versions side-by-side. See what improved, what regressed, and whether it's safe to ship the new version.
/compareStep 4
Pick the right model for the job
Benchmark your prompt across Claude Opus, Sonnet, Haiku, and GPT-4o. Quality scores, per-run costs, and monthly projections in one table.
/runStep 5
Execute your best version
You evaluated it. You improved it. Now run it for real. Executes your best prompt version and delivers actual output.
Two skills that run themselves.
Auto-invoked skills fire in the background — no commands needed. They watch what you're doing and help only when it matters.
Prompt Quality Check
Fires when editing system prompts, agent definitions, or skill files. One-line fix when it matters — silent when solid.
Golden Dataset Builder
Captures approved outputs as test cases over time. When you say "LGTM" or "approved", offers to save it as a golden dataset entry.
Local-first. Zero infrastructure.
Everything persists to a local .promptops/ directory — evaluations, improved versions, regression reports, and cost comparisons. No auth, no backend, no cloud dependency.
The $ARGUMENTS pattern with behavioral overrides prevents Claude from executing prompts instead of analyzing them — the core trick that makes prompt-about-prompt tooling reliable.
Stop guessing if your prompts work.
Install PromptOps in 10 seconds. Start evaluating immediately.