Test your prompts like code.
PromptOps is an open-source Claude Code plugin that treats prompts like tested code. Evaluate, improve, regression-test, benchmark across models, and run — all without leaving your terminal.
Install
claude mcp add-json promptops -- '{"type":"url", "url":"https://promptops-mcp.harmansidhudev.workers.dev/sse"}'Prompts break silently.
Engineers treat prompts like throwaway text — no version control, no test suite, no regression detection. A prompt that worked last week silently degrades after a model update, and there's no tooling to catch it.
PromptOps closes the gap. It gives you the same evaluate → improve → verify → run loop you'd expect from any serious engineering workflow — applied to prompts.
Five commands. Full coverage.
Each command is a step in the prompt engineering workflow. Use them individually or chain them together.
/evaluateStep 1
Score any prompt, instantly
Get a 1–5 score across five dimensions: structure, specificity, output control, error prevention, and testability. Know exactly where your prompt is weak.
/improveStep 2
Turn rough prompts into reliable ones
Automated rewrite with guardrails, output schemas, and edge-case handling. Paste anything — get back a production-ready version.
/regressionStep 3
Catch what broke between versions
Compare two prompt versions side-by-side. See what improved, what regressed, and whether it's safe to ship the new version.
/compareStep 4
Pick the right model for the job
Benchmark your prompt across Claude Opus, Sonnet, Haiku, and GPT-4o. Quality scores, per-run costs, and monthly projections in one table.
/runStep 5
Execute your best version
You evaluated it. You improved it. Now run it for real. Executes your best prompt version and delivers actual output.
Two skills that run themselves.
Auto-invoked skills fire in the background — no commands needed. They watch what you're doing and help only when it matters.
Prompt Quality Check
Fires when editing system prompts, agent definitions, or skill files. One-line fix when it matters — silent when solid.
Golden Dataset Builder
Captures approved outputs as test cases over time. When you say "LGTM" or "approved", offers to save it as a golden dataset entry.
Local-first. Zero infrastructure.
Everything persists to a local .promptops/ directory — evaluations, improved versions, regression reports, and cost comparisons. No auth, no backend, no cloud dependency.
The $ARGUMENTS pattern with behavioral overrides prevents Claude from executing prompts instead of analyzing them — the core trick that makes prompt-about-prompt tooling reliable.
Stop guessing if your prompts work.
Install PromptOps in 10 seconds. Start evaluating immediately.
What brings you here?
Professional Inquiry
Looking to hire, collaborate on platform engineering, or discuss how I approach migrations and infrastructure? I'd love to hear from you.
Follow My Work
I'll email you when I publish something new — engineering deep-dives, product updates, and building in public. One email per post, max.