Claude CodePluginOpen SourceLive

Test your prompts like code.

PromptOps is an open-source Claude Code plugin that treats prompts like tested code. Evaluate, improve, regression-test, benchmark across models, and run — all without leaving your terminal.

Install

claude mcp add-json promptops -- '{"type":"url", "url":"https://promptops-mcp.harmansidhudev.workers.dev/sse"}'

Get Started Star on GitHub Read Why I Built This

01 /The Problem

Prompts break silently.

Engineers treat prompts like throwaway text — no version control, no test suite, no regression detection. A prompt that worked last week silently degrades after a model update, and there's no tooling to catch it.

PromptOps closes the gap. It gives you the same evaluate → improve → verify → run loop you'd expect from any serious engineering workflow — applied to prompts.

02 /What You Can Do

Five commands. Full coverage.

Each command is a step in the prompt engineering workflow. Use them individually or chain them together.

/evaluate

Step 1

Score any prompt, instantly

Get a 1–5 score across five dimensions: structure, specificity, output control, error prevention, and testability. Know exactly where your prompt is weak.

/improve

Step 2

Turn rough prompts into reliable ones

Automated rewrite with guardrails, output schemas, and edge-case handling. Paste anything — get back a production-ready version.

/regression

Step 3

Catch what broke between versions

Compare two prompt versions side-by-side. See what improved, what regressed, and whether it's safe to ship the new version.

/compare

Step 4

Pick the right model for the job

Benchmark your prompt across Claude Opus, Sonnet, Haiku, and GPT-4o. Quality scores, per-run costs, and monthly projections in one table.

/run

Step 5

Execute your best version

You evaluated it. You improved it. Now run it for real. Executes your best prompt version and delivers actual output.

03 /Works While You Work

Two skills that run themselves.

Auto-invoked skills fire in the background — no commands needed. They watch what you're doing and help only when it matters.

Prompt Quality Check

Fires when editing system prompts, agent definitions, or skill files. One-line fix when it matters — silent when solid.

Golden Dataset Builder

Captures approved outputs as test cases over time. When you say "LGTM" or "approved", offers to save it as a golden dataset entry.

04 /How It Works

Local-first. Zero infrastructure.

Everything persists to a local .promptops/ directory — evaluations, improved versions, regression reports, and cost comparisons. No auth, no backend, no cloud dependency.

The $ARGUMENTS pattern with behavioral overrides prevents Claude from executing prompts instead of analyzing them — the core trick that makes prompt-about-prompt tooling reliable.

Stop guessing if your prompts work.

Install PromptOps in 10 seconds. Start evaluating immediately.

Get Started Star on GitHub