← All Projects
Claude CodePluginOpen SourceLive

Test your prompts like code.

PromptOps is an open-source Claude Code plugin that treats prompts like tested code. Evaluate, improve, regression-test, benchmark across models, and run — all without leaving your terminal.

Install

claude mcp add-json promptops -- '{"type":"url", "url":"https://promptops-mcp.harmansidhudev.workers.dev/sse"}'
01 /The Problem

Prompts break silently.

Engineers treat prompts like throwaway text — no version control, no test suite, no regression detection. A prompt that worked last week silently degrades after a model update, and there's no tooling to catch it.

PromptOps closes the gap. It gives you the same evaluate → improve → verify → run loop you'd expect from any serious engineering workflow — applied to prompts.

02 /What You Can Do

Five commands. Full coverage.

Each command is a step in the prompt engineering workflow. Use them individually or chain them together.

/evaluate

Step 1

Score any prompt, instantly

Get a 1–5 score across five dimensions: structure, specificity, output control, error prevention, and testability. Know exactly where your prompt is weak.

/improve

Step 2

Turn rough prompts into reliable ones

Automated rewrite with guardrails, output schemas, and edge-case handling. Paste anything — get back a production-ready version.

/regression

Step 3

Catch what broke between versions

Compare two prompt versions side-by-side. See what improved, what regressed, and whether it's safe to ship the new version.

/compare

Step 4

Pick the right model for the job

Benchmark your prompt across Claude Opus, Sonnet, Haiku, and GPT-4o. Quality scores, per-run costs, and monthly projections in one table.

/run

Step 5

Execute your best version

You evaluated it. You improved it. Now run it for real. Executes your best prompt version and delivers actual output.

03 /Works While You Work

Two skills that run themselves.

Auto-invoked skills fire in the background — no commands needed. They watch what you're doing and help only when it matters.

Prompt Quality Check

Fires when editing system prompts, agent definitions, or skill files. One-line fix when it matters — silent when solid.

Golden Dataset Builder

Captures approved outputs as test cases over time. When you say "LGTM" or "approved", offers to save it as a golden dataset entry.

04 /How It Works

Local-first. Zero infrastructure.

Everything persists to a local .promptops/ directory — evaluations, improved versions, regression reports, and cost comparisons. No auth, no backend, no cloud dependency.

The $ARGUMENTS pattern with behavioral overrides prevents Claude from executing prompts instead of analyzing them — the core trick that makes prompt-about-prompt tooling reliable.

Stop guessing if your prompts work.

Install PromptOps in 10 seconds. Start evaluating immediately.

Get in Touch

What brings you here?

Professional Inquiry

Looking to hire, collaborate on platform engineering, or discuss how I approach migrations and infrastructure? I'd love to hear from you.

LinkedIn