Side Project

PromptOps — Test Your Prompts Like Code

Claude CodePluginAI ToolingOpen Source

Problem

Engineers treat prompts like throwaway text — no version control, no test suite, no regression detection. A prompt that worked last week silently degrades after a model update, and there's no tooling to catch it.

Ownership

Sole creator. Designed the command architecture, scoring rubric, cost benchmarking model, and plugin distribution strategy. Open-sourced on GitHub.

Architecture

Five user-invoked commands (/evaluate, /improve, /regression, /compare, /run) plus two auto-invoked skills (prompt-quality-check, golden-dataset-builder). All output persists to .promptops/ locally — evaluations, improved versions, regression reports, and cost comparisons. The $ARGUMENTS pattern with behavioral overrides prevents Claude from executing prompts instead of analyzing them.

PromptOps workflow showing five commands: evaluate scores prompts 1 to 5, improve rewrites with schemas and guardrails, regression compares versions, compare benchmarks across four models with cost, and run executes for real. Auto-invoked skills and local persistence in .promptops directory.

Highlights

•1–5 scoring across five dimensions: structure, specificity, output control, error prevention, testability
•/compare benchmarks across Claude Opus, Sonnet, Haiku, and GPT-4o with per-run and monthly cost projections
•/regression detects improvements, risks, and regressions between prompt versions
•Auto-invoked skill nudges prompt quality while editing — silent when solid
•Golden dataset builder captures approved outputs as test cases over time

Deployment

Distributed as a Claude Code plugin — installable via terminal or Mac app upload. All state stored locally in .promptops/ per project. No auth, no backend, no infrastructure for v1.

Impact

Closes the gap between writing prompts and testing them. Engineers get a repeatable evaluate → improve → verify → run loop without leaving their terminal.

PromptOps — Test Your Prompts Like Code

Problem

Ownership

Architecture

Highlights

Deployment

Impact

Related Reading

PromptOps — Test Your Prompts Like Code

Problem

Ownership

Architecture

Highlights

Deployment

Impact

Related Reading