Glossary

Evaluation set

A fixed set of tasks used to test AI changes.

Why it matters

Business and delivery impact.

Prevents silent regressions.

How to measure

Proof you can track.

Score trends over time.

Example

What it looks like in practice.

Twenty standard prompts for design system QA.

Related

Keep the context connected.

evaluation rubricprompt versioning