langship/eval-action. Add it to your workflow to automatically run eval suites on every PR and post results as a check.
Setup
1. Add your Control Plane API key as a secret
In your GitHub repository: Settings → Secrets and variables → Actions → New repository secret- Name:
LANGSHIP_API_KEY - Value: your Control Plane API key (from the Control Plane dashboard)
LANGSHIP_URL if you’re self-hosting:
- Name:
LANGSHIP_URL - Value:
https://langship.yourcompany.com
2. Add the workflow
Create.github/workflows/eval.yml:
3. Configure evals in langship.yaml
What the action does
On every PR:- Runs each eval defined in
langship.yamlagainst the configured dataset - Posts results as a GitHub Check on the PR; you will see pass/fail in the PR status bar
- Posts a PR comment with a results table (one row per evaluator)
- If
fail-on-regression: trueand anyblockingeval drops below its threshold, the action exits with code 1, blocking merge
Blocking vs non-blocking evals
Auto-deploy on merge
When a PR merges tomain and all blocking evals pass, Control Plane can automatically deploy the new agent version:
require_approval: true, the deploy step waits for a reviewer to approve in the GitHub Actions UI before proceeding.
Matrix evals across environments
Test your agent against multiple models or configurations in parallel:Caching
Speed up eval runs by caching your Python dependencies:Action inputs
| Input | Required | Default | Description |
|---|---|---|---|
api-key | Yes | None | Control Plane API key |
url | No | http://localhost:3000 | Control Plane server URL |
config | No | langship.yaml | Path to config file |
fail-on-regression | No | true | Exit 1 if blocking eval fails |
post-comment | No | true | Post results as PR comment |
dataset-version | No | latest | Pin dataset to a specific version |