GitHub Actions

Control Plane integrates with GitHub Actions through the official langship/eval-action. Add it to your workflow to automatically run eval suites on every PR and post results as a check.

Setup

1. Add your Control Plane API key as a secret

In your GitHub repository: Settings → Secrets and variables → Actions → New repository secret

Name: LANGSHIP_API_KEY
Value: your Control Plane API key (from the Control Plane dashboard)

Also add LANGSHIP_URL if you’re self-hosting:

Name: LANGSHIP_URL
Value: https://langship.yourcompany.com

2. Add the workflow

Create .github/workflows/eval.yml:

name: Agent Eval

on:
  pull_request:
    branches: [main, staging]
  push:
    branches: [main]

jobs:
  eval:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Install dependencies
        run: pip install -r requirements.txt

      - name: Run Control Plane evals
        uses: langship/eval-action@v1
        with:
          api-key: ${{ secrets.LANGSHIP_API_KEY }}
          url: ${{ secrets.LANGSHIP_URL }}
          config: langship.yaml
          fail-on-regression: true

3. Configure evals in `langship.yaml`

project: my-agent

evals:
  - name: factual-accuracy
    type: llm-judge
    dataset: golden-set
    pass_threshold: 0.85
    blocking: true

  - name: response-length
    type: python
    function: evals.check_length
    dataset: golden-set
    blocking: false      # report-only, won't block merge

deployments:
  staging:
    target: lyzr
    agent_id: ${{ env.AGENT_ID }}
    on_pass: true
    branches: [main]

What the action does

On every PR:

Runs each eval defined in langship.yaml against the configured dataset
Posts results as a GitHub Check on the PR; you will see pass/fail in the PR status bar
Posts a PR comment with a results table (one row per evaluator)
If fail-on-regression: true and any blocking eval drops below its threshold, the action exits with code 1, blocking merge

Example PR comment:

## Control Plane Eval Results

| Eval | Score | Threshold | Status |
|---|---|---|---|
| factual-accuracy | 0.91 | 0.85 | ✅ Pass |
| no-refusals | 1.00 | 0.95 | ✅ Pass |
| response-length | 0.78 | 0.80 | ❌ Fail |

**Overall: 2/3 passing**: response-length is non-blocking; merge is allowed.

Blocking vs non-blocking evals

evals:
  - name: safety-check
    blocking: true     # PR cannot merge if this fails
    pass_threshold: 1.0

  - name: verbosity-score
    blocking: false    # Results shown, but merge not blocked
    pass_threshold: 0.7

Auto-deploy on merge

When a PR merges to main and all blocking evals pass, Control Plane can automatically deploy the new agent version:

deployments:
  production:
    target: lyzr
    agent_id: ${{ env.PROD_AGENT_ID }}
    on_pass: true
    branches: [main]
    require_approval: true    # opens a GitHub environment approval gate

With require_approval: true, the deploy step waits for a reviewer to approve in the GitHub Actions UI before proceeding.

Matrix evals across environments

Test your agent against multiple models or configurations in parallel:

# .github/workflows/eval.yml
jobs:
  eval:
    strategy:
      matrix:
        model: [gpt-4o, gpt-4o-mini, claude-3-5-sonnet]
    steps:
      - uses: langship/eval-action@v1
        with:
          api-key: ${{ secrets.LANGSHIP_API_KEY }}
          env-vars: |
            AGENT_MODEL=${{ matrix.model }}

Results for each matrix leg appear as separate checks on the PR.

Caching

Speed up eval runs by caching your Python dependencies:

- uses: actions/cache@v4
  with:
    path: ~/.cache/pip
    key: ${{ runner.os }}-pip-${{ hashFiles('requirements.txt') }}

- run: pip install -r requirements.txt

Action inputs

Input	Required	Default	Description
`api-key`	Yes	None	Control Plane API key
`url`	No	`http://localhost:3000`	Control Plane server URL
`config`	No	`langship.yaml`	Path to config file
`fail-on-regression`	No	`true`	Exit 1 if blocking eval fails
`post-comment`	No	`true`	Post results as PR comment
`dataset-version`	No	latest	Pin dataset to a specific version

Getting Started

Guides

GitHub Actions

Setup

1. Add your Control Plane API key as a secret

2. Add the workflow

3. Configure evals in `langship.yaml`

What the action does

Blocking vs non-blocking evals

Auto-deploy on merge

Matrix evals across environments

Caching

Action inputs

​Setup

​1. Add your Control Plane API key as a secret

​2. Add the workflow

​3. Configure evals in langship.yaml

​What the action does

​Blocking vs non-blocking evals

​Auto-deploy on merge

​Matrix evals across environments

​Caching

​Action inputs

Setup

1. Add your Control Plane API key as a secret

2. Add the workflow

3. Configure evals in `langship.yaml`

What the action does

Blocking vs non-blocking evals

Auto-deploy on merge

Matrix evals across environments

Caching

Action inputs