> ## Documentation Index
> Fetch the complete documentation index at: https://docs.lyzr.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# GitHub Actions

> Run Control Plane evals on every pull request and block merges on quality regressions.

Control Plane integrates with GitHub Actions through the official `langship/eval-action`. Add it to your workflow to automatically run eval suites on every PR and post results as a check.

## Setup

### 1. Add your Control Plane API key as a secret

In your GitHub repository: **Settings → Secrets and variables → Actions → New repository secret**

* Name: `LANGSHIP_API_KEY`
* Value: your Control Plane API key (from the Control Plane dashboard)

Also add `LANGSHIP_URL` if you're self-hosting:

* Name: `LANGSHIP_URL`
* Value: `https://langship.yourcompany.com`

### 2. Add the workflow

Create `.github/workflows/eval.yml`:

```yaml theme={null}
name: Agent Eval

on:
  pull_request:
    branches: [main, staging]
  push:
    branches: [main]

jobs:
  eval:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Install dependencies
        run: pip install -r requirements.txt

      - name: Run Control Plane evals
        uses: langship/eval-action@v1
        with:
          api-key: ${{ secrets.LANGSHIP_API_KEY }}
          url: ${{ secrets.LANGSHIP_URL }}
          config: langship.yaml
          fail-on-regression: true
```

### 3. Configure evals in `langship.yaml`

```yaml theme={null}
project: my-agent

evals:
  - name: factual-accuracy
    type: llm-judge
    dataset: golden-set
    pass_threshold: 0.85
    blocking: true

  - name: response-length
    type: python
    function: evals.check_length
    dataset: golden-set
    blocking: false      # report-only, won't block merge

deployments:
  staging:
    target: lyzr
    agent_id: ${{ env.AGENT_ID }}
    on_pass: true
    branches: [main]
```

## What the action does

On every PR:

1. Runs each eval defined in `langship.yaml` against the configured dataset
2. Posts results as a GitHub **Check** on the PR; you will see pass/fail in the PR status bar
3. Posts a **PR comment** with a results table (one row per evaluator)
4. If `fail-on-regression: true` and any `blocking` eval drops below its threshold, the action exits with code 1, blocking merge

Example PR comment:

```
## Control Plane Eval Results

| Eval | Score | Threshold | Status |
|---|---|---|---|
| factual-accuracy | 0.91 | 0.85 | ✅ Pass |
| no-refusals | 1.00 | 0.95 | ✅ Pass |
| response-length | 0.78 | 0.80 | ❌ Fail |

**Overall: 2/3 passing**: response-length is non-blocking; merge is allowed.
```

## Blocking vs non-blocking evals

```yaml theme={null}
evals:
  - name: safety-check
    blocking: true     # PR cannot merge if this fails
    pass_threshold: 1.0

  - name: verbosity-score
    blocking: false    # Results shown, but merge not blocked
    pass_threshold: 0.7
```

## Auto-deploy on merge

When a PR merges to `main` and all blocking evals pass, Control Plane can automatically deploy the new agent version:

```yaml theme={null}
deployments:
  production:
    target: lyzr
    agent_id: ${{ env.PROD_AGENT_ID }}
    on_pass: true
    branches: [main]
    require_approval: true    # opens a GitHub environment approval gate
```

With `require_approval: true`, the deploy step waits for a reviewer to approve in the GitHub Actions UI before proceeding.

## Matrix evals across environments

Test your agent against multiple models or configurations in parallel:

```yaml theme={null}
# .github/workflows/eval.yml
jobs:
  eval:
    strategy:
      matrix:
        model: [gpt-4o, gpt-4o-mini, claude-3-5-sonnet]
    steps:
      - uses: langship/eval-action@v1
        with:
          api-key: ${{ secrets.LANGSHIP_API_KEY }}
          env-vars: |
            AGENT_MODEL=${{ matrix.model }}
```

Results for each matrix leg appear as separate checks on the PR.

## Caching

Speed up eval runs by caching your Python dependencies:

```yaml theme={null}
- uses: actions/cache@v4
  with:
    path: ~/.cache/pip
    key: ${{ runner.os }}-pip-${{ hashFiles('requirements.txt') }}

- run: pip install -r requirements.txt
```

## Action inputs

| Input                | Required | Default                 | Description                       |
| -------------------- | -------- | ----------------------- | --------------------------------- |
| `api-key`            | Yes      | None                    | Control Plane API key             |
| `url`                | No       | `http://localhost:3000` | Control Plane server URL          |
| `config`             | No       | `langship.yaml`         | Path to config file               |
| `fail-on-regression` | No       | `true`                  | Exit 1 if blocking eval fails     |
| `post-comment`       | No       | `true`                  | Post results as PR comment        |
| `dataset-version`    | No       | latest                  | Pin dataset to a specific version |
