Skip to main content

Prerequisites

  • Docker and Docker Compose
  • Python 3.9+
  • An agent project (LangChain, LlamaIndex, or raw OpenAI)

Step 1: Start Control Plane Server

curl -O https://raw.githubusercontent.com/LyzrCore/langship/main/docker-compose.yml
docker compose up -d
Control Plane Server starts at http://localhost:3000. Open it in a browser to confirm the dashboard loads.

Step 2: Install the SDK

pip install langship

Step 3: Instrument your agent

Add two lines to your agent code:
import langship

langship.init(
    endpoint="http://localhost:3000",
    project="my-first-agent"
)
Control Plane auto-patches LangChain, LlamaIndex, and the OpenAI client. Every LLM call and tool invocation is traced automatically, without manual spans required.

Step 4: Run your agent

from langchain.agents import initialize_agent, load_tools
from langchain.llms import OpenAI

langship.init(endpoint="http://localhost:3000", project="my-first-agent")

llm = OpenAI(temperature=0)
tools = load_tools(["serpapi"], llm=llm)
agent = initialize_agent(tools, llm, verbose=True)

agent.run("What is the capital of France?")

Step 5: View the trace

Open http://localhost:3000 → select your project → click the latest run. You’ll see the full trace tree:
  • Agent decision step
  • LLM call with token count, cost, and latency
  • Tool invocations with inputs and outputs
  • Final response

Step 6: Add your first eval

Create langship.yaml in your project root:
project: my-first-agent

evals:
  - name: factual-accuracy
    type: llm-judge
    prompt: "Is this response factually correct? Answer yes or no."
    dataset: ./evals/golden-set.jsonl
    pass_threshold: 0.8
Create evals/golden-set.jsonl:
{"input": "What is the capital of France?", "expected": "Paris"}
{"input": "Who wrote Hamlet?", "expected": "Shakespeare"}
Run the eval:
langship eval run
You’ll see per-example pass/fail and an overall score. If the score drops below pass_threshold, the command exits with code 1, blocking CI.

What’s next