Build a Voice Support Agent with Lyzr

This cookbook walks you through building Nexus, a Level 1 IT support voice agent, end to end: pick the models, save the agent, run a live session in a browser, and fetch transcripts.

What you’ll build

Nexus greets the caller, asks whether they’re on Mac or Windows, and walks them through Level 1 troubleshooting steps. By the end of this guide you’ll have:

a saved agent on the Lyzr Voice API (reusable across sessions),
a Python script that starts and ends sessions,
a React component that connects a browser microphone to a live session,
a script that pulls the transcript and latency metrics for any past call.

Prerequisites

A Lyzr API key. Set it as LYZR_API_KEY for the Python scripts.
Python with requests (pip install requests).
For the browser section: a React app that can install npm/pnpm packages.

export LYZR_API_KEY="your-lyzr-api-key"
export VOICE_API_BASE_URL="https://voice-livekit.studio.lyzr.ai/v1"

1. Pick your models and a voice

Pipeline mode runs three separate models — STT, LLM, TTS — plus a TTS voice. Before creating the agent, ask the API which options are available so you can pick concrete values for each.

import os
from urllib.parse import urlencode

import requests

API_KEY = os.environ["LYZR_API_KEY"]
BASE_URL = os.getenv("VOICE_API_BASE_URL", "https://voice-livekit.studio.lyzr.ai/v1")

HEADERS = {"x-api-key": API_KEY, "accept": "application/json"}


def get_json(path: str, params=None) -> dict:
    query = f"?{urlencode(params)}" if params else ""
    response = requests.get(f"{BASE_URL}{path}{query}", headers=HEADERS, timeout=30)
    response.raise_for_status()
    return response.json()


def main() -> None:
    # Pick the first TTS provider that's configured for your account.
    providers = get_json("/config/tts-voice-providers")["providers"]
    active_provider = next(p["providerId"] for p in providers if p.get("configured"))

    # Pick a model for each pipeline stage.
    options = get_json("/config/pipeline-options")
    stt = options["stt"][0]["models"][0]["id"]
    llm = options["llm"][0]["models"][0]["id"]
    tts_provider = next(p for p in options["tts"] if p["providerId"] == active_provider)
    tts_model = tts_provider["models"][0]["id"]

    # Pick a voice for the chosen provider.
    voices = get_json(
        "/config/tts-voices",
        {"providerId": active_provider, "limit": 5},
    )["voices"]
    voice = voices[0]

    print(f"STT:   {stt}")
    print(f"LLM:   {llm}")
    print(f"TTS:   {tts_model} ({active_provider})")
    print(f"Voice: {voice['name']} ({voice['id']})")


if __name__ == "__main__":
    main()

You should see something like:

STT:   assemblyai/universal-streaming:en
LLM:   openai/gpt-4o-mini
TTS:   cartesia/sonic-3 (cartesia)
Voice: Friendly Reading Man (9626c31c-bec5-4cca-baa8-f8ba9e84c8bc)

Hold onto the voice ID — you’ll pass it as engine.voice_id in the next step.

This script takes the happy path: the first configured provider, the first model in each list, the first voice. In production code, handle empty results explicitly and let users pick a voice rather than auto-selecting one.

2. Create the Nexus agent

POST /agents saves an agent definition so you don’t have to send the whole config on every session. The response includes agent.id — store it; the rest of this cookbook references it.

import json
import os

import requests

API_KEY = os.environ["LYZR_API_KEY"]
BASE_URL = os.getenv("VOICE_API_BASE_URL", "https://voice-livekit.studio.lyzr.ai/v1")

HEADERS = {
    "x-api-key": API_KEY,
    "accept": "application/json",
    "Content-Type": "application/json",
}


def create_nexus_agent() -> str:
    payload = {
        "config": {
            # Identity and persona.
            "agent_name": "Nexus - IT Support",
            "agent_description": "Level 1 technical support voice agent",
            "prompt": (
                "You are Nexus, a Level 1 IT support assistant. Keep answers concise "
                "and easy to understand over an audio call. Always ask whether the "
                "user is on Windows or Mac before giving troubleshooting steps."
            ),
            "conversation_start": {
                "who": "ai",
                "greeting": (
                    "Say, \"Hi, I'm Nexus, your IT support assistant. "
                    "Are you calling about a Mac or Windows computer today?\""
                ),
            },
            # Models — fill in the IDs you picked in step 1.
            "engine": {
                "kind": "pipeline",
                "stt": "assemblyai/universal-streaming:en",
                "tts": "cartesia/sonic-3",
                "llm": "openai/gpt-4o-mini",
                "voice_id": "9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
                "language": "en",
            },
            # Optional polish — turn detection, VAD, recording, ambience.
            "turn_detection": "english",
            "vad_enabled": True,
            "noise_cancellation": {"enabled": True, "type": "auto"},
            "audio_recording_enabled": True,
            "background_audio": {
                "enabled": True,
                "ambient": {
                    "enabled": True,
                    "source": "OFFICE_AMBIENCE",
                    "volume": 0.4,
                },
                "tool_call": {
                    "enabled": True,
                    "sources": [
                        {"source": "KEYBOARD_TYPING_TRUNC", "volume": 0.6, "probability": 1}
                    ],
                },
            },
        }
    }

    response = requests.post(
        f"{BASE_URL}/agents", headers=HEADERS, json=payload, timeout=30
    )

    if response.status_code != 201:
        print(json.dumps(response.json(), indent=2))
        response.raise_for_status()

    agent_id = response.json()["agent"]["id"]
    print(f"Created agent: {agent_id}")
    return agent_id


if __name__ == "__main__":
    create_nexus_agent()

Export the printed ID so the next scripts can pick it up:

export LYZR_VOICE_AGENT_ID="agent_..."

Only agent_name, prompt, and engine are strictly required. Everything below the second comment in the payload is polish — drop it for a stripped-down agent and add it back as you need it.

3. Start and end a session

Starting a session dispatches the Python worker, creates a LiveKit room, and returns the credentials your client needs to join. Request to POST /sessions/start:

{
  "userIdentity": "user_123",
  "agentId": "your-agent-id"
}

Response:

{
  "userToken": "livekit-client-token",
  "roomName": "room-...",
  "sessionId": "uuid",
  "livekitUrl": "wss://...",
  "agentDispatched": true,
  "agentConfig": {
    "engine": {
      "kind": "pipeline",
      "stt": "assemblyai/universal-streaming:en",
      "tts": "cartesia/sonic-3",
      "llm": "openai/gpt-4o-mini",
      "voice_id": "9626c31c-bec5-4cca-baa8-f8ba9e84c8bc"
    },
    "tools": []
  }
}

You’ll feed livekitUrl and userToken into the LiveKit client in the next step. When the call is done, call POST /sessions/end with either roomName or sessionId. The Python script below handles only the lifecycle — it doesn’t publish a microphone or play audio. On its own it just opens an empty room and waits. You’ll connect a real microphone in step 4.

import os
import time

import requests

API_KEY = os.environ["LYZR_API_KEY"]
AGENT_ID = os.environ["LYZR_VOICE_AGENT_ID"]
BASE_URL = os.getenv("VOICE_API_BASE_URL", "https://voice-livekit.studio.lyzr.ai/v1")

HEADERS = {
    "x-api-key": API_KEY,
    "accept": "application/json",
    "Content-Type": "application/json",
}


def start_session() -> dict:
    payload = {
        "userIdentity": f"user_{int(time.time())}",
        "agentId": AGENT_ID,
    }

    response = requests.post(
        f"{BASE_URL}/sessions/start", headers=HEADERS, json=payload, timeout=30
    )
    response.raise_for_status()
    data = response.json()

    print(f"Session ID:  {data['sessionId']}")
    print(f"Room name:   {data['roomName']}")
    print(f"LiveKit URL: {data['livekitUrl']}")
    return data


def end_session(room_name: str) -> None:
    response = requests.post(
        f"{BASE_URL}/sessions/end",
        headers=HEADERS,
        json={"roomName": room_name},
        timeout=30,
    )
    response.raise_for_status()
    print("Session ended.")


if __name__ == "__main__":
    session = start_session()
    print("Connect a LiveKit client now. Press Enter when you're done.")
    input()
    end_session(session["roomName"])

POST /sessions/end returns immediately and marks the session as ended on the API side. The transcript becomes available once the worker flushes its buffers and observability data arrives — typically a few seconds after the LiveKit room closes.

4. Connect from the browser

The REST API has done its job; from here, all the audio plumbing belongs to the LiveKit SDK. We’ll build two files:

voice-api.ts — a thin wrapper around /v1/sessions/start and /v1/sessions/end.
NexusVoiceWidget.tsx — a React component that joins the room and renders the agent’s audio.

Install the client packages:

pnpm add livekit-client @livekit/components-react @livekit/components-styles

`voice-api.ts`

const API_BASE_URL =
  import.meta.env.VITE_LIVEKIT_BACKEND_URL ?? "https://voice-livekit.studio.lyzr.ai";

export interface SessionResponse {
  userToken: string;
  roomName: string;
  sessionId: string;
  livekitUrl: string;
  agentDispatched: boolean;
}

export async function startVoiceSession(input: {
  apiKey: string;
  userIdentity: string;
  agentId?: string;
  agentConfig?: Record<string, unknown>;
}): Promise<SessionResponse> {
  const response = await fetch(`${API_BASE_URL}/v1/sessions/start`, {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "x-api-key": input.apiKey,
    },
    body: JSON.stringify({
      userIdentity: input.userIdentity,
      agentId: input.agentId,
      agentConfig: input.agentConfig,
    }),
  });

  if (!response.ok) {
    throw new Error(`Failed to start voice session: ${response.status}`);
  }

  return response.json();
}

export async function endVoiceSession(input: {
  apiKey: string;
  roomName: string;
}): Promise<void> {
  const response = await fetch(`${API_BASE_URL}/v1/sessions/end`, {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "x-api-key": input.apiKey,
    },
    body: JSON.stringify({ roomName: input.roomName }),
  });

  if (!response.ok && response.status !== 404) {
    throw new Error(`Failed to end voice session: ${response.status}`);
  }
}

`NexusVoiceWidget.tsx`

The widget renders two audio sinks inside <LiveKitRoom>:

RoomAudioRenderer — plays the agent’s voice. This is the one you actually hear Nexus through.
BackgroundAudioRenderer — a small custom renderer that attaches the optional background_audio track (office ambience, keyboard sounds during tool calls). It’s separate because the default renderer doesn’t surface non-voice tracks.

import { useEffect, useRef, useState } from "react";
import {
  LiveKitRoom,
  RoomAudioRenderer,
  useTracks,
  useVoiceAssistant,
} from "@livekit/components-react";
import { Track } from "livekit-client";
import "@livekit/components-styles";

import { endVoiceSession, startVoiceSession, type SessionResponse } from "./voice-api";

function BackgroundAudioRenderer() {
  const audioRef = useRef<HTMLAudioElement | null>(null);
  const tracks = useTracks([Track.Source.Unknown], { onlySubscribed: true });
  const backgroundTrack = tracks.find(
    (track) => track.publication?.trackName === "background_audio",
  );

  useEffect(() => {
    const mediaTrack = backgroundTrack?.publication?.track;
    if (!mediaTrack) return;

    const audioElement = audioRef.current ?? document.createElement("audio");
    audioElement.autoplay = true;
    audioElement.setAttribute("playsinline", "true");
    audioRef.current = audioElement;

    mediaTrack.attach(audioElement);

    return () => {
      mediaTrack.detach(audioElement);
    };
  }, [backgroundTrack?.publication?.track]);

  return null;
}

function AgentStatus() {
  const { state } = useVoiceAssistant();
  return <p>Agent state: {state}</p>;
}

export function NexusVoiceWidget(props: { apiKey: string; agentId: string }) {
  const [session, setSession] = useState<SessionResponse | null>(null);

  async function startCall() {
    const data = await startVoiceSession({
      apiKey: props.apiKey,
      userIdentity: `user_${Date.now()}`,
      agentId: props.agentId,
    });
    setSession(data);
  }

  async function endCall() {
    if (session) {
      await endVoiceSession({ apiKey: props.apiKey, roomName: session.roomName });
    }
    setSession(null);
  }

  if (!session) {
    return <button onClick={startCall}>Start call</button>;
  }

  return (
    <LiveKitRoom
      serverUrl={session.livekitUrl}
      token={session.userToken}
      connect
      audio
      video={false}
      onDisconnected={() => void endCall()}
    >
      <AgentStatus />
      <button onClick={endCall}>End call</button>
      <RoomAudioRenderer />
      <BackgroundAudioRenderer />
    </LiveKitRoom>
  );
}

Render <NexusVoiceWidget apiKey={...} agentId={...} /> somewhere in your app, click Start call, grant microphone permission, and you should hear Nexus’s greeting within a second or two.

In a real app, never pass apiKey straight to the browser. Proxy /v1/sessions/start through your own backend so the key stays on the server.

5. Read transcripts and metrics

Once a call has ended and the worker has flushed, you can pull aggregate stats for the agent and a per-session transcript with latency metrics on each turn.

import os

import requests

API_KEY = os.environ["LYZR_API_KEY"]
AGENT_ID = os.environ["LYZR_VOICE_AGENT_ID"]
BASE_URL = os.getenv("VOICE_API_BASE_URL", "https://voice-livekit.studio.lyzr.ai/v1")

HEADERS = {"x-api-key": API_KEY, "accept": "application/json"}


def get_json(path: str) -> dict:
    response = requests.get(f"{BASE_URL}{path}", headers=HEADERS, timeout=30)
    response.raise_for_status()
    return response.json()


def text_from_content(content: object) -> str:
    # Messages can be a plain string or a list of parts (multimodal turns);
    # flatten both into a single line for printing.
    if isinstance(content, list):
        return " ".join(str(part) for part in content)
    return str(content or "")


def fetch_analytics() -> None:
    stats = get_json(f"/transcripts/agent/{AGENT_ID}/stats")
    print("Aggregate stats")
    print(f"Total calls:      {stats.get('totalCalls')}")
    print(f"Average messages: {stats.get('avgMessages')}")

    recent = get_json(f"/transcripts/agent/{AGENT_ID}?sort=desc&limit=5")
    items = recent.get("items", [])
    if not items:
        print("No transcripts found. Complete a voice session first.")
        return

    session_id = items[0]["sessionId"]
    transcript = get_json(f"/transcripts/{session_id}")["transcript"]

    print("\nSession overview")
    print(f"Session ID:    {transcript.get('sessionId')}")
    print(f"Room:          {transcript.get('roomName')}")
    print(f"Duration (s):  {(transcript.get('durationMs') or 0) / 1000:.2f}")
    print(f"Message count: {transcript.get('messageCount')}")

    print("\nConversation")
    for item in transcript.get("chatHistory", []):
        if item.get("type") != "message":
            continue
        role = item.get("role", "system").upper()
        print(f"[{role}] {text_from_content(item.get('content'))}")

    print("\nLatest assistant latency")
    for item in reversed(transcript.get("chatHistory", [])):
        if item.get("role") != "assistant" or "metrics" not in item:
            continue
        metrics = item["metrics"]
        print(f"LLM TTFT: {metrics.get('llm_node_ttft')}")
        print(f"TTS TTFB: {metrics.get('tts_node_ttfb')}")
        break


if __name__ == "__main__":
    fetch_analytics()

llm_node_ttft is the time from the user finishing their turn to the first LLM token; tts_node_ttfb is the time from that first token to the first byte of synthesized audio. Together they’re the dominant contributors to perceived latency.

Where to go next

Swap engine.kind from "pipeline" to "realtime" to use a single multimodal model instead of three.
Add tools to the agent payload so Nexus can look up tickets, reset passwords, or hand off to a human.
Stream transcripts in realtime by subscribing to LiveKit data tracks instead of polling /transcripts/{sessionId} afterward.

​What you’ll build

​Prerequisites

​1. Pick your models and a voice

​2. Create the Nexus agent

​3. Start and end a session

​4. Connect from the browser

​voice-api.ts

​NexusVoiceWidget.tsx

​5. Read transcripts and metrics

​Where to go next

What you’ll build

Prerequisites

1. Pick your models and a voice

2. Create the Nexus agent

3. Start and end a session

4. Connect from the browser

`voice-api.ts`

`NexusVoiceWidget.tsx`

5. Read transcripts and metrics

Where to go next