> ## Documentation Index
> Fetch the complete documentation index at: https://docs.lyzr.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Web & Voice

> Run GitAgent's full browser interface with real-time voice, camera input, and a text-only fallback.

A full-featured browser interface served at `http://localhost:3333` — chat, skills, integrations, and voice all in one place.

## Starting the Server

```bash theme={null}
# No auth — open to anyone on the network
gitagent --voice --dir ~/assistant
# opens http://localhost:3333

# With password protection
GITAGENT_PASSWORD=mysecret gitagent --voice --dir ~/assistant

# Custom username (defaults to "admin")
GITAGENT_USERNAME=alice GITAGENT_PASSWORD=mysecret gitagent --voice --dir ~/assistant
```

**Auth behaviour**

* Port is always 3333 — no env var to change it
* All HTTP routes show a login page when GITAGENT\_PASSWORD is set
* WebSocket connections are rejected without a valid auth cookie
* /health always stays open (for load balancers)
* Cookie: HttpOnly, SameSite=Strict, 24-hour expiry, SHA-256 token

## Interface Tabs

| Tab           | Description                                                               |
| ------------- | ------------------------------------------------------------------------- |
| Chat          | Real-time conversation, voice controls, camera, file system viewer        |
| Skills        | Browse and install skills from the marketplace                            |
| Integrations  | Connect Composio services (Gmail, Calendar, Slack, GitHub)                |
| Communication | Telegram bot setup, WhatsApp connection, phone/SMS webhook                |
| SkillFlows    | Visual workflow builder — chain skills into multi-step flows              |
| Scheduler     | Create cron jobs — run prompts on a schedule                              |
| Settings      | Model selection, API keys, custom base URL — saves to .env and agent.yaml |

## Voice Mode

### OpenAI Realtime (default)

* Model: `gpt-realtime-2025-08-28`
* Real-time audio streaming over WebSocket
* Supports image input (camera frames)
* Requires: `OPENAI_API_KEY`

```bash theme={null}
OPENAI_API_KEY=your_key gitagent --voice --dir ~/assistant
```

### Gemini Live (free tier)

* Model: `models/gemini-2.5-flash-native-audio-preview`
* Alternative voice provider
* Free tier available
* Requires: `GEMINI_API_KEY`

```bash theme={null}
GEMINI_API_KEY=your_key gitagent --voice gemini --dir ~/assistant
```

## Camera Input

* Front/back camera toggle (mobile)
* Captures frames every 2 seconds as JPEG
* Frames injected into conversation as images
* Auto-captures on "memorable moments" (laughter, excitement)

## Text-Only Fallback

<Note>
  No voice API key? GitAgent still starts the web UI server but with voice disabled. Text input routes directly to the agent via `query()`. Web UI runs at `http://localhost:3333`.
</Note>

<CardGroup cols={2}>
  <Card title="Personal Assistant Quick Start" icon="rocket" href="/open-source/gitagent/personal-assistant/quickstart">
    Install GitAgent and run your first session
  </Card>

  <Card title="CLI Reference" icon="terminal" href="/open-source/gitagent/personal-assistant/cli">
    All flags, REPL commands, and the plugin CLI
  </Card>

  <Card title="Messaging" icon="plug" href="/open-source/gitagent/personal-assistant/messaging">
    Connect Telegram, WhatsApp, and phone
  </Card>

  <Card title="Integrations" icon="diagram-project" href="/open-source/gitagent/data-integrations/integrations">
    Connect Composio services like Gmail, Slack, and GitHub
  </Card>
</CardGroup>
