# Get started

This guide explains how to integrate **Oris Voice** (`ojin/oris-voice`) into your applications using WebSockets.

## Prerequisites

1. An Ojin account with an active API key. If you don't have one, [get your API key](/getting-started/authentication.md).
2. [Create a voice configuration](/models/oris-voice/creating-configuration.md) or use an existing one.
3. Save the **Model Config ID** from the dashboard.

{% hint style="info" %}
**Production deployments:** Connect to the WebSocket API from a backend server to keep your API key secure. For end-user delivery, stream the generated audio through your own transport layer.
{% endhint %}

## WebSocket Integration

### Quick Example (Python)

This minimal example connects to **Oris Voice**, sends text, and saves the resulting audio as a WAV file.

**Install dependencies:**

```bash
pip install websockets python-dotenv
```

**Create a `.env` file:**

```bash
OJIN_API_KEY=your-api-key-here
OJIN_CONFIG_ID=your-config-id-here
```

**Run the script:**

```python
import asyncio
import json
import struct
import time
import wave
import os

import websockets
from dotenv import load_dotenv

load_dotenv()

API_KEY = os.getenv("OJIN_API_KEY", "")
CONFIG_ID = os.getenv("OJIN_CONFIG_ID", "")
WS_URL = f"wss://models.ojin.ai/realtime?config_id={CONFIG_ID}"

# Oris Voice outputs 24 kHz, 16-bit PCM mono
SAMPLE_RATE = 24000
SAMPLE_WIDTH = 2
CHANNELS = 1

def build_text_message(text):
    """Build a binary InteractionInput for text."""
    payload = text.encode("utf-8")
    header = struct.pack("!BQI", 0, int(time.time() * 1000), 0)  # 0 = TEXT
    return header + payload

def parse_response(data):
    """Parse a binary InteractionResponse, extract audio payloads."""
    fmt = "!B16sQIII"
    hdr_size = struct.calcsize(fmt)
    is_final, _, _, _, _, num_payloads = struct.unpack(fmt, data[:hdr_size])

    offset = hdr_size
    audio_chunks = []
    for _ in range(num_payloads):
        size, ptype = struct.unpack("!IB", data[offset:offset + 5])
        offset += 5
        if ptype == 1 and size > 0:  # 1 = audio
            audio_chunks.append(data[offset:offset + size])
        offset += size

    return bool(is_final), audio_chunks

async def synthesize(text, output_path="output.wav"):
    headers = websockets.Headers()
    headers["Authorization"] = API_KEY

    async with websockets.connect(
        WS_URL,
        additional_headers=headers,
        open_timeout=None,
        ping_timeout=None,
    ) as ws:
        # 1. Wait for sessionReady
        while True:
            msg = await ws.recv()
            if isinstance(msg, str):
                parsed = json.loads(msg)
                if parsed.get("type") == "sessionReady":
                    break
                if parsed.get("type") == "errorResponse":
                    raise RuntimeError(parsed["payload"]["message"])

        # 2. Send text input (binary) + endInteraction (JSON)
        await ws.send(build_text_message(text))
        await ws.send(json.dumps({
            "type": "endInteraction",
            "payload": {"timestamp": int(time.time() * 1000)},
        }))

        # 3. Collect audio chunks
        audio_data = []
        while True:
            msg = await ws.recv()
            if isinstance(msg, str):
                parsed = json.loads(msg)
                if parsed.get("type") == "errorResponse":
                    raise RuntimeError(parsed["payload"]["message"])
                continue

            is_final, chunks = parse_response(msg)
            audio_data.extend(chunks)
            if is_final:
                break

        # 4. Write WAV file
        pcm = b"".join(audio_data)
        with wave.open(output_path, "wb") as wf:
            wf.setnchannels(CHANNELS)
            wf.setsampwidth(SAMPLE_WIDTH)
            wf.setframerate(SAMPLE_RATE)
            wf.writeframes(pcm)

        duration = len(pcm) / (SAMPLE_RATE * SAMPLE_WIDTH * CHANNELS)
        print(f"Saved {output_path} ({duration:.2f}s)")

asyncio.run(synthesize("Hello, welcome to Ojin text to speech!"))
```

### Integration Flow

```mermaid
sequenceDiagram
    participant Client
    participant Server

    Note over Client,Server: Connection
    Client->>Server: WebSocket Connect (Authorization header + config_id)
    Server->>Client: SessionReady (JSON)

    Note over Client,Server: Text-to-Speech
    Client->>Server: InteractionInput (text, binary)
    Client->>Server: EndInteraction (JSON)

    Note over Client,Server: Audio Streaming
    Server->>Client: InteractionResponse (audio chunk 1, binary)
    Server->>Client: InteractionResponse (audio chunk 2, binary)
    Server->>Client: InteractionResponse (audio chunk N, binary)
    Server->>Client: InteractionResponse (final, is_final=true)
```

### Streaming Playback

For real-time playback, process audio chunks as they arrive instead of waiting for the full response:

```python
# Inside the receive loop, play each chunk immediately:
while True:
    msg = await ws.recv()
    if isinstance(msg, str):
        # Handle JSON messages (errors, etc.)
        continue

    is_final, chunks = parse_response(msg)
    for chunk in chunks:
        play_audio(chunk)  # Feed to your audio output (e.g., pyaudio, sounddevice)

    if is_final:
        break
```

## Feeding TTS Output to a Persona Model

A common pattern is to use **Oris Voice** to generate speech, then pipe that audio into `ojin/oris-portrait` for lip-synced video. In this setup:

1. Send text to Oris Voice and receive streaming audio chunks
2. Forward each audio chunk to the persona model as an `InteractionInput` (audio, payload type `1`)
3. Buffer the audio locally for playback
4. Start audio playback when the persona model returns the first speech frame

## Next Steps

### API Reference

Dive deeper into the binary protocol and message formats.

[API Reference →](/models/oris-voice/api.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.ojin.ai/models/oris-voice/integrations.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.