# API Reference

## Overview

Real-time text-to-speech synthesis API. Send text, receive streaming PCM audio chunks.

After connecting and receiving `SessionReady`, send your text as a binary `InteractionInput` message followed by a JSON `EndInteraction`. The server synthesizes speech and streams back audio chunks as binary `InteractionResponse` messages. The final chunk has `is_final` set to `true`.

{% hint style="info" %}
**Production deployments:** Connect to the real-time WebSocket API from a backend server to keep your API key secure.
{% endhint %}

***

## How It Works

1. **Connect** to the WebSocket endpoint with your API key and config ID
2. **Receive `SessionReady`** — the server has allocated inference resources for your session
3. **Send text** as a binary `InteractionInput` message (payload type `0` for text)
4. **Send `EndInteraction`** (JSON) to signal that input is complete and synthesis should begin
5. **Receive audio chunks** — binary `InteractionResponse` messages containing PCM int16 audio at 24 kHz
6. **Detect completion** — the last response has `is_final: true`

### Audio Output Format

| Property        | Value                                              |
| --------------- | -------------------------------------------------- |
| Format          | PCM signed 16-bit integers (little-endian samples) |
| Sample rate     | 24,000 Hz                                          |
| Channels        | 1 (mono)                                           |
| Bits per sample | 16                                                 |

***

## Connection Flow

```mermaid
sequenceDiagram
    participant Client
    participant Server

    Note over Client,Server: Connection
    Client->>Server: WebSocket Connect
    Server->>Client: SessionReady (JSON)

    Note over Client,Server: Send Text
    Client->>Server: InteractionInput (text payload, binary)
    Client->>Server: EndInteraction (JSON)

    Note over Client,Server: Receive Audio Stream
    Server->>Client: InteractionResponse (audio chunk, binary)
    Server->>Client: InteractionResponse (audio chunk, binary)
    Server->>Client: InteractionResponse (audio chunk, binary)
    Server->>Client: InteractionResponse (final chunk, is_final=true)
```

***

## WebSocket Handshake

## Open WebSocket connection

> Connect to the WebSocket endpoint providing an API key in the \`Authorization\` header and a \`config\_id\` query parameter. The server upgrades the connection to WebSocket. After sending \`SessionReady\`, the server waits for text input.\
> \
> \*\*Recommended WebSocket settings:\*\*\
> \- \`open\_timeout\`: None (model loading may take time on cold start)\
> \- \`ping\_timeout\`: None

```json
{"openapi":"3.0.3","info":{"title":"Ojin Oris Voice Realtime API","version":"1.0.0"},"servers":[{"url":"wss://models.ojin.ai/realtime","description":"Production WebSocket endpoint"}],"security":[{"ApiKeyAuth":[]}],"components":{"securitySchemes":{"ApiKeyAuth":{"type":"apiKey","in":"header","name":"Authorization","description":"Raw API key (no `Bearer` prefix)."}},"schemas":{"SessionReadyMessage":{"type":"object","description":"Sent once by the server after the WebSocket connection is established and inference resources are allocated. The server then waits for text input.\n\n**Format:** JSON text frame.","required":["type","payload"],"properties":{"type":{"type":"string","enum":["sessionReady"]},"payload":{"type":"object","required":["trace_id","status","load"],"properties":{"trace_id":{"type":"string","format":"uuid","description":"Unique session identifier assigned by the server."},"status":{"type":"string","enum":["success"],"description":"Always `success`."},"load":{"type":"number","format":"float","minimum":0,"maximum":1,"description":"Current load of the inference server (0.0–1.0)."},"timestamp":{"type":"integer","format":"int64","description":"Server timestamp in milliseconds since Unix epoch."},"parameters":{"type":"object","additionalProperties":true,"nullable":true,"description":"Model-specific session parameters for Oris Voice, including `sample_rate`, `channels`, and `bits_per_sample`.","properties":{"sample_rate":{"type":"integer","description":"Audio sample rate in Hz."},"channels":{"type":"integer","description":"Number of audio channels."},"bits_per_sample":{"type":"integer","description":"Bits per audio sample."}}}}}}},"ErrorResponseMessage":{"type":"object","description":"Sent by the server when an error occurs.\n\n**Format:** JSON text frame.\n\n> **Note:** In some error conditions, the server may send a plain text message instead of structured JSON. Your client should handle non-JSON text messages gracefully.","required":["type","payload"],"properties":{"type":{"type":"string","enum":["errorResponse"]},"payload":{"type":"object","required":["code","message","timestamp"],"properties":{"code":{"type":"string","description":"Machine-readable error code.","enum":["AUTH_FAILED","UNAUTHORIZED","MISSING_CONFIG_ID","INVALID_MESSAGE","INVALID_HEADERS","MODEL_NOT_FOUND","BACKEND_UNAVAILABLE","RATE_LIMITED","TIMEOUT","CANCELLED","INTERNAL_ERROR","FRAME_SIZE_EXCEEDED"]},"message":{"type":"string","description":"Human-readable description of the error."},"interaction_id":{"type":"string","nullable":true,"description":"The interaction ID related to the error, if applicable."},"details":{"type":"object","additionalProperties":true,"nullable":true,"description":"Optional additional structured details about the error."},"timestamp":{"type":"integer","format":"int64","description":"Milliseconds since Unix epoch when the error was sent."}}}}}}},"paths":{"/":{"get":{"summary":"Open WebSocket connection","description":"Connect to the WebSocket endpoint providing an API key in the `Authorization` header and a `config_id` query parameter. The server upgrades the connection to WebSocket. After sending `SessionReady`, the server waits for text input.\n\n**Recommended WebSocket settings:**\n- `open_timeout`: None (model loading may take time on cold start)\n- `ping_timeout`: None","operationId":"wsHandshake","parameters":[{"in":"query","name":"config_id","required":true,"schema":{"type":"string"},"description":"Configuration ID for the TTS voice, created via API or in the Oris Voice tab of the dashboard."},{"in":"header","name":"Authorization","required":true,"schema":{"type":"string"},"description":"Your raw API key. No `Bearer` prefix."}],"responses":{"101":{"description":"WebSocket upgrade successful. After the upgrade, the server sends a `SessionReady` JSON message and waits for text input.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/SessionReadyMessage"}}}},"401":{"description":"Unauthorized — invalid or missing API key.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponseMessage"}}}}}}}}}
```

***

## Message Format

{% hint style="info" %}
**Mixed message types:** The server sends both JSON (text) and binary messages on the same WebSocket connection. Your client must check the WebSocket frame type to distinguish them:

* **Text frames (JSON):** `SessionReady`, `EndInteraction`, `CancelInteraction`, `ErrorResponse`
* **Binary frames:** `InteractionInput`, `InteractionResponse`
  {% endhint %}

{% hint style="info" %}
**Byte order:** All multi-byte integer fields in binary messages use **network byte order (big-endian)**.
{% endhint %}

***

## Messages Reference

### Server -> Client Messages

## The SessionReadyMessage object

```json
{"openapi":"3.0.3","info":{"title":"Ojin Oris Voice Realtime API","version":"1.0.0"},"components":{"schemas":{"SessionReadyMessage":{"type":"object","description":"Sent once by the server after the WebSocket connection is established and inference resources are allocated. The server then waits for text input.\n\n**Format:** JSON text frame.","required":["type","payload"],"properties":{"type":{"type":"string","enum":["sessionReady"]},"payload":{"type":"object","required":["trace_id","status","load"],"properties":{"trace_id":{"type":"string","format":"uuid","description":"Unique session identifier assigned by the server."},"status":{"type":"string","enum":["success"],"description":"Always `success`."},"load":{"type":"number","format":"float","minimum":0,"maximum":1,"description":"Current load of the inference server (0.0–1.0)."},"timestamp":{"type":"integer","format":"int64","description":"Server timestamp in milliseconds since Unix epoch."},"parameters":{"type":"object","additionalProperties":true,"nullable":true,"description":"Model-specific session parameters for Oris Voice, including `sample_rate`, `channels`, and `bits_per_sample`.","properties":{"sample_rate":{"type":"integer","description":"Audio sample rate in Hz."},"channels":{"type":"integer","description":"Number of audio channels."},"bits_per_sample":{"type":"integer","description":"Bits per audio sample."}}}}}}}}}}
```

## The InteractionResponseMessage object

````json
{"openapi":"3.0.3","info":{"title":"Ojin Oris Voice Realtime API","version":"1.0.0"},"components":{"schemas":{"InteractionResponseMessage":{"type":"object","description":"Binary message containing a streaming audio chunk. The server sends these after receiving text input and `EndInteraction`.\n\n**Format:** Binary frame.\n\n**Binary structure (big-endian):**\n```\n[1 byte  ]  Is final flag   — uint8, 1 = last chunk, 0 = more coming\n[16 bytes]  Interaction ID  — UUID bytes\n[8 bytes ]  Timestamp       — uint64, milliseconds since Unix epoch\n[4 bytes ]  Usage           — uint32, usage metric (audio duration in microseconds)\n[4 bytes ]  Index           — uint32, chunk index\n[4 bytes ]  Num payloads    — uint32, number of payload entries\n\nFor each payload entry:\n  [4 bytes]  Data size       — uint32, byte length of payload data\n  [1 byte ]  Payload type    — uint8, 1 = audio\n  [N bytes]  Payload data    — raw PCM int16 audio bytes\n```\n\nPython unpack: `struct.unpack('!B16sQIII', header)` for the main header, `struct.unpack('!IB', entry)` for each payload entry.","required":["is_final","interaction_id","timestamp","usage","index","payloads"],"properties":{"is_final":{"type":"boolean","description":"`true` if this is the last audio chunk for the current interaction."},"interaction_id":{"type":"string","format":"uuid","description":"UUID identifying this interaction."},"timestamp":{"type":"integer","format":"int64","description":"Milliseconds since Unix epoch when the chunk was sent."},"usage":{"type":"integer","format":"int32","description":"Audio duration in microseconds for this chunk. Used for billing."},"index":{"type":"integer","format":"int32","description":"Chunk index (0-based, incrementing)."},"payloads":{"type":"array","description":"List of payload entries. Each chunk typically contains one audio entry.","items":{"type":"object","required":["payload_type","data_size","data"],"properties":{"payload_type":{"type":"integer","enum":[1],"description":"`1` = audio (PCM int16, 24 kHz, mono)."},"data_size":{"type":"integer","format":"int32","description":"Byte length of the audio data."},"data":{"type":"string","format":"binary","description":"Raw PCM int16 audio bytes at 24 kHz mono."}}}}}}}}}
````

## The ErrorResponseMessage object

```json
{"openapi":"3.0.3","info":{"title":"Ojin Oris Voice Realtime API","version":"1.0.0"},"components":{"schemas":{"ErrorResponseMessage":{"type":"object","description":"Sent by the server when an error occurs.\n\n**Format:** JSON text frame.\n\n> **Note:** In some error conditions, the server may send a plain text message instead of structured JSON. Your client should handle non-JSON text messages gracefully.","required":["type","payload"],"properties":{"type":{"type":"string","enum":["errorResponse"]},"payload":{"type":"object","required":["code","message","timestamp"],"properties":{"code":{"type":"string","description":"Machine-readable error code.","enum":["AUTH_FAILED","UNAUTHORIZED","MISSING_CONFIG_ID","INVALID_MESSAGE","INVALID_HEADERS","MODEL_NOT_FOUND","BACKEND_UNAVAILABLE","RATE_LIMITED","TIMEOUT","CANCELLED","INTERNAL_ERROR","FRAME_SIZE_EXCEEDED"]},"message":{"type":"string","description":"Human-readable description of the error."},"interaction_id":{"type":"string","nullable":true,"description":"The interaction ID related to the error, if applicable."},"details":{"type":"object","additionalProperties":true,"nullable":true,"description":"Optional additional structured details about the error."},"timestamp":{"type":"integer","format":"int64","description":"Milliseconds since Unix epoch when the error was sent."}}}}}}}}
```

### Client -> Server Messages

## The InteractionInputMessage object

````json
{"openapi":"3.0.3","info":{"title":"Ojin Oris Voice Realtime API","version":"1.0.0"},"components":{"schemas":{"InteractionInputMessage":{"type":"object","description":"Binary message for sending text to the server for synthesis.\n\n**Format:** Binary frame.\n\n**Binary structure (big-endian):**\n```\n[1 byte ]  Payload type   — uint8, 0 for text\n[8 bytes]  Timestamp      — uint64, milliseconds since Unix epoch\n[4 bytes]  Params size    — uint32, byte length of JSON params (0 if none)\n[N bytes]  Params JSON    — UTF-8 JSON (only present if params size > 0)\n[M bytes]  Text payload   — UTF-8 encoded text to synthesize\n```\n\nPython pack: `struct.pack('!BQI', 0, timestamp, params_size)`","required":["payload_type","timestamp","params_size","text_payload"],"properties":{"payload_type":{"type":"integer","enum":[0],"description":"Always `0` for text."},"timestamp":{"type":"integer","format":"int64","description":"Milliseconds since Unix epoch when the message was sent."},"params_size":{"type":"integer","format":"int32","minimum":0,"description":"Byte length of the JSON params block. `0` if no params."},"text_payload":{"type":"string","description":"UTF-8 encoded text to synthesize. The model handles sentence segmentation internally."}}}}}}
````

## The EndInteractionMessage object

```json
{"openapi":"3.0.3","info":{"title":"Ojin Oris Voice Realtime API","version":"1.0.0"},"components":{"schemas":{"EndInteractionMessage":{"type":"object","description":"Signal that all text has been sent and synthesis should begin. The server processes the queued text and streams audio chunks, with the last chunk marked `is_final: true`.\n\n**Format:** JSON text frame.","required":["type","payload"],"properties":{"type":{"type":"string","enum":["endInteraction"]},"payload":{"type":"object","required":["timestamp"],"properties":{"timestamp":{"type":"integer","format":"int64","description":"Milliseconds since Unix epoch when the message was sent."}}}}}}}}
```

## The CancelInteractionMessage object

```json
{"openapi":"3.0.3","info":{"title":"Ojin Oris Voice Realtime API","version":"1.0.0"},"components":{"schemas":{"CancelInteractionMessage":{"type":"object","description":"Immediately stop synthesis and discard remaining audio. No final chunk is sent.\n\n**Format:** JSON text frame.","required":["type","payload"],"properties":{"type":{"type":"string","enum":["cancelInteraction"]},"payload":{"type":"object","properties":{"timestamp":{"type":"integer","format":"int64","nullable":true,"description":"Optional. Milliseconds since Unix epoch when the message was sent."}}}}}}}}
```

***

## Message Details

### InteractionInput (Client -> Server, Binary)

Binary message for sending text to the server.

**Binary structure:**

```
[1 byte ]  Payload type      — uint8, 0 for text
[8 bytes]  Timestamp          — uint64, milliseconds since Unix epoch
[4 bytes]  Params size        — uint32, byte length of the JSON params block (0 if no params)
[N bytes]  Params JSON        — UTF-8 encoded JSON (only present if params size > 0)
[M bytes]  Text payload       — UTF-8 encoded text to synthesize
```

**Header fields** use **big-endian** byte order. In Python: `struct.pack('!BQI', payload_type, timestamp, params_size)`.

**Text requirements:**

| Property         | Value                                           |
| ---------------- | ----------------------------------------------- |
| Encoding         | UTF-8                                           |
| Payload type     | `0` (text)                                      |
| Max message size | 512 KB (entire binary message including header) |

**Example:**

```python
import struct, time

def build_text_message(text, params=None):
    """Build a binary InteractionInput message for text."""
    payload = text.encode("utf-8")
    params_bytes = b""
    if params:
        import json
        params_bytes = json.dumps(params).encode("utf-8")
    header = struct.pack('!BQI', 0, int(time.time() * 1000), len(params_bytes))
    return header + params_bytes + payload
```

***

### InteractionResponse (Server -> Client, Binary)

Binary message containing an audio chunk. The server streams these after receiving text input and `EndInteraction`.

**Binary structure:**

```
[1 byte  ]  Is final flag     — uint8, 1 = last chunk for this interaction, 0 = more coming
[16 bytes]  Interaction ID     — UUID bytes (big-endian)
[8 bytes ]  Timestamp          — uint64, milliseconds since Unix epoch
[4 bytes ]  Usage              — uint32, usage metric for this response
[4 bytes ]  Index              — uint32, chunk index
[4 bytes ]  Num payloads       — uint32, number of payload entries that follow

For each payload entry:
  [4 bytes]  Data size          — uint32, byte length of the payload data
  [1 byte ]  Payload type       — uint8, 1 = audio
  [N bytes]  Payload data       — raw PCM int16 audio bytes
```

All multi-byte integers are **big-endian**. In Python: `struct.unpack('!B16sQIII', header_bytes)` for the main header, `struct.unpack('!IB', entry_bytes)` for each payload entry.

**Payload types:**

| Type      | Format                 | Description                           |
| --------- | ---------------------- | ------------------------------------- |
| 1 (audio) | PCM int16, 24 kHz mono | Streaming audio chunk (variable size) |

**Parsing example:**

```python
import struct, uuid

HEADER_FMT = '!B16sQIII'
HEADER_SIZE = struct.calcsize(HEADER_FMT)   # 37 bytes
ENTRY_FMT = '!IB'
ENTRY_SIZE = struct.calcsize(ENTRY_FMT)     # 5 bytes

def parse_response(data):
    is_final, uuid_bytes, timestamp, usage, index, num_payloads = \
        struct.unpack(HEADER_FMT, data[:HEADER_SIZE])

    offset = HEADER_SIZE
    audio_chunks = []

    for _ in range(num_payloads):
        size, ptype = struct.unpack(ENTRY_FMT, data[offset:offset + ENTRY_SIZE])
        offset += ENTRY_SIZE
        if ptype == 1 and size > 0:  # audio
            audio_chunks.append(data[offset:offset + size])
        offset += size

    return {
        'is_final': bool(is_final),
        'interaction_id': str(uuid.UUID(bytes=uuid_bytes)),
        'index': index,
        'audio_chunks': audio_chunks,
    }
```

***

### EndInteraction vs CancelInteraction

| Message             | Purpose         | Server behavior                                                               | Use case                             |
| ------------------- | --------------- | ----------------------------------------------------------------------------- | ------------------------------------ |
| `EndInteraction`    | Graceful finish | Completes synthesis, sends remaining chunks with last marked `is_final: true` | Normal completion after sending text |
| `CancelInteraction` | Immediate stop  | Stops synthesis, discards remaining audio                                     | User interruption or abort           |

***

### ErrorResponse (Server -> Client, JSON)

{% hint style="warning" %}
**Plain text errors:** In some error conditions (e.g., no backend servers available), the server may send a plain text message instead of a structured JSON `ErrorResponse`. Your client should handle non-JSON text messages gracefully.
{% endhint %}

**Error codes:**

| Code                  | Description                              |
| --------------------- | ---------------------------------------- |
| `AUTH_FAILED`         | Invalid API key                          |
| `UNAUTHORIZED`        | Caller lacks permission                  |
| `MISSING_CONFIG_ID`   | `config_id` query parameter not provided |
| `INVALID_MESSAGE`     | Malformed or unsupported message payload |
| `INVALID_HEADERS`     | Missing or invalid headers               |
| `MODEL_NOT_FOUND`     | Config ID not found or invalid           |
| `BACKEND_UNAVAILABLE` | No healthy inference backend available   |
| `RATE_LIMITED`        | Too many requests                        |
| `TIMEOUT`             | Operation exceeded processing time       |
| `CANCELLED`           | Interaction cancelled by client          |
| `INTERNAL_ERROR`      | Unexpected server error                  |
| `FRAME_SIZE_EXCEEDED` | Message exceeded 512 KB limit            |

***

## Rate Limits & Constraints

| Constraint            | Value                                              |
| --------------------- | -------------------------------------------------- |
| Max message size      | 512 KB per message                                 |
| Max generation length | \~30 seconds per interaction (360 tokens at 12 Hz) |

Exceeding limits results in an `ErrorResponse` with the appropriate code.

***

## Best Practices

### Text Input

* Send the full text in a single `InteractionInput` message, then immediately send `EndInteraction`
* The model handles sentence segmentation and streaming internally
* For very long texts, consider splitting into sentences and making separate requests

### Streaming Playback

* Process audio chunks as they arrive for lowest perceived latency
* Buffer a small amount (2--3 chunks) before starting playback to absorb network jitter
* The server generates audio faster than realtime, so chunks will arrive ahead of playback

### Error Handling

* Handle both JSON `ErrorResponse` messages and plain text error strings
* Implement exponential backoff for reconnection
* Check the `SessionReady` message before sending any input

***

## Complete Example

```python
import asyncio
import json
import struct
import time
import wave
import os

import websockets
from dotenv import load_dotenv

load_dotenv()

API_KEY = os.getenv("OJIN_API_KEY", "")
CONFIG_ID = os.getenv("OJIN_CONFIG_ID", "")
WS_URL = f"wss://models.ojin.ai/realtime?config_id={CONFIG_ID}"

SAMPLE_RATE = 24000

def build_text_message(text):
    """Build a binary InteractionInput for text payload."""
    header = struct.pack('!BQI', 0, int(time.time() * 1000), 0)
    return header + text.encode('utf-8')

def parse_response(data):
    """Parse a binary InteractionResponse."""
    fmt = '!B16sQIII'
    hdr_size = struct.calcsize(fmt)
    is_final, _, _, _, _, num_payloads = struct.unpack(fmt, data[:hdr_size])

    offset = hdr_size
    audio = []
    for _ in range(num_payloads):
        size, ptype = struct.unpack('!IB', data[offset:offset + 5])
        offset += 5
        if ptype == 1 and size > 0:
            audio.append(data[offset:offset + size])
        offset += size

    return bool(is_final), audio

async def main():
    headers = websockets.Headers()
    headers["Authorization"] = API_KEY

    async with websockets.connect(
        WS_URL,
        additional_headers=headers,
        open_timeout=None,
        ping_timeout=None,
    ) as ws:
        # 1. Wait for session ready
        while True:
            msg = await ws.recv()
            if isinstance(msg, str):
                parsed = json.loads(msg)
                if parsed.get("type") == "sessionReady":
                    print(f"Session ready (trace_id: {parsed['payload'].get('trace_id')})")
                    break
                if parsed.get("type") == "errorResponse":
                    raise RuntimeError(parsed["payload"]["message"])

        # 2. Send text + end interaction
        text = "Hello! This is a demonstration of Ojin Oris Voice text to speech."
        await ws.send(build_text_message(text))
        await ws.send(json.dumps({
            "type": "endInteraction",
            "payload": {"timestamp": int(time.time() * 1000)},
        }))
        print(f"Sent text ({len(text)} chars), waiting for audio...")

        # 3. Receive audio chunks
        audio_data = []
        chunk_count = 0
        t0 = time.monotonic()

        while True:
            msg = await ws.recv()
            if isinstance(msg, str):
                parsed = json.loads(msg)
                if parsed.get("type") == "errorResponse":
                    raise RuntimeError(parsed["payload"]["message"])
                continue

            is_final, chunks = parse_response(msg)
            audio_data.extend(chunks)
            chunk_count += len(chunks)

            if is_final:
                break

        elapsed = time.monotonic() - t0

        # 4. Write WAV file
        pcm = b"".join(audio_data)
        duration = len(pcm) / (SAMPLE_RATE * 2)
        output_path = f"tts-output-{int(time.time())}.wav"

        with wave.open(output_path, "wb") as wf:
            wf.setnchannels(1)
            wf.setsampwidth(2)
            wf.setframerate(SAMPLE_RATE)
            wf.writeframes(pcm)

        print(f"Saved {output_path}")
        print(f"  Chunks: {chunk_count}")
        print(f"  Duration: {duration:.2f}s")
        print(f"  Elapsed: {elapsed:.2f}s")
        if duration > 0:
            print(f"  RTF: {elapsed / duration:.2f}x")

asyncio.run(main())
```

***

## Troubleshooting

### Connection Issues

* Verify API key and config ID
* Check that the config exists in the dashboard
* Ensure network allows WebSocket connections (port 443)
* The `Authorization` header uses the raw API key (no `Bearer` prefix)

### No Audio Received

* Confirm you received `SessionReady` before sending text
* Make sure you send `EndInteraction` after the text input — synthesis does not start until the server receives it
* Check message size is under 512 KB

### Audio Quality Issues

* Verify the output is written as 24 kHz, 16-bit mono WAV
* Check the `language` parameter matches your input text, or use `"Auto"`
* For voice cloning, ensure the reference audio is clean and at least 5 seconds long

### Unexpected Silence or Truncation

* Check `max_new_tokens` — the default (360) caps output at \~30 seconds
* If the text is very long, consider splitting into smaller segments

***


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.ojin.ai/models/oris-voice/api.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.