# Welcome

> **The Real-Time GenAI Platform.** Instant AI experiences that feel life-like, natural, and human.

***

## Get started in 2 minutes

The fastest way to deploy a conversational AI agent with a lifelike avatar:

**1. Get your API key** → [Create one here](/getting-started/authentication)

**2. Create an agent** in the [dashboard](https://ojin.ai/dashboard) — pick a face, voice, and personality

**3. Embed on your site:**

```html
<script src="https://widget.ojin.ai/ojin-agent.js"></script>
<ojin-agent agent-id="your-agent-id"></ojin-agent>
```

That's it. Your users get a live conversational agent with video, audio, and a real-time avatar. [Full Human Agent docs →](/apps/overview)

***

## What Ojin offers

### Apps — deploy in minutes

#### Human Agent

A complete conversational AI agent with a realistic visual avatar. Speech in, speech out — with synchronized lip movements and expressions. Two modes:

* **Ojin Agent** — Ojin handles everything (STT, LLM, TTS, avatar). You configure the personality.
* **Third-Party Agent** — bring your own speech-to-speech provider (Hume, ElevenLabs, Ultravox, etc...). Ojin adds the face.

[Get started →](/apps/overview)

### Models — build custom pipelines

#### ojin/oris-portrait

Oris Portrait — real-time lipsync model. Transforms a single reference image into a natural animated persona with audio-synchronized lip movements and expressions. Sub-200ms latency, up to 720p.

[Learn more →](/models/oris-portrait)

#### ojin/oris-voice

Multilingual text-to-speech with voice cloning, preset speakers, and promptable voice design. 24 kHz streaming output, 10 languages.

[Learn more →](/models/oris-voice)

***

## Core features

* **Human Agent** — end-to-end conversational AI with a lifelike visual avatar. One widget embed, no pipeline assembly.
* **Real-time streaming** — ultra-low-latency WebSocket and WebRTC transport for production-grade experiences.
* **One-shot personas** — create a lifelike persona from a single reference image. No training, ready immediately.
* **API-first** — integrate via WebSocket, REST API, or drop-in widget. Works with Pipecat, LiveKit Agents, any stack.
* **Auto-scale** — serverless orchestration scales to zero when idle, scales up under load.
* **Cost-effective** — competitive per-minute pricing with no commitments. $10 free credits to start.

***

## Use cases

* **Customer Support** — deploy lifelike agents for personalized 24/7 support
* **Sales** — greet, qualify, and convert leads with conversational avatars
* **Education** — build interactive tutors with natural speech and expressions
* **Onboarding & Training** — conversational AI for employee learning
* **Brand Ambassador** — always-on, always-on-brand digital representative
* **Healthcare** — empathetic virtual health assistants

***

## LLM-Ready Docs

{% hint style="info" %}
This documentation is optimized for LLM access:

* **MCP Server:** [`docs.ojin.ai/~gitbook/mcp`](https://docs.ojin.ai/~gitbook/mcp)
* [**llms.txt**](https://docs.ojin.ai/llms.txt) — structured index
* [**llms-full.txt**](https://docs.ojin.ai/llms-full.txt) — full content
* Append `.md` to any page URL for raw markdown

**Note:** `llms.txt` and `llms-full.txt` are auto-generated from the published sitemap. Human Agent pages will appear once they are published in the navigation.
{% endhint %}


# Quickstart

> Get up and running with Ojin in minutes.

This guide walks you through both paths — using a real-time persona model (e.g. `ojin/oris-portrait`) in your own pipeline, or deploying a full conversational agent with the human-agent app.

## Step 1: Create an API Key

[Get your API key from the Ojin dashboard](/getting-started/authentication). This will allow you to use Ojin in your applications through a secure environment.

{% hint style="warning" %}
Never hardcode your API key directly in your application code or commit it to version control.
{% endhint %}

## Step 2: Choose Your Path

Ojin offers two ways to build with real-time AI:

### Models — Build Your Own Pipeline

Use Ojin's real-time models (`ojin/oris-portrait`, `ojin/oris-voice`) for inference in your own conversational pipeline. You control the full stack — STT, LLM, TTS — and use Ojin for the visual avatar layer.

1. [Create a Persona](/models/oris-portrait/creating-persona) or use a [Persona Template](/models/oris-portrait/using-persona-template)
2. [Integrate with your application](/models/oris-portrait/integrations) using Pipecat or WebSocket

### Apps — Deploy a Full Agent

Deploy an end-to-end conversational AI agent with a visual avatar. Ojin handles the entire pipeline — or bring your own speech-to-speech provider. No pipeline assembly required.

1. [Create & configure your agent](/apps/overview/configure) in the dashboard (using your API key from Step 1)
2. [Embed the widget](/apps/overview/widget-integration) on your site — one HTML tag, done

## Troubleshooting

[Check for common troubleshooting questions](https://github.com/journee-live/ojin/blob/main/docs/public/troubleshooting.md).

## Next Steps

#### Models API Reference

Dive deeper into the real-time model API

[View API Reference →](/models/oris-portrait/api)

#### Human Agent

Learn more about the full agent product

[View Human Agent →](/apps/overview)


# Get your API key

All requests to the Ojin API require authentication using API keys. This guide explains how to create, manage, and securely use API keys in your applications.

## Creating an API Key

1. **Sign in** to your [Ojin Dashboard](https://ojin.ai)
2. Navigate to the **API Keys** section
3. Click **Create API Key**
4. Enter a descriptive name for your key (e.g., "Development", "Production")
5. Click **Create**
6. **Important**: Copy and store your API key securely. It will only be shown once.

{% hint style="warning" %}
API keys provide full access to your Ojin resources. Never expose them in client-side code, public repositories, or share them with unauthorized individuals.
{% endhint %}

## API Key Best Practices

* **Separate keys** for development and production environments
* **Use environment variables** to store API keys without exposing them publicly
* **Restrict permissions** to only what's needed for each key
* **Rotate keys** periodically for enhanced security
* **Revoke compromised keys** immediately in your dashboard
* **Use secret management services** in production environments
* **Monitor usage** to detect unusual patterns that might indicate a leak


# REST API

Use the REST API to manage persistent Ojin resources such as model configurations and assets.

The raw OpenAPI specification for this API is also available at [openapi.gitbook.com/o/V9IIQ3Cw10PlDcbzN32h/spec/ojin-rest-api.json](https://openapi.gitbook.com/o/V9IIQ3Cw10PlDcbzN32h/spec/ojin-rest-api.json).

{% hint style="info" %}
This page covers the HTTP REST API. It is separate from Ojin's realtime model APIs, which use WebSockets for streaming media. For example, for `ojin/oris-portrait`, see [Realtime API Reference](/models/oris-portrait/api).
{% endhint %}

## Authentication

Authenticated REST endpoints require an API key in the `X-API-Key` header.

```bash
curl https://api.ojin.ai/v1/model-configs \
  -H "X-API-Key: $OJIN_API_KEY"
```

For API key setup guidance, see [Get your API key](/getting-started/authentication).

## Model Configurations

Use model configurations to create and manage reusable settings for Ojin models.

`model_configurations` is variant-specific. To determine the correct object shape, first fetch the target model variant and read its `configuration_schema`, then build your `model_configurations` payload to match that JSON Schema.

## List Model Configurations

> Retrieves a list of Model Configurations. By default returns only organization-owned configurations. Use source='template' to retrieve template configurations instead. Supports pagination and filtering by model variant.

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"tags":[{"name":"Model Configurations","description":"Operations related to user-defined configurations of model variants for API clients and product integrations."}],"servers":[{"url":"https://api.ojin.ai/v1","description":"Main backend server, version 1."}],"security":[{"APIKeyAuth":[]}],"components":{"securitySchemes":{"APIKeyAuth":{"type":"apiKey","in":"header","name":"X-API-Key","description":"API key for authenticated access to the Ojin REST API."}},"parameters":{"ModelVariantIdQueryParameter":{"name":"model_variant_id","in":"query","required":false,"description":"Filter model configurations by the model_variant_id.","schema":{"type":"string"}},"SourceQueryParameter":{"name":"source","in":"query","required":false,"description":"Filter items by ownership source. Use 'org' for items owned by the user's organization, 'template' for items from the template organization. Defaults to 'org'.","schema":{"type":"string","enum":["org","template"],"default":"org"}},"LimitQueryParameter":{"name":"limit","in":"query","required":false,"description":"Number of items to return per page.","schema":{"type":"integer","default":20,"minimum":1,"maximum":100}},"OffsetQueryParameter":{"name":"offset","in":"query","required":false,"description":"Number of items to skip for pagination.","schema":{"type":"integer","default":0,"minimum":0}}},"schemas":{"ModelConfig":{"type":"object","description":"Represents a specific configuration of a ModelVariant.","properties":{"model_config_id":{"type":"string","format":"uuid","readOnly":true,"description":"Server-generated unique ID."},"organization_id":{"type":"string","readOnly":true,"description":"Owning organization ID (from external IdP)."},"model_id":{"type":"string","readOnly":true,"description":"ID of the parent Model, derived from the Model Variant."},"model_variant_id":{"type":"string","description":"ID of the ModelVariant being configured. NOT NULL."},"created_by":{"type":"string","readOnly":true,"description":"Creator's user ID (from external IdP)."},"title":{"type":"string","description":"Title for the configuration. NOT NULL, unique per organization_id."},"model_configurations":{"type":"object","additionalProperties":true,"description":"Parameters for the model variant, adhering to its schema. NOT NULL, defaults to '{}'.","default":{}},"created_at":{"type":"string","format":"date-time","readOnly":true,"description":"Timestamp of creation. Defaults to CURRENT_TIMESTAMP."},"updated_at":{"type":"string","format":"date-time","readOnly":true,"description":"Timestamp of last update. Auto-updates."}},"required":["model_config_id","organization_id","model_id","model_variant_id","created_by","title","model_configurations","created_at","updated_at"]},"Pagination":{"type":"object","properties":{"limit":{"type":"integer","description":"The number of items returned in the current page."},"offset":{"type":"integer","description":"The number of items skipped before starting the current page."},"total_items":{"type":"integer","description":"The total number of items available that match the query."}},"required":["limit","offset","total_items"]},"ErrorResponse":{"type":"object","properties":{"code":{"type":"string","description":"A short, machine-readable error code string."},"message":{"type":"string","description":"A human-readable description of the error."},"details":{"type":"object","additionalProperties":true,"nullable":true,"description":"Optional. Additional structured details about the error."}},"required":["code","message"]}},"responses":{"BadRequestError":{"description":"Invalid request payload or parameters.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"UnauthorizedError":{"description":"Authentication token is missing, invalid, or expired.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"ForbiddenError":{"description":"Authenticated principal does not have permission.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}}}},"paths":{"/model-configs":{"get":{"tags":["Model Configurations"],"summary":"List Model Configurations","description":"Retrieves a list of Model Configurations. By default returns only organization-owned configurations. Use source='template' to retrieve template configurations instead. Supports pagination and filtering by model variant.","operationId":"listModelConfigs","parameters":[{"$ref":"#/components/parameters/ModelVariantIdQueryParameter"},{"$ref":"#/components/parameters/SourceQueryParameter"},{"$ref":"#/components/parameters/LimitQueryParameter"},{"$ref":"#/components/parameters/OffsetQueryParameter"}],"responses":{"200":{"description":"A paginated list of model configurations.","content":{"application/json":{"schema":{"type":"object","properties":{"data":{"type":"array","items":{"$ref":"#/components/schemas/ModelConfig"}},"pagination":{"$ref":"#/components/schemas/Pagination"}}}}}},"400":{"$ref":"#/components/responses/BadRequestError"},"401":{"$ref":"#/components/responses/UnauthorizedError"},"403":{"$ref":"#/components/responses/ForbiddenError"}}}}}}
```

## POST /model-configs

> Create a new Model Configuration

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"tags":[{"name":"Model Configurations","description":"Operations related to user-defined configurations of model variants for API clients and product integrations."}],"servers":[{"url":"https://api.ojin.ai/v1","description":"Main backend server, version 1."}],"security":[{"APIKeyAuth":[]}],"components":{"securitySchemes":{"APIKeyAuth":{"type":"apiKey","in":"header","name":"X-API-Key","description":"API key for authenticated access to the Ojin REST API."}},"schemas":{"ModelConfigCreationRequest":{"type":"object","description":"Payload for creating a new Model Configuration.","properties":{"title":{"type":"string"},"model_variant_id":{"type":"string"},"model_configurations":{"type":"object","additionalProperties":true,"default":{}}},"required":["title","model_variant_id"]},"ModelConfig":{"type":"object","description":"Represents a specific configuration of a ModelVariant.","properties":{"model_config_id":{"type":"string","format":"uuid","readOnly":true,"description":"Server-generated unique ID."},"organization_id":{"type":"string","readOnly":true,"description":"Owning organization ID (from external IdP)."},"model_id":{"type":"string","readOnly":true,"description":"ID of the parent Model, derived from the Model Variant."},"model_variant_id":{"type":"string","description":"ID of the ModelVariant being configured. NOT NULL."},"created_by":{"type":"string","readOnly":true,"description":"Creator's user ID (from external IdP)."},"title":{"type":"string","description":"Title for the configuration. NOT NULL, unique per organization_id."},"model_configurations":{"type":"object","additionalProperties":true,"description":"Parameters for the model variant, adhering to its schema. NOT NULL, defaults to '{}'.","default":{}},"created_at":{"type":"string","format":"date-time","readOnly":true,"description":"Timestamp of creation. Defaults to CURRENT_TIMESTAMP."},"updated_at":{"type":"string","format":"date-time","readOnly":true,"description":"Timestamp of last update. Auto-updates."}},"required":["model_config_id","organization_id","model_id","model_variant_id","created_by","title","model_configurations","created_at","updated_at"]},"ErrorResponse":{"type":"object","properties":{"code":{"type":"string","description":"A short, machine-readable error code string."},"message":{"type":"string","description":"A human-readable description of the error."},"details":{"type":"object","additionalProperties":true,"nullable":true,"description":"Optional. Additional structured details about the error."}},"required":["code","message"]}},"responses":{"BadRequestError":{"description":"Invalid request payload or parameters.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"UnauthorizedError":{"description":"Authentication token is missing, invalid, or expired.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"ForbiddenError":{"description":"Authenticated principal does not have permission.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}}}},"paths":{"/model-configs":{"post":{"tags":["Model Configurations"],"summary":"Create a new Model Configuration","operationId":"createModelConfig","requestBody":{"required":true,"content":{"application/json":{"schema":{"$ref":"#/components/schemas/ModelConfigCreationRequest"}}}},"responses":{"201":{"description":"Model Configuration created.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ModelConfig"}}}},"400":{"$ref":"#/components/responses/BadRequestError"},"401":{"$ref":"#/components/responses/UnauthorizedError"},"403":{"$ref":"#/components/responses/ForbiddenError"},"404":{"description":"Referenced ModelVariant not found.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}}}}}}}
```

## GET /model-configs/{model\_config\_id}

> Retrieve a specific Model Configuration

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"tags":[{"name":"Model Configurations","description":"Operations related to user-defined configurations of model variants for API clients and product integrations."}],"servers":[{"url":"https://api.ojin.ai/v1","description":"Main backend server, version 1."}],"security":[{"APIKeyAuth":[]}],"components":{"securitySchemes":{"APIKeyAuth":{"type":"apiKey","in":"header","name":"X-API-Key","description":"API key for authenticated access to the Ojin REST API."}},"parameters":{"ModelConfigIdPathParameter":{"name":"model_config_id","in":"path","required":true,"description":"The unique identifier (UUID) of the Model Configuration.","schema":{"type":"string","format":"uuid"}}},"schemas":{"ModelConfig":{"type":"object","description":"Represents a specific configuration of a ModelVariant.","properties":{"model_config_id":{"type":"string","format":"uuid","readOnly":true,"description":"Server-generated unique ID."},"organization_id":{"type":"string","readOnly":true,"description":"Owning organization ID (from external IdP)."},"model_id":{"type":"string","readOnly":true,"description":"ID of the parent Model, derived from the Model Variant."},"model_variant_id":{"type":"string","description":"ID of the ModelVariant being configured. NOT NULL."},"created_by":{"type":"string","readOnly":true,"description":"Creator's user ID (from external IdP)."},"title":{"type":"string","description":"Title for the configuration. NOT NULL, unique per organization_id."},"model_configurations":{"type":"object","additionalProperties":true,"description":"Parameters for the model variant, adhering to its schema. NOT NULL, defaults to '{}'.","default":{}},"created_at":{"type":"string","format":"date-time","readOnly":true,"description":"Timestamp of creation. Defaults to CURRENT_TIMESTAMP."},"updated_at":{"type":"string","format":"date-time","readOnly":true,"description":"Timestamp of last update. Auto-updates."}},"required":["model_config_id","organization_id","model_id","model_variant_id","created_by","title","model_configurations","created_at","updated_at"]},"ErrorResponse":{"type":"object","properties":{"code":{"type":"string","description":"A short, machine-readable error code string."},"message":{"type":"string","description":"A human-readable description of the error."},"details":{"type":"object","additionalProperties":true,"nullable":true,"description":"Optional. Additional structured details about the error."}},"required":["code","message"]}},"responses":{"UnauthorizedError":{"description":"Authentication token is missing, invalid, or expired.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"ForbiddenError":{"description":"Authenticated principal does not have permission.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"NotFoundError":{"description":"The requested resource was not found.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}}}},"paths":{"/model-configs/{model_config_id}":{"get":{"tags":["Model Configurations"],"summary":"Retrieve a specific Model Configuration","operationId":"getModelConfigById","parameters":[{"$ref":"#/components/parameters/ModelConfigIdPathParameter"}],"responses":{"200":{"description":"Model Configuration details.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ModelConfig"}}}},"401":{"$ref":"#/components/responses/UnauthorizedError"},"403":{"$ref":"#/components/responses/ForbiddenError"},"404":{"$ref":"#/components/responses/NotFoundError"}}}}}}
```

## PUT /model-configs/{model\_config\_id}

> Update an existing Model Configuration

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"tags":[{"name":"Model Configurations","description":"Operations related to user-defined configurations of model variants for API clients and product integrations."}],"servers":[{"url":"https://api.ojin.ai/v1","description":"Main backend server, version 1."}],"security":[{"APIKeyAuth":[]}],"components":{"securitySchemes":{"APIKeyAuth":{"type":"apiKey","in":"header","name":"X-API-Key","description":"API key for authenticated access to the Ojin REST API."}},"parameters":{"ModelConfigIdPathParameter":{"name":"model_config_id","in":"path","required":true,"description":"The unique identifier (UUID) of the Model Configuration.","schema":{"type":"string","format":"uuid"}}},"schemas":{"ModelConfigUpdateRequest":{"type":"object","description":"Payload for updating a Model Configuration (title and parameters only).","properties":{"title":{"type":"string"},"model_configurations":{"type":"object","additionalProperties":true}},"required":["title","model_configurations"]},"ModelConfig":{"type":"object","description":"Represents a specific configuration of a ModelVariant.","properties":{"model_config_id":{"type":"string","format":"uuid","readOnly":true,"description":"Server-generated unique ID."},"organization_id":{"type":"string","readOnly":true,"description":"Owning organization ID (from external IdP)."},"model_id":{"type":"string","readOnly":true,"description":"ID of the parent Model, derived from the Model Variant."},"model_variant_id":{"type":"string","description":"ID of the ModelVariant being configured. NOT NULL."},"created_by":{"type":"string","readOnly":true,"description":"Creator's user ID (from external IdP)."},"title":{"type":"string","description":"Title for the configuration. NOT NULL, unique per organization_id."},"model_configurations":{"type":"object","additionalProperties":true,"description":"Parameters for the model variant, adhering to its schema. NOT NULL, defaults to '{}'.","default":{}},"created_at":{"type":"string","format":"date-time","readOnly":true,"description":"Timestamp of creation. Defaults to CURRENT_TIMESTAMP."},"updated_at":{"type":"string","format":"date-time","readOnly":true,"description":"Timestamp of last update. Auto-updates."}},"required":["model_config_id","organization_id","model_id","model_variant_id","created_by","title","model_configurations","created_at","updated_at"]},"ErrorResponse":{"type":"object","properties":{"code":{"type":"string","description":"A short, machine-readable error code string."},"message":{"type":"string","description":"A human-readable description of the error."},"details":{"type":"object","additionalProperties":true,"nullable":true,"description":"Optional. Additional structured details about the error."}},"required":["code","message"]}},"responses":{"BadRequestError":{"description":"Invalid request payload or parameters.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"UnauthorizedError":{"description":"Authentication token is missing, invalid, or expired.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"ForbiddenError":{"description":"Authenticated principal does not have permission.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"NotFoundError":{"description":"The requested resource was not found.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}}}},"paths":{"/model-configs/{model_config_id}":{"put":{"tags":["Model Configurations"],"summary":"Update an existing Model Configuration","operationId":"updateModelConfig","parameters":[{"$ref":"#/components/parameters/ModelConfigIdPathParameter"}],"requestBody":{"required":true,"content":{"application/json":{"schema":{"$ref":"#/components/schemas/ModelConfigUpdateRequest"}}}},"responses":{"200":{"description":"Model Configuration updated.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ModelConfig"}}}},"400":{"$ref":"#/components/responses/BadRequestError"},"401":{"$ref":"#/components/responses/UnauthorizedError"},"403":{"$ref":"#/components/responses/ForbiddenError"},"404":{"$ref":"#/components/responses/NotFoundError"}}}}}}
```

## DELETE /model-configs/{model\_config\_id}

> Delete a Model Configuration

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"tags":[{"name":"Model Configurations","description":"Operations related to user-defined configurations of model variants for API clients and product integrations."}],"servers":[{"url":"https://api.ojin.ai/v1","description":"Main backend server, version 1."}],"security":[{"APIKeyAuth":[]}],"components":{"securitySchemes":{"APIKeyAuth":{"type":"apiKey","in":"header","name":"X-API-Key","description":"API key for authenticated access to the Ojin REST API."}},"parameters":{"ModelConfigIdPathParameter":{"name":"model_config_id","in":"path","required":true,"description":"The unique identifier (UUID) of the Model Configuration.","schema":{"type":"string","format":"uuid"}},"ReferencedByQueryParameter":{"name":"referenced_by","in":"query","required":false,"description":"Delete reference to the agent configuration.","schema":{"type":"string","format":"uuid"}}},"responses":{"NoContentSuccess":{"description":"Operation successful, no content to return."},"UnauthorizedError":{"description":"Authentication token is missing, invalid, or expired.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"ForbiddenError":{"description":"Authenticated principal does not have permission.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"NotFoundError":{"description":"The requested resource was not found.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}}},"schemas":{"ErrorResponse":{"type":"object","properties":{"code":{"type":"string","description":"A short, machine-readable error code string."},"message":{"type":"string","description":"A human-readable description of the error."},"details":{"type":"object","additionalProperties":true,"nullable":true,"description":"Optional. Additional structured details about the error."}},"required":["code","message"]}}},"paths":{"/model-configs/{model_config_id}":{"delete":{"tags":["Model Configurations"],"summary":"Delete a Model Configuration","operationId":"deleteModelConfig","parameters":[{"$ref":"#/components/parameters/ModelConfigIdPathParameter"},{"$ref":"#/components/parameters/ReferencedByQueryParameter"}],"responses":{"204":{"$ref":"#/components/responses/NoContentSuccess"},"401":{"$ref":"#/components/responses/UnauthorizedError"},"403":{"$ref":"#/components/responses/ForbiddenError"},"404":{"$ref":"#/components/responses/NotFoundError"},"409":{"description":"Conflict - Cannot delete, referenced by AgentConfig.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}}}}}}}
```

## Model Variants

Use model variant endpoints to discover public variants and retrieve the `configuration_schema` that defines the expected shape of `model_configurations`.

## List all Model Variants

> List all Model Variants.\
> If authenticated, returns variants based on filters (defaulting to all if no filters provided).\
> If unauthenticated, ONLY returns variants with status="public", ignoring other status filters.

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"tags":[{"name":"Model Variants","description":"Operations related to specific variants of models."}],"servers":[{"url":"https://api.ojin.ai/v1","description":"Main backend server, version 1."}],"security":[{"APIKeyAuth":[]}],"components":{"securitySchemes":{"APIKeyAuth":{"type":"apiKey","in":"header","name":"X-API-Key","description":"API key for authenticated access to the Ojin REST API."}},"parameters":{"ModelIdQueryParameter":{"name":"model_id","in":"query","required":false,"description":"Filter model variants by parent model_id.","schema":{"type":"string"}},"StatusQueryParameter":{"name":"status","in":"query","required":false,"description":"Filter by status.","schema":{"type":"string"}},"TagsQueryParameter":{"name":"tags","in":"query","required":false,"description":"Filter model variants by a comma-separated list of tags (AND logic).","style":"form","explode":false,"schema":{"type":"array","items":{"type":"string"}}},"LimitQueryParameter":{"name":"limit","in":"query","required":false,"description":"Number of items to return per page.","schema":{"type":"integer","default":20,"minimum":1,"maximum":100}},"OffsetQueryParameter":{"name":"offset","in":"query","required":false,"description":"Number of items to skip for pagination.","schema":{"type":"integer","default":0,"minimum":0}}},"schemas":{"ModelVariant":{"type":"object","description":"Represents a specific variant of a Model.","properties":{"model_variant_id":{"type":"string","description":"Client-provided unique identifier (e.g., \"ojin/oris-v1/standard\")."},"model_id":{"type":"string","description":"Identifier of the parent Model. NOT NULL."},"title":{"type":"string","description":"Display name for the variant. NOT NULL, unique per model_id."},"description":{"type":"object","additionalProperties":{"type":"string"},"description":"UI display texts. NOT NULL, defaults to '{}'."},"preview_media_url":{"type":"string","format":"url","nullable":true,"description":"URL for a preview media."},"configuration_schema":{"type":"object","description":"JSON schema for ModelConfig.model_configurations. NOT NULL, defaults to '{}'.","additionalProperties":true},"tags":{"type":"array","items":{"type":"string"},"description":"List of descriptive tags. NOT NULL, defaults to '[]'."},"status":{"type":"string","description":"Status (e.g., 'available', 'deprecated'). NOT NULL. List managed in code."},"created_at":{"type":"string","format":"date-time","description":"Timestamp of creation. Server-generated. Defaults to CURRENT_TIMESTAMP.","readOnly":true},"updated_at":{"type":"string","format":"date-time","description":"Timestamp of last update. Server-generated. Auto-updates.","readOnly":true}},"required":["model_variant_id","model_id","title","description","configuration_schema","tags","status","created_at","updated_at"]},"Pagination":{"type":"object","properties":{"limit":{"type":"integer","description":"The number of items returned in the current page."},"offset":{"type":"integer","description":"The number of items skipped before starting the current page."},"total_items":{"type":"integer","description":"The total number of items available that match the query."}},"required":["limit","offset","total_items"]},"ErrorResponse":{"type":"object","properties":{"code":{"type":"string","description":"A short, machine-readable error code string."},"message":{"type":"string","description":"A human-readable description of the error."},"details":{"type":"object","additionalProperties":true,"nullable":true,"description":"Optional. Additional structured details about the error."}},"required":["code","message"]}},"responses":{"BadRequestError":{"description":"Invalid request payload or parameters.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"UnauthorizedError":{"description":"Authentication token is missing, invalid, or expired.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"ForbiddenError":{"description":"Authenticated principal does not have permission.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}}}},"paths":{"/model-variants":{"get":{"tags":["Model Variants"],"summary":"List all Model Variants","description":"List all Model Variants.\nIf authenticated, returns variants based on filters (defaulting to all if no filters provided).\nIf unauthenticated, ONLY returns variants with status=\"public\", ignoring other status filters.","operationId":"listModelVariants","parameters":[{"$ref":"#/components/parameters/ModelIdQueryParameter"},{"$ref":"#/components/parameters/StatusQueryParameter"},{"$ref":"#/components/parameters/TagsQueryParameter"},{"$ref":"#/components/parameters/LimitQueryParameter"},{"$ref":"#/components/parameters/OffsetQueryParameter"}],"responses":{"200":{"description":"A paginated list of model variants.","content":{"application/json":{"schema":{"type":"object","properties":{"data":{"type":"array","items":{"$ref":"#/components/schemas/ModelVariant"}},"pagination":{"$ref":"#/components/schemas/Pagination"}}}}}},"400":{"$ref":"#/components/responses/BadRequestError"},"401":{"$ref":"#/components/responses/UnauthorizedError"},"403":{"$ref":"#/components/responses/ForbiddenError"}}}}}}
```

## Retrieve a specific Model Variant

> Retrieve a specific Model Variant by ID.\
> If authenticated, returns the variant regardless of status.\
> If unauthenticated, ONLY returns the variant if status="public". Otherwise returns 404.

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"tags":[{"name":"Model Variants","description":"Operations related to specific variants of models."}],"servers":[{"url":"https://api.ojin.ai/v1","description":"Main backend server, version 1."}],"security":[{"APIKeyAuth":[]}],"components":{"securitySchemes":{"APIKeyAuth":{"type":"apiKey","in":"header","name":"X-API-Key","description":"API key for authenticated access to the Ojin REST API."}},"parameters":{"ModelVariantIdPathParameter":{"name":"model_variant_id","in":"path","required":true,"description":"The unique identifier of the model variant.","schema":{"type":"string"}}},"schemas":{"ModelVariant":{"type":"object","description":"Represents a specific variant of a Model.","properties":{"model_variant_id":{"type":"string","description":"Client-provided unique identifier (e.g., \"ojin/oris-v1/standard\")."},"model_id":{"type":"string","description":"Identifier of the parent Model. NOT NULL."},"title":{"type":"string","description":"Display name for the variant. NOT NULL, unique per model_id."},"description":{"type":"object","additionalProperties":{"type":"string"},"description":"UI display texts. NOT NULL, defaults to '{}'."},"preview_media_url":{"type":"string","format":"url","nullable":true,"description":"URL for a preview media."},"configuration_schema":{"type":"object","description":"JSON schema for ModelConfig.model_configurations. NOT NULL, defaults to '{}'.","additionalProperties":true},"tags":{"type":"array","items":{"type":"string"},"description":"List of descriptive tags. NOT NULL, defaults to '[]'."},"status":{"type":"string","description":"Status (e.g., 'available', 'deprecated'). NOT NULL. List managed in code."},"created_at":{"type":"string","format":"date-time","description":"Timestamp of creation. Server-generated. Defaults to CURRENT_TIMESTAMP.","readOnly":true},"updated_at":{"type":"string","format":"date-time","description":"Timestamp of last update. Server-generated. Auto-updates.","readOnly":true}},"required":["model_variant_id","model_id","title","description","configuration_schema","tags","status","created_at","updated_at"]},"ErrorResponse":{"type":"object","properties":{"code":{"type":"string","description":"A short, machine-readable error code string."},"message":{"type":"string","description":"A human-readable description of the error."},"details":{"type":"object","additionalProperties":true,"nullable":true,"description":"Optional. Additional structured details about the error."}},"required":["code","message"]}},"responses":{"UnauthorizedError":{"description":"Authentication token is missing, invalid, or expired.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"ForbiddenError":{"description":"Authenticated principal does not have permission.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"NotFoundError":{"description":"The requested resource was not found.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}}}},"paths":{"/model-variants/{model_variant_id}":{"get":{"tags":["Model Variants"],"summary":"Retrieve a specific Model Variant","description":"Retrieve a specific Model Variant by ID.\nIf authenticated, returns the variant regardless of status.\nIf unauthenticated, ONLY returns the variant if status=\"public\". Otherwise returns 404.","operationId":"getModelVariantById","parameters":[{"$ref":"#/components/parameters/ModelVariantIdPathParameter"}],"responses":{"200":{"description":"Model Variant details.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ModelVariant"}}}},"401":{"$ref":"#/components/responses/UnauthorizedError"},"403":{"$ref":"#/components/responses/ForbiddenError"},"404":{"$ref":"#/components/responses/NotFoundError"}}}}}}
```

## Assets

Use asset endpoints to upload, register, retrieve, and delete media used by your Ojin integrations.

## Initiate a Multipart Asset Upload

> Starts the multipart asset upload process by creating a multipart upload session in S3.\
> The Core API generates a unique \`asset\_id\` and receives an \`upload\_id\` from S3.\
> Both IDs must be used in subsequent part signing and finalization requests.\
> No Asset record is created in the database at this stage.

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"tags":[{"name":"Assets","description":"Operations related to managing digital assets (images, videos, etc.)."}],"servers":[{"url":"https://api.ojin.ai/v1","description":"Main backend server, version 1."}],"security":[{"APIKeyAuth":[]}],"components":{"securitySchemes":{"APIKeyAuth":{"type":"apiKey","in":"header","name":"X-API-Key","description":"API key for authenticated access to the Ojin REST API."}},"schemas":{"AssetInitiateUploadRequest":{"type":"object","description":"Payload to initiate a multipart asset upload.","properties":{"name":{"type":"string","description":"Original filename of the asset."},"category":{"type":"string","description":"Category for the asset (e.g., 'video', 'image')."},"content_type":{"type":"string","nullable":true,"description":"Client-declared MIME type of the file."},"size_bytes":{"type":"integer","format":"int64","nullable":true,"description":"Client-declared file size in bytes."}},"required":["name","category"]},"AssetInitiateUploadResponse":{"type":"object","description":"Response from initiating a multipart asset upload.","properties":{"asset_id":{"type":"string","format":"uuid","description":"A unique ID generated by Core API for this asset transaction."},"upload_id":{"type":"string","description":"The multipart upload ID from S3, used to identify this multipart upload session."},"s3_key":{"type":"string","description":"The S3 key for this asset."}},"required":["asset_id","upload_id"]},"ErrorResponse":{"type":"object","properties":{"code":{"type":"string","description":"A short, machine-readable error code string."},"message":{"type":"string","description":"A human-readable description of the error."},"details":{"type":"object","additionalProperties":true,"nullable":true,"description":"Optional. Additional structured details about the error."}},"required":["code","message"]}},"responses":{"BadRequestError":{"description":"Invalid request payload or parameters.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"UnauthorizedError":{"description":"Authentication token is missing, invalid, or expired.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"ForbiddenError":{"description":"Authenticated principal does not have permission.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}}}},"paths":{"/assets/initiate-upload":{"post":{"tags":["Assets"],"summary":"Initiate a Multipart Asset Upload","description":"Starts the multipart asset upload process by creating a multipart upload session in S3.\nThe Core API generates a unique `asset_id` and receives an `upload_id` from S3.\nBoth IDs must be used in subsequent part signing and finalization requests.\nNo Asset record is created in the database at this stage.","operationId":"initiateAssetUpload","requestBody":{"description":"Initial metadata for the asset to be uploaded.","required":true,"content":{"application/json":{"schema":{"$ref":"#/components/schemas/AssetInitiateUploadRequest"}}}},"responses":{"200":{"description":"Multipart upload initiated successfully. Returns asset_id and upload_id.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/AssetInitiateUploadResponse"}}}},"400":{"$ref":"#/components/responses/BadRequestError"},"401":{"$ref":"#/components/responses/UnauthorizedError"},"403":{"$ref":"#/components/responses/ForbiddenError"}}}}}}
```

## Get Pre-signed URL for Upload Part

> Generates a pre-signed URL for uploading a specific part of a multipart upload.\
> This endpoint will be called multiple times, once for each part of the file.\
> Parts must be numbered sequentially starting from 1, and each part (except the last)\
> must be at least 5MB in size (S3 requirement).

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"tags":[{"name":"Assets","description":"Operations related to managing digital assets (images, videos, etc.)."}],"servers":[{"url":"https://api.ojin.ai/v1","description":"Main backend server, version 1."}],"security":[{"APIKeyAuth":[]}],"components":{"securitySchemes":{"APIKeyAuth":{"type":"apiKey","in":"header","name":"X-API-Key","description":"API key for authenticated access to the Ojin REST API."}},"schemas":{"AssetSignPartRequest":{"type":"object","description":"Payload to get a pre-signed URL for uploading a specific part of a multipart upload.","properties":{"s3_key":{"type":"string","description":"The S3 key for this asset."},"asset_id":{"type":"string","format":"uuid","description":"The asset ID from the initiate upload response."},"upload_id":{"type":"string","description":"The multipart upload ID from the initiate upload response."},"part_number":{"type":"integer","minimum":1,"maximum":10000,"description":"The part number for this upload part (1-10000)."}},"required":["asset_id","upload_id","part_number"]},"AssetSignPartResponse":{"type":"object","description":"Response containing the pre-signed URL for uploading a specific part.","properties":{"upload_url":{"type":"string","format":"url","description":"The pre-signed S3 URL to PUT this specific part to."},"part_number":{"type":"integer","description":"The part number this URL is for."}},"required":["upload_url","part_number"]},"ErrorResponse":{"type":"object","properties":{"code":{"type":"string","description":"A short, machine-readable error code string."},"message":{"type":"string","description":"A human-readable description of the error."},"details":{"type":"object","additionalProperties":true,"nullable":true,"description":"Optional. Additional structured details about the error."}},"required":["code","message"]}},"responses":{"BadRequestError":{"description":"Invalid request payload or parameters.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"UnauthorizedError":{"description":"Authentication token is missing, invalid, or expired.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"ForbiddenError":{"description":"Authenticated principal does not have permission.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}}}},"paths":{"/assets/sign-part":{"post":{"tags":["Assets"],"summary":"Get Pre-signed URL for Upload Part","description":"Generates a pre-signed URL for uploading a specific part of a multipart upload.\nThis endpoint will be called multiple times, once for each part of the file.\nParts must be numbered sequentially starting from 1, and each part (except the last)\nmust be at least 5MB in size (S3 requirement).","operationId":"signAssetUploadPart","requestBody":{"description":"Details for the part to be signed.","required":true,"content":{"application/json":{"schema":{"$ref":"#/components/schemas/AssetSignPartRequest"}}}},"responses":{"200":{"description":"Pre-signed URL generated successfully for the specified part.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/AssetSignPartResponse"}}}},"400":{"$ref":"#/components/responses/BadRequestError"},"401":{"$ref":"#/components/responses/UnauthorizedError"},"403":{"$ref":"#/components/responses/ForbiddenError"},"404":{"description":"Multipart upload session not found.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}}}}}}}
```

## Finalize Multipart Upload and Create Asset Record

> Completes a multipart upload by combining all uploaded parts and creates the Asset\
> metadata record in the Core API. The client must provide all part numbers and their\
> corresponding ETags from the S3 upload responses. The Core API will complete the\
> multipart upload in S3 and verify the final object before creating the database record.

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"tags":[{"name":"Assets","description":"Operations related to managing digital assets (images, videos, etc.)."}],"servers":[{"url":"https://api.ojin.ai/v1","description":"Main backend server, version 1."}],"security":[{"APIKeyAuth":[]}],"components":{"securitySchemes":{"APIKeyAuth":{"type":"apiKey","in":"header","name":"X-API-Key","description":"API key for authenticated access to the Ojin REST API."}},"schemas":{"AssetFinalizationRequest":{"type":"object","description":"Payload to finalize a multipart asset upload and create the asset metadata record in DB.","properties":{"asset_id":{"type":"string","format":"uuid","description":"The unique ID received from the 'initiate-upload' step."},"upload_id":{"type":"string","description":"The multipart upload ID from the 'initiate-upload' step."},"name":{"type":"string","description":"The original filename (must be consistent with initiate request)."},"category":{"type":"string","description":"The asset category (must be consistent with initiate request)."},"content_type":{"type":"string","description":"Final confirmed MIME type of the asset."},"size_bytes":{"type":"integer","format":"int64","description":"Final confirmed size of the asset in bytes."},"parts":{"type":"array","items":{"$ref":"#/components/schemas/AssetUploadPart"},"description":"List of all uploaded parts with their ETags, in order.","minItems":1}},"required":["asset_id","upload_id","name","category","content_type","size_bytes","parts"]},"AssetUploadPart":{"type":"object","description":"Information about a completed upload part.","properties":{"part_number":{"type":"integer","minimum":1,"maximum":10000,"description":"The part number that was uploaded."},"etag":{"type":"string","description":"The ETag returned by S3 after successfully uploading this part."}},"required":["part_number","etag"]},"Asset":{"type":"object","description":"Represents a digital asset managed by the Core API.","properties":{"asset_id":{"type":"string","format":"uuid","description":"Unique identifier for the Asset, generated during upload initiation.","readOnly":true},"organization_id":{"type":"string","description":"ID of the Organisation (from external IdP) that owns this Asset. Server-set.","readOnly":true},"created_by":{"type":"string","description":"User ID of the creator (from external IdP). Server-set.","readOnly":true},"name":{"type":"string","description":"The original file name of the Asset. NOT NULL."},"category":{"type":"string","description":"Primary category of the Asset (e.g., 'video', 'image', 'weight'). NOT NULL. List of values managed in code."},"content_type":{"type":"string","description":"The MIME type of the asset. NOT NULL."},"size_bytes":{"type":"integer","format":"int64","description":"The size of the asset in bytes. NOT NULL."},"etag":{"type":"string","description":"The ETag of the S3 object, used for integrity checking. NOT NULL."},"created_at":{"type":"string","format":"date-time","description":"Timestamp of when this Asset record was created (finalized). Server-generated. Defaults to CURRENT_TIMESTAMP.","readOnly":true},"updated_at":{"type":"string","format":"date-time","description":"Timestamp of the last update to this Asset record. Server-generated. Auto-updates on modification.","readOnly":true}},"required":["asset_id","organization_id","created_by","name","category","content_type","size_bytes","etag","created_at","updated_at"]},"ErrorResponse":{"type":"object","properties":{"code":{"type":"string","description":"A short, machine-readable error code string."},"message":{"type":"string","description":"A human-readable description of the error."},"details":{"type":"object","additionalProperties":true,"nullable":true,"description":"Optional. Additional structured details about the error."}},"required":["code","message"]}},"responses":{"BadRequestError":{"description":"Invalid request payload or parameters.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"UnauthorizedError":{"description":"Authentication token is missing, invalid, or expired.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"ForbiddenError":{"description":"Authenticated principal does not have permission.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"ConflictError":{"description":"Conflict with the current state of the resource.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}}}},"paths":{"/assets":{"post":{"tags":["Assets"],"summary":"Finalize Multipart Upload and Create Asset Record","description":"Completes a multipart upload by combining all uploaded parts and creates the Asset\nmetadata record in the Core API. The client must provide all part numbers and their\ncorresponding ETags from the S3 upload responses. The Core API will complete the\nmultipart upload in S3 and verify the final object before creating the database record.","operationId":"createAssetRecord","requestBody":{"description":"Finalization details for the multipart asset upload.","required":true,"content":{"application/json":{"schema":{"$ref":"#/components/schemas/AssetFinalizationRequest"}}}},"responses":{"201":{"description":"Multipart upload completed successfully and Asset record created in DB.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/Asset"}}}},"400":{"$ref":"#/components/responses/BadRequestError"},"401":{"$ref":"#/components/responses/UnauthorizedError"},"403":{"$ref":"#/components/responses/ForbiddenError"},"404":{"description":"Multipart upload session not found.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"409":{"$ref":"#/components/responses/ConflictError"}}}}}}
```

## List Assets

> Retrieves a list of Asset metadata. By default returns only organization-owned assets. Use source='template' to retrieve template assets instead. Supports pagination and filtering by category and name.

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"tags":[{"name":"Assets","description":"Operations related to managing digital assets (images, videos, etc.)."}],"servers":[{"url":"https://api.ojin.ai/v1","description":"Main backend server, version 1."}],"security":[{"APIKeyAuth":[]}],"components":{"securitySchemes":{"APIKeyAuth":{"type":"apiKey","in":"header","name":"X-API-Key","description":"API key for authenticated access to the Ojin REST API."}},"parameters":{"SourceQueryParameter":{"name":"source","in":"query","required":false,"description":"Filter items by ownership source. Use 'org' for items owned by the user's organization, 'template' for items from the template organization. Defaults to 'org'.","schema":{"type":"string","enum":["org","template"],"default":"org"}},"AssetCategoryQueryParameter":{"name":"category","in":"query","required":false,"description":"Filter assets by category.","schema":{"type":"string"}},"AssetNameQueryParameter":{"name":"name","in":"query","required":false,"description":"Filter assets by name (e.g., for partial match - specific behavior TBD by implementation).","schema":{"type":"string"}},"LimitQueryParameter":{"name":"limit","in":"query","required":false,"description":"Number of items to return per page.","schema":{"type":"integer","default":20,"minimum":1,"maximum":100}},"OffsetQueryParameter":{"name":"offset","in":"query","required":false,"description":"Number of items to skip for pagination.","schema":{"type":"integer","default":0,"minimum":0}}},"schemas":{"Asset":{"type":"object","description":"Represents a digital asset managed by the Core API.","properties":{"asset_id":{"type":"string","format":"uuid","description":"Unique identifier for the Asset, generated during upload initiation.","readOnly":true},"organization_id":{"type":"string","description":"ID of the Organisation (from external IdP) that owns this Asset. Server-set.","readOnly":true},"created_by":{"type":"string","description":"User ID of the creator (from external IdP). Server-set.","readOnly":true},"name":{"type":"string","description":"The original file name of the Asset. NOT NULL."},"category":{"type":"string","description":"Primary category of the Asset (e.g., 'video', 'image', 'weight'). NOT NULL. List of values managed in code."},"content_type":{"type":"string","description":"The MIME type of the asset. NOT NULL."},"size_bytes":{"type":"integer","format":"int64","description":"The size of the asset in bytes. NOT NULL."},"etag":{"type":"string","description":"The ETag of the S3 object, used for integrity checking. NOT NULL."},"created_at":{"type":"string","format":"date-time","description":"Timestamp of when this Asset record was created (finalized). Server-generated. Defaults to CURRENT_TIMESTAMP.","readOnly":true},"updated_at":{"type":"string","format":"date-time","description":"Timestamp of the last update to this Asset record. Server-generated. Auto-updates on modification.","readOnly":true}},"required":["asset_id","organization_id","created_by","name","category","content_type","size_bytes","etag","created_at","updated_at"]},"Pagination":{"type":"object","properties":{"limit":{"type":"integer","description":"The number of items returned in the current page."},"offset":{"type":"integer","description":"The number of items skipped before starting the current page."},"total_items":{"type":"integer","description":"The total number of items available that match the query."}},"required":["limit","offset","total_items"]},"ErrorResponse":{"type":"object","properties":{"code":{"type":"string","description":"A short, machine-readable error code string."},"message":{"type":"string","description":"A human-readable description of the error."},"details":{"type":"object","additionalProperties":true,"nullable":true,"description":"Optional. Additional structured details about the error."}},"required":["code","message"]}},"responses":{"BadRequestError":{"description":"Invalid request payload or parameters.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"UnauthorizedError":{"description":"Authentication token is missing, invalid, or expired.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"ForbiddenError":{"description":"Authenticated principal does not have permission.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}}}},"paths":{"/assets":{"get":{"tags":["Assets"],"summary":"List Assets","description":"Retrieves a list of Asset metadata. By default returns only organization-owned assets. Use source='template' to retrieve template assets instead. Supports pagination and filtering by category and name.","operationId":"listAssets","parameters":[{"$ref":"#/components/parameters/SourceQueryParameter"},{"$ref":"#/components/parameters/AssetCategoryQueryParameter"},{"$ref":"#/components/parameters/AssetNameQueryParameter"},{"$ref":"#/components/parameters/LimitQueryParameter"},{"$ref":"#/components/parameters/OffsetQueryParameter"}],"responses":{"200":{"description":"A paginated list of Asset metadata.","content":{"application/json":{"schema":{"type":"object","properties":{"data":{"type":"array","items":{"$ref":"#/components/schemas/Asset"}},"pagination":{"$ref":"#/components/schemas/Pagination"}}}}}},"400":{"$ref":"#/components/responses/BadRequestError"},"401":{"$ref":"#/components/responses/UnauthorizedError"},"403":{"$ref":"#/components/responses/ForbiddenError"}}}}}}
```

## Retrieve Asset Metadata and Download URL

> Retrieves metadata for a specific Asset, including a pre-signed URL for downloading the content. The user can access an asset if it belongs to their organization or to the shared template organization.

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"tags":[{"name":"Assets","description":"Operations related to managing digital assets (images, videos, etc.)."}],"servers":[{"url":"https://api.ojin.ai/v1","description":"Main backend server, version 1."}],"security":[{"APIKeyAuth":[]}],"components":{"securitySchemes":{"APIKeyAuth":{"type":"apiKey","in":"header","name":"X-API-Key","description":"API key for authenticated access to the Ojin REST API."}},"parameters":{"AssetIdPathParameter":{"name":"asset_id","in":"path","required":true,"description":"The unique identifier (UUID) of the Asset.","schema":{"type":"string","format":"uuid"}}},"schemas":{"Asset":{"type":"object","description":"Represents a digital asset managed by the Core API.","properties":{"asset_id":{"type":"string","format":"uuid","description":"Unique identifier for the Asset, generated during upload initiation.","readOnly":true},"organization_id":{"type":"string","description":"ID of the Organisation (from external IdP) that owns this Asset. Server-set.","readOnly":true},"created_by":{"type":"string","description":"User ID of the creator (from external IdP). Server-set.","readOnly":true},"name":{"type":"string","description":"The original file name of the Asset. NOT NULL."},"category":{"type":"string","description":"Primary category of the Asset (e.g., 'video', 'image', 'weight'). NOT NULL. List of values managed in code."},"content_type":{"type":"string","description":"The MIME type of the asset. NOT NULL."},"size_bytes":{"type":"integer","format":"int64","description":"The size of the asset in bytes. NOT NULL."},"etag":{"type":"string","description":"The ETag of the S3 object, used for integrity checking. NOT NULL."},"created_at":{"type":"string","format":"date-time","description":"Timestamp of when this Asset record was created (finalized). Server-generated. Defaults to CURRENT_TIMESTAMP.","readOnly":true},"updated_at":{"type":"string","format":"date-time","description":"Timestamp of the last update to this Asset record. Server-generated. Auto-updates on modification.","readOnly":true}},"required":["asset_id","organization_id","created_by","name","category","content_type","size_bytes","etag","created_at","updated_at"]},"ErrorResponse":{"type":"object","properties":{"code":{"type":"string","description":"A short, machine-readable error code string."},"message":{"type":"string","description":"A human-readable description of the error."},"details":{"type":"object","additionalProperties":true,"nullable":true,"description":"Optional. Additional structured details about the error."}},"required":["code","message"]}},"responses":{"UnauthorizedError":{"description":"Authentication token is missing, invalid, or expired.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"ForbiddenError":{"description":"Authenticated principal does not have permission.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"NotFoundError":{"description":"The requested resource was not found.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}}}},"paths":{"/assets/{asset_id}/download":{"get":{"tags":["Assets"],"summary":"Retrieve Asset Metadata and Download URL","description":"Retrieves metadata for a specific Asset, including a pre-signed URL for downloading the content. The user can access an asset if it belongs to their organization or to the shared template organization.","operationId":"getAssetDownloadById","parameters":[{"$ref":"#/components/parameters/AssetIdPathParameter"}],"responses":{"200":{"description":"Successfully retrieved Asset metadata, including a download URL.","content":{"application/json":{"schema":{"allOf":[{"$ref":"#/components/schemas/Asset"},{"type":"object","properties":{"asset_url":{"type":"string","format":"uri","readOnly":true,"description":"A temporary, pre-signed URL to download the asset's content. This URL will expire."}}}]}}}},"401":{"$ref":"#/components/responses/UnauthorizedError"},"403":{"$ref":"#/components/responses/ForbiddenError"},"404":{"$ref":"#/components/responses/NotFoundError"}}}}}}
```

## Retrieve Asset Metadata

> Retrieves metadata for a specific Asset. Does not include a download URL. The user can access an asset if it belongs to their organization or to the shared template organization.

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"tags":[{"name":"Assets","description":"Operations related to managing digital assets (images, videos, etc.)."}],"servers":[{"url":"https://api.ojin.ai/v1","description":"Main backend server, version 1."}],"security":[{"APIKeyAuth":[]}],"components":{"securitySchemes":{"APIKeyAuth":{"type":"apiKey","in":"header","name":"X-API-Key","description":"API key for authenticated access to the Ojin REST API."}},"parameters":{"AssetIdPathParameter":{"name":"asset_id","in":"path","required":true,"description":"The unique identifier (UUID) of the Asset.","schema":{"type":"string","format":"uuid"}}},"schemas":{"Asset":{"type":"object","description":"Represents a digital asset managed by the Core API.","properties":{"asset_id":{"type":"string","format":"uuid","description":"Unique identifier for the Asset, generated during upload initiation.","readOnly":true},"organization_id":{"type":"string","description":"ID of the Organisation (from external IdP) that owns this Asset. Server-set.","readOnly":true},"created_by":{"type":"string","description":"User ID of the creator (from external IdP). Server-set.","readOnly":true},"name":{"type":"string","description":"The original file name of the Asset. NOT NULL."},"category":{"type":"string","description":"Primary category of the Asset (e.g., 'video', 'image', 'weight'). NOT NULL. List of values managed in code."},"content_type":{"type":"string","description":"The MIME type of the asset. NOT NULL."},"size_bytes":{"type":"integer","format":"int64","description":"The size of the asset in bytes. NOT NULL."},"etag":{"type":"string","description":"The ETag of the S3 object, used for integrity checking. NOT NULL."},"created_at":{"type":"string","format":"date-time","description":"Timestamp of when this Asset record was created (finalized). Server-generated. Defaults to CURRENT_TIMESTAMP.","readOnly":true},"updated_at":{"type":"string","format":"date-time","description":"Timestamp of the last update to this Asset record. Server-generated. Auto-updates on modification.","readOnly":true}},"required":["asset_id","organization_id","created_by","name","category","content_type","size_bytes","etag","created_at","updated_at"]},"ErrorResponse":{"type":"object","properties":{"code":{"type":"string","description":"A short, machine-readable error code string."},"message":{"type":"string","description":"A human-readable description of the error."},"details":{"type":"object","additionalProperties":true,"nullable":true,"description":"Optional. Additional structured details about the error."}},"required":["code","message"]}},"responses":{"UnauthorizedError":{"description":"Authentication token is missing, invalid, or expired.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"ForbiddenError":{"description":"Authenticated principal does not have permission.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"NotFoundError":{"description":"The requested resource was not found.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}}}},"paths":{"/assets/{asset_id}":{"get":{"tags":["Assets"],"summary":"Retrieve Asset Metadata","description":"Retrieves metadata for a specific Asset. Does not include a download URL. The user can access an asset if it belongs to their organization or to the shared template organization.","operationId":"getAssetById","parameters":[{"$ref":"#/components/parameters/AssetIdPathParameter"}],"responses":{"200":{"description":"Successfully retrieved Asset metadata.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/Asset"}}}},"401":{"$ref":"#/components/responses/UnauthorizedError"},"403":{"$ref":"#/components/responses/ForbiddenError"},"404":{"$ref":"#/components/responses/NotFoundError"}}}}}}
```

## Delete an Asset

> Permanently deletes an Asset's metadata from the DB and its corresponding object from S3 (hard delete). This operation is restricted to assets owned by the user's organization and cannot be used on shared template assets.

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"tags":[{"name":"Assets","description":"Operations related to managing digital assets (images, videos, etc.)."}],"servers":[{"url":"https://api.ojin.ai/v1","description":"Main backend server, version 1."}],"security":[{"APIKeyAuth":[]}],"components":{"securitySchemes":{"APIKeyAuth":{"type":"apiKey","in":"header","name":"X-API-Key","description":"API key for authenticated access to the Ojin REST API."}},"parameters":{"AssetIdPathParameter":{"name":"asset_id","in":"path","required":true,"description":"The unique identifier (UUID) of the Asset.","schema":{"type":"string","format":"uuid"}}},"responses":{"NoContentSuccess":{"description":"Operation successful, no content to return."},"UnauthorizedError":{"description":"Authentication token is missing, invalid, or expired.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"ForbiddenError":{"description":"Authenticated principal does not have permission.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"NotFoundError":{"description":"The requested resource was not found.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}}},"schemas":{"ErrorResponse":{"type":"object","properties":{"code":{"type":"string","description":"A short, machine-readable error code string."},"message":{"type":"string","description":"A human-readable description of the error."},"details":{"type":"object","additionalProperties":true,"nullable":true,"description":"Optional. Additional structured details about the error."}},"required":["code","message"]}}},"paths":{"/assets/{asset_id}":{"delete":{"tags":["Assets"],"summary":"Delete an Asset","description":"Permanently deletes an Asset's metadata from the DB and its corresponding object from S3 (hard delete). This operation is restricted to assets owned by the user's organization and cannot be used on shared template assets.","operationId":"deleteAsset","parameters":[{"$ref":"#/components/parameters/AssetIdPathParameter"}],"responses":{"204":{"$ref":"#/components/responses/NoContentSuccess"},"401":{"$ref":"#/components/responses/UnauthorizedError"},"403":{"$ref":"#/components/responses/ForbiddenError"},"404":{"$ref":"#/components/responses/NotFoundError"}}}}}}
```

{% hint style="warning" %}
The idle video generator endpoint still references the deprecated `ojin/oris-1.0` model name. The underlying service uses the current Oris Portrait model. This endpoint path will be updated in a future release.
{% endhint %}

## Generate idle video from image asset

> Triggers background generation of an idle video from a source image asset\
> using the ojin/oris-1.0 idle video generator.\
> \
> The job runs asynchronously and returns a \`job\_id\` for status tracking.\
> \
> On completion, the generated video is saved as a new Asset in the organization.\
> The \`result.asset\_id\` field in the job object will contain the new asset ID.

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"tags":[{"name":"Assets","description":"Operations related to managing digital assets (images, videos, etc.)."}],"servers":[{"url":"https://api.ojin.ai/v1","description":"Main backend server, version 1."}],"security":[{"APIKeyAuth":[]}],"components":{"securitySchemes":{"APIKeyAuth":{"type":"apiKey","in":"header","name":"X-API-Key","description":"API key for authenticated access to the Ojin REST API."}},"schemas":{"IdleVideoGenerationRequest":{"type":"object","description":"Request payload to trigger idle video generation.","properties":{"source_asset_id":{"type":"string","format":"uuid","description":"The ID of the source image asset to use for generating the idle video."},"reference_template":{"type":"string","description":"The reference motion template to use (e.g., 'v1', 'v2', 'v3').","enum":["v1","v2","v3"]}},"required":["source_asset_id"]},"IdleVideoGenerationResponse":{"type":"object","description":"Response from triggering idle video generation.","properties":{"job_id":{"type":"string","format":"uuid","description":"Identifier of the background job created for this generation request."}},"required":["job_id"]},"ErrorResponse":{"type":"object","properties":{"code":{"type":"string","description":"A short, machine-readable error code string."},"message":{"type":"string","description":"A human-readable description of the error."},"details":{"type":"object","additionalProperties":true,"nullable":true,"description":"Optional. Additional structured details about the error."}},"required":["code","message"]}},"responses":{"BadRequestError":{"description":"Invalid request payload or parameters.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"UnauthorizedError":{"description":"Authentication token is missing, invalid, or expired.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}},"ForbiddenError":{"description":"Authenticated principal does not have permission.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}}}},"paths":{"/assets/generate/ojin/oris-1.0-idle-video-generator":{"post":{"tags":["Assets"],"summary":"Generate idle video from image asset","description":"Triggers background generation of an idle video from a source image asset\nusing the ojin/oris-1.0 idle video generator.\n\nThe job runs asynchronously and returns a `job_id` for status tracking.\n\nOn completion, the generated video is saved as a new Asset in the organization.\nThe `result.asset_id` field in the job object will contain the new asset ID.","operationId":"generateIdleVideo","requestBody":{"required":true,"content":{"application/json":{"schema":{"$ref":"#/components/schemas/IdleVideoGenerationRequest"}}}},"responses":{"202":{"description":"Job accepted and queued for processing.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/IdleVideoGenerationResponse"}}}},"400":{"$ref":"#/components/responses/BadRequestError"},"401":{"$ref":"#/components/responses/UnauthorizedError"},"403":{"$ref":"#/components/responses/ForbiddenError"},"404":{"description":"Source asset not found.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}}}}}}}
```

## Public Widget Endpoints

These unauthenticated endpoints are intended for browser clients such as the Ojin widget.

## Retrieve a Model Configuration (Public)

> Retrieves a specific Model Configuration by ID. This is the public variant of\
> \`GET /model-configs/{model\_config\_id}\`, designed for use by the Ojin widget\
> embedded on third-party domains. Uses permissive CORS and requires no authentication.\
> Resources are accessed by their UUID.\
> \
> Internal fields (organization\_id, created\_by) are stripped from the response.

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"tags":[{"name":"Public API","description":"Public endpoints for unauthenticated browser clients such as the Ojin widget. These routes use permissive CORS (origin: '*')\nto allow the widget to be embedded on any third-party domain. No authentication is required."}],"servers":[{"url":"https://api.ojin.ai/v1","description":"Main backend server, version 1."}],"security":[],"paths":{"/public/model-configs/{model_config_id}":{"get":{"tags":["Public API"],"summary":"Retrieve a Model Configuration (Public)","description":"Retrieves a specific Model Configuration by ID. This is the public variant of\n`GET /model-configs/{model_config_id}`, designed for use by the Ojin widget\nembedded on third-party domains. Uses permissive CORS and requires no authentication.\nResources are accessed by their UUID.\n\nInternal fields (organization_id, created_by) are stripped from the response.","operationId":"publicGetModelConfigById","parameters":[{"$ref":"#/components/parameters/ModelConfigIdPathParameter"}],"responses":{"200":{"description":"Model Configuration details (without internal fields).","content":{"application/json":{"schema":{"$ref":"#/components/schemas/PublicModelConfig"}}}},"404":{"$ref":"#/components/responses/NotFoundError"}}}}},"components":{"parameters":{"ModelConfigIdPathParameter":{"name":"model_config_id","in":"path","required":true,"description":"The unique identifier (UUID) of the Model Configuration.","schema":{"type":"string","format":"uuid"}}},"schemas":{"PublicModelConfig":{"type":"object","description":"Public variant of ModelConfig for widget embedding.\nStrips internal fields (organization_id, created_by) that are not needed by the widget.","properties":{"model_config_id":{"type":"string","format":"uuid","readOnly":true,"description":"Server-generated unique ID."},"model_id":{"type":"string","readOnly":true,"description":"ID of the parent Model, derived from the Model Variant."},"model_variant_id":{"type":"string","description":"ID of the ModelVariant being configured."},"title":{"type":"string","description":"Title for the configuration."},"model_configurations":{"type":"object","additionalProperties":true,"description":"Parameters for the model variant.","default":{}},"created_at":{"type":"string","format":"date-time","readOnly":true,"description":"Timestamp of creation."},"updated_at":{"type":"string","format":"date-time","readOnly":true,"description":"Timestamp of last update."}},"required":["model_config_id","model_id","model_variant_id","title","model_configurations","created_at","updated_at"]},"ErrorResponse":{"type":"object","properties":{"code":{"type":"string","description":"A short, machine-readable error code string."},"message":{"type":"string","description":"A human-readable description of the error."},"details":{"type":"object","additionalProperties":true,"nullable":true,"description":"Optional. Additional structured details about the error."}},"required":["code","message"]}},"responses":{"NotFoundError":{"description":"The requested resource was not found.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}}}}}
```

## Retrieve Asset Metadata with Download URL (Public)

> Retrieves metadata for a specific Asset, including a pre-signed download URL.\
> This is the public variant of \`GET /assets/{asset\_id}\`, designed for use by the\
> Ojin widget embedded on third-party domains. Uses permissive CORS and requires\
> no authentication. Resources are accessed by their UUID.\
> \
> Internal fields (organization\_id, created\_by) are stripped from the response.

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"tags":[{"name":"Public API","description":"Public endpoints for unauthenticated browser clients such as the Ojin widget. These routes use permissive CORS (origin: '*')\nto allow the widget to be embedded on any third-party domain. No authentication is required."}],"servers":[{"url":"https://api.ojin.ai/v1","description":"Main backend server, version 1."}],"security":[],"paths":{"/public/assets/{asset_id}":{"get":{"tags":["Public API"],"summary":"Retrieve Asset Metadata with Download URL (Public)","description":"Retrieves metadata for a specific Asset, including a pre-signed download URL.\nThis is the public variant of `GET /assets/{asset_id}`, designed for use by the\nOjin widget embedded on third-party domains. Uses permissive CORS and requires\nno authentication. Resources are accessed by their UUID.\n\nInternal fields (organization_id, created_by) are stripped from the response.","operationId":"publicGetAssetById","parameters":[{"$ref":"#/components/parameters/AssetIdPathParameter"}],"responses":{"200":{"description":"Successfully retrieved Asset metadata with download URL (without internal fields).","content":{"application/json":{"schema":{"$ref":"#/components/schemas/PublicAsset"}}}},"404":{"$ref":"#/components/responses/NotFoundError"}}}}},"components":{"parameters":{"AssetIdPathParameter":{"name":"asset_id","in":"path","required":true,"description":"The unique identifier (UUID) of the Asset.","schema":{"type":"string","format":"uuid"}}},"schemas":{"PublicAsset":{"type":"object","description":"Public variant of Asset for widget embedding.\nStrips internal fields (organization_id, created_by) and includes a pre-signed download URL.","properties":{"asset_id":{"type":"string","format":"uuid","readOnly":true,"description":"Unique identifier for the Asset."},"name":{"type":"string","description":"The original file name of the Asset."},"category":{"type":"string","description":"Primary category of the Asset (e.g., 'video', 'image', 'weight')."},"content_type":{"type":"string","description":"The MIME type of the asset."},"size_bytes":{"type":"integer","format":"int64","description":"The size of the asset in bytes."},"etag":{"type":"string","description":"The ETag of the S3 object."},"created_at":{"type":"string","format":"date-time","readOnly":true,"description":"Timestamp of creation."},"updated_at":{"type":"string","format":"date-time","readOnly":true,"description":"Timestamp of last update."},"asset_url":{"type":"string","format":"uri","readOnly":true,"description":"A temporary, pre-signed URL to download the asset's content. This URL will expire."}},"required":["asset_id","name","category","content_type","size_bytes","etag","created_at","updated_at","asset_url"]},"ErrorResponse":{"type":"object","properties":{"code":{"type":"string","description":"A short, machine-readable error code string."},"message":{"type":"string","description":"A human-readable description of the error."},"details":{"type":"object","additionalProperties":true,"nullable":true,"description":"Optional. Additional structured details about the error."}},"required":["code","message"]}},"responses":{"NotFoundError":{"description":"The requested resource was not found.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponse"}}}}}}}
```

## Schemas

## The ModelVariant object

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"components":{"schemas":{"ModelVariant":{"type":"object","description":"Represents a specific variant of a Model.","properties":{"model_variant_id":{"type":"string","description":"Client-provided unique identifier (e.g., \"ojin/oris-v1/standard\")."},"model_id":{"type":"string","description":"Identifier of the parent Model. NOT NULL."},"title":{"type":"string","description":"Display name for the variant. NOT NULL, unique per model_id."},"description":{"type":"object","additionalProperties":{"type":"string"},"description":"UI display texts. NOT NULL, defaults to '{}'."},"preview_media_url":{"type":"string","format":"url","nullable":true,"description":"URL for a preview media."},"configuration_schema":{"type":"object","description":"JSON schema for ModelConfig.model_configurations. NOT NULL, defaults to '{}'.","additionalProperties":true},"tags":{"type":"array","items":{"type":"string"},"description":"List of descriptive tags. NOT NULL, defaults to '[]'."},"status":{"type":"string","description":"Status (e.g., 'available', 'deprecated'). NOT NULL. List managed in code."},"created_at":{"type":"string","format":"date-time","description":"Timestamp of creation. Server-generated. Defaults to CURRENT_TIMESTAMP.","readOnly":true},"updated_at":{"type":"string","format":"date-time","description":"Timestamp of last update. Server-generated. Auto-updates.","readOnly":true}},"required":["model_variant_id","model_id","title","description","configuration_schema","tags","status","created_at","updated_at"]}}}}
```

## The ModelConfig object

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"components":{"schemas":{"ModelConfig":{"type":"object","description":"Represents a specific configuration of a ModelVariant.","properties":{"model_config_id":{"type":"string","format":"uuid","readOnly":true,"description":"Server-generated unique ID."},"organization_id":{"type":"string","readOnly":true,"description":"Owning organization ID (from external IdP)."},"model_id":{"type":"string","readOnly":true,"description":"ID of the parent Model, derived from the Model Variant."},"model_variant_id":{"type":"string","description":"ID of the ModelVariant being configured. NOT NULL."},"created_by":{"type":"string","readOnly":true,"description":"Creator's user ID (from external IdP)."},"title":{"type":"string","description":"Title for the configuration. NOT NULL, unique per organization_id."},"model_configurations":{"type":"object","additionalProperties":true,"description":"Parameters for the model variant, adhering to its schema. NOT NULL, defaults to '{}'.","default":{}},"created_at":{"type":"string","format":"date-time","readOnly":true,"description":"Timestamp of creation. Defaults to CURRENT_TIMESTAMP."},"updated_at":{"type":"string","format":"date-time","readOnly":true,"description":"Timestamp of last update. Auto-updates."}},"required":["model_config_id","organization_id","model_id","model_variant_id","created_by","title","model_configurations","created_at","updated_at"]}}}}
```

## The ModelConfigCreationRequest object

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"components":{"schemas":{"ModelConfigCreationRequest":{"type":"object","description":"Payload for creating a new Model Configuration.","properties":{"title":{"type":"string"},"model_variant_id":{"type":"string"},"model_configurations":{"type":"object","additionalProperties":true,"default":{}}},"required":["title","model_variant_id"]}}}}
```

## The ModelConfigUpdateRequest object

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"components":{"schemas":{"ModelConfigUpdateRequest":{"type":"object","description":"Payload for updating a Model Configuration (title and parameters only).","properties":{"title":{"type":"string"},"model_configurations":{"type":"object","additionalProperties":true}},"required":["title","model_configurations"]}}}}
```

## The Pagination object

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"components":{"schemas":{"Pagination":{"type":"object","properties":{"limit":{"type":"integer","description":"The number of items returned in the current page."},"offset":{"type":"integer","description":"The number of items skipped before starting the current page."},"total_items":{"type":"integer","description":"The total number of items available that match the query."}},"required":["limit","offset","total_items"]}}}}
```

## The Asset object

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"components":{"schemas":{"Asset":{"type":"object","description":"Represents a digital asset managed by the Core API.","properties":{"asset_id":{"type":"string","format":"uuid","description":"Unique identifier for the Asset, generated during upload initiation.","readOnly":true},"organization_id":{"type":"string","description":"ID of the Organisation (from external IdP) that owns this Asset. Server-set.","readOnly":true},"created_by":{"type":"string","description":"User ID of the creator (from external IdP). Server-set.","readOnly":true},"name":{"type":"string","description":"The original file name of the Asset. NOT NULL."},"category":{"type":"string","description":"Primary category of the Asset (e.g., 'video', 'image', 'weight'). NOT NULL. List of values managed in code."},"content_type":{"type":"string","description":"The MIME type of the asset. NOT NULL."},"size_bytes":{"type":"integer","format":"int64","description":"The size of the asset in bytes. NOT NULL."},"etag":{"type":"string","description":"The ETag of the S3 object, used for integrity checking. NOT NULL."},"created_at":{"type":"string","format":"date-time","description":"Timestamp of when this Asset record was created (finalized). Server-generated. Defaults to CURRENT_TIMESTAMP.","readOnly":true},"updated_at":{"type":"string","format":"date-time","description":"Timestamp of the last update to this Asset record. Server-generated. Auto-updates on modification.","readOnly":true}},"required":["asset_id","organization_id","created_by","name","category","content_type","size_bytes","etag","created_at","updated_at"]}}}}
```

## The AssetInitiateUploadRequest object

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"components":{"schemas":{"AssetInitiateUploadRequest":{"type":"object","description":"Payload to initiate a multipart asset upload.","properties":{"name":{"type":"string","description":"Original filename of the asset."},"category":{"type":"string","description":"Category for the asset (e.g., 'video', 'image')."},"content_type":{"type":"string","nullable":true,"description":"Client-declared MIME type of the file."},"size_bytes":{"type":"integer","format":"int64","nullable":true,"description":"Client-declared file size in bytes."}},"required":["name","category"]}}}}
```

## The AssetInitiateUploadResponse object

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"components":{"schemas":{"AssetInitiateUploadResponse":{"type":"object","description":"Response from initiating a multipart asset upload.","properties":{"asset_id":{"type":"string","format":"uuid","description":"A unique ID generated by Core API for this asset transaction."},"upload_id":{"type":"string","description":"The multipart upload ID from S3, used to identify this multipart upload session."},"s3_key":{"type":"string","description":"The S3 key for this asset."}},"required":["asset_id","upload_id"]}}}}
```

## The AssetSignPartRequest object

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"components":{"schemas":{"AssetSignPartRequest":{"type":"object","description":"Payload to get a pre-signed URL for uploading a specific part of a multipart upload.","properties":{"s3_key":{"type":"string","description":"The S3 key for this asset."},"asset_id":{"type":"string","format":"uuid","description":"The asset ID from the initiate upload response."},"upload_id":{"type":"string","description":"The multipart upload ID from the initiate upload response."},"part_number":{"type":"integer","minimum":1,"maximum":10000,"description":"The part number for this upload part (1-10000)."}},"required":["asset_id","upload_id","part_number"]}}}}
```

## The AssetSignPartResponse object

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"components":{"schemas":{"AssetSignPartResponse":{"type":"object","description":"Response containing the pre-signed URL for uploading a specific part.","properties":{"upload_url":{"type":"string","format":"url","description":"The pre-signed S3 URL to PUT this specific part to."},"part_number":{"type":"integer","description":"The part number this URL is for."}},"required":["upload_url","part_number"]}}}}
```

## The AssetFinalizationRequest object

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"components":{"schemas":{"AssetFinalizationRequest":{"type":"object","description":"Payload to finalize a multipart asset upload and create the asset metadata record in DB.","properties":{"asset_id":{"type":"string","format":"uuid","description":"The unique ID received from the 'initiate-upload' step."},"upload_id":{"type":"string","description":"The multipart upload ID from the 'initiate-upload' step."},"name":{"type":"string","description":"The original filename (must be consistent with initiate request)."},"category":{"type":"string","description":"The asset category (must be consistent with initiate request)."},"content_type":{"type":"string","description":"Final confirmed MIME type of the asset."},"size_bytes":{"type":"integer","format":"int64","description":"Final confirmed size of the asset in bytes."},"parts":{"type":"array","items":{"$ref":"#/components/schemas/AssetUploadPart"},"description":"List of all uploaded parts with their ETags, in order.","minItems":1}},"required":["asset_id","upload_id","name","category","content_type","size_bytes","parts"]},"AssetUploadPart":{"type":"object","description":"Information about a completed upload part.","properties":{"part_number":{"type":"integer","minimum":1,"maximum":10000,"description":"The part number that was uploaded."},"etag":{"type":"string","description":"The ETag returned by S3 after successfully uploading this part."}},"required":["part_number","etag"]}}}}
```

## The AssetUploadPart object

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"components":{"schemas":{"AssetUploadPart":{"type":"object","description":"Information about a completed upload part.","properties":{"part_number":{"type":"integer","minimum":1,"maximum":10000,"description":"The part number that was uploaded."},"etag":{"type":"string","description":"The ETag returned by S3 after successfully uploading this part."}},"required":["part_number","etag"]}}}}
```

## The IdleVideoGenerationRequest object

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"components":{"schemas":{"IdleVideoGenerationRequest":{"type":"object","description":"Request payload to trigger idle video generation.","properties":{"source_asset_id":{"type":"string","format":"uuid","description":"The ID of the source image asset to use for generating the idle video."},"reference_template":{"type":"string","description":"The reference motion template to use (e.g., 'v1', 'v2', 'v3').","enum":["v1","v2","v3"]}},"required":["source_asset_id"]}}}}
```

## The IdleVideoGenerationResponse object

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"components":{"schemas":{"IdleVideoGenerationResponse":{"type":"object","description":"Response from triggering idle video generation.","properties":{"job_id":{"type":"string","format":"uuid","description":"Identifier of the background job created for this generation request."}},"required":["job_id"]}}}}
```

## The PublicModelConfig object

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"components":{"schemas":{"PublicModelConfig":{"type":"object","description":"Public variant of ModelConfig for widget embedding.\nStrips internal fields (organization_id, created_by) that are not needed by the widget.","properties":{"model_config_id":{"type":"string","format":"uuid","readOnly":true,"description":"Server-generated unique ID."},"model_id":{"type":"string","readOnly":true,"description":"ID of the parent Model, derived from the Model Variant."},"model_variant_id":{"type":"string","description":"ID of the ModelVariant being configured."},"title":{"type":"string","description":"Title for the configuration."},"model_configurations":{"type":"object","additionalProperties":true,"description":"Parameters for the model variant.","default":{}},"created_at":{"type":"string","format":"date-time","readOnly":true,"description":"Timestamp of creation."},"updated_at":{"type":"string","format":"date-time","readOnly":true,"description":"Timestamp of last update."}},"required":["model_config_id","model_id","model_variant_id","title","model_configurations","created_at","updated_at"]}}}}
```

## The PublicAsset object

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"components":{"schemas":{"PublicAsset":{"type":"object","description":"Public variant of Asset for widget embedding.\nStrips internal fields (organization_id, created_by) and includes a pre-signed download URL.","properties":{"asset_id":{"type":"string","format":"uuid","readOnly":true,"description":"Unique identifier for the Asset."},"name":{"type":"string","description":"The original file name of the Asset."},"category":{"type":"string","description":"Primary category of the Asset (e.g., 'video', 'image', 'weight')."},"content_type":{"type":"string","description":"The MIME type of the asset."},"size_bytes":{"type":"integer","format":"int64","description":"The size of the asset in bytes."},"etag":{"type":"string","description":"The ETag of the S3 object."},"created_at":{"type":"string","format":"date-time","readOnly":true,"description":"Timestamp of creation."},"updated_at":{"type":"string","format":"date-time","readOnly":true,"description":"Timestamp of last update."},"asset_url":{"type":"string","format":"uri","readOnly":true,"description":"A temporary, pre-signed URL to download the asset's content. This URL will expire."}},"required":["asset_id","name","category","content_type","size_bytes","etag","created_at","updated_at","asset_url"]}}}}
```

## The ErrorResponse object

```json
{"openapi":"3.0.0","info":{"title":"Ojin REST API","version":"v1.0.0"},"components":{"schemas":{"ErrorResponse":{"type":"object","properties":{"code":{"type":"string","description":"A short, machine-readable error code string."},"message":{"type":"string","description":"A human-readable description of the error."},"details":{"type":"object","additionalProperties":true,"nullable":true,"description":"Optional. Additional structured details about the error."}},"required":["code","message"]}}}}
```


# Support

Need help with the Ojin platform? We're here to assist you with any questions, issues, or feedback you may have.

## Contact Us

For support inquiries, please reach out to our team:

**Email**: <hello@ojin.ai>

Our support team will respond to your inquiry as soon as possible.

## What to Include in Your Support Request

To help us assist you more efficiently, please include the following information when contacting support:

* **Description**: A clear description of your issue or question
* **Model**: Which model you're working with (e.g., `ojin/oris-portrait`, `ojin/oris-voice`)
* **Error Messages**: Any error messages or codes you're encountering
* **Steps to Reproduce**: If applicable, steps to reproduce the issue
* **Expected vs Actual Behavior**: What you expected to happen vs what actually happened
* **Environment**: Your development environment details (language, framework, etc.)

{% hint style="info" %}
Before reaching out, check our [Troubleshooting](https://github.com/journee-live/ojin/blob/main/docs/public/best-practices/troubleshooting.md) guide for common issues and solutions.
{% endhint %}

## Additional Resources

* [Documentation](/)
* [Quickstart Guide](/getting-started/quickstart)
* [Oris Portrait — realtime API](/models/oris-portrait/api)
* [Oris Voice — realtime API](/models/oris-voice/api)
* [Troubleshooting](https://github.com/journee-live/ojin/blob/main/docs/public/best-practices/troubleshooting.md)


# Human Agent

> Give your AI agent a face — a lifelike visual avatar that speaks, listens, and expresses emotion with ultra-low-latency synchronized animation.

## Overview

The Human Agent is an end-to-end conversational AI application that combines speech-to-speech (STS) with a lifelike animated face powered by Ojin's avatar model [ojin/oris-portrait](/models/oris-portrait). Instead of wiring up STT, LLM, TTS, and avatar services yourself, you create an agent in the Ojin dashboard, embed a single widget on your site, and your users get a live, ultra-low-latency conversation with a fully animated avatar.

## Key Features

* **Lifelike visual avatar** — your agent has a face with synchronized lip movements and natural expressions
* **End-to-end conversational AI** — speech in, speech out, no pipeline assembly required
* **Ultra-low latency** — real-time WebRTC transport for audio, video, and signaling
* **No pipeline assembly required** — Ojin manages the full speech-to-speech stack (or bring your own provider)
* **Drop-in widget** — one HTML tag to embed the agent in any web page
* **Dashboard configuration** — create, configure, and monitor agents without writing code

## Agent Modes

### Ojin Agent (managed)

You configure the personality and appearance — system prompt, face, voice, and behaviour. Ojin handles everything else: the conversational pipeline, avatar rendering, and infrastructure. You never see or manage the underlying providers.

### Third-Party Agent

You bring your own speech-to-speech provider — Hume, ElevenLabs Agents, or Ultravox. Ojin adds the visual avatar layer and runs the agent through Ojin infrastructure. You supply your provider API key and config ID in the dashboard; Ojin handles the rest.

## How It Works

1. **Create an agent** in the [Ojin dashboard](https://ojin.ai/dashboard) — pick a mode, configure it, go online
2. **Get your agent ID** from the agent settings page
3. **Embed the widget** on your site or call the Session API from your backend
4. **Your users interact** via WebRTC — audio, video, and real-time avatar in a single connection

## Use Cases

* **Customer Support** — let customers talk to a lifelike agent instead of a chatbot
* **Sales** — greet and qualify leads with a conversational avatar
* **Education** — build interactive tutors with natural speech and expressions
* **Healthcare** — create empathetic virtual health assistants
* **Reception** — deploy a digital receptionist on your website or kiosk

## Pricing

Agent sessions are metered based on usage. Visit [ojin.ai/pricing](https://ojin.ai/pricing) for details on plans and per-session costs.

## Quick Start

1. [**Create an API key**](/getting-started/authentication) — set up authentication for the Ojin platform
2. [**Create & configure your agent**](/apps/overview/configure) — set up appearance, voice, and behaviour
3. [**Widget Integration**](/apps/overview/widget-integration) — drop the agent into your web page
4. [**Session API Reference**](/apps/overview/api-reference) — for custom integrations beyond the widget


# Create & Configure

> Set up your Human Agent in the Ojin dashboard — pick a mode, configure appearance and behaviour, and go online.

## Prerequisites

1. An Ojin account with an active API key — [get your API key](/getting-started/authentication)

## Creating an Agent

Head to the [Ojin dashboard](https://ojin.ai/dashboard) and create a new agent. Give it a name and pick your mode:

* **Ojin Agent** — Ojin handles everything: the conversational pipeline and avatar rendering. You just configure the personality and appearance.
* **Third-Party Agent** — you bring your own speech-to-speech provider (Hume, ElevenLabs Agents, or Ultravox). Ojin adds the visual avatar.

### Starting from a Preset

If you choose Ojin Agent mode, you can start from a preset — a ready-made template that pre-fills the face, voice, system prompt, and behaviour for common use cases like Customer Support, Sales Assistant, or Receptionist. You can customize everything after creation.

Alternatively, start blank and configure each field yourself.

## Configuring an Ojin Agent

### System Prompt

Write a prompt that defines how the agent behaves — its personality, knowledge, tone, and any instructions for the conversation. This works the same as a system prompt for any LLM.

### Face

Pick a visual appearance for your agent. The face is powered by Ojin's avatar model [ojin/oris-portrait](/models/oris-portrait), which generates lifelike personas from a single reference image. Browse the available face configurations in the dashboard and select the one that best fits your agent's personality.

### Voice

Select a TTS voice for the agent's speech output. The voice determines how the agent sounds when it responds.

### Behaviour

Configure how the agent interacts:

* **Greeting** — the first thing the agent says when a session starts (e.g., "Hi, how can I help you today?")
* **Vocal burst** — a short sound the agent makes while listening (e.g., "mhm", "uh-huh") to signal active listening
* **Nudge** — a message the agent sends if the user stays silent for a while

### Advanced Settings

Fine-tune the agent's audio processing:

* **VAD stop seconds** — how long to wait after the user stops speaking before the agent responds (default: 0.5s)
* **Min volume** — minimum audio volume threshold to detect speech (default: 0.1)
* **Noise filter** — toggle background noise filtering (default: off)

### Session Limit

Set the maximum duration for a single session in seconds. Default is 600 seconds (10 minutes). When the limit is reached, the session ends automatically.

## Configuring a Third-Party Agent

### Face

Same as Ojin Agent — pick a visual appearance using Ojin's avatar model [ojin/oris-portrait](/models/oris-portrait). Regardless of which provider powers the conversation, Ojin renders the avatar.

### Provider

Select your speech-to-speech provider from the dropdown:

* **Hume** — enter your Hume API key. Config ID is optional (Hume can use a default config).
* **ElevenLabs Agents** — enter your ElevenLabs API key and your ElevenLabs agent ID (required).
* **Ultravox** — enter your Ultravox API key. Config ID is optional.

{% hint style="warning" %}
Your provider API key is encrypted at rest and never displayed after you save it. To rotate a key, enter the new one in the dashboard — it takes effect for new sessions immediately.
{% endhint %}

## Going Online

When your agent is configured, toggle its status to **online** in the [dashboard](https://ojin.ai/dashboard). The system validates that all required fields are filled before allowing the agent to go live:

* **Ojin Agent** requires: system prompt, voice, and face
* **Third-Party Agent** requires: provider, API key, and face (plus any provider-specific required fields)

If anything is missing, the dashboard tells you which fields need attention.

## Concurrency

The **max concurrency** setting controls how many simultaneous sessions your agent can handle. Default is 1 — meaning one user can talk to the agent at a time. Increase this if you expect multiple concurrent users.

When all slots are in use, new connection attempts receive a `concurrency_limit` error with a `retry_after_seconds` hint.

## Next Steps

* [**Widget Integration**](/apps/overview/widget-integration) — add the agent to your web page
* [**Session API Reference**](/apps/overview/api-reference) — for custom integrations


# Widget Integration

> Embed a fully interactive Human Agent on your website with a single HTML tag.

## Quick Start

Add the Ojin widget script to your page and drop in the agent tag:

```html
<script src="https://widget.ojin.ai/ojin-agent.js"></script>
<ojin-agent agent-id="your-agent-id"></ojin-agent>
```

{% hint style="info" %}
The widget script URL is available in your agent's settings page in the [dashboard](https://ojin.ai/dashboard). Copy it from there to ensure you're using the latest version.
{% endhint %}

That's it. Your users get a live conversational agent with video, audio, and a real-time avatar — no backend code required.

## Appearance & Layout

The widget renders as a floating overlay on your page. It includes the avatar video feed, audio controls, and session management — all self-contained.

{% hint style="info" %}
Widget styling and positioning options are configured in your agent's settings in the [dashboard](https://ojin.ai/dashboard). Check your agent's widget configuration for available customization options.
{% endhint %}

## Finding Your Agent ID

After [creating and configuring your agent](/apps/overview/configure), copy the agent ID from the agent settings page in the [Ojin dashboard](https://ojin.ai/dashboard).

## Controlling Access

Agents are **private by default** — connections from any website are rejected until you configure access.

### Hostname Allowlist

In the [dashboard](https://ojin.ai/dashboard), add the domains that are allowed to connect to your agent:

| Configuration                    | Behaviour                                           |
| -------------------------------- | --------------------------------------------------- |
| No hostnames configured          | **Private** — all connections rejected              |
| `myapp.com`, `staging.myapp.com` | Only requests from those exact domains are accepted |
| `*`                              | **Public** — connections accepted from any domain   |

{% hint style="warning" %}
Using `*` (wildcard) allows anyone to connect to your agent. This is useful for demos but not recommended for production. Use specific hostnames to restrict access to your own domains.
{% endhint %}

{% hint style="info" %}
No API key is needed in the widget. Access is controlled entirely by the hostname allowlist — the server checks the request's origin against your configured domains.
{% endhint %}

## How It Works

When the widget loads on your page:

1. The widget calls the Session API (`POST /v1/public/agents/connect`) with your agent ID
2. The server validates the request origin against the hostname allowlist
3. On success, the server returns a WebRTC room URL and token
4. The widget connects to the WebRTC room directly
5. Audio, video, and signaling all flow over a single WebRTC connection

The entire handshake takes a few seconds. After that, your user is in a live conversation with the agent.

## Custom Integrations

If you need more control than the widget provides — for example, building a native mobile app or a custom web experience — you can call the Session API directly from your backend and connect to the WebRTC room yourself.

See the [Session API Reference](/apps/overview/api-reference) for details.

## Troubleshooting

### Widget not loading

* Verify the widget script URL is correct and the script loads without errors (check browser console)
* Confirm the `agent-id` attribute matches your agent's ID in the dashboard

### "agent\_offline" error

* Check that your agent's status is set to **online** in the [dashboard](https://ojin.ai/dashboard)
* Ensure all required fields are filled — the dashboard won't let you go online with missing configuration

### "auth\_failed" error

* Verify your domain is in the agent's hostname allowlist in the [dashboard](https://ojin.ai/dashboard)
* Check that the `Origin` header sent by the browser matches one of your configured hostnames exactly
* If testing locally, add `localhost` or `127.0.0.1` to the allowlist

### "concurrency\_limit" error

* Your agent's maximum concurrent sessions are in use. Wait for an active session to end or increase the **max concurrency** setting in the dashboard

### No audio or video

* Ensure the user's browser has granted microphone and camera permissions
* Check that the WebRTC connection is not blocked by a firewall or corporate proxy


# Session API Reference

> Start agent sessions programmatically. The widget uses this API automatically — call it directly for custom integrations.

## Base URL

```
https://{region}.agents.ojin.ai
```

The `agents.ojin.ai` domain uses geoproximity routing to direct requests to the nearest regional endpoint.

## Start a Session

### `POST /v1/public/agents/connect`

Start a new session for a given agent. Returns connection details for joining the WebRTC room.

### Request

**Headers:**

| Header         | Required | Description        |
| -------------- | -------- | ------------------ |
| `Content-Type` | Yes      | `application/json` |

**Body:**

```json
{
  "agent_id": "your-agent-id",
  "client_user_ref": "optional-user-identifier"
}
```

| Field             | Type   | Required | Description                                                                                                     |
| ----------------- | ------ | -------- | --------------------------------------------------------------------------------------------------------------- |
| `agent_id`        | string | Yes      | The ID of the agent to start a session with                                                                     |
| `client_user_ref` | string | No       | An opaque identifier for your end user. Appears in call history for analytics and tracking. Max 256 characters. |

### Response (200 OK)

```json
{
  "session_id": "550e8400-e29b-41d4-a716-446655440000",
  "room_url": "https://example.daily.co/room-name",
  "token": "eyJhbGciOiJIUzI1NiIs..."
}
```

| Field        | Type   | Description                               |
| ------------ | ------ | ----------------------------------------- |
| `session_id` | string | Unique identifier for this session        |
| `room_url`   | string | WebRTC room URL to connect to             |
| `token`      | string | Authentication token for joining the room |

Use `room_url` and `token` to connect to the WebRTC room from your client. If you're building a web application, you can use the [Daily JavaScript SDK](https://docs.daily.co/reference/daily-js) to join the room.

### Error Responses

All errors return a JSON body with `error_code` and `message`:

```json
{
  "error_code": "agent_unpublished",
  "message": "Agent exists but is currently unpublished"
}
```

For `concurrency_limit` errors, the response includes an additional `retry_after_seconds` field indicating how long to wait before retrying:

```json
{
  "error_code": "concurrency_limit",
  "message": "Max concurrency reached",
  "retry_after_seconds": 5
}
```

| Error Code             | HTTP Status | Description                                                      |
| ---------------------- | ----------- | ---------------------------------------------------------------- |
| `agent_not_found`      | 404         | Agent ID does not exist or is not visible to the caller          |
| `agent_unpublished`    | 403         | Agent exists but status is unpublished                           |
| `auth_failed`          | 403         | Hostname allowlist validation failed                             |
| `concurrency_limit`    | 429         | Max concurrency reached. Response includes `retry_after_seconds` |
| `room_creation_failed` | 502         | WebRTC room could not be created                                 |
| `orchestrator_error`   | 502         | Agent orchestrator is unreachable or returned an error           |
| `internal_error`       | 500         | Unexpected server error                                          |

### Authentication

Currently, the public Session API uses **hostname allowlist** authentication. When a browser request arrives, the server checks the `Origin` header against the domains configured on the agent:

* If the origin matches an allowed hostname, the request proceeds
* If no hostnames are configured, all requests are rejected (agent is private)
* If `*` is configured, all origins are accepted

No API key or token is needed from the client side when using hostname allowlist.

{% hint style="info" %}
Server-side calls without a browser `Origin` header only work for agents configured with `*` in their allowed hostnames. Production server-to-server authentication is planned separately.
{% endhint %}

## Examples

### curl

```bash
# Works only for wildcard/demo agents because curl does not attach a browser Origin.
curl -X POST https://eu-central-1.agents.ojin.ai/v1/public/agents/connect \
  -H "Content-Type: application/json" \
  -d '{"agent_id": "your-agent-id"}'
```

### Python

```python
import requests

# Works only for wildcard/demo agents because server-side HTTP clients do not
# attach a browser Origin.
response = requests.post(
    "https://eu-central-1.agents.ojin.ai/v1/public/agents/connect",
    json={"agent_id": "your-agent-id"}
)

data = response.json()
print(f"Session: {data['session_id']}")
print(f"Room: {data['room_url']}")
```

### JavaScript

```javascript
const response = await fetch(
    "https://eu-central-1.agents.ojin.ai/v1/public/agents/connect",
    {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ agent_id: "your-agent-id" })
    }
);

const { session_id, room_url, token } = await response.json();
```

## Connecting to the Room

After receiving the `room_url` and `token` from the Session API, connect to the WebRTC room using the [Daily JavaScript SDK](https://docs.daily.co/reference/daily-js):

```bash
npm install @daily-co/daily-js
```

```javascript
import DailyIframe from '@daily-co/daily-js';

// Create a call frame (attaches to DOM automatically)
const callFrame = DailyIframe.createFrame();

// Join the room with the token from the Session API
await callFrame.join({ url: room_url, token: token });

// The agent's audio and video tracks are now available
callFrame.on('track-started', (event) => {
    if (event.participant && !event.participant.local) {
        // Remote participant = the agent
        // event.track contains the audio or video MediaStreamTrack
    }
});
```

{% hint style="info" %}
This is only needed for custom integrations. If you're using the [widget](/apps/overview/widget-integration), it handles the room connection automatically.
{% endhint %}


# ojin/oris-portrait

> A lifelike persona model that transforms reference images into natural animated personas

## Overview

The ojin/oris-portrait model is our flagship persona generation technology that creates realistic, expressive digital humans from a reference image. It excels at producing natural facial animations, lip-syncing, and emotional expressions that bring your persona to life with synchronized speech.

## Key Features

* **Full persona look control** - Generate a persona based on any image reference, the persona will behave exactly the same
* **No training required** - You don't need to wait for your persona to be ready, as soon as the reference image is uploaded, you can start using it
* **Natural Lip-Syncing** - Precise lip movements synchronized with speech audio
* **Emotional Expressions** - Support for multiple emotional states and expressions
* **Low Latency** - Fast processing for real-time applications
* **High Resolution** - Support for up to 720p output resolution

## Quick Start

Getting started with ojin/oris-portrait is simple:

1. [**Create an API key**](/getting-started/authentication) - Set up authentication for the Ojin platform
2. [**Use a persona template**](/models/oris-portrait/using-persona-template) - Use a persona template to generate your persona in seconds
3. [**Integrate with your application**](/models/oris-portrait/integrations) - Use either Pipecat or WebSocket API

## Use Cases

* **Virtual Assistants** - Create responsive customer service personas
* **Educational Content** - Develop engaging tutors and instructors
* **Entertainment** - Produce animated characters for games and media
* **Presentations** - Transform static slides into dynamic video presentations
* **Healthcare** - Build empathetic virtual health assistants


# Get started

This guide explains how to integrate the `ojin/oris-portrait` persona model into your applications using either Pipecat or WebSockets

## Prerequisites

1. An Ojin account with an active API key, if you don't have one [get your API key](/getting-started/authentication)
2. [Create a Persona](/models/oris-portrait/creating-persona) or use a [Persona Template](/models/oris-portrait/using-persona-template)
3. Save the Persona Configuration ID from the dashboard
4. Integrate with your application using either [Pipecat](#pipecat-integration) or [WebSockets](#websocket-integration)

{% hint style="info" %}
**Staging deployments:** For secure, low-latency video applications, connect to the real-time WebSocket API from a backend server rather than a front-end client (to keep your API key secure and leverage a network transport appropriate for real-time video media delivery under varying network conditions). Typically, WebRTC is used to deliver the final media stream to end users for smooth, reliable, low-latency playback.
{% endhint %}

{% tabs %}
{% tab title="Pipecat" %}

#### Pipecat Integration

[Pipecat](https://github.com/pipecat-ai/pipecat) is a powerful open source framework for building conversational AI pipelines. The `ojin/oris-portrait` model integrates seamlessly with Pipecat through our dedicated `OjinVideoService`.

Clone the pipecat repository and open the ready to use [video-avatar-ojin-video-service example](https://github.com/ojinai/pipecat/blob/add-ojin/examples/video-avatar/video-avatar-ojin-video-service.py).

To start using it, create a python virtual environment on it and install requirements

```bash
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```

Create a `.env` file and add your Ojin credentials along with the API keys for the STT, LLM and TTS services used by the example

```bash
OJIN_API_KEY="your_api_key_here"
OJIN_CONFIG_ID="your_persona_id_here"
DEEPGRAM_API_KEY="your_deepgram_api_key_here"
CARTESIA_API_KEY="your_cartesia_api_key_here"
OPENAI_API_KEY="your_openai_api_key_here"
```

Once configured, run the example and connect with a local WebRTC client (or Daily) to interact with a conversational, human-like avatar

```bash
python examples/video-avatar/video-avatar-ojin-video-service.py
```

**How It Works**

1. The microphone captures speech input
2. Voice Activity Detection identifies speech segments
3. Deepgram transcribes user audio to text (STT)
4. OpenAI generates the assistant's reply (LLM)
5. Cartesia synthesizes the reply to audio (TTS)
6. The OjinVideoService animates your persona based on the TTS audio
7. Video and audio frames are streamed back to the client in real-time

{% hint style="info" %}
You can customize the pipeline by adding or removing components, or by adjusting their parameters to suit your needs.
{% endhint %}
{% endtab %}

{% tab title="WebSocket" %}

#### WebSocket Integration

For WebSocket integration check our [API Reference →](/models/oris-portrait/api)
{% endtab %}
{% endtabs %}

## Next Steps

#### API Reference

Dive deeper into the model API for custom integrations

[API Reference →](/models/oris-portrait/api)


# Using a Persona Template

Learn how to create a persona out of a template to get started ASAP with your application.

## Prerequisites

* An Ojin account with an active API key

## Creating a Persona through the Dashboard

The simplest way to create a persona is through the Ojin Dashboard:

1. Log in to the [Ojin Dashboard](https://ojin.ai)
2. Navigate to the [**Oris Portrait**](https://ojin.ai/models/ojin/oris-portrait) section
3. Navigate to [**Configs**](https://ojin.ai/models/ojin/oris-portrait/configs) sub-section
4. Select a persona template and press **Copy Template**
5. Open the newly created model configuration and save **Model Config ID** parameter which will be used by your application
6. You can now integrate it through the [**model API endpoints**](https://ojin.ai/models/ojin/oris-portrait/docs)

## Next Steps

Once your persona is ready, you can:

#### Integration Guide

Learn how to integrate your persona using Pipecat or WebSocket

[View Integration Guide →](/models/oris-portrait/integrations)

#### API Reference

Explore the complete API documentation

[View API Reference →](/models/oris-portrait/api)


# Creating a custom Persona

Before you can start integrating with the ojin/oris-portrait model, you'll need to create a persona configuration. This guide walks you through the process of creating a persona that looks exactly how you want.

## Prerequisites

* An Ojin account with an active API key
* A high-quality reference image of the persona you want to animate (check [Reference Image best practices](#reference-image-best-practices) for more details)

{% hint style="info" %}
For best results, follow the reference image best practices below.
{% endhint %}

## Creating a Persona configuration through the Dashboard

The simplest way to create a persona is through the Ojin Dashboard:

1. Log in to the [Ojin Dashboard](https://ojin.ai)
2. Select the [**ojin/oris-portrait**](https://ojin.ai/models/ojin/oris-portrait) model
3. Navigate to [**Configs**](https://ojin.ai/models/ojin/oris-portrait/configs) sub-section
4. Press **New Configuration** button to create a new configuration
5. Fill in required fields and upload a reference image. Make sure to follow the instructions below on how the image should look.
6. Click **Create Configuration**
7. Open your newly created configuration and copy **Model Config ID** parameter which will be used by your application

### Reference Image best practices

* **Image content**:
  * Mouth should be closed, but smiles or subtle expressions are fine
  * The eyes should be looking directly into the camera
  * Keep the expression balanced — the image is used as the base for both the idle loop and speech, so avoid extreme poses
* **Format**: JPEG, PNG, or WebP
* **Lighting**: Even lighting with no harsh shadows
* **Face Position**: The face should be clearly visible and centered
* **Background**: Simple backgrounds work best
* **Accessories**: Avoid sunglasses or items that obscure facial features

## Next Steps

Once your persona is ready, you can:

#### Integration Guide

Learn how to integrate your persona using Pipecat or WebSocket

[View Integration Guide →](/models/oris-portrait/integrations)

#### API Reference

Explore the complete API documentation

[View API Reference →](/models/oris-portrait/api)


# API Reference

## Overview

Real-time talking head synthesis API. Send speech audio, receive synchronized video and audio frames.

After connecting and receiving `SessionReady`, you should send one initial audio message with silence (one frame worth of silent audio) to start the interaction on the server. The server then immediately begins streaming video and audio frames at 25fps. When no speech audio has been sent, the server generates **silence frames** (persona at rest with idle animation). When you send speech audio, the server generates **speech frames** with lip-synced animation synchronized to your audio.

After the initial silence frame, you only need to send speech audio — no additional silence, padding, or keep-alive messages are required.

{% hint style="info" %}
**Staging deployments:** For secure, low-latency video applications, connect to the real-time WebSocket API from a backend server rather than a front-end client (to keep your API key secure and leverage a network transport appropriate for real-time video media delivery under varying network conditions). Typically, WebRTC is used to deliver the final media stream to end users for smooth, reliable, low-latency playback.
{% endhint %}

***

## How It Works

1. **Connect** to the WebSocket endpoint with your API key and config ID
2. **Receive `SessionReady`** — the server has allocated inference resources for your session
3. **Send** initial audio message with silence for one frame.
4. **The server starts streaming frames immediately** — silence frames with idle animation
5. **Send speech audio** whenever it becomes available (e.g., TTS output from your language model) — also buffer it locally for playback
6. **Receive speech frames** — the server transitions to lip-synced animation and returns to silence frames when audio runs out
7. **Render video frames** at 25fps, dropping excess silence frames to manage buffer size
8. **Start playing your buffered TTS audio** when the first speech frame (`frame_type` `1` or `3`) arrives from Ojin — stop when speech ends

### Frame Types

Every frame arrives as a binary `InteractionResponse` containing both a JPEG image and a PCM audio chunk. Frames are always delivered in order. The `frame_type` field classifies each frame:

| `frame_type` | Description                                                                                                                                         |
| ------------ | --------------------------------------------------------------------------------------------------------------------------------------------------- |
| `0`          | **Idle** — persona at rest with idle animation. Generated automatically when no speech audio is queued                                              |
| `1`          | **Speech** — lip-synced animation generated from your audio input                                                                                   |
| `2`          | **Fade-out** — post-cancel ramp back toward idle after an interruption                                                                              |
| `3`          | **Start of speech** — first speech frame of a turn **resuming after an interruption/cancel**. Natural (uninterrupted) turns start with `1`, not `3` |

### Faster-than-Realtime Generation

The server generates frames slightly **faster than realtime** to build a client-side buffer that prevents stuttering during speech. During speech bursts, generation is even faster. This means your frame buffer will grow over time if you don't manage it.

**You must drop idle frames to prevent unbounded buffer growth.** When your buffer starts growing beyond what you need for smooth playback, skip 1 out of every 2 idle frames (`frame_type == 0`) until the buffer shrinks back down. Only idle frames are safe to drop — never drop speech (`1`), start-of-speech (`3`), or fade-out (`2`) frames.

The right buffer target depends on your network conditions and latency requirements — start by observing your buffer size during playback and tuning from there. Keep it as low as possible to minimize latency, but high enough to absorb network jitter without starving playback.

```python
# When consuming frames from the buffer:
frame = buffer.popleft()

# If buffer is growing and this is an idle frame, skip every other one
if len(buffer) > target_buffer_size and frame.frame_type == 0:
    skip_counter += 1
    if skip_counter % 2 == 0 and buffer:
        buffer.popleft()  # drop one idle frame
```

***

## Connection Flow

```mermaid
sequenceDiagram
    participant Client
    participant Server

    Note over Client,Server: Connection
    Client->>Server: WebSocket Connect
    Server->>Client: SessionReady (JSON)
    Client->>Server: InteractionInput (silence audio for one frame)

    Note over Client,Server: Server Streams Immediately
    Server->>Client: Frame (idle, frame_type=0)
    Server->>Client: Frame (idle, frame_type=0)
    Server->>Client: Frame (idle, frame_type=0)

    Note over Client,Server: Client Sends Speech Audio
    Client->>Server: InteractionInput (TTS audio chunk 1)
    Client->>Server: InteractionInput (TTS audio chunk 2)

    Note over Client,Server: Server Transitions to Speech
    Server->>Client: Frame (start-of-speech, frame_type=3)
    Server->>Client: Frame (speech, frame_type=1)
    Server->>Client: Frame (speech, frame_type=1)
    Note right of Server: Burst: faster than realtime

    Note over Client,Server: Audio Runs Out → Back to Idle
    Server->>Client: Frame (idle, frame_type=0)
    Server->>Client: Frame (idle, frame_type=0)
    Note right of Client: Client drops excess idle frames
```

***

## WebSocket Handshake

## Open WebSocket connection

> Connect to the WebSocket endpoint providing an API key in the \`Authorization\` header and a \`config\_id\` query parameter. The server upgrades the connection to WebSocket and immediately begins streaming frames after sending \`SessionReady\`.\
> \
> \*\*Recommended WebSocket settings:\*\*\
> \- \`ping\_interval\`: 30 seconds\
> \- \`ping\_timeout\`: 10 seconds

```json
{"openapi":"3.0.3","info":{"title":"Ojin Oris Portrait Realtime API","version":"1.0.0"},"servers":[{"url":"wss://models.ojin.ai/realtime","description":"WebSocket endpoint"}],"security":[{"ApiKeyAuth":[]}],"components":{"securitySchemes":{"ApiKeyAuth":{"type":"apiKey","in":"header","name":"Authorization","description":"Raw API key (no `Bearer` prefix)."}},"schemas":{"SessionReadyMessage":{"type":"object","description":"Sent once by the server after the WebSocket connection is established and inference resources are allocated. **The server begins streaming frames immediately after this message.**\n\n**Format:** JSON text frame.","required":["type","payload"],"properties":{"type":{"type":"string","enum":["sessionReady"]},"payload":{"type":"object","required":["trace_id","status","load"],"properties":{"trace_id":{"type":"string","format":"uuid","description":"Unique session identifier assigned by the server."},"status":{"type":"string","enum":["success"],"description":"Always `success`."},"load":{"type":"number","format":"float","minimum":0,"maximum":1,"description":"Current load of the inference server (0.0–1.0)."},"timestamp":{"type":"integer","format":"int64","description":"Server timestamp in milliseconds since Unix epoch."},"parameters":{"type":"object","additionalProperties":true,"nullable":true,"description":"Optional model-specific session parameters returned by the server."}}}}},"ErrorResponseMessage":{"type":"object","description":"Sent by the server when an error occurs.\n\n**Format:** JSON text frame.\n\n> **Note:** In some error conditions (e.g., no backend servers available), the server may send a plain text message instead of a structured JSON `ErrorResponse`. Your client should handle non-JSON text messages gracefully.","required":["type","payload"],"properties":{"type":{"type":"string","enum":["errorResponse"]},"payload":{"type":"object","required":["code","message","timestamp"],"properties":{"code":{"type":"string","description":"Machine-readable error code.","enum":["AUTH_FAILED","UNAUTHORIZED","MISSING_CONFIG_ID","INVALID_MESSAGE","INVALID_HEADERS","MODEL_NOT_FOUND","BACKEND_UNAVAILABLE","RATE_LIMITED","TIMEOUT","CANCELLED","INTERNAL_ERROR","FRAME_SIZE_EXCEEDED"]},"message":{"type":"string","description":"Human-readable description of the error."},"interaction_id":{"type":"string","nullable":true,"description":"The interaction ID related to the error, if applicable."},"details":{"type":"object","additionalProperties":true,"nullable":true,"description":"Optional additional structured details about the error."},"timestamp":{"type":"integer","format":"int64","description":"Milliseconds since Unix epoch when the error was sent."}}}}}}},"paths":{"/":{"get":{"summary":"Open WebSocket connection","description":"Connect to the WebSocket endpoint providing an API key in the `Authorization` header and a `config_id` query parameter. The server upgrades the connection to WebSocket and immediately begins streaming frames after sending `SessionReady`.\n\n**Recommended WebSocket settings:**\n- `ping_interval`: 30 seconds\n- `ping_timeout`: 10 seconds","operationId":"wsHandshake","parameters":[{"in":"query","name":"config_id","required":true,"schema":{"type":"string"},"description":"Configuration ID for the persona, created in the Oris Portrait tab of the dashboard."},{"in":"header","name":"Authorization","required":true,"schema":{"type":"string"},"description":"Your raw API key. No `Bearer` prefix."}],"responses":{"101":{"description":"WebSocket upgrade successful. After the upgrade, the server sends a `SessionReady` JSON message and begins streaming binary `InteractionResponse` frames immediately.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/SessionReadyMessage"}}}},"401":{"description":"Unauthorized — invalid or missing API key.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponseMessage"}}}}}}}}}
```

***

## Message Format

{% hint style="info" %}
**Mixed message types:** Both JSON (text) and binary messages are exchanged on the same WebSocket connection. Your client must check the WebSocket frame type to distinguish them:

* **Text frames (JSON):** `SessionReady`, `ErrorResponse` (server → client), `EndInteraction`, `CancelInteraction` (client → server)
* **Binary frames:** `InteractionResponse` (server → client), `InteractionInput` (client → server)
  {% endhint %}

{% hint style="info" %}
**Byte order:** All multi-byte integer fields in binary messages use **network byte order (big-endian)**.
{% endhint %}

***

## Messages Reference

### Server → Client Messages

## The SessionReadyMessage object

```json
{"openapi":"3.0.3","info":{"title":"Ojin Oris Portrait Realtime API","version":"1.0.0"},"components":{"schemas":{"SessionReadyMessage":{"type":"object","description":"Sent once by the server after the WebSocket connection is established and inference resources are allocated. **The server begins streaming frames immediately after this message.**\n\n**Format:** JSON text frame.","required":["type","payload"],"properties":{"type":{"type":"string","enum":["sessionReady"]},"payload":{"type":"object","required":["trace_id","status","load"],"properties":{"trace_id":{"type":"string","format":"uuid","description":"Unique session identifier assigned by the server."},"status":{"type":"string","enum":["success"],"description":"Always `success`."},"load":{"type":"number","format":"float","minimum":0,"maximum":1,"description":"Current load of the inference server (0.0–1.0)."},"timestamp":{"type":"integer","format":"int64","description":"Server timestamp in milliseconds since Unix epoch."},"parameters":{"type":"object","additionalProperties":true,"nullable":true,"description":"Optional model-specific session parameters returned by the server."}}}}}}}}
```

## The InteractionResponseMessage object

````json
{"openapi":"3.0.3","info":{"title":"Ojin Oris Portrait Realtime API","version":"1.0.0"},"components":{"schemas":{"InteractionResponseMessage":{"type":"object","description":"Binary message containing a video frame and synchronized audio chunk. The server streams these continuously after `SessionReady` — silence frames when idle, speech frames when processing your audio.\n\n**Format:** Binary frame.\n\n**Binary structure (big-endian):**\n```\n[1 byte  ]  Is final flag   — uint8, 1 = last frame, 0 = more coming\n[16 bytes]  Interaction ID  — UUID bytes\n[8 bytes ]  Timestamp       — uint64, milliseconds since Unix epoch\n[4 bytes ]  Usage           — uint32, usage metric\n[4 bytes ]  Frame index     — uint32, 0 = silence, 1 = speech\n[4 bytes ]  Num payloads    — uint32, number of payload entries\n\nFor each payload entry:\n  [4 bytes]  Data size       — uint32, byte length of payload data\n  [1 byte ]  Payload type    — uint8, 1 = audio, 2 = image\n  [N bytes]  Payload data    — raw payload bytes\n```\n\nPython unpack: `struct.unpack('!B16sQIII', header)` for the main header, `struct.unpack('!IB', entry)` for each payload entry.","required":["is_final","interaction_id","timestamp","usage","index","payloads"],"properties":{"is_final":{"type":"boolean","description":"`true` if this is the last frame for the current interaction. `false` if more frames are coming."},"interaction_id":{"type":"string","format":"uuid","description":"UUID identifying this response. Use to correlate frames across a single interaction."},"timestamp":{"type":"integer","format":"int64","description":"Milliseconds since Unix epoch when the frame was sent."},"usage":{"type":"integer","format":"int32","description":"Usage metric for this response."},"index":{"type":"integer","format":"int32","enum":[0,1],"description":"Frame type. `0` = silence frame (idle animation, no speech input). `1` = speech frame (lip-synced to your audio). **Drop silence frames (`0`) to manage buffer size. Never drop speech frames (`1`).**"},"payloads":{"type":"array","description":"List of payload entries in this frame. Each frame typically contains one audio entry and one image entry.","items":{"type":"object","required":["payload_type","data_size","data"],"properties":{"payload_type":{"type":"integer","enum":[1,2],"description":"`1` = audio (PCM int16, 16kHz mono, 1,280 bytes = 640 samples = 40ms). `2` = image (JPEG-encoded, resolution depends on config e.g. 1280×720)."},"data_size":{"type":"integer","format":"int32","description":"Byte length of the payload data."},"data":{"type":"string","format":"binary","description":"Raw payload bytes. For audio: PCM int16 bytes. For image: JPEG bytes."}}}}}}}}}
````

## The ErrorResponseMessage object

```json
{"openapi":"3.0.3","info":{"title":"Ojin Oris Portrait Realtime API","version":"1.0.0"},"components":{"schemas":{"ErrorResponseMessage":{"type":"object","description":"Sent by the server when an error occurs.\n\n**Format:** JSON text frame.\n\n> **Note:** In some error conditions (e.g., no backend servers available), the server may send a plain text message instead of a structured JSON `ErrorResponse`. Your client should handle non-JSON text messages gracefully.","required":["type","payload"],"properties":{"type":{"type":"string","enum":["errorResponse"]},"payload":{"type":"object","required":["code","message","timestamp"],"properties":{"code":{"type":"string","description":"Machine-readable error code.","enum":["AUTH_FAILED","UNAUTHORIZED","MISSING_CONFIG_ID","INVALID_MESSAGE","INVALID_HEADERS","MODEL_NOT_FOUND","BACKEND_UNAVAILABLE","RATE_LIMITED","TIMEOUT","CANCELLED","INTERNAL_ERROR","FRAME_SIZE_EXCEEDED"]},"message":{"type":"string","description":"Human-readable description of the error."},"interaction_id":{"type":"string","nullable":true,"description":"The interaction ID related to the error, if applicable."},"details":{"type":"object","additionalProperties":true,"nullable":true,"description":"Optional additional structured details about the error."},"timestamp":{"type":"integer","format":"int64","description":"Milliseconds since Unix epoch when the error was sent."}}}}}}}}
```

### Client → Server Messages

## The InteractionInputMessage object

````json
{"openapi":"3.0.3","info":{"title":"Ojin Oris Portrait Realtime API","version":"1.0.0"},"components":{"schemas":{"InteractionInputMessage":{"type":"object","description":"Binary message for sending speech audio to the server. **Only send speech audio** — do not send silence or padding. The server generates silence frames automatically.\n\n**Format:** Binary frame.\n\n**Binary structure (big-endian):**\n```\n[1 byte ]  Payload type   — uint8, always 1 for audio\n[8 bytes]  Timestamp      — uint64, milliseconds since Unix epoch\n[4 bytes]  Params size    — uint32, byte length of JSON params (0 if none)\n[N bytes]  Params JSON    — UTF-8 JSON (only present if params size > 0)\n[M bytes]  Audio payload  — raw PCM int16 speech audio\n```\n\nPython pack: `struct.pack('!BQI', payload_type, timestamp, params_size)`","required":["payload_type","timestamp","params_size","audio_payload"],"properties":{"payload_type":{"type":"integer","enum":[1],"description":"Always `1` for audio."},"timestamp":{"type":"integer","format":"int64","description":"Milliseconds since Unix epoch when the message was sent."},"params_size":{"type":"integer","format":"int32","minimum":0,"description":"Byte length of the JSON params block. `0` if no params."},"params":{"type":"object","nullable":true,"description":"Optional per-chunk parameters. Overrides session defaults for this audio chunk.","properties":{"speech_filter_amount":{"type":"number","format":"float","default":5,"description":"Smoothing for speech animation. Higher = smoother, less responsive."},"idle_filter_amount":{"type":"number","format":"float","default":1000,"description":"Smoothing for idle animation."},"idle_mouth_opening_scale":{"type":"number","format":"float","default":0,"description":"Mouth movement scale during idle. `0.0` = closed."},"speech_mouth_opening_scale":{"type":"number","format":"float","default":1,"description":"Mouth movement scale during speech. `1.0` = full movement."},"client_frame_index":{"type":"integer","format":"int32","default":0,"description":"Frame index the client is currently displaying. Helps the server manage silence-to-speech transitions smoothly."}}},"audio_payload":{"type":"string","format":"binary","description":"Raw PCM int16 speech audio. Requirements: 16,000 Hz sample rate, mono (1 channel), little-endian int16 samples. Entire message must be under 512 KB. Recommended chunk size: 400ms = 6,400 samples = 12,800 bytes."}}}}}}
````

## The EndInteractionMessage object

```json
{"openapi":"3.0.3","info":{"title":"Ojin Oris Portrait Realtime API","version":"1.0.0"},"components":{"schemas":{"EndInteractionMessage":{"type":"object","description":"Signal graceful end of the session. The server finishes processing all queued audio and sends remaining frames, with the last frame marked `is_final: true`.\n\n**Format:** JSON text frame.","required":["type","payload"],"properties":{"type":{"type":"string","enum":["endInteraction"]},"payload":{"type":"object","required":["timestamp"],"properties":{"timestamp":{"type":"integer","format":"int64","description":"Milliseconds since Unix epoch when the message was sent."}}}}}}}}
```

## The CancelInteractionMessage object

```json
{"openapi":"3.0.3","info":{"title":"Ojin Oris Portrait Realtime API","version":"1.0.0"},"components":{"schemas":{"CancelInteractionMessage":{"type":"object","description":"Immediately stop processing and discard all remaining frames. No final frame is sent. Use for interruptions (e.g., user starts speaking while the persona is talking).\n\n**Format:** JSON text frame.","required":["type","payload"],"properties":{"type":{"type":"string","enum":["cancelInteraction"]},"payload":{"type":"object","properties":{"timestamp":{"type":"integer","format":"int64","nullable":true,"description":"Optional. Milliseconds since Unix epoch when the message was sent."}}}}}}}}
```

***

## Message Details

### InteractionInput (Client → Server, Binary)

Binary message for sending speech audio to the server. **Only send speech audio** — do not send silence or padding.

**Binary structure:**

```
[1 byte ]  Payload type      — uint8, always 1 for audio
[8 bytes]  Timestamp          — uint64, milliseconds since Unix epoch
[4 bytes]  Params size        — uint32, byte length of the JSON params block (0 if no params)
[N bytes]  Params JSON        — UTF-8 encoded JSON (only present if params size > 0)
[M bytes]  Audio payload      — raw PCM int16 speech audio data
```

**Header fields** use **big-endian** byte order. The PCM audio samples in the payload use **little-endian** (standard for PCM int16). In Python: `struct.pack('!BQI', payload_type, timestamp, params_size)`.

**Audio requirements:**

| Property         | Value                                              |
| ---------------- | -------------------------------------------------- |
| Format           | PCM signed 16-bit integers (little-endian samples) |
| Sample rate      | 16,000 Hz                                          |
| Channels         | 1 (mono)                                           |
| Max message size | 512 KB (entire binary message including header)    |

**Recommended streaming pattern:**

Forward speech audio to the server as it arrives from your TTS service — no need to buffer or accumulate before sending (buffering only needed for playback). Send the audio in realtime to achieve a realtime streaming and smooth playback experience.

```python
import struct, json, time

def build_audio_message(audio_bytes):
    header = struct.pack('!BQI',
        1,                         # payload type: audio
        int(time.time() * 1000),   # timestamp ms
        0,                         # params size (unused for oris-portrait)
    )
    return header + audio_bytes
```

***

### InteractionResponse (Server → Client, Binary)

Binary message containing a video frame and synchronized audio. The server streams these continuously after `SessionReady`. **Frames always arrive in order.**

**Binary structure:**

```
[1 byte  ]  Is final flag     — uint8, 1 = last frame for this interaction, 0 = more coming
[16 bytes]  Interaction ID     — UUID bytes (big-endian)
[8 bytes ]  Timestamp          — uint64, milliseconds since Unix epoch
[4 bytes ]  Usage              — uint32, usage metric for this response
[4 bytes ]  Reserved           — uint32, reserved; ignore
[4 bytes ]  Num payloads       — uint32, number of payload entries that follow

For each payload entry:
  [4 bytes]  Data size          — uint32, byte length of the payload data only
  [1 byte ]  Payload type       — uint8, 1 = audio, 2 = image
  [N bytes]  Payload data       — raw payload bytes

[1 byte ]  Frame type         — uint8, appended after all payload entries: 0=idle, 1=speech, 2=fade-out, 3=start-of-speech
```

All multi-byte integers are **big-endian**. In Python: `struct.unpack('!B16sQIII', header_bytes)` for the main header, `struct.unpack('!IB', entry_bytes)` for each payload entry. The `Frame type` byte is a single trailing `uint8` after the last payload entry and is the authoritative frame classifier.

**Frame type:**

| Frame type | Meaning                                                                                                                 |
| ---------- | ----------------------------------------------------------------------------------------------------------------------- |
| `0`        | **Idle** — persona at rest with idle animation                                                                          |
| `1`        | **Speech** — lip-synced animation from your audio                                                                       |
| `2`        | **Fade-out** — post-cancel ramp toward idle                                                                             |
| `3`        | **Start of speech** — first speech frame of a turn resuming after an interruption/cancel (natural turns start with `1`) |

**Payload types:**

| Type      | Format                | Typical size per frame                                 |
| --------- | --------------------- | ------------------------------------------------------ |
| 1 (audio) | PCM int16, 16kHz mono | **1,280 bytes** (640 samples = 40ms at 25fps)          |
| 2 (image) | JPEG-encoded image    | Variable (resolution depends on config, e.g. 1280×720) |

**Parsing example:**

```python
import struct, uuid

HEADER_FMT = '!B16sQIII'
HEADER_SIZE = struct.calcsize(HEADER_FMT)   # 37 bytes
ENTRY_FMT = '!IB'
ENTRY_SIZE = struct.calcsize(ENTRY_FMT)     # 5 bytes

def parse_response(data):
    # The 5th header field is reserved; ignore it.
    is_final, uuid_bytes, timestamp, usage, _reserved, num_payloads = \
        struct.unpack(HEADER_FMT, data[:HEADER_SIZE])

    offset = HEADER_SIZE
    image = audio = None

    for _ in range(num_payloads):
        size, ptype = struct.unpack(ENTRY_FMT, data[offset:offset + ENTRY_SIZE])
        offset += ENTRY_SIZE
        payload = data[offset:offset + size]
        offset += size

        if ptype == 2:
            image = payload   # JPEG bytes
        elif ptype == 1:
            audio = payload   # PCM int16 bytes

    # Trailing frame type byte, appended after the payload entries.
    frame_type = data[offset]

    return {
        'is_final': bool(is_final),
        'frame_type': frame_type,    # 0=idle, 1=speech, 2=fade-out, 3=start-of-speech
        'image': image,
        'audio': audio,
    }
```

***

### EndInteraction vs CancelInteraction

| Message             | Purpose         | Server behavior                                                                | Use case          |
| ------------------- | --------------- | ------------------------------------------------------------------------------ | ----------------- |
| `EndInteraction`    | Graceful finish | Completes processing, sends remaining frames with last marked `is_final: true` | Session end       |
| `CancelInteraction` | Immediate stop  | Stops processing, discards remaining frames                                    | User interruption |

***

### ErrorResponse (Server → Client, JSON)

{% hint style="warning" %}
**Plain text errors:** In some error conditions (e.g., no backend servers available), the server may send a plain text message instead of a structured JSON `ErrorResponse`. Your client should handle non-JSON text messages gracefully.
{% endhint %}

**Error codes:**

| Code                  | Description                              |
| --------------------- | ---------------------------------------- |
| `AUTH_FAILED`         | Invalid API key                          |
| `UNAUTHORIZED`        | Caller lacks permission                  |
| `MISSING_CONFIG_ID`   | `config_id` query parameter not provided |
| `INVALID_MESSAGE`     | Malformed or unsupported message payload |
| `INVALID_HEADERS`     | Missing or invalid headers               |
| `MODEL_NOT_FOUND`     | Config ID not found or invalid           |
| `BACKEND_UNAVAILABLE` | No healthy inference backend available   |
| `RATE_LIMITED`        | Too many requests                        |
| `TIMEOUT`             | Operation exceeded processing time       |
| `CANCELLED`           | Interaction cancelled by client          |
| `INTERNAL_ERROR`      | Unexpected server error                  |
| `FRAME_SIZE_EXCEEDED` | Message exceeded 512KB limit             |

***

## Rate Limits & Constraints

| Constraint       | Value                                                   |
| ---------------- | ------------------------------------------------------- |
| Rate limit       | 6 requests per second                                   |
| Max message size | 512 KB per message                                      |
| Video output     | 25 fps target (generated slightly faster than realtime) |

Exceeding limits results in an `ErrorResponse` with code `RATE_LIMITED`.

***

## Best Practices

### Audio Input

* **Send one silence frame first** to start the conversation, then send speech audio when available
* Forward speech audio to the server as it arrives from your TTS service — no need to buffer or accumulate before sending(buffering only needed for playback). Send the audio in realtime to achieve a realtime streaming and smooth playback experience.

### Buffer Management

* Play frames at **25 fps** (40ms per frame)
* The server generates slightly faster than realtime — **you must drop idle frames** to prevent the buffer from growing
* **Drop strategy:** when the buffer grows beyond your target size, skip 1 out of every 2 idle frames (`frame_type == 0`) until the buffer shrinks back
* **Never drop speech (`1`), start-of-speech (`3`), or fade-out (`2`) frames**
* Tune your target buffer size based on your network conditions — keep it as low as possible for minimal latency

### Audio and Video Synchronization

Each `InteractionResponse` contains both a JPEG image and a PCM audio chunk. However, the **recommended approach for audio playback** is:

1. **Buffer your TTS/source audio locally** as it arrives from your speech service (for later playback only. You do not need to buffer it to send it to Ojin)
2. **Forward TTS audio to Ojin immediately** as it arrives from your speech service
3. **Wait for the first speech frame** (`frame_type` `1` or `3`) to arrive from Ojin
4. **Start playing your buffered TTS audio** at that moment
5. **Stop audio playback** when speech ends — the first idle or fade-out frame (`frame_type` `0` or `2`) after speech
6. **Render video** from every frame regardless of type

Sending is immediate, but playback is gated on the first speech frame. This ensures audio and video stay in sync.

```python
# When TTS audio arrives from your speech service:
speech_audio_buffer.extend(tts_audio_chunk)      # buffer locally for playback
await ojin.send_audio(tts_audio_chunk)            # send to Ojin for lip-sync

# In your video playback loop:
frame = buffer.popleft()

# Speech frames are frame_type 1 (speech) and 3 (start-of-speech).
if frame.frame_type in (1, 3) and not audio_playing:
    start_audio_playback(speech_audio_buffer)     # begin draining the buffer
    audio_playing = True

# Non-speech frames are frame_type 0 (idle) and 2 (fade-out).
if frame.frame_type in (0, 2) and audio_playing:
    stop_audio_playback()
    audio_playing = False

render_video(frame.image)                         # always render the video
```

### Error Handling

* Handle both JSON `ErrorResponse` messages and plain text error strings
* Implement exponential backoff for reconnection
* Monitor server `load` in the `SessionReady` message

### Interruption Handling

* Use `CancelInteraction` for immediate stops (e.g., user interrupts the bot)
* Use `EndInteraction` for graceful session endings
* Clear your frame buffer on interruption

***

## Complete Example

```python
import asyncio
import json
import struct
import time
from collections import deque
import numpy as np
import websockets
from dotenv import load_dotenv
import os

load_dotenv()

API_KEY = os.getenv("OJIN_API_KEY", "")
CONFIG_ID = os.getenv("OJIN_CONFIG_ID", "")
URL = f"wss://models.ojin.ai/realtime?config_id={CONFIG_ID}"

SAMPLE_RATE = 16000
FPS = 25
TARGET_BUFFER = 10  # Tune based on your network conditions

def build_audio_message(audio_bytes):
    """Build a binary InteractionInput message."""
    header = struct.pack('!BQI', 1, int(time.time() * 1000), 0)
    return header + audio_bytes

def parse_response(data):
    """Parse a binary InteractionResponse message."""
    fmt = '!B16sQIII'
    hdr_size = struct.calcsize(fmt)
    # The 5th header field is reserved; ignore it.
    is_final, uid_bytes, ts, usage, _reserved, n_payloads = struct.unpack(fmt, data[:hdr_size])

    offset = hdr_size
    image = audio = None
    for _ in range(n_payloads):
        size, ptype = struct.unpack('!IB', data[offset:offset+5])
        offset += 5
        if ptype == 2:
            image = data[offset:offset+size]
        elif ptype == 1:
            audio = data[offset:offset+size]
        offset += size

    # Trailing frame type byte, appended after the payload entries.
    frame_type = data[offset]

    return {
        'is_final': bool(is_final),
        'frame_type': frame_type,    # 0=idle, 1=speech, 2=fade-out, 3=start-of-speech
        'image': image,
        'audio': audio,
    }

async def main():
    headers = {"Authorization": API_KEY}
    # For older websockets versions, use extra_headers instead.
    async with websockets.connect(URL, additional_headers=headers, ping_interval=30) as ws:
        # 1. Wait for SessionReady — server starts streaming frames immediately after
        msg = json.loads(await ws.recv())
        assert msg["type"] == "sessionReady"
        print(f"Session ready: {msg['payload']}")

        buffer = deque()
        skip_counter = 0
        playback_started = False
        frame_count = 0
        audio_sent = False

        # 2. Send one silence frame to start processing
        audio_data = b"\00\00" * (SAMPLE_RATE // 25)
        await ws.send(build_audio_message(audio_data))

        # 3. Receive and process frames
        async for data in ws:
            if isinstance(data, str):
                msg = json.loads(data)
                if msg.get("type") == "errorResponse":
                    print(f"Error: {msg['payload']}")
                    break
                continue

            frame = parse_response(data)
            buffer.append(frame)
            frame_count += 1

            # Wait for initial buffer before playback
            if not playback_started:
                if len(buffer) >= TARGET_BUFFER:
                    playback_started = True
                    print(f"Buffer filled ({TARGET_BUFFER} frames), starting playback")
                continue

            # Consume one frame
            if buffer:
                play_frame = buffer.popleft()

                # Drop excess idle frames: skip 1 out of 2 when buffer is too large.
                # Only idle (frame_type 0) is safe to drop — keep speech (1),
                # start-of-speech (3), and fade-out (2).
                if len(buffer) > TARGET_BUFFER and play_frame['frame_type'] == 0:
                    skip_counter += 1
                    if skip_counter % 2 == 0 and len(buffer) > 0:
                        buffer.popleft()  # drop one idle frame

                kind = {0: "idle", 1: "speech", 2: "fade-out", 3: "start-of-speech"}.get(
                    play_frame['frame_type'], "?"
                )
                print(f"[{kind}] frame #{frame_count}, buffer={len(buffer)}")

                # In a real app: render play_frame['image'] and play play_frame['audio']

            # Demo: send speech audio after receiving some silence frames
            if frame_count == 50 and not audio_sent:
                t = np.linspace(0, 1.0, SAMPLE_RATE, endpoint=False)
                audio_data = (32767 * 0.5 * np.sin(2 * np.pi * 440 * t)).astype(np.int16)
                chunk_size = SAMPLE_RATE # 1s chunks
                for i in range(0, len(audio_data), chunk_size):
                    chunk = audio_data[i:i + chunk_size]
                    await ws.send(build_audio_message(chunk.tobytes()))
                audio_sent = True
                print("Sent 1 second of speech audio")

            if frame_count > 200:
                break

asyncio.run(main())
```

***

## Troubleshooting

### Connection Issues

* ✓ Verify API key and config ID
* ✓ Check that config exists in dashboard
* ✓ Ensure network allows WebSocket connections (port 443)
* ✓ Check the `Authorization` header uses the raw API key (no `Bearer` prefix)

### No Frames Received

* ✓ Confirm you received `SessionReady` — frames start streaming immediately after
* ✓ If sending speech audio: verify format is 16kHz, int16, mono with big-endian message header
* ✓ Check message size < 512KB

### Choppy Playback

* ✓ Play at 25fps (40ms per frame)
* ✓ Buffer some frames before starting playback
* ✓ Check network latency and jitter

### Growing Latency

* ✓ You **must** drop idle frames — the server generates faster than realtime
* ✓ Skip 1 out of 2 idle frames (`frame_type == 0`) when buffer grows beyond your target
* ✓ Never drop speech (`1`), start-of-speech (`3`), or fade-out (`2`) frames
* ✓ During speech bursts the buffer will grow temporarily — this is expected, trim idle frames afterward

### Frame Lag During Speech

* ✓ Reduce `speech_filter_amount` parameter (lower = more responsive, less smooth)

***

## Example Implementation

A complete working Python example integrating Ojin Oris Portrait with a speech-to-speech service (Hume EVI) is available here:

[**github.com/journee-live/speech-to-video-samples/tree/main/samples**](https://github.com/journee-live/speech-to-video-samples/tree/main/samples) (includes Hume STS → Oris Portrait walkthroughs)

The repository demonstrates the full integration pattern: microphone capture → STS service → TTS audio → Ojin lip-sync → synchronized video and audio playback at 25fps. It includes the buffer management and frame handling approach described in [Best Practices](#best-practices) above.

***


# Playback Tips

## Basics

* Video models such as `ojin/oris-portrait` will deliver frames at a fixed rate (e.g. 25fps)
* There are exceptions to the fixed rate delivery by the server, such as when the model is delivering first speech frames in which case it will try to deliver frames at a higher rate (e.g. 50fps). This is to ensure covering for any network jitter.
* The authoritative frame classifier is the `frame_type` field (0=idle, 1=speech, 2=fade-out, 3=start-of-speech). The legacy `index` (`frame_idx`) is the coarse 0/1 tag (0=silence, 1=speech) and is kept for backward compatibility

## Playback core

* Recommended playback is done through audio clock technique
* We recommend buffering incoming video frames to control their rate based on audio head.
* TTS audio from TTS services need to be held also on a buffer so that audio is not played immediately. Instead clients need to wait for first speech frame to start audio playback
* When speech starts audio should play at a steady rate to avoid crackling, while speech video frames will just try to sync with the current head of the audio. If for some reason video frames don't come fast enough audio should never stop. This situation will create a small unsync, but eventually it can be corrected by skipping smartly incoming speech video frames if the frame rate allows. A good compromise is to skip one frame every second frame if the speech video frames buffer is bigger than a small threshold and video is lagging behind the audio.
* Decoding video frames must not reorder them. If you decode frames asynchronously (e.g. `createImageBitmap`, off-thread JPEG decode), concurrent decodes can resolve in a different order than the frames arrived in. Assign each frame a sequence number in arrival order, before dispatching the decode, and use that number to insert the decoded frame at the correct position in your playback queue. Otherwise a slow-decoding speech frame can land after a faster-decoding silence frame and your lipsync will visibly stutter.

## Turn detection (bot speaking, bot not speaking)

* Silence -> speech (bot starts speaking): whenever the first video frame is ready to be played we can consider the bot started speaking and the audio playback can start
* Speech -> silence (bot stops speaking): whenever we play the first silence video frame after speech we can consider the bot stopped speaking and audio playback can stop

## Interruptions

In order to interrupt our video models users must send a CancelInteractionMessage. The server will flush almost all video frames being processed at that moment and start generating silence frames right away. There might still be some old frames coming to the client that need to be handled. There are several approaches to handle interruptions based on your needs, it's usually a trade off between having jumps on your avatar or introducing interruption latency:

1. **Smooth transitions**: when interrupting client doesn't discard any frames either from buffer or incoming. In this case transitions will be smooth, but the bot will continue speaking (moving lips) for some time since the system will not be able to clear frames already sent or frames already in client buffer.
2. **Instant cut**: when interrupting, client clears current buffer and discards any incoming speech frames until it receives first silence frame after interruption. This could create a small freeze and/or jump between last speech frame played and first silence frame after interruption but interruption latency will be 0, you can stop playing audio right away.
3. **Smooth video / hard cut audio**: there is a third option which is keep playing video frames to have smooth transitions but stop audio playback immediately. If incoming frames + client buffered frames are not so many, this usually creates the best experience.

## Edge cases

The playback loop must handle these scenarios correctly:

1. **Audio exhausted during speech (deadlock prevention)**: when TTS audio buffer runs out while `is_speaking` is True and the video sync guard blocks frame consumption (`video_sent >= audio_released`), the loop must force a transition to idle. Otherwise frames pile up but are never consumed, and `is_speaking` stays True forever.
2. **Speech frames queued ahead of silence frames**: when transitioning from speech to silence, silence frames may be behind unconsumed speech frames in the buffer. The sync guard blocks those speech frames (no audio left), so the silence-transition check never sees a silence frame at position \[0]. The audio-exhaustion guard (edge case 1) resolves this by forcing idle mode.
3. **TTS audio arrives before video frames**: audio buffer fills but `is_speaking` is False because no speech video frame has arrived yet. Audio must NOT play until the first speech video frame is ready. Once it arrives, counters reset and sync begins.
4. **TTS audio arrives in bursts with gaps**: audio buffer may run dry mid-speech, then refill when more TTS audio arrives. During the dry period, video must pause (not advance). When audio resumes, sync continues from where it left off without losing frames.
5. **Video buffer drains completely during speech**: if the server is slow and no video frames are available, the last played frame should repeat. Audio continues uninterrupted. When video frames arrive again, they catch up via the sync mechanism.
6. **Rapid turn succession (speech -> silence -> speech)**: each speech turn must reset audio/video counters independently. A brief silence gap (even 1-2 frames) between turns must correctly trigger stop then start events.
7. **Zero-volume speech frames at turn boundary**: the first frame with `frame_type` speech (`1`/`3`, i.e. coarse `index`/`frame_idx=1`) might have zero volume (model warmup). It should NOT be treated as the first real speech frame; wait for a frame with actual audio content to trigger speech start.
8. **Interruption during silence**: if an interruption arrives while already in idle/silence mode, it should be a no-op for the playback loop (no state to clean up).
9. **Interruption with stale speech frames in buffer**: after interruption, the server sends silence but stale speech frames may still be in the client buffer. Depending on the interruption strategy, these must be either played through, discarded, or played without audio.
10. **Single speech frame turn**: a turn with only 1 speech frame followed by silence must correctly trigger speech\_start, play the frame with audio, then trigger speech\_stop on the next tick.


# Troubleshooting

## Connection Issues

* ✓ Verify API key and config ID
* ✓ Check that config exists in dashboard
* ✓ Ensure network allows WebSocket connections

## No Frames Received

* ✓ Confirm you received `SessionReady` before sending audio
* ✓ Verify audio format (16kHz, int16, mono)
* ✓ Check message size < 512KB

## Choppy Playback

* ✓ Play at exactly 25 fps
* ✓ Buffer at least 10 frames before playback
* ✓ Check network latency

## Frame Lag

* ✓ Reduce `speech_filter_amount` parameter


# ojin/oris-voice

> High-quality multilingual text-to-speech with voice cloning, preset speakers, and promptable voice design

## Overview

**Oris Voice** (`ojin/oris-voice`) is Ojin's streaming text-to-speech product. You get natural-sounding speech at 24 kHz with three voice modes: clone from a short reference clip, pick a built-in speaker, or describe the voice you want in natural language.

## Key Features

* **Voice Cloning** — Clone any voice from a short reference audio clip and optional transcript
* **Built-in Voices** — Choose from a library of built-in speaker identities
* **Voice Design** — Describe the desired voice characteristics in natural language
* **Streaming Output** — Audio chunks stream in real time as the model generates, enabling low-latency playback
* **Multilingual** — Supports Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian, with automatic language detection
* **High-Quality Audio** — 24 kHz, 16-bit PCM mono output

## Voice Modes

| Mode                | Description                                                                                                                   |
| ------------------- | ----------------------------------------------------------------------------------------------------------------------------- |
| **Clone**           | Reproduce a voice from a reference audio sample. Provide `ref_audio` and optionally `ref_text`.                               |
| **Built-in Voices** | Use a built-in speaker identity. Provide `speaker` name and optional `instruct` for style instructions.                       |
| **Voice Design**    | Generate a voice from a natural language description. Provide `instruct` (e.g., "a warm female voice with a British accent"). |

## Quick Start

Getting started with ojin/oris-voice is simple:

1. [**Create an API key**](/getting-started/authentication) — Set up authentication for the Ojin platform
2. [**Create a configuration**](/models/oris-voice/creating-configuration) — Set up a voice configuration in the dashboard
3. [**Integrate with your application**](/models/oris-voice/integrations) — Use the WebSocket API

## Use Cases

* **Conversational AI** — Generate natural speech responses for chatbots and virtual assistants
* **Content Creation** — Produce voiceovers for videos, podcasts, and audiobooks
* **Accessibility** — Convert text content to speech for visually impaired users
* **Localization** — Generate speech in multiple languages from the same text
* **Voice Cloning** — Preserve and reproduce specific voice identities
* **Persona Pipelines** — Feed generated audio into ojin/oris-portrait for lip-synced video personas

## Supported Languages

Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian, Auto (automatic detection).


# Get started

This guide explains how to integrate **Oris Voice** (`ojin/oris-voice`) into your applications using WebSockets.

## Prerequisites

1. An Ojin account with an active API key. If you don't have one, [get your API key](/getting-started/authentication).
2. [Create a voice configuration](/models/oris-voice/creating-configuration) or use an existing one.
3. Save the **Model Config ID** from the dashboard.

{% hint style="info" %}
**Production deployments:** Connect to the WebSocket API from a backend server to keep your API key secure. For end-user delivery, stream the generated audio through your own transport layer.
{% endhint %}

## WebSocket Integration

### Quick Example (Python)

This minimal example connects to **Oris Voice**, sends text, and saves the resulting audio as a WAV file.

**Install dependencies:**

```bash
pip install websockets python-dotenv
```

**Create a `.env` file:**

```bash
OJIN_API_KEY=your-api-key-here
OJIN_CONFIG_ID=your-config-id-here
```

**Run the script:**

```python
import asyncio
import json
import struct
import time
import wave
import os

import websockets
from dotenv import load_dotenv

load_dotenv()

API_KEY = os.getenv("OJIN_API_KEY", "")
CONFIG_ID = os.getenv("OJIN_CONFIG_ID", "")
WS_URL = f"wss://models.ojin.ai/realtime?config_id={CONFIG_ID}"

# Oris Voice outputs 24 kHz, 16-bit PCM mono
SAMPLE_RATE = 24000
SAMPLE_WIDTH = 2
CHANNELS = 1

def build_text_message(text):
    """Build a binary InteractionInput for text."""
    payload = text.encode("utf-8")
    header = struct.pack("!BQI", 0, int(time.time() * 1000), 0)  # 0 = TEXT
    return header + payload

def parse_response(data):
    """Parse a binary InteractionResponse, extract audio payloads."""
    fmt = "!B16sQIII"
    hdr_size = struct.calcsize(fmt)
    is_final, _, _, _, _, num_payloads = struct.unpack(fmt, data[:hdr_size])

    offset = hdr_size
    audio_chunks = []
    for _ in range(num_payloads):
        size, ptype = struct.unpack("!IB", data[offset:offset + 5])
        offset += 5
        if ptype == 1 and size > 0:  # 1 = audio
            audio_chunks.append(data[offset:offset + size])
        offset += size

    return bool(is_final), audio_chunks

async def synthesize(text, output_path="output.wav"):
    headers = websockets.Headers()
    headers["Authorization"] = API_KEY

    async with websockets.connect(
        WS_URL,
        additional_headers=headers,
        open_timeout=None,
        ping_timeout=None,
    ) as ws:
        # 1. Wait for sessionReady
        while True:
            msg = await ws.recv()
            if isinstance(msg, str):
                parsed = json.loads(msg)
                if parsed.get("type") == "sessionReady":
                    break
                if parsed.get("type") == "errorResponse":
                    raise RuntimeError(parsed["payload"]["message"])

        # 2. Send text input (binary) + endInteraction (JSON)
        await ws.send(build_text_message(text))
        await ws.send(json.dumps({
            "type": "endInteraction",
            "payload": {"timestamp": int(time.time() * 1000)},
        }))

        # 3. Collect audio chunks
        audio_data = []
        while True:
            msg = await ws.recv()
            if isinstance(msg, str):
                parsed = json.loads(msg)
                if parsed.get("type") == "errorResponse":
                    raise RuntimeError(parsed["payload"]["message"])
                continue

            is_final, chunks = parse_response(msg)
            audio_data.extend(chunks)
            if is_final:
                break

        # 4. Write WAV file
        pcm = b"".join(audio_data)
        with wave.open(output_path, "wb") as wf:
            wf.setnchannels(CHANNELS)
            wf.setsampwidth(SAMPLE_WIDTH)
            wf.setframerate(SAMPLE_RATE)
            wf.writeframes(pcm)

        duration = len(pcm) / (SAMPLE_RATE * SAMPLE_WIDTH * CHANNELS)
        print(f"Saved {output_path} ({duration:.2f}s)")

asyncio.run(synthesize("Hello, welcome to Ojin text to speech!"))
```

### Integration Flow

```mermaid
sequenceDiagram
    participant Client
    participant Server

    Note over Client,Server: Connection
    Client->>Server: WebSocket Connect (Authorization header + config_id)
    Server->>Client: SessionReady (JSON)

    Note over Client,Server: Text-to-Speech
    Client->>Server: InteractionInput (text, binary)
    Client->>Server: EndInteraction (JSON)

    Note over Client,Server: Audio Streaming
    Server->>Client: InteractionResponse (audio chunk 1, binary)
    Server->>Client: InteractionResponse (audio chunk 2, binary)
    Server->>Client: InteractionResponse (audio chunk N, binary)
    Server->>Client: InteractionResponse (final, is_final=true)
```

### Streaming Playback

For real-time playback, process audio chunks as they arrive instead of waiting for the full response:

```python
# Inside the receive loop, play each chunk immediately:
while True:
    msg = await ws.recv()
    if isinstance(msg, str):
        # Handle JSON messages (errors, etc.)
        continue

    is_final, chunks = parse_response(msg)
    for chunk in chunks:
        play_audio(chunk)  # Feed to your audio output (e.g., pyaudio, sounddevice)

    if is_final:
        break
```

## Feeding TTS Output to a Persona Model

A common pattern is to use **Oris Voice** to generate speech, then pipe that audio into `ojin/oris-portrait` for lip-synced video. In this setup:

1. Send text to Oris Voice and receive streaming audio chunks
2. Forward each audio chunk to the persona model as an `InteractionInput` (audio, payload type `1`)
3. Buffer the audio locally for playback
4. Start audio playback when the persona model returns the first speech frame

## Next Steps

### API Reference

Dive deeper into the binary protocol and message formats.

[API Reference →](/models/oris-voice/api)


# Creating a Configuration

Before you can start using the ojin/oris-voice model, you need to create a voice configuration. This guide walks you through the process.

## Prerequisites

* An Ojin account with an active API key
* For voice cloning: a reference audio clip of the target voice

## Creating a Configuration through the Dashboard

1. Log in to the [Ojin Dashboard](https://ojin.ai)
2. Select [**Oris Voice**](https://ojin.ai/models/ojin/oris-voice) (`ojin/oris-voice`)
3. Navigate to [**Library**](https://ojin.ai/models/ojin/oris-voice/configs) sub-section
4. Press **New Configuration** to create a new configuration
5. Select the voice mode and fill in the required fields (see below)
6. Click **Create Configuration**
7. Open your newly created configuration and copy the **Model Config ID** for use in your application

## Voice Mode Configuration

### Clone Mode

Reproduces a voice from a reference audio sample.

| Field             | Required | Description                                                                    |
| ----------------- | -------- | ------------------------------------------------------------------------------ |
| `Voice Mode`      | Yes      | Set to `"Clone"`                                                               |
| `Reference audio` | Yes      | Reference audio file (WAV recommended). Can be a file upload or an asset UUID. |
| `Language`        | No       | Language of the text to synthesize (default: `"English"`).                     |

**Reference audio best practices:**

* **Duration**: 5-15 seconds of clean speech
* **Quality**: Clear recording with minimal background noise
* **Format**: WAV, 16 kHz or higher sample rate
* **Content**: Natural conversational speech (not whispered or shouted)

### Built-in Voices Mode

Uses a built-in speaker identity.

| Field        | Required | Description                                                                                    |
| ------------ | -------- | ---------------------------------------------------------------------------------------------- |
| `Voice Mode` | Yes      | Set to `"Built-in Voices"`                                                                     |
| `Speaker`    | Yes      | Name of the built-in speaker identity                                                          |
| `instruct`   | No       | Optional styling instruction layered on the selected speaker (e.g., "speak slowly and warmly") |
| `Language`   | No       | Language of the text to synthesize (default: `"English"`)                                      |

### Voice Design Mode

Generates a voice from a natural language description.

| Field        | Required | Description                                                                                    |
| ------------ | -------- | ---------------------------------------------------------------------------------------------- |
| `Voice Mode` | Yes      | Set to `"Design"`                                                                              |
| `instruct`   | Yes      | Natural language description of the desired voice (e.g., "a deep male voice with a calm tone") |
| `Language`   | No       | Language of the text to synthesize (default: `"English"`)                                      |

## Generation Parameters

These optional parameters can be set in any mode to control the generation behavior:

| Parameter            | Default | Description                                                                                                                                                                                                       |
| -------------------- | ------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `Temperature`        | `0.9`   | Controls randomness in speech generation. `0.1` = highly consistent, robotic. `0.9` = natural variation. `1.0`+ = maximum variety, may introduce artifacts. For production use cases, `0.7`–`0.9` is recommended. |
| `Top-k`              | `50`    | Limits token sampling to the top-k most probable candidates per step. Lower values (e.g., `20`) produce more predictable speech; higher values (e.g., `100`) allow more variety.                                  |
| `Max new tokens`     | `360`   | Maximum tokens to generate per interaction. At 12 Hz codec rate, `360` tokens ≈ 30 seconds of audio. Increase for longer utterances; decrease to cap generation time.                                             |
| `Repetition penalty` | `1.05`  | Discourages the model from repeating the same sounds or patterns. Values above `1.0` reduce repetition; too high (e.g., `1.5`+) may degrade naturalness.                                                          |
| `Random seed`        | `null`  | Set a fixed integer for reproducible output (same text + seed = same audio). Leave `null` for natural variation between generations.                                                                              |

## Next Steps

Once your configuration is ready, you can:

#### Integration Guide

Learn how to integrate TTS using WebSocket

[View Integration Guide →](/models/oris-voice/integrations)

#### API Reference

Explore the complete API documentation

[View API Reference →](/models/oris-voice/api)


# API Reference

## Overview

Real-time text-to-speech synthesis API. Send text, receive streaming PCM audio chunks.

After connecting and receiving `SessionReady`, send your text as a binary `InteractionInput` message followed by a JSON `EndInteraction`. The server synthesizes speech and streams back audio chunks as binary `InteractionResponse` messages. The final chunk has `is_final` set to `true`.

{% hint style="info" %}
**Production deployments:** Connect to the real-time WebSocket API from a backend server to keep your API key secure.
{% endhint %}

***

## How It Works

1. **Connect** to the WebSocket endpoint with your API key and config ID
2. **Receive `SessionReady`** — the server has allocated inference resources for your session
3. **Send text** as a binary `InteractionInput` message (payload type `0` for text)
4. **Send `EndInteraction`** (JSON) to signal that input is complete and synthesis should begin
5. **Receive audio chunks** — binary `InteractionResponse` messages containing PCM int16 audio at 24 kHz
6. **Detect completion** — the last response has `is_final: true`

### Audio Output Format

| Property        | Value                                              |
| --------------- | -------------------------------------------------- |
| Format          | PCM signed 16-bit integers (little-endian samples) |
| Sample rate     | 24,000 Hz                                          |
| Channels        | 1 (mono)                                           |
| Bits per sample | 16                                                 |

***

## Connection Flow

```mermaid
sequenceDiagram
    participant Client
    participant Server

    Note over Client,Server: Connection
    Client->>Server: WebSocket Connect
    Server->>Client: SessionReady (JSON)

    Note over Client,Server: Send Text
    Client->>Server: InteractionInput (text payload, binary)
    Client->>Server: EndInteraction (JSON)

    Note over Client,Server: Receive Audio Stream
    Server->>Client: InteractionResponse (audio chunk, binary)
    Server->>Client: InteractionResponse (audio chunk, binary)
    Server->>Client: InteractionResponse (audio chunk, binary)
    Server->>Client: InteractionResponse (final chunk, is_final=true)
```

***

## WebSocket Handshake

## Open WebSocket connection

> Connect to the WebSocket endpoint providing an API key in the \`Authorization\` header and a \`config\_id\` query parameter. The server upgrades the connection to WebSocket. After sending \`SessionReady\`, the server waits for text input.\
> \
> \*\*Recommended WebSocket settings:\*\*\
> \- \`open\_timeout\`: None (model loading may take time on cold start)\
> \- \`ping\_timeout\`: None

```json
{"openapi":"3.0.3","info":{"title":"Ojin Oris Voice Realtime API","version":"1.0.0"},"servers":[{"url":"wss://models.ojin.ai/realtime","description":"Production WebSocket endpoint"}],"security":[{"ApiKeyAuth":[]}],"components":{"securitySchemes":{"ApiKeyAuth":{"type":"apiKey","in":"header","name":"Authorization","description":"Raw API key (no `Bearer` prefix)."}},"schemas":{"SessionReadyMessage":{"type":"object","description":"Sent once by the server after the WebSocket connection is established and inference resources are allocated. The server then waits for text input.\n\n**Format:** JSON text frame.","required":["type","payload"],"properties":{"type":{"type":"string","enum":["sessionReady"]},"payload":{"type":"object","required":["trace_id","status","load"],"properties":{"trace_id":{"type":"string","format":"uuid","description":"Unique session identifier assigned by the server."},"status":{"type":"string","enum":["success"],"description":"Always `success`."},"load":{"type":"number","format":"float","minimum":0,"maximum":1,"description":"Current load of the inference server (0.0–1.0)."},"timestamp":{"type":"integer","format":"int64","description":"Server timestamp in milliseconds since Unix epoch."},"parameters":{"type":"object","additionalProperties":true,"nullable":true,"description":"Model-specific session parameters for Oris Voice, including `sample_rate`, `channels`, and `bits_per_sample`.","properties":{"sample_rate":{"type":"integer","description":"Audio sample rate in Hz."},"channels":{"type":"integer","description":"Number of audio channels."},"bits_per_sample":{"type":"integer","description":"Bits per audio sample."}}}}}}},"ErrorResponseMessage":{"type":"object","description":"Sent by the server when an error occurs.\n\n**Format:** JSON text frame.\n\n> **Note:** In some error conditions, the server may send a plain text message instead of structured JSON. Your client should handle non-JSON text messages gracefully.","required":["type","payload"],"properties":{"type":{"type":"string","enum":["errorResponse"]},"payload":{"type":"object","required":["code","message","timestamp"],"properties":{"code":{"type":"string","description":"Machine-readable error code.","enum":["AUTH_FAILED","UNAUTHORIZED","MISSING_CONFIG_ID","INVALID_MESSAGE","INVALID_HEADERS","MODEL_NOT_FOUND","BACKEND_UNAVAILABLE","RATE_LIMITED","TIMEOUT","CANCELLED","INTERNAL_ERROR","FRAME_SIZE_EXCEEDED"]},"message":{"type":"string","description":"Human-readable description of the error."},"interaction_id":{"type":"string","nullable":true,"description":"The interaction ID related to the error, if applicable."},"details":{"type":"object","additionalProperties":true,"nullable":true,"description":"Optional additional structured details about the error."},"timestamp":{"type":"integer","format":"int64","description":"Milliseconds since Unix epoch when the error was sent."}}}}}}},"paths":{"/":{"get":{"summary":"Open WebSocket connection","description":"Connect to the WebSocket endpoint providing an API key in the `Authorization` header and a `config_id` query parameter. The server upgrades the connection to WebSocket. After sending `SessionReady`, the server waits for text input.\n\n**Recommended WebSocket settings:**\n- `open_timeout`: None (model loading may take time on cold start)\n- `ping_timeout`: None","operationId":"wsHandshake","parameters":[{"in":"query","name":"config_id","required":true,"schema":{"type":"string"},"description":"Configuration ID for the TTS voice, created via API or in the Oris Voice tab of the dashboard."},{"in":"header","name":"Authorization","required":true,"schema":{"type":"string"},"description":"Your raw API key. No `Bearer` prefix."}],"responses":{"101":{"description":"WebSocket upgrade successful. After the upgrade, the server sends a `SessionReady` JSON message and waits for text input.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/SessionReadyMessage"}}}},"401":{"description":"Unauthorized — invalid or missing API key.","content":{"application/json":{"schema":{"$ref":"#/components/schemas/ErrorResponseMessage"}}}}}}}}}
```

***

## Message Format

{% hint style="info" %}
**Mixed message types:** The server sends both JSON (text) and binary messages on the same WebSocket connection. Your client must check the WebSocket frame type to distinguish them:

* **Text frames (JSON):** `SessionReady`, `EndInteraction`, `CancelInteraction`, `ErrorResponse`
* **Binary frames:** `InteractionInput`, `InteractionResponse`
  {% endhint %}

{% hint style="info" %}
**Byte order:** All multi-byte integer fields in binary messages use **network byte order (big-endian)**.
{% endhint %}

***

## Messages Reference

### Server -> Client Messages

## The SessionReadyMessage object

```json
{"openapi":"3.0.3","info":{"title":"Ojin Oris Voice Realtime API","version":"1.0.0"},"components":{"schemas":{"SessionReadyMessage":{"type":"object","description":"Sent once by the server after the WebSocket connection is established and inference resources are allocated. The server then waits for text input.\n\n**Format:** JSON text frame.","required":["type","payload"],"properties":{"type":{"type":"string","enum":["sessionReady"]},"payload":{"type":"object","required":["trace_id","status","load"],"properties":{"trace_id":{"type":"string","format":"uuid","description":"Unique session identifier assigned by the server."},"status":{"type":"string","enum":["success"],"description":"Always `success`."},"load":{"type":"number","format":"float","minimum":0,"maximum":1,"description":"Current load of the inference server (0.0–1.0)."},"timestamp":{"type":"integer","format":"int64","description":"Server timestamp in milliseconds since Unix epoch."},"parameters":{"type":"object","additionalProperties":true,"nullable":true,"description":"Model-specific session parameters for Oris Voice, including `sample_rate`, `channels`, and `bits_per_sample`.","properties":{"sample_rate":{"type":"integer","description":"Audio sample rate in Hz."},"channels":{"type":"integer","description":"Number of audio channels."},"bits_per_sample":{"type":"integer","description":"Bits per audio sample."}}}}}}}}}}
```

## The InteractionResponseMessage object

````json
{"openapi":"3.0.3","info":{"title":"Ojin Oris Voice Realtime API","version":"1.0.0"},"components":{"schemas":{"InteractionResponseMessage":{"type":"object","description":"Binary message containing a streaming audio chunk. The server sends these after receiving text input and `EndInteraction`.\n\n**Format:** Binary frame.\n\n**Binary structure (big-endian):**\n```\n[1 byte  ]  Is final flag   — uint8, 1 = last chunk, 0 = more coming\n[16 bytes]  Interaction ID  — UUID bytes\n[8 bytes ]  Timestamp       — uint64, milliseconds since Unix epoch\n[4 bytes ]  Usage           — uint32, usage metric (audio duration in microseconds)\n[4 bytes ]  Index           — uint32, chunk index\n[4 bytes ]  Num payloads    — uint32, number of payload entries\n\nFor each payload entry:\n  [4 bytes]  Data size       — uint32, byte length of payload data\n  [1 byte ]  Payload type    — uint8, 1 = audio\n  [N bytes]  Payload data    — raw PCM int16 audio bytes\n```\n\nPython unpack: `struct.unpack('!B16sQIII', header)` for the main header, `struct.unpack('!IB', entry)` for each payload entry.","required":["is_final","interaction_id","timestamp","usage","index","payloads"],"properties":{"is_final":{"type":"boolean","description":"`true` if this is the last audio chunk for the current interaction."},"interaction_id":{"type":"string","format":"uuid","description":"UUID identifying this interaction."},"timestamp":{"type":"integer","format":"int64","description":"Milliseconds since Unix epoch when the chunk was sent."},"usage":{"type":"integer","format":"int32","description":"Audio duration in microseconds for this chunk. Used for billing."},"index":{"type":"integer","format":"int32","description":"Chunk index (0-based, incrementing)."},"payloads":{"type":"array","description":"List of payload entries. Each chunk typically contains one audio entry.","items":{"type":"object","required":["payload_type","data_size","data"],"properties":{"payload_type":{"type":"integer","enum":[1],"description":"`1` = audio (PCM int16, 24 kHz, mono)."},"data_size":{"type":"integer","format":"int32","description":"Byte length of the audio data."},"data":{"type":"string","format":"binary","description":"Raw PCM int16 audio bytes at 24 kHz mono."}}}}}}}}}
````

## The ErrorResponseMessage object

```json
{"openapi":"3.0.3","info":{"title":"Ojin Oris Voice Realtime API","version":"1.0.0"},"components":{"schemas":{"ErrorResponseMessage":{"type":"object","description":"Sent by the server when an error occurs.\n\n**Format:** JSON text frame.\n\n> **Note:** In some error conditions, the server may send a plain text message instead of structured JSON. Your client should handle non-JSON text messages gracefully.","required":["type","payload"],"properties":{"type":{"type":"string","enum":["errorResponse"]},"payload":{"type":"object","required":["code","message","timestamp"],"properties":{"code":{"type":"string","description":"Machine-readable error code.","enum":["AUTH_FAILED","UNAUTHORIZED","MISSING_CONFIG_ID","INVALID_MESSAGE","INVALID_HEADERS","MODEL_NOT_FOUND","BACKEND_UNAVAILABLE","RATE_LIMITED","TIMEOUT","CANCELLED","INTERNAL_ERROR","FRAME_SIZE_EXCEEDED"]},"message":{"type":"string","description":"Human-readable description of the error."},"interaction_id":{"type":"string","nullable":true,"description":"The interaction ID related to the error, if applicable."},"details":{"type":"object","additionalProperties":true,"nullable":true,"description":"Optional additional structured details about the error."},"timestamp":{"type":"integer","format":"int64","description":"Milliseconds since Unix epoch when the error was sent."}}}}}}}}
```

### Client -> Server Messages

## The InteractionInputMessage object

````json
{"openapi":"3.0.3","info":{"title":"Ojin Oris Voice Realtime API","version":"1.0.0"},"components":{"schemas":{"InteractionInputMessage":{"type":"object","description":"Binary message for sending text to the server for synthesis.\n\n**Format:** Binary frame.\n\n**Binary structure (big-endian):**\n```\n[1 byte ]  Payload type   — uint8, 0 for text\n[8 bytes]  Timestamp      — uint64, milliseconds since Unix epoch\n[4 bytes]  Params size    — uint32, byte length of JSON params (0 if none)\n[N bytes]  Params JSON    — UTF-8 JSON (only present if params size > 0)\n[M bytes]  Text payload   — UTF-8 encoded text to synthesize\n```\n\nPython pack: `struct.pack('!BQI', 0, timestamp, params_size)`","required":["payload_type","timestamp","params_size","text_payload"],"properties":{"payload_type":{"type":"integer","enum":[0],"description":"Always `0` for text."},"timestamp":{"type":"integer","format":"int64","description":"Milliseconds since Unix epoch when the message was sent."},"params_size":{"type":"integer","format":"int32","minimum":0,"description":"Byte length of the JSON params block. `0` if no params."},"text_payload":{"type":"string","description":"UTF-8 encoded text to synthesize. The model handles sentence segmentation internally."}}}}}}
````

## The EndInteractionMessage object

```json
{"openapi":"3.0.3","info":{"title":"Ojin Oris Voice Realtime API","version":"1.0.0"},"components":{"schemas":{"EndInteractionMessage":{"type":"object","description":"Signal that all text has been sent and synthesis should begin. The server processes the queued text and streams audio chunks, with the last chunk marked `is_final: true`.\n\n**Format:** JSON text frame.","required":["type","payload"],"properties":{"type":{"type":"string","enum":["endInteraction"]},"payload":{"type":"object","required":["timestamp"],"properties":{"timestamp":{"type":"integer","format":"int64","description":"Milliseconds since Unix epoch when the message was sent."}}}}}}}}
```

## The CancelInteractionMessage object

```json
{"openapi":"3.0.3","info":{"title":"Ojin Oris Voice Realtime API","version":"1.0.0"},"components":{"schemas":{"CancelInteractionMessage":{"type":"object","description":"Immediately stop synthesis and discard remaining audio. No final chunk is sent.\n\n**Format:** JSON text frame.","required":["type","payload"],"properties":{"type":{"type":"string","enum":["cancelInteraction"]},"payload":{"type":"object","properties":{"timestamp":{"type":"integer","format":"int64","nullable":true,"description":"Optional. Milliseconds since Unix epoch when the message was sent."}}}}}}}}
```

***

## Message Details

### InteractionInput (Client -> Server, Binary)

Binary message for sending text to the server.

**Binary structure:**

```
[1 byte ]  Payload type      — uint8, 0 for text
[8 bytes]  Timestamp          — uint64, milliseconds since Unix epoch
[4 bytes]  Params size        — uint32, byte length of the JSON params block (0 if no params)
[N bytes]  Params JSON        — UTF-8 encoded JSON (only present if params size > 0)
[M bytes]  Text payload       — UTF-8 encoded text to synthesize
```

**Header fields** use **big-endian** byte order. In Python: `struct.pack('!BQI', payload_type, timestamp, params_size)`.

**Text requirements:**

| Property         | Value                                           |
| ---------------- | ----------------------------------------------- |
| Encoding         | UTF-8                                           |
| Payload type     | `0` (text)                                      |
| Max message size | 512 KB (entire binary message including header) |

**Example:**

```python
import struct, time

def build_text_message(text, params=None):
    """Build a binary InteractionInput message for text."""
    payload = text.encode("utf-8")
    params_bytes = b""
    if params:
        import json
        params_bytes = json.dumps(params).encode("utf-8")
    header = struct.pack('!BQI', 0, int(time.time() * 1000), len(params_bytes))
    return header + params_bytes + payload
```

***

### InteractionResponse (Server -> Client, Binary)

Binary message containing an audio chunk. The server streams these after receiving text input and `EndInteraction`.

**Binary structure:**

```
[1 byte  ]  Is final flag     — uint8, 1 = last chunk for this interaction, 0 = more coming
[16 bytes]  Interaction ID     — UUID bytes (big-endian)
[8 bytes ]  Timestamp          — uint64, milliseconds since Unix epoch
[4 bytes ]  Usage              — uint32, usage metric for this response
[4 bytes ]  Index              — uint32, chunk index
[4 bytes ]  Num payloads       — uint32, number of payload entries that follow

For each payload entry:
  [4 bytes]  Data size          — uint32, byte length of the payload data
  [1 byte ]  Payload type       — uint8, 1 = audio
  [N bytes]  Payload data       — raw PCM int16 audio bytes

[1 byte ]  Frame type         — uint8, appended after all payload entries: 0=idle, 1=speech, 2=fade-out, 3=start-of-speech
```

All multi-byte integers are **big-endian**. In Python: `struct.unpack('!B16sQIII', header_bytes)` for the main header, `struct.unpack('!IB', entry_bytes)` for each payload entry. A single trailing `frame_type` byte (uint8) follows the last payload entry — oris-voice emits `1` (speech) for its audio chunks; the field is present for parity with the video models, which use the full `0`/`1`/`2`/`3` range.

**Payload types:**

| Type      | Format                 | Description                           |
| --------- | ---------------------- | ------------------------------------- |
| 1 (audio) | PCM int16, 24 kHz mono | Streaming audio chunk (variable size) |

**Parsing example:**

```python
import struct, uuid

HEADER_FMT = '!B16sQIII'
HEADER_SIZE = struct.calcsize(HEADER_FMT)   # 37 bytes
ENTRY_FMT = '!IB'
ENTRY_SIZE = struct.calcsize(ENTRY_FMT)     # 5 bytes

def parse_response(data):
    is_final, uuid_bytes, timestamp, usage, index, num_payloads = \
        struct.unpack(HEADER_FMT, data[:HEADER_SIZE])

    offset = HEADER_SIZE
    audio_chunks = []

    for _ in range(num_payloads):
        size, ptype = struct.unpack(ENTRY_FMT, data[offset:offset + ENTRY_SIZE])
        offset += ENTRY_SIZE
        if ptype == 1 and size > 0:  # audio
            audio_chunks.append(data[offset:offset + size])
        offset += size

    # Trailing frame_type byte (oris-voice always emits 1 = speech).
    frame_type = data[offset]

    return {
        'is_final': bool(is_final),
        'interaction_id': str(uuid.UUID(bytes=uuid_bytes)),
        'index': index,            # chunk index (0-based, incrementing)
        'frame_type': frame_type,
        'audio_chunks': audio_chunks,
    }
```

***

### EndInteraction vs CancelInteraction

| Message             | Purpose         | Server behavior                                                               | Use case                             |
| ------------------- | --------------- | ----------------------------------------------------------------------------- | ------------------------------------ |
| `EndInteraction`    | Graceful finish | Completes synthesis, sends remaining chunks with last marked `is_final: true` | Normal completion after sending text |
| `CancelInteraction` | Immediate stop  | Stops synthesis, discards remaining audio                                     | User interruption or abort           |

***

### ErrorResponse (Server -> Client, JSON)

{% hint style="warning" %}
**Plain text errors:** In some error conditions (e.g., no backend servers available), the server may send a plain text message instead of a structured JSON `ErrorResponse`. Your client should handle non-JSON text messages gracefully.
{% endhint %}

**Error codes:**

| Code                  | Description                              |
| --------------------- | ---------------------------------------- |
| `AUTH_FAILED`         | Invalid API key                          |
| `UNAUTHORIZED`        | Caller lacks permission                  |
| `MISSING_CONFIG_ID`   | `config_id` query parameter not provided |
| `INVALID_MESSAGE`     | Malformed or unsupported message payload |
| `INVALID_HEADERS`     | Missing or invalid headers               |
| `MODEL_NOT_FOUND`     | Config ID not found or invalid           |
| `BACKEND_UNAVAILABLE` | No healthy inference backend available   |
| `RATE_LIMITED`        | Too many requests                        |
| `TIMEOUT`             | Operation exceeded processing time       |
| `CANCELLED`           | Interaction cancelled by client          |
| `INTERNAL_ERROR`      | Unexpected server error                  |
| `FRAME_SIZE_EXCEEDED` | Message exceeded 512 KB limit            |

***

## Rate Limits & Constraints

| Constraint            | Value                                              |
| --------------------- | -------------------------------------------------- |
| Max message size      | 512 KB per message                                 |
| Max generation length | \~30 seconds per interaction (360 tokens at 12 Hz) |

Exceeding limits results in an `ErrorResponse` with the appropriate code.

***

## Best Practices

### Text Input

* Send the full text in a single `InteractionInput` message, then immediately send `EndInteraction`
* The model handles sentence segmentation and streaming internally
* For very long texts, consider splitting into sentences and making separate requests

### Streaming Playback

* Process audio chunks as they arrive for lowest perceived latency
* Buffer a small amount (2--3 chunks) before starting playback to absorb network jitter
* The server generates audio faster than realtime, so chunks will arrive ahead of playback

### Error Handling

* Handle both JSON `ErrorResponse` messages and plain text error strings
* Implement exponential backoff for reconnection
* Check the `SessionReady` message before sending any input

***

## Complete Example

```python
import asyncio
import json
import struct
import time
import wave
import os

import websockets
from dotenv import load_dotenv

load_dotenv()

API_KEY = os.getenv("OJIN_API_KEY", "")
CONFIG_ID = os.getenv("OJIN_CONFIG_ID", "")
WS_URL = f"wss://models.ojin.ai/realtime?config_id={CONFIG_ID}"

SAMPLE_RATE = 24000

def build_text_message(text):
    """Build a binary InteractionInput for text payload."""
    header = struct.pack('!BQI', 0, int(time.time() * 1000), 0)
    return header + text.encode('utf-8')

def parse_response(data):
    """Parse a binary InteractionResponse."""
    fmt = '!B16sQIII'
    hdr_size = struct.calcsize(fmt)
    is_final, _, _, _, _, num_payloads = struct.unpack(fmt, data[:hdr_size])

    offset = hdr_size
    audio = []
    for _ in range(num_payloads):
        size, ptype = struct.unpack('!IB', data[offset:offset + 5])
        offset += 5
        if ptype == 1 and size > 0:
            audio.append(data[offset:offset + size])
        offset += size

    return bool(is_final), audio

async def main():
    headers = websockets.Headers()
    headers["Authorization"] = API_KEY

    async with websockets.connect(
        WS_URL,
        additional_headers=headers,
        open_timeout=None,
        ping_timeout=None,
    ) as ws:
        # 1. Wait for session ready
        while True:
            msg = await ws.recv()
            if isinstance(msg, str):
                parsed = json.loads(msg)
                if parsed.get("type") == "sessionReady":
                    print(f"Session ready (trace_id: {parsed['payload'].get('trace_id')})")
                    break
                if parsed.get("type") == "errorResponse":
                    raise RuntimeError(parsed["payload"]["message"])

        # 2. Send text + end interaction
        text = "Hello! This is a demonstration of Ojin Oris Voice text to speech."
        await ws.send(build_text_message(text))
        await ws.send(json.dumps({
            "type": "endInteraction",
            "payload": {"timestamp": int(time.time() * 1000)},
        }))
        print(f"Sent text ({len(text)} chars), waiting for audio...")

        # 3. Receive audio chunks
        audio_data = []
        chunk_count = 0
        t0 = time.monotonic()

        while True:
            msg = await ws.recv()
            if isinstance(msg, str):
                parsed = json.loads(msg)
                if parsed.get("type") == "errorResponse":
                    raise RuntimeError(parsed["payload"]["message"])
                continue

            is_final, chunks = parse_response(msg)
            audio_data.extend(chunks)
            chunk_count += len(chunks)

            if is_final:
                break

        elapsed = time.monotonic() - t0

        # 4. Write WAV file
        pcm = b"".join(audio_data)
        duration = len(pcm) / (SAMPLE_RATE * 2)
        output_path = f"tts-output-{int(time.time())}.wav"

        with wave.open(output_path, "wb") as wf:
            wf.setnchannels(1)
            wf.setsampwidth(2)
            wf.setframerate(SAMPLE_RATE)
            wf.writeframes(pcm)

        print(f"Saved {output_path}")
        print(f"  Chunks: {chunk_count}")
        print(f"  Duration: {duration:.2f}s")
        print(f"  Elapsed: {elapsed:.2f}s")
        if duration > 0:
            print(f"  RTF: {elapsed / duration:.2f}x")

asyncio.run(main())
```

***

## Troubleshooting

### Connection Issues

* Verify API key and config ID
* Check that the config exists in the dashboard
* Ensure network allows WebSocket connections (port 443)
* The `Authorization` header uses the raw API key (no `Bearer` prefix)

### No Audio Received

* Confirm you received `SessionReady` before sending text
* Make sure you send `EndInteraction` after the text input — synthesis does not start until the server receives it
* Check message size is under 512 KB

### Audio Quality Issues

* Verify the output is written as 24 kHz, 16-bit mono WAV
* Check the `language` parameter matches your input text, or use `"Auto"`
* For voice cloning, ensure the reference audio is clean and at least 5 seconds long

### Unexpected Silence or Truncation

* Check `max_new_tokens` — the default (360) caps output at \~30 seconds
* If the text is very long, consider splitting into smaller segments

***


# Troubleshooting

## Connection Issues

* ✓ Verify API key and config ID
* ✓ Check that config exists in dashboard
* ✓ Ensure network allows WebSocket connections

## No Audio Output

* ✓ Verify text is sent as binary `InteractionInput` with payload type `0` (text), not `1` (audio)
* ✓ Confirm you sent `EndInteraction` JSON after the text — the model waits for this signal
* ✓ Check output format handling: PCM int16, 24 kHz mono (not 16 kHz — Oris Voice uses 24 kHz; persona video realtime APIs may differ)

## Garbled or Distorted Audio

* ✓ Ensure audio is decoded as 24,000 Hz sample rate (not 16,000 Hz)
* ✓ Verify little-endian byte order for PCM samples
* ✓ Check that you're reading the correct number of bytes per payload (from the payload size header)

## Voice Clone Doesn't Sound Right

* ✓ Use 5–15 seconds of clean reference audio (longer is not better)
* ✓ WAV format preferred for clone reference
* ✓ Avoid reference audio with background noise or music


