ojin/oris-voice

High-quality multilingual text-to-speech with voice cloning, preset speakers, and promptable voice design

Overview

Oris Voice (ojin/oris-voice) is Ojin's streaming text-to-speech product. You get natural-sounding speech at 24 kHz with three voice modes: clone from a short reference clip, pick a built-in speaker, or describe the voice you want in natural language.

Key Features

  • Voice Cloning — Clone any voice from a short reference audio clip and optional transcript

  • Built-in Voices — Choose from a library of built-in speaker identities

  • Voice Design — Describe the desired voice characteristics in natural language

  • Streaming Output — Audio chunks stream in real time as the model generates, enabling low-latency playback

  • Multilingual — Supports Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian, with automatic language detection

  • High-Quality Audio — 24 kHz, 16-bit PCM mono output

Voice Modes

Mode
Description

Clone

Reproduce a voice from a reference audio sample. Provide ref_audio and optionally ref_text.

Built-in Voices

Use a built-in speaker identity. Provide speaker name and optional instruct for style instructions.

Voice Design

Generate a voice from a natural language description. Provide instruct (e.g., "a warm female voice with a British accent").

Quick Start

Getting started with ojin/oris-voice is simple:

  1. Create an API key — Set up authentication for the Ojin platform

  2. Create a configuration — Set up a voice configuration in the dashboard

  3. Integrate with your application — Use the WebSocket API

Use Cases

  • Conversational AI — Generate natural speech responses for chatbots and virtual assistants

  • Content Creation — Produce voiceovers for videos, podcasts, and audiobooks

  • Accessibility — Convert text content to speech for visually impaired users

  • Localization — Generate speech in multiple languages from the same text

  • Voice Cloning — Preserve and reproduce specific voice identities

  • Persona Pipelines — Feed generated audio into ojin/oris-portrait for lip-synced video personas

Supported Languages

Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian, Auto (automatic detection).

Last updated

Was this helpful?