Get started

This guide explains how to integrate the ojin/oris-portrait persona model into your applications using either Pipecat or WebSockets

Prerequisites

  1. An Ojin account with an active API key, if you don't have one get your API key

  2. Save the Persona Configuration ID from the dashboard

  3. Integrate with your application using either Pipecat or WebSockets

Staging deployments: For secure, low-latency video applications, connect to the real-time WebSocket API from a backend server rather than a front-end client (to keep your API key secure and leverage a network transport appropriate for real-time video media delivery under varying network conditions). Typically, WebRTC is used to deliver the final media stream to end users for smooth, reliable, low-latency playback.

Pipecat Integration

Pipecat is a powerful open source framework for building conversational AI pipelines. The ojin/oris-portrait model integrates seamlessly with Pipecat through our dedicated OjinVideoService.

Clone the pipecat repository and open the ready to use video-avatar-ojin-video-service example.

To start using it, create a python virtual environment on it and install requirements

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Create a .env file and add your Ojin credentials along with the API keys for the STT, LLM and TTS services used by the example

OJIN_API_KEY="your_api_key_here"
OJIN_CONFIG_ID="your_persona_id_here"
DEEPGRAM_API_KEY="your_deepgram_api_key_here"
CARTESIA_API_KEY="your_cartesia_api_key_here"
OPENAI_API_KEY="your_openai_api_key_here"

Once configured, run the example and connect with a local WebRTC client (or Daily) to interact with a conversational, human-like avatar

python examples/video-avatar/video-avatar-ojin-video-service.py

How It Works

  1. The microphone captures speech input

  2. Voice Activity Detection identifies speech segments

  3. Deepgram transcribes user audio to text (STT)

  4. OpenAI generates the assistant's reply (LLM)

  5. Cartesia synthesizes the reply to audio (TTS)

  6. The OjinVideoService animates your persona based on the TTS audio

  7. Video and audio frames are streamed back to the client in real-time

You can customize the pipeline by adding or removing components, or by adjusting their parameters to suit your needs.

Next Steps

API Reference

Dive deeper into the model API for custom integrations

API Reference →

Last updated

Was this helpful?