Note
Go to the end to download the full example code.
Realtime Agent¶
The realtime agent is designed to handle real-time interactions, such as voice conversations or live chat sessions. The realtime agent in AgentScope features:
Integration with OpenAI, DashScope, Gemini, and other realtime model APIs
Unified event interface to simplify interactions with different realtime models
Support for tool calling capabilities
Support for multi-agent interactions
Note
The realtime agent is currently under active development. We welcome community contributions, discussions, and feedback! If you’re interested in realtime agents, please join our discussion and development.
import asyncio
import os
from agentscope.agent import RealtimeAgent
from agentscope.realtime import (
DashScopeRealtimeModel,
OpenAIRealtimeModel,
GeminiRealtimeModel,
)
Creating Realtime Models¶
AgentScope currently supports the following realtime model APIs:
Provider |
Class |
Supported Models |
Input Modalities |
Tool Support |
|---|---|---|---|---|
DashScope |
|
|
Text, Audio, Image |
No |
OpenAI |
|
|
Text, Audio |
Yes |
Gemini |
|
|
Text, Audio, Image |
Yes |
Here are examples of initializing different realtime models:
The realtime model provides the following key methods:
Method |
Description |
|---|---|
|
Establish WebSocket connection to the realtime model API |
|
Close the WebSocket connection |
|
Send audio/text/image data to the realtime model for processing |
The outgoing_queue parameter in connect() is an asyncio queue used to
forward events from the realtime model to the outside (e.g., the agent or frontend).
Model Events Interface¶
AgentScope provides a unified agentscope.realtime.ModelEvents interface to simplify
interactions with different realtime models. The following events are
supported:
Note
The “session” in ModelEvents refers to the WebSocket connection session between the realtime model and the model API, not the session between the frontend and backend.
Event |
Description |
|---|---|
|
Session is successfully created |
|
Session has ended |
|
Model begins generating a response |
|
Model finished generating a response |
|
Streaming audio data chunk from the model |
|
Audio response is complete |
|
Streaming transcription chunk of audio response |
|
Audio transcription is complete |
|
Streaming tool call parameters |
|
Tool call parameters are complete |
|
Streaming transcription chunk of user input |
|
User input transcription is complete |
|
Detected start of user audio input (VAD) |
|
Detected end of user audio input (VAD) |
|
An error occurred |
Creating a Realtime Agent¶
The RealtimeAgent serves as a bridge layer that:
Converts
ModelEventsfrom realtime models intoServerEventsfor frontend and other agentsReceives
ClientEventsfrom frontend or other agents and forwards them to the realtime model APIManages the agent’s lifecycle and event queues
Server and Client Events¶
AgentScope provides unified ServerEvents and ClientEvents for
communication between backend and frontend:
ServerEvents (Backend → Frontend):
Event |
Description |
|---|---|
|
Session created in backend |
|
Session updated in backend |
|
Session ended in backend |
|
Agent is ready to receive inputs |
|
Agent has ended |
|
Agent starts generating response |
|
Agent finished generating response |
|
Streaming audio chunk from agent |
|
Audio response complete |
|
Streaming transcription of agent response |
|
Transcription complete |
|
Streaming tool call data |
|
Tool call complete |
|
Tool execution result |
|
Streaming transcription of user input |
|
Input transcription complete |
|
User audio input started |
|
User audio input ended |
|
An error occurred |
ClientEvents (Frontend → Backend):
Event |
Description |
|---|---|
|
Create a new session with specified configuration |
|
End current session |
|
Request agent to generate response immediately |
|
Interrupt agent’s current response |
|
Append text input |
|
Append audio input |
|
Commit audio input (signal end of input) |
|
Append image input |
|
Send tool execution result |
Initializing a Realtime Agent¶
Here’s how to create and use a realtime agent:
async def example_realtime_agent() -> None:
"""Example of creating and using a realtime agent."""
agent = RealtimeAgent(
name="Friday",
sys_prompt="You are a helpful assistant named Friday.",
model=DashScopeRealtimeModel(
model_name="qwen3-omni-flash-realtime",
api_key=os.getenv("DASHSCOPE_API_KEY"),
),
)
# Create a queue to receive messages from the agent
outgoing_queue = asyncio.Queue()
# The agent is now ready to handle inputs
# Handle outgoing messages in a separate task
async def handle_agent_messages():
while True:
event = await outgoing_queue.get()
# Process the event (e.g., send to frontend via WebSocket)
print(f"Agent event: {event.type}")
# Start the message handling task
asyncio.create_task(handle_agent_messages())
# Start the agent (establishes connection)
await agent.start(outgoing_queue)
# Stop the agent when done
await agent.stop()
Starting Realtime Conversation¶
Now we can set up a realtime conversation between a user and a realtime agent.
Here we take FastAPI as an example backend framework to demonstrate how to set up a realtime conversation.
Backend Setup (Server-side):
The backend needs to:
Create a WebSocket endpoint to accept frontend connections
Create a
RealtimeAgentwhen the session startsForward
ClientEventsfrom frontend to the agentForward
ServerEventsfrom agent to the frontend
from fastapi import FastAPI, WebSocket
from agentscope.agent import RealtimeAgent
from agentscope.realtime import (
DashScopeRealtimeModel,
ClientEvents,
ServerEvents,
)
app = FastAPI()
@app.websocket("/ws/{user_id}/{session_id}")
async def websocket_endpoint(
websocket: WebSocket,
user_id: str,
session_id: str,
):
await websocket.accept()
# Create queue for agent messages
frontend_queue = asyncio.Queue()
# Create agent
agent = RealtimeAgent(
name="Assistant",
sys_prompt="You are a helpful assistant.",
model=DashScopeRealtimeModel(
model_name="qwen3-omni-flash-realtime",
api_key=os.getenv("DASHSCOPE_API_KEY"),
),
)
# Start agent
await agent.start(frontend_queue)
# Forward messages from agent to frontend
async def send_to_frontend():
while True:
msg = await frontend_queue.get()
await websocket.send_json(msg.model_dump())
asyncio.create_task(send_to_frontend())
# Receive messages from frontend and forward to agent
while True:
data = await websocket.receive_json()
client_event = ClientEvents.from_json(data)
await agent.handle_input(client_event)
Frontend Setup (Client-side):
The frontend needs to:
Establish WebSocket connection to the backend
Send
CLIENT_SESSION_CREATEevent to initialize the sessionCapture audio from microphone and send via
CLIENT_AUDIO_APPENDeventsReceive and handle
ServerEvents(e.g., play audio, display transcripts)
// Connect to WebSocket
const ws = new WebSocket('ws://localhost:8000/ws/user1/session1');
ws.onopen = () => {
// Create session
ws.send(JSON.stringify({
type: 'client_session_create',
config: {
instructions: 'You are a helpful assistant.',
user_name: 'User1'
}
}));
};
// Handle messages from backend
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.type === 'response_audio_delta') {
// Play audio chunk
playAudio(data.delta);
}
};
// Send audio data
function sendAudioChunk(audioData) {
ws.send(JSON.stringify({
type: 'client_audio_append',
session_id: 'session1',
audio: audioData, // base64 encoded
format: { encoding: 'pcm16', sample_rate: 16000 }
}));
}
For a complete working example, see
examples/agent/realtime_voice_agent/ in the AgentScope repository.
Multi-Agent Realtime Conversation¶
AgentScope supports multi-agent realtime interactions through the ChatRoom
class.
Note currently most realtime model APIs only support single-user interactions, but AgentScope’s architecture is designed to support multiple agents and users when API capabilities expand.
The Realtime ChatRoom¶
AgentScope introduces the ChatRoom class to manage multiple realtime
agents in a shared conversation space. The ChatRoom provides:
Centralized management of multiple
RealtimeAgentinstancesAutomatic message broadcasting between agents
Unified message queue for frontend communication
Lifecycle management for all agents in the room
Using ChatRoom¶
The usage of ChatRoom is similar to RealtimeAgent:
async def example_chat_room() -> None:
"""Example of using ChatRoom with multiple realtime agents."""
from agentscope.pipeline import ChatRoom
from agentscope.agent import RealtimeAgent
from agentscope.realtime import DashScopeRealtimeModel
# Create multiple agents
agent1 = RealtimeAgent(
name="Agent1",
sys_prompt="You are Agent1, a helpful assistant.",
model=DashScopeRealtimeModel(
model_name="qwen3-omni-flash-realtime",
api_key=os.getenv("DASHSCOPE_API_KEY"),
),
)
agent2 = RealtimeAgent(
name="Agent2",
sys_prompt="You are Agent2, a helpful assistant.",
model=DashScopeRealtimeModel(
model_name="qwen3-omni-flash-realtime",
api_key=os.getenv("DASHSCOPE_API_KEY"),
),
)
# Create a chat room with multiple agents
chat_room = ChatRoom(agents=[agent1, agent2])
# Create queue to receive messages from all agents
outgoing_queue = asyncio.Queue()
# Start the chat room
await chat_room.start(outgoing_queue)
# Handle input from frontend
# The chat room will broadcast to all agents
from agentscope.realtime import ClientEvents
client_event = ClientEvents.ClientTextAppendEvent(
session_id="session1",
text="Hello everyone!",
)
await chat_room.handle_input(client_event)
# Stop the chat room when done
await chat_room.stop()
Roadmap¶
The realtime agent feature is currently experimental and under active development. The future plans include:
Support for more realtime model APIs
Enhanced memory management for conversation history
Comprehensive tool calling support across all providers
Multi-user voice interaction support
Improved VAD (Voice Activity Detection) configuration
Better error handling and recovery mechanisms
We welcome contributions and feedback from the community to help shape the future of realtime agents in AgentScope!
Total running time of the script: (0 minutes 0.001 seconds)