GPT Realtime 使用指南

本页面介绍如何使用我们提供的 Realtime 系列模型（turing/gpt-realtime 与 turing/gpt-realtime-mini）。内容覆盖鉴权、连接方式（WebSocket / HTTP）、消息格式、Node.js / 浏览器 / Python 示例、流式事件处理、常用参数与最佳实践。

概述

Realtime 模型专为低延迟、长会话和流式交互设计。主要场景包括在线对话、多人协作编辑、实时代理/助手、语音实时转文本+理解等。

主要区别：

turing/gpt-realtime：完整功能，高质量推理与生成，适合对理解和长上下文有要求的场景。
turing/gpt-realtime-mini：轻量、低成本版本，延迟更低，适合高并发与成本敏感场景，但在复杂推理上可能略逊一筹。

认证

所有请求需在 HTTP header 中携带 Bearer token：

Header: Authorization: Bearer <YOUR_API_KEY>

请勿在客户端公开暴露主密钥；在浏览器场景下，应通过后端代理或短期签发的临时凭证进行呼叫。

连接方式选择

WebSocket（推荐）：适合低延迟、双向实时通信、需要服务端主动推送事件的场景。支持多路复用会话、流式响应、逐句/逐词输出。
HTTP REST（轮询或流式）：适合一次性请求或不方便维持长连接的环境。若使用 HTTP 流（chunked），可接收分块的生成结果，但延迟与复杂事件处理不如 WebSocket 灵活。

消息协议（WebSocket）

以下为推荐的 WebSocket JSON 消息格式（用作示例，实际协议以后端文档为准）：

客户端 -> 服务端（建立会话）

{
  "type": "session.create",
  "model": "turing/gpt-realtime",
  "session": {
    "id": "session-123",
    "metadata": { "user_id": "u-1" }
  }
}

客户端 -> 服务端（发送输入）

{
  "type": "input",
  "session_id": "session-123",
  "input": {
    "text": "你好，给我生成一段 50 字以内的产品介绍。"
  }
}

服务端 -> 客户端（流式输出）

{
  "type": "output.token",
  "session_id": "session-123",
  "token": "这是",
  "seq": 1
}

{
  "type": "output.complete",
  "session_id": "session-123",
  "finished": true
}

错误与状态也会以事件形式返回：error、session.closed、heartbeat 等。

Node.js（WebSocket）示例

以下示例使用 ws 库连接到 Realtime WebSocket。请将 WSS_URL 与 API_KEY 替换为实际值。

import WebSocket from 'ws';

const WSS_URL = 'wss://live-turing.cn.llm.tcljd.com/realtime';
const API_KEY = process.env.TURING_API_KEY;

const ws = new WebSocket(WSS_URL, {
  headers: {
    Authorization: `Bearer ${API_KEY}`,
  },
});

ws.on('open', () => {
  // 创建会话
  ws.send(JSON.stringify({ type: 'session.create', model: 'turing/gpt-realtime', session: { id: 's1' } }));
  // 发送输入
  ws.send(JSON.stringify({ type: 'input', session_id: 's1', input: { text: '帮我写一段 40 字左右的招聘文案。' } }));
});

ws.on('message', (data) => {
  const msg = JSON.parse(data.toString());
  if (msg.type === 'output.token') process.stdout.write(msg.token);
  if (msg.type === 'output.complete') console.log('\n== 完成 ==');
  if (msg.type === 'error') console.error('Error:', msg.error);
});

ws.on('close', () => console.log('connection closed'));

浏览器端（通过后端代理）

建议在浏览器端不要直接使用主密钥。可以由后端生成临时 token 或代理 WebSocket。下面示例假设后端返回 wssUrl：

const ws = new WebSocket(wssUrl);
ws.addEventListener('open', () => {
  ws.send(JSON.stringify({ type: 'session.create', model: 'turing/gpt-realtime', session: { id: 'web-1' } }));
});

ws.addEventListener('message', (ev) => console.log('recv', JSON.parse(ev.data)));

Python（WebSocket）示例

使用 websockets 或 websocket-client 库：

import os
import asyncio
import websockets
import json

WSS_URL = 'wss://live-turing.cn.llm.tcljd.com/realtime'
API_KEY = os.environ.get('TURING_API_KEY')

async def main():
    async with websockets.connect(WSS_URL, extra_headers={ 'Authorization': f'Bearer {API_KEY}' }) as ws:
        await ws.send(json.dumps({ 'type': 'session.create', 'model': 'turing/gpt-realtime', 'session': { 'id': 'py-1' } }))
        await ws.send(json.dumps({ 'type': 'input', 'session_id': 'py-1', 'input': { 'text': '请生成一段产品简介，30 字左右。' } }))
        async for message in ws:
            msg = json.loads(message)
            if msg.get('type') == 'output.token':
                print(msg['token'], end='')
            if msg.get('type') == 'output.complete':
                print('\n-- done --')
                return

asyncio.run(main())

常用参数说明

model: 模型名（turing/gpt-realtime / turing/gpt-realtime-mini）
max_tokens: 最大生成长度（按 token 计）
temperature: 生成温度（0-2）
top_p: nucleus 采样阈值
stream: 是否启用流式输出（HTTP）
session.metadata: 会话级元信息（用户 ID、对话话题等）

示例：

{
  "model": "turing/gpt-realtime",
  "input": { "text": "写一封面向客户的新版发布通知。" },
  "max_tokens": 300,
  "temperature": 0.2
}

流式事件与断线重连

推荐使用心跳（heartbeat）事件保持连接活跃，服务端也会发送 session.keepalive 或类似事件。
发生短暂网络断连时，客户端应保存会话 ID 与最后的位点（如 seq），并在重连后尝试用 session.resume 恢复会话上下文（若服务端支持）。

性能与费用建议

如果延迟与并发为首要需求，优先选择 turing/gpt-realtime-mini 并结合更短的上下文窗口。
在需要高质量生成与复杂推理时，选择 turing/gpt-realtime 并适当提高 max_tokens 与 temperature 调优输出质量。

常见问题与调试

无法连接：检查 WSS 地址、网络、防火墙与 Authorization header；浏览器场景需通过后端代理。
收到 error 事件：查看 error.code 与 error.message，常见为鉴权/配额/超时错误。
输出不稳定或半截：确认客户端是否正确处理 output.token 与 output.complete 事件；HTTP chunking 时确保按 chunk 拼接。

示例 — 多轮对话管理要点

将用户与助手的历史作为会话元数据上传，或让服务端维护会话状态，避免每次传入完整上下文造成带宽与延迟问题。
对超长历史，考虑只保留摘要或重要轮次以节省 token 成本。

更新记录

2025-10-14: 添加 turing/gpt-realtime 与 turing/gpt-realtime-mini 的使用示例与最佳实践。

概述​

认证​

连接方式选择​

消息协议（WebSocket）​

Node.js（WebSocket）示例​

浏览器端（通过后端代理）​

Python（WebSocket）示例​

常用参数说明​

流式事件与断线重连​

性能与费用建议​

常见问题与调试​

示例 — 多轮对话管理要点​

更新记录​

概述

认证