跳到主要内容

语音转写 / Speech-to-Text

把音频 URL 转写为文本,支持多语言、说话人分离、专有名词识别和音视频格式转换。图灵平台采用异步任务模型:先发起任务拿到 task_run_id,再轮询任务结果。

完整服务入口与计费口径见 音频模型 → 语音转写 / ASR

为什么异步?

STT 任务比 chat 响应更耗时(长音频可能需要数十秒),同步阻塞会放大客户端压力。异步 + 轮询模式让你:

  • 不受 HTTP 连接超时限制
  • 批量场景可并发多任务
  • 长音频可在后台转写,前端先做别的

基础用法

/audio/transcriptions/runs 接口接收 JSON 请求体。音频来源通过 file 字段传入,其中 file.type 填写 public_urifile.file_uri 填写公网可访问的音频地址。

1. 发起任务

curl "$TURING_BASE_URL/audio/transcriptions/runs" \
-H "Authorization: Bearer $TURING_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"service_name": "aliyun/tingwu",
"language_code": "cn",
"file": {
"type": "public_uri",
"file_uri": "https://example.com/speech.mp3"
},
"diarization": {
"enabled": true,
"speaker_number": 2
},
"turing_options": {
"include_service_usages": true
}
}'

响应:

{
"code": 0,
"message": "Success",
"data": {
"task_run_id": "run_xxx",
"task_run_state": "running"
}
}

2. 轮询状态

curl "$TURING_BASE_URL/audio/transcriptions/runs/run_xxx" \
-H "Authorization: Bearer $TURING_API_KEY"

任务状态会从 running 流转到 completed / failed

成功时:

{
"code": 0,
"message": "Success",
"data": {
"task_run_id": "run_xxx",
"task_run_state": "completed",
"transcript": "完整转写文本...",
"segments": [
{
"start_time": 0,
"end_time": 3200,
"text": "第一段",
"words": []
}
],
"audio_info": {
"duration_ms": 42300
}
}
}

Python 示例

import time

import httpx

BASE = "https://live-turing.cn.llm.tcljd.com/api/v1"
headers = {"Authorization": f"Bearer {API_KEY}"}


def transcribe(audio_url: str) -> dict:
create_resp = httpx.post(
f"{BASE}/audio/transcriptions/runs",
headers={**headers, "Content-Type": "application/json"},
json={
"service_name": "aliyun/tingwu",
"language_code": "cn",
"file": {
"type": "public_uri",
"file_uri": audio_url,
},
"diarization": {
"enabled": True,
"speaker_number": 2,
},
"turing_options": {
"include_service_usages": True,
},
},
timeout=30,
)
create_resp.raise_for_status()
task_id = create_resp.json()["data"]["task_run_id"]

while True:
status_resp = httpx.get(f"{BASE}/audio/transcriptions/runs/{task_id}", headers=headers, timeout=30)
status_resp.raise_for_status()
body = status_resp.json()["data"]
if body["task_run_state"] in {"completed", "failed"}:
return body
time.sleep(2)

参数

参数作用取值
service_name转写服务填写 aliyun/tingwu
language_code指定语言cn / en / ja
file.type音频来源类型填写 public_uri
file.file_uri音频公网 URLhttps://example.com/speech.mp3
diarization.enabled是否开启说话人分离true / false
diarization.speaker_number说话人数正整数,可选
phrase_set专有名词识别增强可传入人名、产品名、品牌名等需要重点识别的词
extra高级配置如音视频格式转换设置
turing_options.include_service_usages是否返回计费明细true / false

音视频格式转换示例

如需在转写前把音频或视频转换为指定格式,可在 extra 中传入格式转换配置:

{
"extra": {
"Parameters": {
"Transcoding": {
"TargetAudioFormat": "mp3",
"TargetVideoFormat": "mp4"
}
}
}
}

支持格式

mp3mp4mpegmpgam4awavwebm

计费

按音频时长计费(秒级),中文 / 多语言单价略有差异。详见 计费与用量

常见问题

  • 模型参数应该怎么传? 使用 service_name 选择转写服务,音频转写请传 aliyun/tingwu
  • 任务一直 running → 转写服务仍在处理;大批量场景建议指数退避轮询。
  • 专有名词识别不准确 → 使用 phrase_set 提供人名、产品名、品牌名等词,增强识别效果。

See also