语音转写 / Speech-to-Text

把音频 URL 转写为文本，支持多语言、说话人分离、专有名词识别和音视频格式转换。图灵平台采用异步任务模型：先发起任务拿到 task_run_id，再轮询任务结果。

为什么异步？

STT 任务比 chat 响应更耗时（长音频可能需要数十秒），同步阻塞会放大客户端压力。异步 + 轮询模式让你：

不受 HTTP 连接超时限制
批量场景可并发多任务
长音频可在后台转写，前端先做别的

基础用法

/audio/transcriptions/runs 接口接收 JSON 请求体。音频来源通过 file 字段传入，其中 file.type 填写 public_uri，file.file_uri 填写公网可访问的音频地址。

1. 发起任务

curl "$TURING_BASE_URL/audio/transcriptions/runs" \
  -H "Authorization: Bearer $TURING_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "service_name": "aliyun/tingwu",
    "language_code": "cn",
    "file": {
      "type": "public_uri",
      "file_uri": "https://example.com/speech.mp3"
    },
    "diarization": {
      "enabled": true,
      "speaker_number": 2
    },
    "turing_options": {
      "include_service_usages": true
    }
  }'

响应：

{
  "code": 0,
  "message": "Success",
  "data": {
    "task_run_id": "run_xxx",
    "task_run_state": "running"
  }
}

2. 轮询状态

curl "$TURING_BASE_URL/audio/transcriptions/runs/run_xxx" \
  -H "Authorization: Bearer $TURING_API_KEY"

任务状态会从 running 流转到 completed / failed。

成功时：

{
  "code": 0,
  "message": "Success",
  "data": {
    "task_run_id": "run_xxx",
    "task_run_state": "completed",
    "transcript": "完整转写文本...",
    "segments": [
      {
        "start_time": 0,
        "end_time": 3200,
        "text": "第一段",
        "words": []
      }
    ],
    "audio_info": {
      "duration_ms": 42300
    }
  }
}

Python 示例

import time

import httpx

BASE = "https://live-turing.cn.llm.tcljd.com/api/v1"
headers = {"Authorization": f"Bearer {API_KEY}"}


def transcribe(audio_url: str) -> dict:
    create_resp = httpx.post(
        f"{BASE}/audio/transcriptions/runs",
        headers={**headers, "Content-Type": "application/json"},
        json={
            "service_name": "aliyun/tingwu",
            "language_code": "cn",
            "file": {
                "type": "public_uri",
                "file_uri": audio_url,
            },
            "diarization": {
                "enabled": True,
                "speaker_number": 2,
            },
            "turing_options": {
                "include_service_usages": True,
            },
        },
        timeout=30,
    )
    create_resp.raise_for_status()
    task_id = create_resp.json()["data"]["task_run_id"]

    while True:
        status_resp = httpx.get(f"{BASE}/audio/transcriptions/runs/{task_id}", headers=headers, timeout=30)
        status_resp.raise_for_status()
        body = status_resp.json()["data"]
        if body["task_run_state"] in {"completed", "failed"}:
            return body
        time.sleep(2)

参数

参数	作用	取值
`service_name`	转写服务	填写 `aliyun/tingwu`
`language_code`	指定语言	如 `cn` / `en` / `ja`
`file.type`	音频来源类型	填写 `public_uri`
`file.file_uri`	音频公网 URL	`https://example.com/speech.mp3`
`diarization.enabled`	是否开启说话人分离	`true` / `false`
`diarization.speaker_number`	说话人数	正整数，可选
`phrase_set`	专有名词识别增强	可传入人名、产品名、品牌名等需要重点识别的词
`extra`	高级配置	如音视频格式转换设置
`turing_options.include_service_usages`	是否返回计费明细	`true` / `false`

音视频格式转换示例

如需在转写前把音频或视频转换为指定格式，可在 extra 中传入格式转换配置：

{
  "extra": {
    "Parameters": {
      "Transcoding": {
        "TargetAudioFormat": "mp3",
        "TargetVideoFormat": "mp4"
      }
    }
  }
}

支持格式

mp3、mp4、mpeg、mpga、m4a、wav、webm。

计费

按音频时长计费（秒级），中文 / 多语言单价略有差异。详见计费与用量。

常见问题

模型参数应该怎么传？ 使用 service_name 选择转写服务，音频转写请传 aliyun/tingwu。
任务一直 running → 转写服务仍在处理；大批量场景建议指数退避轮询。
专有名词识别不准确 → 使用 phrase_set 提供人名、产品名、品牌名等词，增强识别效果。

为什么异步？​

基础用法​

1. 发起任务​

2. 轮询状态​

Python 示例​

参数​

音视频格式转换示例​

支持格式​

计费​

常见问题​

See also​