文档转 Markdown / Document to Markdown

把 PDF、图片、Office 文档解析为 Markdown，底层基于 Azure Document Intelligence 的 prebuilt-layout 模型，能识别标题层级、段落、列表、表格与图片。

文件通过 multipart/form-data 上传，转换结果以 Markdown 内联返回。适合 RAG 预处理、知识库导入、文档内容抽取等「转完即用」的场景。

支持的格式

类别	扩展名
PDF	`pdf`
图片	`jpeg` / `jpg`、`png`、`bmp`、`tiff` / `tif`、`heif`
Office	`docx`、`xlsx`、`pptx`

上传不支持的扩展名（如 .txt、.md）会被拒绝。

图片抽取

文档中的图片会被 Azure 抽取出来，以 base64 内联在 figures[] 中返回。每张图片在 markdown 正文里有一个 placeholder 占位符（形如 ![figure 1.2](figure://1.2)），与 figures[] 中的对象 1:1 对应。你可以在渲染时把占位符替换为真实图片（用 data_base64 还原），或直接忽略占位符只保留文字。

鉴权

与平台其他接口一致，使用 Authorization: Bearer $TURING_API_KEY。注意 multipart/form-data 上传时不要手动设置 Content-Type——交给 HTTP 客户端自动生成带 boundary 的请求头。

快速开始

cURL
Python
Node.js

curl $TURING_BASE_URL/documents/azure/document2md \
  -H "Authorization: Bearer $TURING_API_KEY" \
  -F "files=@report.pdf" \
  -F "files=@slides.pptx"

import os
import requests

resp = requests.post(
    "https://live-turing.cn.llm.tcljd.com/api/v1/documents/azure/document2md",
    headers={"Authorization": f"Bearer {os.environ['TURING_API_KEY']}"},
    files=[
        ("files", ("report.pdf", open("report.pdf", "rb"), "application/pdf")),
        (
            "files",
            (
                "slides.pptx",
                open("slides.pptx", "rb"),
                "application/vnd.openxmlformats-officedocument.presentationml.presentation",
            ),
        ),
    ],
)

data = resp.json()["data"]
for result in data["results"]:
    print(result["file_name"], result["status"])
    print(result["markdown"])

import fs from "node:fs";

const form = new FormData();
form.append(
  "files",
  new Blob([fs.readFileSync("report.pdf")], { type: "application/pdf" }),
  "report.pdf",
);

const resp = await fetch(
  "https://live-turing.cn.llm.tcljd.com/api/v1/documents/azure/document2md",
  {
    method: "POST",
    headers: { Authorization: `Bearer ${process.env.TURING_API_KEY}` },
    body: form,
  },
);

const { data } = await resp.json();
for (const result of data.results) {
  console.log(result.file_name, result.status);
}

响应示例

{
  "trace_id": "trace-xxxxx",
  "code": 0,
  "at": "2026-06-08T08:00:00.000Z",
  "message": "Success",
  "data": {
    "results": [
      {
        "file_name": "report.md",
        "status": "succeeded",
        "markdown": "# 季度报告\n\n## 概述\n\n...\n\n![figure 1.2](figure://1.2)\n",
        "pages_processed": 8,
        "figures": [
          {
            "id": "1.2",
            "placeholder": "![figure 1.2](figure://1.2)",
            "content_type": "image/png",
            "data_base64": "iVBORw0KGgoAAAANSUhEUgAA..."
          }
        ]
      }
    ],
    "total_pages_processed": 8
  }
}

支持的格式​

图片抽取​

鉴权​

快速开始​

响应示例​

See also​

支持的格式

图片抽取

鉴权

快速开始

响应示例

See also