图像生成

本文介绍如何通过 /v1/images/generations 与 /v1/images/edits 接口生成与编辑图像，覆盖：

豆包 Seedream（/v1/images/generations，OpenAI 兼容；暂不支持 /v1/images/edits）
OpenAI GPT-image-2 / GPT-image-1（Azure，/v1/images/generations 文生图 + /v1/images/edits 图生图，原生支持透明背景）
Gemini Nano Banana（走 /v1/chat/completions，支持多图编辑与流式）—— 见 Gemini Nano Banana

环境准备

1. 安装最新版本的 OpenAI Python SDK

pip install --upgrade openai

2. 准备 base64 模块

Python 3 标准库自带，无需额外安装。

示例 1：生成图像

from openai import OpenAI
import base64

client = OpenAI(
    api_key="your-api-key",
    base_url="https://live-turing.cn.llm.tcljd.com/api/v1",
)

img = client.images.generate(
    prompt="a futuristic city with flying cars and neon lights",  # 图像描述
    model="doubao-seedream-4-5-251128",  # 使用的模型
    n=1,
    size="1024x1024",
)

image_bytes = base64.b64decode(img.data[0].b64_json)
with open("output.png", "wb") as f:
    f.write(image_bytes)

print("图像已成功生成并保存为 output.png")

示例 2：编辑多张参考图

提供若干参考图 + 文本描述，模型输出一张合成图。

import base64
from openai import OpenAI

client = OpenAI(
    api_key="your-api-key",
    base_url="https://live-turing.cn.llm.tcljd.com/api/v1",
)

prompt = """
Generate a photorealistic image of a gift basket on a white background
labeled 'Relax & Unwind' with a ribbon and handwriting-like font,
containing all the items in the reference pictures.
"""

result = client.images.edit(
    model="turing/gpt-image-2",
    n=1,
    size="1024x1024",
    image=[
        open("body-lotion.png", "rb"),
        open("bath-bomb.png", "rb"),
        open("incense-kit.png", "rb"),
        open("soap.png", "rb"),
    ],
    prompt=prompt,
    timeout=120,  # 复杂图像生成可能较慢，建议延长超时
)

image_bytes = base64.b64decode(result.data[0].b64_json)
with open("gift-basket.png", "wb") as f:
    f.write(image_bytes)

print("图像已成功编辑并保存为 gift-basket.png")

示例 3：编辑单张图

import base64
from openai import OpenAI

client = OpenAI(
    api_key="your-api-key",
    base_url="https://live-turing.cn.llm.tcljd.com/api/v1",
)

result = client.images.edit(
    model="turing/gpt-image-2",
    n=1,
    size="1024x1024",
    image=open("body-lotion.png", "rb"),
    prompt="将瓶子的颜色替换为蓝色",
    timeout=120,
)

image_bytes = base64.b64decode(result.data[0].b64_json)
with open("blue-bottle.png", "wb") as f:
    f.write(image_bytes)

print("图像已成功编辑并保存为 blue-bottle.png")

Seedream 批量生成

Seedream 的 n 参数会自动转换为批量生成参数：

# n=3 时平台自动添加
# "sequential_image_generation": "auto"
# "sequential_image_generation_options": {"max_images": 3}
client.images.generate(
    model="doubao-seedream-4-5-251128",
    prompt="...",
    n=3,
)

"auto"：模型自动决定是否返回多张图及数量
"disabled"：仅生成一张图

Gemini Nano Banana

Gemini 家族的图像生成模型走标准 /chat/completions 端点(而非 /images/generations),通过 modalities: ["text", "image"] + imageConfig 控制输出。

支持模型

默认 RPH(请求速率限制)为 60。可用型号、价格与规格以图片模型列表 → Gemini Nano Banana 为准。

基础示例

curl -N $TURING_BASE_URL/chat/completions \
  -H "Authorization: Bearer $TURING_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "turing/gemini-3-pro-image",
    "stream": true,
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Generate a Banana with saying hello"}
    ],
    "modalities": ["text", "image"],
    "imageConfig": {
      "aspectRatio": "1:1",
      "imageSize": "1K",
      "imageOutputOptions": {"mimeType": "image/jpeg", "compressionQuality": 95}
    }
  }'

三种输入形式

纯文本

{
  "model": "$model",
  "stream": true,
  "messages": [
    {"role": "user", "content": "Generate a Banana with saying hello"}
  ]
}

多轮含图片(图像编辑)

{
  "model": "$model",
  "stream": true,
  "messages": [
    {"role": "user", "content": [{"type": "text", "text": "Please draw me a dog"}]},
    {"role": "assistant", "content": [
      {"type": "text", "text": "Here you go!"},
      {"type": "image_url", "image_url": {"url": "{image_url}", "detail": "low"}}
    ]},
    {"role": "user", "content": [{"type": "text", "text": "Add a hat on the dog"}]}
  ]
}

控制比例与分辨率

{
  "model": "$model",
  "stream": false,
  "messages": [{"role": "user", "content": "Please draw a cute dog"}],
  "imageConfig": {
    "aspectRatio": "16:9",
    "imageSize": "1K",
    "imageOutputOptions": {"mimeType": "image/jpeg", "compressionQuality": 95}
  }
}

支持的比例与分辨率

Gemini 2.5 Flash Image

Aspect ratio	Resolution	Tokens
1:1	1024x1024	1290
2:3	832x1248	1290
3:2	1248x832	1290
3:4	864x1184	1290
4:3	1184x864	1290
4:5	896x1152	1290
5:4	1152x896	1290
9:16	768x1344	1290
16:9	1344x768	1290
21:9	1536x672	1290

Gemini 3 Pro Image / Gemini 3.1 Flash Image

Aspect ratio	1K resolution	1K Tokens	2K resolution	2K Tokens	4K resolution	4K Tokens
1:1	1024x1024	1210	2048x2048	1210	4096x4096	2000
2:3	848x1264	1210	1696x2528	1210	3392x5056	2000
3:2	1264x848	1210	2528x1696	1210	5056x3392	2000
3:4	896x1200	1210	1792x2400	1210	3584x4800	2000
4:3	1200x896	1210	2400x1792	1210	4800x3584	2000
4:5	928x1152	1210	1856x2304	1210	3712x4608	2000
5:4	1152x928	1210	2304x1856	1210	4608x3712	2000
9:16	768x1376	1210	1536x2752	1210	3072x5504	2000
16:9	1376x768	1210	2752x1536	1210	5504x3072	2000
21:9	1584x672	1210	3168x1344	1210	6336x2688	2000

返回格式

流式(gemini-2.5-flash-image):图像作为 content 中的 image_url block 返回。

流式 Beta(gemini-3-pro-image / gemini-3.1-flash-image / gemini-3.1-flash-lite-image):图像在独立 images 字段返回,content 为空数组。

{
  "choices": [{
    "delta": {
      "content": [],
      "images": [{"type": "image_url", "image_url": {"url": "{Base64 string}"}}]
    }
  }]
}

非流式(gemini-2.5-flash-image):

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": [
        {"type": "text", "text": "Here's your dog!"},
        {"type": "image_url", "image_url": {"url": "{generated_image_url}", "detail": "low"}}
      ]
    },
    "finish_reason": "stop"
  }]
}

非流式 Beta(gemini-3-pro-image / gemini-3.1-flash-image / gemini-3.1-flash-lite-image):图像在 choices[*].images 字段。

能力

图像生成:根据提示生成图像
流式输出:支持实时流式响应
图像编辑:多轮对话传入历史图片即可对已有图作增量修改

GPT-image-2

turing/gpt-image-2 是 OpenAI 通过 Azure 提供的最新一代图像模型（公开预览），相比 gpt-image-1：

支持任意分辨率（4K，长边 ≤ 3840 px，宽高比 ≤ 3:1）
重做的 quality 控制（low 针对延迟优化）
原生透明背景

支持两个接口：

POST /v1/images/generations（JSON body）——文生图
POST /v1/images/edits（multipart/form-data）——图生图 / 图像编辑

参数总览

参数	类型	必填	默认	说明
`model`	string	是	-	固定 `turing/gpt-image-2`
`prompt`	string	是	-	文本描述，最多 32000 字符
`n`	int	否	`1`	单次返回图片数，`1`-`10`
`size`	string	否	`"auto"`	`"auto"` 或 `<w>x<h>`：两边均为 16 的倍数；长边 ≤ 3840；宽高比 ≤ 3:1；总像素 655,360 ~ 8,294,400
`quality`	string	否	`"high"`	`"low"` / `"medium"` / `"high"`，`low` 优化延迟
`output_format`	string	否	`"png"`	`"png"` / `"jpeg"`（Azure 暂不支持 `webp`）
`output_compression`	int	否	`100`	`0`-`100`，仅对 `jpeg` 有效
`background`	string	否	`"auto"`	`"transparent"` / `"opaque"` / `"auto"`；`transparent` 必须配合 `output_format="png"`
`moderation`	string	否	`"auto"`	`"auto"` / `"low"`，`low` 内容审核更宽松
`user`	string	否	-	终端用户标识，便于审计

不支持的参数

response_format：GPT-image 系列始终返回 base64（b64_json），不支持 url
style：仅 dall-e-3 支持

文本生成图像

直接 POST /v1/images/generations，请求体为 JSON。

curl $TURING_BASE_URL/images/generations \
  -H "Authorization: Bearer $TURING_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "turing/gpt-image-2",
    "prompt": "a close-up of a bear walking through a misty forest at dawn",
    "n": 1,
    "size": "1536x1024",
    "quality": "high",
    "output_format": "png"
  }' \
  | jq -r '.data[0].b64_json' | base64 -d > bear.png

响应示例：

{
  "created": 1729753028,
  "data": [
    {
      "b64_json": "iVBORw0KGgoAAAANS..."
    }
  ],
  "usage": {
    "input_tokens": 50,
    "input_tokens_details": { "text_tokens": 50, "image_tokens": 0 },
    "output_tokens": 1568,
    "total_tokens": 1618
  }
}

4K 与任意分辨率

gpt-image-2 不再受限于 1024x1024 / 1024x1536 / 1536x1024 三种尺寸，可自定义任意 <w>x<h>，例如 3840x2160（4K 横屏）、2160x3840（4K 竖屏）。只需把上面请求体里的 size 字段替换为目标尺寸：

{
  "model": "turing/gpt-image-2",
  "prompt": "cyberpunk Tokyo street, neon reflections on wet asphalt, 4k cinematic",
  "size": "3840x2160",
  "quality": "high"
}

约束（强制校验）：

width % 16 == 0 and height % 16 == 0
max(width, height) <= 3840
max(w/h, h/w) <= 3
655_360 <= width * height <= 8_294_400

不满足时接口直接返回 4xx，建议在调用方提前校验。

透明背景

需要透明背景时，请求体里同时设置 "background": "transparent" 与 "output_format": "png"：

curl $TURING_BASE_URL/images/generations \
  -H "Authorization: Bearer $TURING_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "turing/gpt-image-2",
    "prompt": "a single red maple leaf, isolated",
    "size": "1024x1024",
    "background": "transparent",
    "output_format": "png"
  }'

图像编辑（图生图）

接口：POST /v1/images/edits，请求体为 multipart/form-data。

单张图输入

curl $TURING_BASE_URL/images/edits \
  -H "Authorization: Bearer $TURING_API_KEY" \
  -F "model=turing/gpt-image-2" \
  -F "prompt=将背景替换为海滩日落场景" \
  -F "image=@input.png" \
  -F "n=1" \
  -F "size=1024x1024" \
  -F "quality=medium" \
  | jq -r '.data[0].b64_json' | base64 -d > output.png

多张参考图输入（字段名改为 image[]）

curl $TURING_BASE_URL/images/edits \
  -H "Authorization: Bearer $TURING_API_KEY" \
  -F "model=turing/gpt-image-2" \
  -F "prompt=Generate a gift basket containing all items in the reference images" \
  -F "image[]=@item1.png" \
  -F "image[]=@item2.png" \
  -F "image[]=@item3.png" \
  -F "n=1" \
  -F "size=1024x1024" \
  -F "quality=medium" \
  | jq -r '.data[0].b64_json' | base64 -d > output.png

带遮罩（局部重绘）

提供 mask（必须为 PNG，透明像素表示待编辑区域），模型只对透明区域进行重绘：

curl $TURING_BASE_URL/images/edits \
  -H "Authorization: Bearer $TURING_API_KEY" \
  -F "model=turing/gpt-image-2" \
  -F "prompt=在透明区域添加一只橘猫" \
  -F "image=@input.png" \
  -F "mask=@mask.png" \
  -F "size=1024x1024" \
  -F "quality=high" \
  | jq -r '.data[0].b64_json' | base64 -d > output.png

文件限制

输入图支持 PNG / JPG / JPEG，mask 只支持 PNG；单文件不超过 50 MB。

图像编辑专有参数

参数	类型	说明
`image`	file	输入图（单张）；多张时改用 `image[]` 重复传入
`mask`	file	可选，PNG 遮罩，透明区域指示待编辑范围
`input_fidelity`	string	`"low"` / `"high"`，控制对输入图的保留程度
`output_format`	string	`"png"` / `"jpeg"`
`background`	string	`"auto"` / `"transparent"`（需同时设置 `output_format=png`）

错误与超时

场景	行为
速率超限	HTTP `429`，建议指数退避重试
Prompt 命中内容审核	HTTP 4xx，`error.code = "contentFilter"`
输出图命中内容审核	HTTP 4xx，`error.message` 提示 Generated image was filtered ...
单次生成耗时	通常 120 秒，复杂 4K + high 可达 180-240 秒，建议设置 `timeout >= 300s`

完整字段定义与 Try-It：/api/create-image。

环境准备​

1. 安装最新版本的 OpenAI Python SDK​

2. 准备 base64 模块​

示例 1：生成图像​

示例 2：编辑多张参考图​

示例 3：编辑单张图​

Seedream 批量生成​

Gemini Nano Banana​

支持模型​

基础示例​

三种输入形式​

支持的比例与分辨率​

Gemini 2.5 Flash Image​

Gemini 3 Pro Image / Gemini 3.1 Flash Image​

返回格式​

能力​

GPT-image-2​

参数总览​

文本生成图像​

4K 与任意分辨率​

透明背景​

图像编辑（图生图）​

错误与超时​

See also​