图像生成
本文介绍如何通过 /v1/images/generations 与 /v1/images/edits 接口生成与编辑图像,覆盖:
- 豆包 Seedream(
/v1/images/generations,OpenAI 兼容;暂不支持/v1/images/edits) - OpenAI GPT-image-2 / GPT-image-1(Azure,
/v1/images/generations文生图 +/v1/images/edits图生图,原生支持透明背景) - Gemini Nano Banana(走
/v1/chat/completions,支持多图编辑与流式)—— 见 Gemini Nano Banana
完整模型清单与价格:图像生成模型列表。
环境准备
1. 安装最新版本的 OpenAI Python SDK
pip install --upgrade openai
2. 准备 base64 模块
Python 3 标准库自带,无需额外安装。
示例 1:生成图像
from openai import OpenAI
import base64
client = OpenAI(
api_key="your-api-key",
base_url="https://live-turing.cn.llm.tcljd.com/api/v1",
)
img = client.images.generate(
prompt="a futuristic city with flying cars and neon lights", # 图像描述
model="doubao-seedream-4-5-251128", # 使用的模型
n=1,
size="1024x1024",
)
image_bytes = base64.b64decode(img.data[0].b64_json)
with open("output.png", "wb") as f:
f.write(image_bytes)
print("图像已成功生成并保存为 output.png")
示例 2:编辑多张参考图
提供若干参考图 + 文本描述,模型输出一张合成图。
import base64
from openai import OpenAI
client = OpenAI(
api_key="your-api-key",
base_url="https://live-turing.cn.llm.tcljd.com/api/v1",
)
prompt = """
Generate a photorealistic image of a gift basket on a white background
labeled 'Relax & Unwind' with a ribbon and handwriting-like font,
containing all the items in the reference pictures.
"""
result = client.images.edit(
model="turing/gpt-image-2",
n=1,
size="1024x1024",
image=[
open("body-lotion.png", "rb"),
open("bath-bomb.png", "rb"),
open("incense-kit.png", "rb"),
open("soap.png", "rb"),
],
prompt=prompt,
timeout=120, # 复杂图像生成可能较慢,建议延长超时
)
image_bytes = base64.b64decode(result.data[0].b64_json)
with open("gift-basket.png", "wb") as f:
f.write(image_bytes)
print("图像已成功编辑并保存为 gift-basket.png")
示例 3:编辑单张图
import base64
from openai import OpenAI
client = OpenAI(
api_key="your-api-key",
base_url="https://live-turing.cn.llm.tcljd.com/api/v1",
)
result = client.images.edit(
model="turing/gpt-image-2",
n=1,
size="1024x1024",
image=open("body-lotion.png", "rb"),
prompt="将瓶子的颜色替换为蓝色",
timeout=120,
)
image_bytes = base64.b64decode(result.data[0].b64_json)
with open("blue-bottle.png", "wb") as f:
f.write(image_bytes)
print("图像已成功编辑并保存为 blue-bottle.png")
Seedream 批量生成
Seedream 的 n 参数会自动转换为批量生成参数:
# n=3 时平台自动添加
# "sequential_image_generation": "auto"
# "sequential_image_generation_options": {"max_images": 3}
client.images.generate(
model="doubao-seedream-4-5-251128",
prompt="...",
n=3,
)
"auto":模型自动决定是否返回多张图及数量"disabled":仅生成一张图
Gemini Nano Banana
Gemini 家族的图像生成模型走标准 /chat/completions 端点(而非 /images/generations),通过 modalities: ["text", "image"] + imageConfig 控制输出。
支持模型
默认 RPH(请求速率限制)为 60。可用型号、价格与规格以 图片模型列表 → Gemini Nano Banana 为准。
基础示例
curl -N $TURING_BASE_URL/chat/completions \
-H "Authorization: Bearer $TURING_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "turing/gemini-3-pro-image",
"stream": true,
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Generate a Banana with saying hello"}
],
"modalities": ["text", "image"],
"imageConfig": {
"aspectRatio": "1:1",
"imageSize": "1K",
"imageOutputOptions": {"mimeType": "image/jpeg", "compressionQuality": 95}
}
}'
三种输入形式
纯文本
{
"model": "$model",
"stream": true,
"messages": [
{"role": "user", "content": "Generate a Banana with saying hello"}
]
}
多轮含图片(图像编辑)
{
"model": "$model",
"stream": true,
"messages": [
{"role": "user", "content": [{"type": "text", "text": "Please draw me a dog"}]},
{"role": "assistant", "content": [
{"type": "text", "text": "Here you go!"},
{"type": "image_url", "image_url": {"url": "{image_url}", "detail": "low"}}
]},
{"role": "user", "content": [{"type": "text", "text": "Add a hat on the dog"}]}
]
}
控制比例与分辨率
{
"model": "$model",
"stream": false,
"messages": [{"role": "user", "content": "Please draw a cute dog"}],
"imageConfig": {
"aspectRatio": "16:9",
"imageSize": "1K",
"imageOutputOptions": {"mimeType": "image/jpeg", "compressionQuality": 95}
}
}
支持的比例与分辨率
Gemini 2.5 Flash Image
| Aspect ratio | Resolution | Tokens |
|---|---|---|
| 1:1 | 1024x1024 | 1290 |
| 2:3 | 832x1248 | 1290 |
| 3:2 | 1248x832 | 1290 |
| 3:4 | 864x1184 | 1290 |
| 4:3 | 1184x864 | 1290 |
| 4:5 | 896x1152 | 1290 |
| 5:4 | 1152x896 | 1290 |
| 9:16 | 768x1344 | 1290 |
| 16:9 | 1344x768 | 1290 |
| 21:9 | 1536x672 | 1290 |
Gemini 3 Pro Image / Gemini 3.1 Flash Image
| Aspect ratio | 1K resolution | 1K Tokens | 2K resolution | 2K Tokens | 4K resolution | 4K Tokens |
|---|---|---|---|---|---|---|
| 1:1 | 1024x1024 | 1210 | 2048x2048 | 1210 | 4096x4096 | 2000 |
| 2:3 | 848x1264 | 1210 | 1696x2528 | 1210 | 3392x5056 | 2000 |
| 3:2 | 1264x848 | 1210 | 2528x1696 | 1210 | 5056x3392 | 2000 |
| 3:4 | 896x1200 | 1210 | 1792x2400 | 1210 | 3584x4800 | 2000 |
| 4:3 | 1200x896 | 1210 | 2400x1792 | 1210 | 4800x3584 | 2000 |
| 4:5 | 928x1152 | 1210 | 1856x2304 | 1210 | 3712x4608 | 2000 |
| 5:4 | 1152x928 | 1210 | 2304x1856 | 1210 | 4608x3712 | 2000 |
| 9:16 | 768x1376 | 1210 | 1536x2752 | 1210 | 3072x5504 | 2000 |
| 16:9 | 1376x768 | 1210 | 2752x1536 | 1210 | 5504x3072 | 2000 |
| 21:9 | 1584x672 | 1210 | 3168x1344 | 1210 | 6336x2688 | 2000 |
返回格式
流式(gemini-2.5-flash-image):图像作为 content 中的 image_url block 返回。
流式 Beta(gemini-3-pro-image / gemini-3.1-flash-image / gemini-3.1-flash-lite-image):图像在独立 images 字段返回,content 为空数组。
{
"choices": [{
"delta": {
"content": [],
"images": [{"type": "image_url", "image_url": {"url": "{Base64 string}"}}]
}
}]
}
非流式(gemini-2.5-flash-image):
{
"choices": [{
"message": {
"role": "assistant",
"content": [
{"type": "text", "text": "Here's your dog!"},
{"type": "image_url", "image_url": {"url": "{generated_image_url}", "detail": "low"}}
]
},
"finish_reason": "stop"
}]
}
非流式 Beta(gemini-3-pro-image / gemini-3.1-flash-image / gemini-3.1-flash-lite-image):图像在 choices[*].images 字段。
能力
- 图像生成:根据提示生成图像
- 流式输出:支持实时流式响应
- 图像编辑:多轮对话传入历史图片即可对已有图作增量修改
GPT-image-2
turing/gpt-image-2 是 OpenAI 通过 Azure 提供的最新一代图像模型(公开预览),相比 gpt-image-1:
- 支持任意分辨率(4K,长边 ≤ 3840 px,宽高比 ≤ 3:1)
- 重做的 quality 控制(
low针对延迟优化) - 原生透明背景
支持两个接口:
POST /v1/images/generations(JSON body)——文生图POST /v1/images/edits(multipart/form-data)——图生图 / 图像编辑
参数总览
| 参数 | 类型 | 必填 | 默认 | 说明 |
|---|---|---|---|---|
model | string | 是 | - | 固定 turing/gpt-image-2 |
prompt | string | 是 | - | 文本描述,最多 32000 字符 |
n | int | 否 | 1 | 单次返回图片数,1-10 |
size | string | 否 | "auto" | "auto" 或 <w>x<h>:两边均为 16 的倍数;长边 ≤ 3840;宽高比 ≤ 3:1;总像素 655,360 ~ 8,294,400 |
quality | string | 否 | "high" | "low" / "medium" / "high",low 优化延迟 |
output_format | string | 否 | "png" | "png" / "jpeg"(Azure 暂不支持 webp) |
output_compression | int | 否 | 100 | 0-100,仅对 jpeg 有效 |
background | string | 否 | "auto" | "transparent" / "opaque" / "auto";transparent 必须配合 output_format="png" |
moderation | string | 否 | "auto" | "auto" / "low",low 内容审核更宽松 |
user | string | 否 | - | 终端用户标识,便于审计 |
response_format:GPT-image 系列始终返回 base64(b64_json),不支持urlstyle:仅dall-e-3支持
文本生成图像
直接 POST /v1/images/generations,请求体为 JSON。
curl $TURING_BASE_URL/images/generations \
-H "Authorization: Bearer $TURING_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "turing/gpt-image-2",
"prompt": "a close-up of a bear walking through a misty forest at dawn",
"n": 1,
"size": "1536x1024",
"quality": "high",
"output_format": "png"
}' \
| jq -r '.data[0].b64_json' | base64 -d > bear.png
响应示例:
{
"created": 1729753028,
"data": [
{
"b64_json": "iVBORw0KGgoAAAANS..."
}
],
"usage": {
"input_tokens": 50,
"input_tokens_details": { "text_tokens": 50, "image_tokens": 0 },
"output_tokens": 1568,
"total_tokens": 1618
}
}
4K 与任意分辨率
gpt-image-2 不再受限于 1024x1024 / 1024x1536 / 1536x1024 三种尺寸,可自定义任意 <w>x<h>,例如 3840x2160(4K 横屏)、2160x3840(4K 竖屏)。只需把上面请求体里的 size 字段替换为目标尺寸:
{
"model": "turing/gpt-image-2",
"prompt": "cyberpunk Tokyo street, neon reflections on wet asphalt, 4k cinematic",
"size": "3840x2160",
"quality": "high"
}
约束(强制校验):
width % 16 == 0 and height % 16 == 0max(width, height) <= 3840max(w/h, h/w) <= 3655_360 <= width * height <= 8_294_400
不满足时接口直接返回 4xx,建议在调用方提前校验。
透明背景
需要透明背景时,请求体里同时设置 "background": "transparent" 与 "output_format": "png":
curl $TURING_BASE_URL/images/generations \
-H "Authorization: Bearer $TURING_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "turing/gpt-image-2",
"prompt": "a single red maple leaf, isolated",
"size": "1024x1024",
"background": "transparent",
"output_format": "png"
}'
图像编辑(图生图)
接口:POST /v1/images/edits,请求体为 multipart/form-data。
单张图输入
curl $TURING_BASE_URL/images/edits \
-H "Authorization: Bearer $TURING_API_KEY" \
-F "model=turing/gpt-image-2" \
-F "prompt=将背景替换为海滩日落场景" \
-F "image=@input.png" \
-F "n=1" \
-F "size=1024x1024" \
-F "quality=medium" \
| jq -r '.data[0].b64_json' | base64 -d > output.png
多张参考图输入(字段名改为 image[])
curl $TURING_BASE_URL/images/edits \
-H "Authorization: Bearer $TURING_API_KEY" \
-F "model=turing/gpt-image-2" \
-F "prompt=Generate a gift basket containing all items in the reference images" \
-F "image[]=@item1.png" \
-F "image[]=@item2.png" \
-F "image[]=@item3.png" \
-F "n=1" \
-F "size=1024x1024" \
-F "quality=medium" \
| jq -r '.data[0].b64_json' | base64 -d > output.png
带遮罩(局部重绘)
提供 mask(必须为 PNG,透明像素表示待编辑区域),模型只对透明区域进行重绘:
curl $TURING_BASE_URL/images/edits \
-H "Authorization: Bearer $TURING_API_KEY" \
-F "model=turing/gpt-image-2" \
-F "prompt=在透明区域添加一只橘猫" \
-F "image=@input.png" \
-F "mask=@mask.png" \
-F "size=1024x1024" \
-F "quality=high" \
| jq -r '.data[0].b64_json' | base64 -d > output.png
输入图支持 PNG / JPG / JPEG,mask 只支持 PNG;单文件不超过 50 MB。
图像编辑专有参数
| 参数 | 类型 | 说明 |
|---|---|---|
image | file | 输入图(单张);多张时改用 image[] 重复传入 |
mask | file | 可选,PNG 遮罩,透明区域指示待编辑范围 |
input_fidelity | string | "low" / "high",控制对输入图的保留程度 |
output_format | string | "png" / "jpeg" |
background | string | "auto" / "transparent"(需同时设置 output_format=png) |
错误与超时
| 场景 | 行为 |
|---|---|
| 速率超限 | HTTP 429,建议指数退避重试 |
| Prompt 命中内容审核 | HTTP 4xx,error.code = "contentFilter" |
| 输出图命中内容审核 | HTTP 4xx,error.message 提示 Generated image was filtered ... |
| 单次生成耗时 | 通常 120 秒,复杂 4K + high 可达 180-240 秒,建议设置 timeout >= 300s |
完整字段定义与 Try-It:/api/create-image。
See also
/api/create-image— 完整 schema、所有支持参数、Try-It- 图像生成模型列表 — 各模型单价、分辨率、下线时间
- Gemini Nano Banana — Gemini 图像生成(走
/v1/chat/completions,支持多图编辑 + 流式) - 视频生成 — 视频生成接口