跳到主要内容

图像生成

本文介绍如何通过 /v1/images/generations/v1/images/edits 接口生成与编辑图像,覆盖:

  • 豆包 Seedream/v1/images/generations,OpenAI 兼容;暂不支持 /v1/images/edits
  • OpenAI GPT-image-2 / GPT-image-1(Azure,/v1/images/generations 文生图 + /v1/images/edits 图生图,原生支持透明背景)
  • Gemini Nano Banana(走 /v1/chat/completions,支持多图编辑与流式)—— 见 Gemini Nano Banana

完整模型清单与价格:图像生成模型列表


环境准备

1. 安装最新版本的 OpenAI Python SDK

pip install --upgrade openai

2. 准备 base64 模块

Python 3 标准库自带,无需额外安装。


示例 1:生成图像

from openai import OpenAI
import base64

client = OpenAI(
api_key="your-api-key",
base_url="https://live-turing.cn.llm.tcljd.com/api/v1",
)

img = client.images.generate(
prompt="a futuristic city with flying cars and neon lights", # 图像描述
model="doubao-seedream-4-5-251128", # 使用的模型
n=1,
size="1024x1024",
)

image_bytes = base64.b64decode(img.data[0].b64_json)
with open("output.png", "wb") as f:
f.write(image_bytes)

print("图像已成功生成并保存为 output.png")

示例 2:编辑多张参考图

提供若干参考图 + 文本描述,模型输出一张合成图。

import base64
from openai import OpenAI

client = OpenAI(
api_key="your-api-key",
base_url="https://live-turing.cn.llm.tcljd.com/api/v1",
)

prompt = """
Generate a photorealistic image of a gift basket on a white background
labeled 'Relax & Unwind' with a ribbon and handwriting-like font,
containing all the items in the reference pictures.
"""

result = client.images.edit(
model="turing/gpt-image-2",
n=1,
size="1024x1024",
image=[
open("body-lotion.png", "rb"),
open("bath-bomb.png", "rb"),
open("incense-kit.png", "rb"),
open("soap.png", "rb"),
],
prompt=prompt,
timeout=120, # 复杂图像生成可能较慢,建议延长超时
)

image_bytes = base64.b64decode(result.data[0].b64_json)
with open("gift-basket.png", "wb") as f:
f.write(image_bytes)

print("图像已成功编辑并保存为 gift-basket.png")

示例 3:编辑单张图

import base64
from openai import OpenAI

client = OpenAI(
api_key="your-api-key",
base_url="https://live-turing.cn.llm.tcljd.com/api/v1",
)

result = client.images.edit(
model="turing/gpt-image-2",
n=1,
size="1024x1024",
image=open("body-lotion.png", "rb"),
prompt="将瓶子的颜色替换为蓝色",
timeout=120,
)

image_bytes = base64.b64decode(result.data[0].b64_json)
with open("blue-bottle.png", "wb") as f:
f.write(image_bytes)

print("图像已成功编辑并保存为 blue-bottle.png")

Seedream 批量生成

Seedream 的 n 参数会自动转换为批量生成参数:

# n=3 时平台自动添加
# "sequential_image_generation": "auto"
# "sequential_image_generation_options": {"max_images": 3}
client.images.generate(
model="doubao-seedream-4-5-251128",
prompt="...",
n=3,
)
  • "auto":模型自动决定是否返回多张图及数量
  • "disabled":仅生成一张图

Gemini Nano Banana

Gemini 家族的图像生成模型走标准 /chat/completions 端点(而非 /images/generations),通过 modalities: ["text", "image"] + imageConfig 控制输出。

支持模型

默认 RPH(请求速率限制)为 60。可用型号、价格与规格以 图片模型列表 → Gemini Nano Banana 为准。

基础示例

curl -N $TURING_BASE_URL/chat/completions \
-H "Authorization: Bearer $TURING_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "turing/gemini-3-pro-image",
"stream": true,
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Generate a Banana with saying hello"}
],
"modalities": ["text", "image"],
"imageConfig": {
"aspectRatio": "1:1",
"imageSize": "1K",
"imageOutputOptions": {"mimeType": "image/jpeg", "compressionQuality": 95}
}
}'

三种输入形式

纯文本

{
"model": "$model",
"stream": true,
"messages": [
{"role": "user", "content": "Generate a Banana with saying hello"}
]
}

多轮含图片(图像编辑)

{
"model": "$model",
"stream": true,
"messages": [
{"role": "user", "content": [{"type": "text", "text": "Please draw me a dog"}]},
{"role": "assistant", "content": [
{"type": "text", "text": "Here you go!"},
{"type": "image_url", "image_url": {"url": "{image_url}", "detail": "low"}}
]},
{"role": "user", "content": [{"type": "text", "text": "Add a hat on the dog"}]}
]
}

控制比例与分辨率

{
"model": "$model",
"stream": false,
"messages": [{"role": "user", "content": "Please draw a cute dog"}],
"imageConfig": {
"aspectRatio": "16:9",
"imageSize": "1K",
"imageOutputOptions": {"mimeType": "image/jpeg", "compressionQuality": 95}
}
}

支持的比例与分辨率

Gemini 2.5 Flash Image

Aspect ratioResolutionTokens
1:11024x10241290
2:3832x12481290
3:21248x8321290
3:4864x11841290
4:31184x8641290
4:5896x11521290
5:41152x8961290
9:16768x13441290
16:91344x7681290
21:91536x6721290

Gemini 3 Pro Image / Gemini 3.1 Flash Image

Aspect ratio1K resolution1K Tokens2K resolution2K Tokens4K resolution4K Tokens
1:11024x102412102048x204812104096x40962000
2:3848x126412101696x252812103392x50562000
3:21264x84812102528x169612105056x33922000
3:4896x120012101792x240012103584x48002000
4:31200x89612102400x179212104800x35842000
4:5928x115212101856x230412103712x46082000
5:41152x92812102304x185612104608x37122000
9:16768x137612101536x275212103072x55042000
16:91376x76812102752x153612105504x30722000
21:91584x67212103168x134412106336x26882000

返回格式

流式(gemini-2.5-flash-image):图像作为 content 中的 image_url block 返回。

流式 Beta(gemini-3-pro-image / gemini-3.1-flash-image / gemini-3.1-flash-lite-image):图像在独立 images 字段返回,content 为空数组。

{
"choices": [{
"delta": {
"content": [],
"images": [{"type": "image_url", "image_url": {"url": "{Base64 string}"}}]
}
}]
}

非流式(gemini-2.5-flash-image):

{
"choices": [{
"message": {
"role": "assistant",
"content": [
{"type": "text", "text": "Here's your dog!"},
{"type": "image_url", "image_url": {"url": "{generated_image_url}", "detail": "low"}}
]
},
"finish_reason": "stop"
}]
}

非流式 Beta(gemini-3-pro-image / gemini-3.1-flash-image / gemini-3.1-flash-lite-image):图像在 choices[*].images 字段。

能力

  • 图像生成:根据提示生成图像
  • 流式输出:支持实时流式响应
  • 图像编辑:多轮对话传入历史图片即可对已有图作增量修改

GPT-image-2

turing/gpt-image-2 是 OpenAI 通过 Azure 提供的最新一代图像模型(公开预览),相比 gpt-image-1

  • 支持任意分辨率(4K,长边 ≤ 3840 px,宽高比 ≤ 3:1)
  • 重做的 quality 控制(low 针对延迟优化)
  • 原生透明背景

支持两个接口:

  • POST /v1/images/generations(JSON body)——文生图
  • POST /v1/images/edits(multipart/form-data)——图生图 / 图像编辑

参数总览

参数类型必填默认说明
modelstring-固定 turing/gpt-image-2
promptstring-文本描述,最多 32000 字符
nint1单次返回图片数,1-10
sizestring"auto""auto"<w>x<h>:两边均为 16 的倍数;长边 ≤ 3840;宽高比 ≤ 3:1;总像素 655,360 ~ 8,294,400
qualitystring"high""low" / "medium" / "high"low 优化延迟
output_formatstring"png""png" / "jpeg"(Azure 暂不支持 webp
output_compressionint1000-100,仅对 jpeg 有效
backgroundstring"auto""transparent" / "opaque" / "auto"transparent 必须配合 output_format="png"
moderationstring"auto""auto" / "low"low 内容审核更宽松
userstring-终端用户标识,便于审计
不支持的参数
  • response_format:GPT-image 系列始终返回 base64b64_json),不支持 url
  • style:仅 dall-e-3 支持

文本生成图像

直接 POST /v1/images/generations,请求体为 JSON。

curl $TURING_BASE_URL/images/generations \
-H "Authorization: Bearer $TURING_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "turing/gpt-image-2",
"prompt": "a close-up of a bear walking through a misty forest at dawn",
"n": 1,
"size": "1536x1024",
"quality": "high",
"output_format": "png"
}' \
| jq -r '.data[0].b64_json' | base64 -d > bear.png

响应示例

{
"created": 1729753028,
"data": [
{
"b64_json": "iVBORw0KGgoAAAANS..."
}
],
"usage": {
"input_tokens": 50,
"input_tokens_details": { "text_tokens": 50, "image_tokens": 0 },
"output_tokens": 1568,
"total_tokens": 1618
}
}

4K 与任意分辨率

gpt-image-2 不再受限于 1024x1024 / 1024x1536 / 1536x1024 三种尺寸,可自定义任意 <w>x<h>,例如 3840x2160(4K 横屏)、2160x3840(4K 竖屏)。只需把上面请求体里的 size 字段替换为目标尺寸:

{
"model": "turing/gpt-image-2",
"prompt": "cyberpunk Tokyo street, neon reflections on wet asphalt, 4k cinematic",
"size": "3840x2160",
"quality": "high"
}

约束(强制校验):

  • width % 16 == 0 and height % 16 == 0
  • max(width, height) <= 3840
  • max(w/h, h/w) <= 3
  • 655_360 <= width * height <= 8_294_400

不满足时接口直接返回 4xx,建议在调用方提前校验。

透明背景

需要透明背景时,请求体里同时设置 "background": "transparent""output_format": "png"

curl $TURING_BASE_URL/images/generations \
-H "Authorization: Bearer $TURING_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "turing/gpt-image-2",
"prompt": "a single red maple leaf, isolated",
"size": "1024x1024",
"background": "transparent",
"output_format": "png"
}'

图像编辑(图生图)

接口:POST /v1/images/edits,请求体为 multipart/form-data

单张图输入

curl $TURING_BASE_URL/images/edits \
-H "Authorization: Bearer $TURING_API_KEY" \
-F "model=turing/gpt-image-2" \
-F "prompt=将背景替换为海滩日落场景" \
-F "image=@input.png" \
-F "n=1" \
-F "size=1024x1024" \
-F "quality=medium" \
| jq -r '.data[0].b64_json' | base64 -d > output.png

多张参考图输入(字段名改为 image[]

curl $TURING_BASE_URL/images/edits \
-H "Authorization: Bearer $TURING_API_KEY" \
-F "model=turing/gpt-image-2" \
-F "prompt=Generate a gift basket containing all items in the reference images" \
-F "image[]=@item1.png" \
-F "image[]=@item2.png" \
-F "image[]=@item3.png" \
-F "n=1" \
-F "size=1024x1024" \
-F "quality=medium" \
| jq -r '.data[0].b64_json' | base64 -d > output.png

带遮罩(局部重绘)

提供 mask(必须为 PNG,透明像素表示待编辑区域),模型只对透明区域进行重绘:

curl $TURING_BASE_URL/images/edits \
-H "Authorization: Bearer $TURING_API_KEY" \
-F "model=turing/gpt-image-2" \
-F "prompt=在透明区域添加一只橘猫" \
-F "image=@input.png" \
-F "mask=@mask.png" \
-F "size=1024x1024" \
-F "quality=high" \
| jq -r '.data[0].b64_json' | base64 -d > output.png
文件限制

输入图支持 PNG / JPG / JPEG,mask 只支持 PNG;单文件不超过 50 MB。

图像编辑专有参数

参数类型说明
imagefile输入图(单张);多张时改用 image[] 重复传入
maskfile可选,PNG 遮罩,透明区域指示待编辑范围
input_fidelitystring"low" / "high",控制对输入图的保留程度
output_formatstring"png" / "jpeg"
backgroundstring"auto" / "transparent"(需同时设置 output_format=png

错误与超时

场景行为
速率超限HTTP 429,建议指数退避重试
Prompt 命中内容审核HTTP 4xx,error.code = "contentFilter"
输出图命中内容审核HTTP 4xx,error.message 提示 Generated image was filtered ...
单次生成耗时通常 120 秒,复杂 4K + high 可达 180-240 秒,建议设置 timeout >= 300s

完整字段定义与 Try-It:/api/create-image


See also