GPT 이미지 2 API 문서
GPT 이미지 2 API 문서
GPT Image 2 APIQuickstartAuthenticationPricing
API referenceGenerate imageGenerate videoGet task status
API reference

Generate video

POST /api/ai/video/generate — submit a text-to-video or image-to-video task across Kling, Sora, Veo, Hailuo, Wan, and Seedance.

Submits a new video generation task. inputMode must match the model's capability, and credits are pre-debited when the task is accepted.

Model families

FamilyModel IDsNotes
Kling 2.1kling/v2-1-standard, kling/v2-1-pro, kling/v2-1-master-text-to-videoStandard and Pro are image-to-video only; Master is text-to-video only.
Sora 2sora-2-text-to-video, sora-2-pro-text-to-video, sora-2-image-to-videoDuration- and resolution-dependent pricing.
Hailuo 02hailuo/02-text-to-video-standard, hailuo/02-text-to-video-pro, hailuo/02-image-to-video-standard, hailuo/02-image-to-video-proStandard/Pro pricing differs mainly by duration tier.
Veo 3.1veo3_fast, veo3, veo3_liteFlat-rate 8-second clips. Supports both text-to-video and image-to-video.
Wan 2.6wan/2-6-text-to-video, wan/2-6-image-to-videoDuration and resolution both affect credits.
Seedancebytedance/seedance-1.5-proSupports both text and image input modes; duration and resolution affect credits.

Input mode rules

  • Models ending in -text-to-video require inputMode: text_to_video.
  • Models ending in -image-to-video require inputMode: image_to_video and at least one inputImageUrls entry.
  • Veo 3.1 (veo3_fast, veo3, veo3_lite) supports both modes.

Field notes

  • duration is model-specific. Common allowed values are 5, 10, 15, or 20; Veo is fixed at 8.
  • inputImageUrls currently feeds the first image into the video provider path. Keep the array focused on the exact frame you want animated.
  • aspectRatio and provider-specific controls vary by model family.
  • For exact per-model credit numbers, see Pricing.

Video tasks typically complete in 30–120 seconds depending on model and duration. The same polling loop used for images applies.

POST
/api/ai/video/generate

Authorization

bearerAuth
AuthorizationBearer <token>

API keys are in closed beta — send your key as Authorization: Bearer <key>. In the meantime, first-party usage from the web dashboard is authenticated via session cookie. See Authentication for details.

In: header

Request Body

application/json

TypeScript Definitions

Use the request body type in TypeScript.

Response Body

application/json

application/json

application/json

application/json

application/json

curl -X POST "https://gptimage2api.org/api/ai/video/generate" \  -H "Content-Type: application/json" \  -d '{    "prompt": "A slow cinematic dolly around a ceramic coffee cup on a kitchen island, morning light, 35mm film",    "model": "sora-2-text-to-video",    "inputMode": "text_to_video",    "aspectRatio": "16:9",    "duration": 5  }'
{
  "taskId": "tsk_01J9XA5M2R9W4QZC4PYJF3N7ND",
  "status": 0,
  "creditsUsed": 12
}
{
  "error": "Invalid request",
  "details": {}
}
{
  "error": "Unauthorized"
}
{
  "error": "Insufficient credits",
  "required": 12
}
{
  "error": "string"
}

Generate image

POST /api/ai/image/generate — submit a text-to-image or image-to-image task across GPT Image 2 and Nano Banana models.

Get task status

GET /api/ai/tasks/{id} — fetch the current snapshot of a submitted task.

목차

Model familiesInput mode rulesField notes
prompt*string
Length1 <= length <= 4000
model*string

Video model ID. See Pricing for the current public list and credit rules.

Value in"kling/v2-1-standard" | "kling/v2-1-pro" | "kling/v2-1-master-text-to-video" | "sora-2-text-to-video" | "sora-2-pro-text-to-video" | "sora-2-image-to-video" | "hailuo/02-text-to-video-standard" | "hailuo/02-text-to-video-pro" | "hailuo/02-image-to-video-standard" | "hailuo/02-image-to-video-pro" | "veo3_fast" | "veo3" | "veo3_lite" | "wan/2-6-text-to-video" | "wan/2-6-image-to-video" | "bytedance/seedance-1.5-pro"
inputMode*string

Must match the selected model's capability.

Value in"text_to_video" | "image_to_video"
negativePrompt?string
Lengthlength <= 2000
width?integer
Range64 <= value <= 4096
height?integer
Range64 <= value <= 4096
aspectRatio?string
duration?number

Duration in seconds. Allowed values are model-specific (e.g., Kling accepts 5 or 10).

Range1 <= value <= 60
fps?integer
Range1 <= value <= 60
seed?string
guidanceScale?number
Range0 <= value <= 100
motionStrength?number
Range0 <= value <= 100
cameraMotion?string
inputImageUrls?array<>

Required for inputMode: image_to_video. Up to 5 reference images; individual models cap lower (Veo 3.1 up to 3, others typically 1).

Itemsitems <= 5
providerInput?

Advanced passthrough for provider-specific fields that override defaults. This object is not schema-validated.