Generate video

POST /api/ai/video/generate — submit a text-to-video or image-to-video task across Kling, Sora, Veo, Hailuo, Wan, and Seedance.

Submits a new video generation task. inputMode must match the model's capability, and credits are pre-debited when the task is accepted.

Model families

Family	Model IDs	Notes
Kling 2.1	`kling/v2-1-standard`, `kling/v2-1-pro`, `kling/v2-1-master-text-to-video`	Standard and Pro are image-to-video only; Master is text-to-video only.
Sora 2	`sora-2-text-to-video`, `sora-2-pro-text-to-video`, `sora-2-image-to-video`	Duration- and resolution-dependent pricing.
Hailuo 02	`hailuo/02-text-to-video-standard`, `hailuo/02-text-to-video-pro`, `hailuo/02-image-to-video-standard`, `hailuo/02-image-to-video-pro`	Standard/Pro pricing differs mainly by duration tier.
Veo 3.1	`veo3_fast`, `veo3`, `veo3_lite`	Flat-rate 8-second clips. Supports both text-to-video and image-to-video.
Wan 2.6	`wan/2-6-text-to-video`, `wan/2-6-image-to-video`	Duration and resolution both affect credits.
Seedance	`bytedance/seedance-1.5-pro`	Supports both text and image input modes; duration and resolution affect credits.

Input mode rules

Models ending in -text-to-video require inputMode: text_to_video.
Models ending in -image-to-video require inputMode: image_to_video and at least one inputImageUrls entry.
Veo 3.1 (veo3_fast, veo3, veo3_lite) supports both modes.

Field notes

duration is model-specific. Common allowed values are 5, 10, 15, or 20; Veo is fixed at 8.
inputImageUrls currently feeds the first image into the video provider path. Keep the array focused on the exact frame you want animated.
aspectRatio and provider-specific controls vary by model family.
For exact per-model credit numbers, see Pricing.

Video tasks typically complete in 30–120 seconds depending on model and duration. The same polling loop used for images applies.

API keys are in closed beta — send your key as Authorization: Bearer <key>. In the meantime, first-party usage from the web dashboard is authenticated via session cookie. See Authentication for details.

In: header

Request Body

application/json

TypeScript Definitions

Use the request body type in TypeScript.

Response Body

`application/json`

curl -X POST "https://gptimage2api.org/api/ai/video/generate" \  -H "Content-Type: application/json" \  -d '{    "prompt": "A slow cinematic dolly around a ceramic coffee cup on a kitchen island, morning light, 35mm film",    "model": "sora-2-text-to-video",    "inputMode": "text_to_video",    "aspectRatio": "16:9",    "duration": 5  }'

{
  "taskId": "tsk_01J9XA5M2R9W4QZC4PYJF3N7ND",
  "status": 0,
  "creditsUsed": 12
}

{
  "error": "Invalid request",
  "details": {}
}

{
  "error": "Unauthorized"
}

{
  "error": "Insufficient credits",
  "required": 12
}

{
  "error": "string"
}

Submits a new video generation task. inputMode must match the model's capability, and credits are pre-debited when the task is accepted.

Model families

Family	Model IDs	Notes
Kling 2.1	`kling/v2-1-standard`, `kling/v2-1-pro`, `kling/v2-1-master-text-to-video`	Standard and Pro are image-to-video only; Master is text-to-video only.
Sora 2	`sora-2-text-to-video`, `sora-2-pro-text-to-video`, `sora-2-image-to-video`	Duration- and resolution-dependent pricing.
Hailuo 02	`hailuo/02-text-to-video-standard`, `hailuo/02-text-to-video-pro`, `hailuo/02-image-to-video-standard`, `hailuo/02-image-to-video-pro`	Standard/Pro pricing differs mainly by duration tier.
Veo 3.1	`veo3_fast`, `veo3`, `veo3_lite`	Flat-rate 8-second clips. Supports both text-to-video and image-to-video.
Wan 2.6	`wan/2-6-text-to-video`, `wan/2-6-image-to-video`	Duration and resolution both affect credits.
Seedance	`bytedance/seedance-1.5-pro`	Supports both text and image input modes; duration and resolution affect credits.

Input mode rules

Models ending in -text-to-video require inputMode: text_to_video.
Models ending in -image-to-video require inputMode: image_to_video and at least one inputImageUrls entry.
Veo 3.1 (veo3_fast, veo3, veo3_lite) supports both modes.

Field notes

duration is model-specific. Common allowed values are 5, 10, 15, or 20; Veo is fixed at 8.
inputImageUrls currently feeds the first image into the video provider path. Keep the array focused on the exact frame you want animated.
aspectRatio and provider-specific controls vary by model family.
For exact per-model credit numbers, see Pricing.

Video tasks typically complete in 30–120 seconds depending on model and duration. The same polling loop used for images applies.

Authorization

bearerAuth

AuthorizationBearer <token>

In: header

Request Body

application/json

TypeScript Definitions

Use the request body type in TypeScript.

prompt*string

Length1 <= length <= 4000

model*string

Video model ID. See Pricing for the current public list and credit rules.

Value in

"kling/v2-1-standard" | "kling/v2-1-pro" | "kling/v2-1-master-text-to-video" | "sora-2-text-to-video" | "sora-2-pro-text-to-video" | "sora-2-image-to-video" | "hailuo/02-text-to-video-standard" | "hailuo/02-text-to-video-pro" | "hailuo/02-image-to-video-standard" | "hailuo/02-image-to-video-pro" | "veo3_fast" | "veo3" | "veo3_lite" | "wan/2-6-text-to-video" | "wan/2-6-image-to-video" | "bytedance/seedance-1.5-pro"

inputMode*string

Must match the selected model's capability.

Value in"text_to_video" | "image_to_video"

negativePrompt?string

Lengthlength <= 2000

width?integer

Range64 <= value <= 4096

height?integer

Range64 <= value <= 4096

aspectRatio?string

duration?number

Duration in seconds. Allowed values are model-specific (e.g., Kling accepts 5 or 10).

Range1 <= value <= 60

fps?integer

Range1 <= value <= 60

seed?string

guidanceScale?number

Range0 <= value <= 100

motionStrength?number

Range0 <= value <= 100

cameraMotion?string

inputImageUrls?array<>

Required for inputMode: image_to_video. Up to 5 reference images; individual models cap lower (Veo 3.1 up to 3, others typically 1).

Itemsitems <= 5

providerInput?

Advanced passthrough for provider-specific fields that override defaults. This object is not schema-validated.

Response Body

`application/json`

curl -X POST "https://gptimage2api.org/api/ai/video/generate" \  -H "Content-Type: application/json" \  -d '{    "prompt": "A slow cinematic dolly around a ceramic coffee cup on a kitchen island, morning light, 35mm film",    "model": "sora-2-text-to-video",    "inputMode": "text_to_video",    "aspectRatio": "16:9",    "duration": 5  }'

{
  "taskId": "tsk_01J9XA5M2R9W4QZC4PYJF3N7ND",
  "status": 0,
  "creditsUsed": 12
}

{
  "error": "Invalid request",
  "details": {}
}

{
  "error": "Unauthorized"
}

{
  "error": "Insufficient credits",
  "required": 12
}

{
  "error": "string"
}

Model families

Input mode rules

Field notes

Authorization

Request Body

Response Body

`application/json`

`application/json`

`application/json`

`application/json`

`application/json`

목차

Generate video

Model families

Input mode rules

Field notes

Authorization

Request Body

Response Body

`application/json`

`application/json`

`application/json`

`application/json`

`application/json`

목차

Generate video

200application/json

400application/json

401application/json

402application/json

500application/json

목차

Generate video

200application/json

400application/json

401application/json

402application/json

500application/json

목차

`application/json`

`application/json`

`application/json`

`application/json`

`application/json`

`application/json`

`application/json`

`application/json`

`application/json`

`application/json`