Seedance Video - OrcaRouter

OrcaRouter speaks the Seedance video models on the same submit-and-poll endpoint as Kling. You send model: byteplus/dreamina-seedance-2-0-260128, OrcaRouter routes the request to the upstream Ark /contents/generations/tasks API, and you poll the same task ID back through OrcaRouter once it’s done (typically 30 seconds to 4 minutes depending on duration / resolution / generate_audio).

Currently available model. Only Seedance 2.0 is provisioned right now, under the backend name byteplus/dreamina-seedance-2-0-260128. The capability table below lists the rest of the Seedance family for reference, but they are not yet selectable in the playground or routable through OrcaRouter — use byteplus/dreamina-seedance-2-0-260128 for every request for now.

The submit endpoint POST /v1/video/generations and the fetch endpoint GET /v1/video/generations/{task_id} are shared with Kling. What changes is the request body: Kling uses prompt + image + metadata.{mode, aspect_ratio, image_list, ...}, Seedance uses prompt + metadata.{content[], ratio, duration, generate_audio, watermark, ...}. The prefix on model selects which schema is honored.

Models

Model	T2V	I2V (first)	I2V (first+last)	Multimodal ref¹	Video edit²	Generate audio³	Duration	Available
`byteplus/dreamina-seedance-2-0-260128` (2.0)	✓	✓	✓	✓ full	✓	✓	4 – 15 s	✓
`byteplus/seedance-2.0-fast`	✓	✓	✓	✓ full	✓	✓	4 – 15 s	planned
`byteplus/seedance-1-5-pro`	✓	✓	✓	image only		✓	4 – 12 s	planned
`byteplus/seedance-1-0-pro`	✓	✓	✓	image only			2 – 12 s	planned
`byteplus/seedance-1-0-pro-fast`	✓	✓		image only			2 – 12 s	planned
`byteplus/seedance-1-0-lite-i2v`		✓	✓	image only			2 – 12 s	planned
`byteplus/seedance-1-0-lite-t2v`	✓			image only			2 – 12 s	planned

¹ Multimodal reference = the metadata.content[] array can carry image_url / video_url / audio_url items with role markers (reference_image / reference_video / reference_audio). “Full” means combinations of image + video + audio are accepted. ² Video editing = pass a video_url content item to apply prompt-driven edits to the source video (subject swap, region inpainting, etc.). ³ Native audio = upstream auto-generates a soundtrack matching the video. Toggle via metadata.generate_audio: true. The submit endpoint is the same for every model — POST /v1/video/generations. What changes is which metadata fields the upstream honors per the table above. See the upstream Seedance capability matrix for the authoritative per-model feature list.

Submit a task

Send a POST to /v1/video/generations with model, prompt, and any upstream-specific parameters under metadata:

curl https://api.orcarouter.ai/v1/video/generations \
  -H "Authorization: Bearer sk-orca-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "byteplus/dreamina-seedance-2-0-260128",
    "prompt": "A girl holding a fox, the girl opens her eyes, looks gently at the camera, the fox hugs affectionately, the camera slowly pulls out, the girl'\''s hair is blown by the wind",
    "metadata": {
      "content": [
        { "type": "image_url", "image_url": { "url": "https://example.com/foxgirl.png" } }
      ],
      "ratio": "16:9",
      "duration": 5,
      "generate_audio": true,
      "watermark": false
    }
  }'

Response carries the task ID (same envelope as Kling — OrcaRouter normalizes across providers):

{
  "id": "task_9q9oz6tjtgABYWC1QIqoz3sscgVz7ycw",
  "task_id": "task_9q9oz6tjtgABYWC1QIqoz3sscgVz7ycw",
  "object": "video",
  "model": "byteplus/dreamina-seedance-2-0-260128",
  "status": "queued",
  "progress": 0,
  "created_at": 1777975188
}

OrcaRouter wraps your prompt as the text item inside Seedance’s content[] array automatically — you don’t need to pass a {type: "text"} item yourself. Any text item you supply in metadata.content[] is replaced by your top-level prompt. Other content items (image_url, video_url, audio_url) pass through unchanged.

Body fields

These fields go inside metadata. Arrange them per the variant tables below.

Field	Type	Notes
`content`	array	Multimodal reference items. Each item: `{type, image_url? \| video_url? \| audio_url?, role?}`. Skip if pure text-to-video.
`ratio`	string	Aspect ratio. `16:9` / `9:16` / `1:1` / `4:3` / `3:4` / `21:9` / `adaptive`. `adaptive` infers from input.
`duration`	integer	Seconds. Allowed range depends on model — see the table above.
`resolution`	string	`480p` / `720p` / `1080p`. Default `720p`. `1080p` only on `seedance-2.0` / `seedance-2.0-fast` / `seedance-1-5-pro` / `seedance-1-0-pro` / `seedance-1-0-pro-fast`.
`generate_audio`	boolean	Auto-generate a synced soundtrack. Default `false`. Only on `seedance-2.0` / `2.0-fast` / `1-5-pro`.
`watermark`	boolean	Imprint the upstream watermark. Default upstream-defined.
`seed`	integer	Random seed for reproducibility.
`service_tier`	string	`default` (online) or `flex` (offline / lower priority, higher quota). Defaults to `default`.
`return_last_frame`	boolean	Return the final frame as an image alongside the MP4. Default `false`.
`callback_url`	string	Webhook URL — receives status changes instead of (or alongside) polling.

content[] item shape

Each item in metadata.content is one of four shapes:

{ "type": "image_url", "image_url": { "url": "https://..." }, "role": "first_frame" }
{ "type": "video_url", "video_url": { "url": "https://..." }, "role": "reference_video" }
{ "type": "audio_url", "audio_url": { "url": "https://..." }, "role": "reference_audio" }
{ "type": "text",      "text": "..." }    // automatically replaced by top-level prompt

role values:

`role`	Purpose
`first_frame`	Anchor this image as the first frame of the generated video.
`end_frame`	Anchor this image as the last frame (use with first_frame for first+last frame i2v).
`reference_image`	Style / subject reference (Multimodal reference variant; can pass multiple).
`reference_video`	Style / motion reference, or the source video for editing / extension.
`reference_audio`	Background music or voice reference (audio-video generation).

Reference items inside the prompt with [Image 1], [Video 1], [Audio 1] syntax. The index matches the array order (1-based, scoped per type).

Poll for results

Use the task ID returned at submit time:

curl https://api.orcarouter.ai/v1/video/generations/task_9q9oz6tjtgABYWC1QIqoz3sscgVz7ycw \
  -H "Authorization: Bearer sk-orca-..."

Response shape is wrapped (identical to Kling):

{
  "code": "success",
  "message": "",
  "data": {
    "task_id": "task_9q9oz6tjtgABYWC1QIqoz3sscgVz7ycw",
    "status": "SUCCESS",
    "progress": "100%",
    "result_url": "https://ark-content-generation-ap-southeast-1.tos-ap-southeast-1.volces.com/.../video.mp4",
    "submit_time": 1777975188,
    "start_time": 1777975241,
    "finish_time": 1777975277,
    "fail_reason": ""
  }
}

Status values are normalized to uppercase across providers:

Status	Upstream Seedance status	Meaning
`NOT_START`	(transient)	Task row created, not yet dispatched
`SUBMITTED`	`queued`	Sent to upstream, waiting in the queue
`IN_PROGRESS`	`running`	Upstream is rendering
`SUCCESS`	`succeeded`	Done. `data.result_url` carries the MP4
`FAILURE`	`failed`	Failed. `data.fail_reason` has the reason

Progress is a percent string ("50%", "100%"), not an integer. Poll every 5 - 10 seconds. A 5-second 720p clip typically completes in 30 - 60 seconds; 1080p with audio or 15-second / multimodal-reference clips can take 3 - 5 minutes. The result_url is an upstream-signed TOS URL with a short TTL — download or rehost promptly if you need long retention.

Endpoint variants

All variants share POST /v1/video/generations. Which Seedance feature path the upstream serves is determined by the metadata.content[] items and role markers — not by URL.

Text-to-video

Just model + prompt + optional metadata. No content items means pure text-to-video:

curl https://api.orcarouter.ai/v1/video/generations \
  -H "Authorization: Bearer sk-orca-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "byteplus/dreamina-seedance-2-0-260128",
    "prompt": "Photorealistic style: under a clear blue sky, a vast expanse of white daisy fields stretches out. The camera gradually zooms in on a single daisy with glistening dewdrops on its petals.",
    "metadata": {
      "ratio": "16:9",
      "duration": 5,
      "watermark": true
    }
  }'

Image-to-video — first frame

Pass one image item with role: "first_frame":

curl https://api.orcarouter.ai/v1/video/generations \
  -H "Authorization: Bearer sk-orca-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "byteplus/dreamina-seedance-2-0-260128",
    "prompt": "the cat starts dancing energetically",
    "metadata": {
      "content": [
        { "type": "image_url", "image_url": { "url": "https://example.com/cat.png" }, "role": "first_frame" }
      ],
      "ratio": "adaptive",
      "duration": 5,
      "generate_audio": true
    }
  }'

Image-to-video — first and last frame

Two image items, one each for first_frame and end_frame:

curl https://api.orcarouter.ai/v1/video/generations \
  -H "Authorization: Bearer sk-orca-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "byteplus/dreamina-seedance-2-0-260128",
    "prompt": "Create a 360-degree orbiting camera shot from start to end frame.",
    "metadata": {
      "content": [
        { "type": "image_url", "image_url": { "url": "https://example.com/start.jpg" }, "role": "first_frame" },
        { "type": "image_url", "image_url": { "url": "https://example.com/end.jpg" },   "role": "end_frame"   }
      ],
      "ratio": "16:9",
      "duration": 6
    }
  }'

Multimodal reference — image + video + audio

Combine reference_image / reference_video / reference_audio items. Reference them in the prompt with [Image N] / [Video N] / [Audio N] indices (1-based, per type):

curl https://api.orcarouter.ai/v1/video/generations \
  -H "Authorization: Bearer sk-orca-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "byteplus/dreamina-seedance-2-0-260128",
    "prompt": "Use the first-person POV framing from [Video 1] throughout, and use [Audio 1] as the background music. First-person POV fruit tea promotional ad: [Image 1] hands pick a dew-covered apple; [Image 2] holds the finished drink up to the camera.",
    "metadata": {
      "content": [
        { "type": "image_url", "image_url": { "url": "https://example.com/tea_pic1.jpg" }, "role": "reference_image" },
        { "type": "image_url", "image_url": { "url": "https://example.com/tea_pic2.jpg" }, "role": "reference_image" },
        { "type": "video_url", "video_url": { "url": "https://example.com/tea_video1.mp4" }, "role": "reference_video" },
        { "type": "audio_url", "audio_url": { "url": "https://example.com/tea_audio1.mp3" }, "role": "reference_audio" }
      ],
      "ratio": "16:9",
      "duration": 11,
      "generate_audio": true,
      "watermark": false
    }
  }'

Available on seedance-2.0 and seedance-2.0-fast (full image + video + audio combinations); seedance-1-5-pro and seedance-1-0-* accept only reference_image items.

Video editing / extension

Pass {type: "video_url", role: "reference_video"} and ask the prompt to modify or extend it:

curl https://api.orcarouter.ai/v1/video/generations \
  -H "Authorization: Bearer sk-orca-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "byteplus/dreamina-seedance-2-0-260128",
    "prompt": "Change all the fruits in [Video 1] into fresh fruits.",
    "metadata": {
      "content": [
        { "type": "video_url", "video_url": { "url": "https://example.com/source.mp4" }, "role": "reference_video" }
      ],
      "ratio": "adaptive",
      "duration": 6
    }
  }'

Available on seedance-2.0 and seedance-2.0-fast only.

Webhooks

Pass metadata.callback_url: "https://your.domain/webhook" to receive a POST when the task transitions to SUCCESS or FAILURE. The payload mirrors the polling response. If you set both polling and a callback, you’ll get both — they’re independent.

Billing

OrcaRouter passes through upstream’s per-task token charge with no markup. Final cost matches ByteDance Ark’s published rate card (the upstream completion_tokens / total_tokens from the task result are converted to quota at the model’s per-token rate set in your Channel Margin config). A small pre-consume hold is reserved at submit time; the difference settles on success. See Operations / Billing & Usage.

​Models

​Submit a task

​Body fields

​content[] item shape

​Poll for results

​Endpoint variants

​Text-to-video

​Image-to-video — first frame

​Image-to-video — first and last frame

​Multimodal reference — image + video + audio

​Video editing / extension

​Webhooks

​Billing

​See also

Models

Submit a task

Body fields

content[] item shape

Poll for results

Endpoint variants

Text-to-video

Image-to-video — first frame

Image-to-video — first and last frame

Multimodal reference — image + video + audio

Video editing / extension

Webhooks

Billing

See also