Batch Inference (Batch Job) Development Guide

Batch inference is designed for scenarios that require offline processing of large volumes of LLM requests, such as batch text generation, data labeling, and content moderation. You create batch jobs by uploading input files; the system processes them asynchronously and provides output file downloads when complete.

Supported Models

Batch inference currently supports the following LLM models only:

Model ID	Description
`deepseek-ai/deepseek-v3.2`	DeepSeek V3.2 standard, for general conversation and text generation
`deepseek-ai/deepseek-v3.2/thinking`	DeepSeek V3.2 thinking mode, for complex reasoning and chain-of-thought tasks

All requests in a single batch input file must use the same model; mixing different models in one job is not supported.

Development Flow Overview

Prepare input file: Write request data in JSONL format

Input File Format

The input file must be JSONL (one JSON object per line), where each line represents one Chat Completions request.

Per-Line Structure

Each line must include the following fields:

Field	Type	Required	Description
`custom_id`	string	Yes	Unique request identifier for matching results in the output
`body`	object	Yes	Request body, aligned with Chat Completions API parameters

`body` Parameters

The body structure matches the Chat Completions request body. Main fields:

Field	Type	Required	Description
`messages`	array	Yes	Conversation message list
`max_tokens`	integer	No	Maximum tokens to generate
`temperature`	number	No	Sampling temperature, 0–2
`top_p`	number	No	Nucleus sampling parameter

Input File Example

{"custom_id":"req-001","body":{"messages":[{"role":"user","content":"Describe artificial intelligence in one sentence."}],"max_tokens":500}}
{"custom_id":"req-002","body":{"messages":[{"role":"system","content":"You are a professional technical writing assistant."},{"role":"user","content":"Explain what REST API is"}],"max_tokens":800}}
{"custom_id":"req-003","body":{"messages":[{"role":"user","content":"Derive the Pythagorean theorem"}],"max_tokens":2000}}

File Limits

Format: UTF-8 encoded JSONL
File size: Recommended not to exceed 5 GB per file

Batch processing does not yet support upload and download via API. If you need to use this feature, please contact our sales team.

Production Introduction

Getting Started

Models Endpoints

Authentication

API Reference

Terms & Privacy

Batch Processing

Batch Inference (Batch Job) Development Guide

Supported Models

Development Flow Overview

Input File Format

Per-Line Structure

`body` Parameters

Input File Example

File Limits

Production Introduction

Getting Started

Models Endpoints

Authentication

API Reference

Terms & Privacy

​Batch Inference (Batch Job) Development Guide

​Supported Models

​Development Flow Overview

​Input File Format

​Per-Line Structure

​body Parameters

​Input File Example

​File Limits

Batch Inference (Batch Job) Development Guide

Supported Models

Development Flow Overview

Input File Format

Per-Line Structure

`body` Parameters

Input File Example

File Limits