Batch Inference (Batch Job) Development Guide
Batch inference is designed for scenarios that require offline processing of large volumes of LLM requests, such as batch text generation, data labeling, and content moderation. You create batch jobs by uploading input files; the system processes them asynchronously and provides output file downloads when complete.
Supported Models
Batch inference currently supports the following LLM models only:
| Model ID | Description |
|---|
deepseek-ai/deepseek-v3.2 | DeepSeek V3.2 standard, for general conversation and text generation |
deepseek-ai/deepseek-v3.2/thinking | DeepSeek V3.2 thinking mode, for complex reasoning and chain-of-thought tasks |
All requests in a single batch input file must use the same model; mixing different models in one job is not supported.
Development Flow Overview
Prepare input file: Write request data in JSONL format
The input file must be JSONL (one JSON object per line), where each line represents one Chat Completions request.
Per-Line Structure
Each line must include the following fields:
| Field | Type | Required | Description |
|---|
custom_id | string | Yes | Unique request identifier for matching results in the output |
body | object | Yes | Request body, aligned with Chat Completions API parameters |
body Parameters
The body structure matches the Chat Completions request body. Main fields:
| Field | Type | Required | Description |
|---|
messages | array | Yes | Conversation message list |
max_tokens | integer | No | Maximum tokens to generate |
temperature | number | No | Sampling temperature, 0–2 |
top_p | number | No | Nucleus sampling parameter |
{"custom_id":"req-001","body":{"messages":[{"role":"user","content":"Describe artificial intelligence in one sentence."}],"max_tokens":500}}
{"custom_id":"req-002","body":{"messages":[{"role":"system","content":"You are a professional technical writing assistant."},{"role":"user","content":"Explain what REST API is"}],"max_tokens":800}}
{"custom_id":"req-003","body":{"messages":[{"role":"user","content":"Derive the Pythagorean theorem"}],"max_tokens":2000}}
File Limits
- Format: UTF-8 encoded JSONL
- File size: Recommended not to exceed 5 GB per file
Batch processing does not yet support upload and download via API. If you need to use this feature, please contact our sales team.