Batch Inference (Batch Job) Development Guide

Batch inference is designed for scenarios that require offline processing of large volumes of LLM requests, such as batch text generation, data labeling, and content moderation. You create batch jobs by uploading input files; the system processes them asynchronously and provides output file downloads when complete.

Supported Models

Batch inference currently supports the following LLM models only:
Model IDDescription
deepseek-ai/deepseek-v3.2DeepSeek V3.2 standard, for general conversation and text generation
deepseek-ai/deepseek-v3.2/thinkingDeepSeek V3.2 thinking mode, for complex reasoning and chain-of-thought tasks
All requests in a single batch input file must use the same model; mixing different models in one job is not supported.

Development Flow Overview

Prepare input file: Write request data in JSONL format

Input File Format

The input file must be JSONL (one JSON object per line), where each line represents one Chat Completions request.

Per-Line Structure

Each line must include the following fields:
FieldTypeRequiredDescription
custom_idstringYesUnique request identifier for matching results in the output
bodyobjectYesRequest body, aligned with Chat Completions API parameters

body Parameters

The body structure matches the Chat Completions request body. Main fields:
FieldTypeRequiredDescription
messagesarrayYesConversation message list
max_tokensintegerNoMaximum tokens to generate
temperaturenumberNoSampling temperature, 0–2
top_pnumberNoNucleus sampling parameter

Input File Example

{"custom_id":"req-001","body":{"messages":[{"role":"user","content":"Describe artificial intelligence in one sentence."}],"max_tokens":500}}
{"custom_id":"req-002","body":{"messages":[{"role":"system","content":"You are a professional technical writing assistant."},{"role":"user","content":"Explain what REST API is"}],"max_tokens":800}}
{"custom_id":"req-003","body":{"messages":[{"role":"user","content":"Derive the Pythagorean theorem"}],"max_tokens":2000}}

File Limits

  • Format: UTF-8 encoded JSONL
  • File size: Recommended not to exceed 5 GB per file

Batch processing does not yet support upload and download via API. If you need to use this feature, please contact our sales team.