Configuring Custom Models: Vision & Parallel Tool Calls

Overview

This page documents configuration patterns for using custom models (e.g. Qwen, custom fine-tunes) with LiteLLM as a proxy and OpenCode as the agentic frontend.

When using non-standard models, both LiteLLM and OpenCode require explicit configuration to advertise and support advanced capabilities like vision (image input) and parallel tool calls. Without proper setup, these features silently fail.


Vision / Multimodal Support for Custom Models

Configuring image support for custom models in a llama.cpp + LiteLLM + opencode setup.

The Problem

LiteLLM does not infer vision support for arbitrary custom_openai models. OpenCode also performs its own preflight modality check and does not read LiteLLM's supports_vision metadata for custom providers. Both layers must be configured independently.

LiteLLM Side

OpenCode Side

Working OpenCode Model Shape

"Qwen3.6-35B-A3B": {
  "name": "Qwen3.6-35B-A3B",
  "limit": {
    "context": 262144,
    "output": 65536
  },
  "modalities": {
    "input": ["text", "image"],
    "output": ["text"]
  }
}

Parallel Tool Calls

Enabling parallel tool calls for a custom model served through LiteLLM, used by OpenCode.

The Problem

The parallel_tool_calls: true in model_info is metadata only — it does not auto-forward to the API. Without explicit pass-through, the downstream server never receives the signal.

OpenCode Side

LiteLLM Side

Fix: Add parallel_tool_calls inside litellm_params:

"litellm_params": {
  ...
  "parallel_tool_calls": true
}

Prerequisites

  1. LiteLLM must be v1.61.0+ (parallel tool calls pass-through added then)
  2. The downstream inference server (vLLM, TGI, etc.) must support parallel tool calls

Files Reference


Revision #1
Created 8 May 2026 16:40:46 by Clive
Updated 8 May 2026 16:40:48 by Clive