Skip to main content

PaddleOCR-VL

PaddleOCR-VL Doc

PaddleX Doc

Architecture

Inference with vLLM & SGLang

Set vLLM configuration:

vllm_config.yaml
gpu-memory-utilization: 0.8
max-num-seqs: 1024
max_num_batched_tokens: 1024
docker run \
-it \
--rm \
--gpus all \
--network host \
-v ./vllm_config.yaml:/app/vllm_config.yaml \
ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddlex-genai-vllm-server \
paddlex_genai_server \
--model_name PaddleOCR-VL-0.9B \
--host 0.0.0.0 \
--port 8118 \
--backend vllm \
--backend_config /app/vllm_config.yaml
tip

Change --backend sglang to use SGLang.

Run PaddleX Server

Docker

docker run --gpus all \
--name paddlex \
-v $PWD:/paddle \
--shm-size=8G \
--network=host \
-it \
ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.0.0-gpu-cuda12.6-cudnn9.5-trt10.5 \
/bin/bash

CLI

Install dependencies:

python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
python -m pip install paddlex
python -m pip install https://paddle-whl.bj.bcebos.com/nightly/cu126/safetensors/safetensors-0.6.2.dev0-cp38-abi3-linux_x86_64.whl

paddlex --install serving

Configure pipeline:

# Get pipeline config file
paddlex --get_pipeline_config PaddleOCR-VL
PaddleOCR-VL.yaml
# Change the backend and server_url to your own.
VLRecognition:
genai_config:
backend: vllm
server_url: http://127.0.0.1:8118

Run server:

paddlex --serve --pipeline PaddleOCR-VL
# or specific pipeline config file
paddlex --serve --pipeline PaddleOCR-VL.yaml

API Doc