PaddleOCR-VL
Architecture
Inference with vLLM & SGLang
Set vLLM configuration:
vllm_config.yaml
gpu-memory-utilization: 0.8
max-num-seqs: 1024
max_num_batched_tokens: 1024
docker run \
-it \
--rm \
--gpus all \
--network host \
-v ./vllm_config.yaml:/app/vllm_config.yaml \
ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddlex-genai-vllm-server \
paddlex_genai_server \
--model_name PaddleOCR-VL-0.9B \
--host 0.0.0.0 \
--port 8118 \
--backend vllm \
--backend_config /app/vllm_config.yaml
tip
Change --backend sglang to use SGLang.
Run PaddleX Server
Docker
docker run --gpus all \
--name paddlex \
-v $PWD:/paddle \
--shm-size=8G \
--network=host \
-it \
ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.0.0-gpu-cuda12.6-cudnn9.5-trt10.5 \
/bin/bash
CLI
Install dependencies:
python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
python -m pip install paddlex
python -m pip install https://paddle-whl.bj.bcebos.com/nightly/cu126/safetensors/safetensors-0.6.2.dev0-cp38-abi3-linux_x86_64.whl
paddlex --install serving
Configure pipeline:
# Get pipeline config file
paddlex --get_pipeline_config PaddleOCR-VL
PaddleOCR-VL.yaml
# Change the backend and server_url to your own.
VLRecognition:
genai_config:
backend: vllm
server_url: http://127.0.0.1:8118
Run server:
paddlex --serve --pipeline PaddleOCR-VL
# or specific pipeline config file
paddlex --serve --pipeline PaddleOCR-VL.yaml