
vllm
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
PythonApache License 2.0
amdblackwellcudadeepseekdeepseek-v3gptgpt-ossinferencekimillamallmllm-servingmodel-servingmoeopenaipytorchqwenqwen3tputransformer
Created: Feb 9, 2023
Updated: Feb 18, 2026
Last push: Feb 18, 2026