TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in performant way.

1stars

Homepage View on GitHub

Forks

Open issues

Watchers

Size

1631.6 MB

Apache License 2.0

Created: Oct 13, 2025

Updated: Jan 4, 2026

Last push: Oct 13, 2025

Fork