TensorRT-LLM
hpcaitech/TensorRT-LLM
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in performant way.
1stars
Forks
0
Open issues
0
Watchers
1
Size
1631.6 MB
Apache License 2.0
Created: Oct 13, 2025
Updated: Jan 4, 2026
Last push: Oct 13, 2025
Fork