tarsier
bytedance/tarsier
Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with good capability of general video understanding.
539stars
Forks
31
Open issues
30
Watchers
539
Size
85.3 MB
PythonApache License 2.0
research
Created: Jul 5, 2024
Updated: Apr 23, 2026
Last push: Aug 14, 2025