tarsier
bytedance/tarsier
Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with good capability of general video understanding.
530stars
Forks
29
Open issues
30
Watchers
530
Size
85.3 MB
PythonApache License 2.0
research
Created: Jul 5, 2024
Updated: Mar 12, 2026
Last push: Aug 14, 2025