ByteDance Open-Sources UI-TARS-1.5, a Multimodal Agent Model Built for Visual-Language Tasks| TMTPOST

中文

HOME

BRIEF NEWS

OPINION

FEATURES

LIVE

EVENTS

Apr. 18, 2025

ByteDance Open-Sources UI-TARS-1.5, a Multimodal Agent Model Built for Visual-Language Tasks

TMTPOST – TechNode/TMTPost – ByteDance’s Doubao large model team has officially released and open-sourced UI-TARS-1.5, a cutting-edge multimodal intelligent agent built on a vision-language model architecture. Designed for high-efficiency task execution in virtual environments, UI-TARS-1.5 achieves state-of-the-art (SOTA) performance across seven benchmark GUI (Graphical User Interface) evaluation tasks. Notably, the model demonstrates long-horizon reasoning capabilities within gaming scenarios and showcases robust interactive abilities in open virtual spaces—a first for models of its kind. The open-source release aims to accelerate research and development of intelligent agents capable of navigating and performing in visually rich, interactive environments.

Subscribe To Our News