Milestone Launches Vision Language Model For Traffic Video

The generative AI-powered plug-in analyzes submitted video clips and produces natural-language descriptions within seconds, based on user-defined prompts.

Milestone Systems has released a new Vision Language Model (VLM) purpose-built for traffic understanding and real-world video intelligence. Powered by NVIDIA Cosmos Reason, the VLM underpins two new offerings: Video Summarization for XProtect® Video Management Software and Hafnia VLM as a Service (VLMaaS) for third-party integrations.

Designed to reduce video overload and manual review, the new solutions enable faster insights, automated reporting, and scalable AI-powered video intelligence across traffic and public safety environments.

Modern video systems generate massive volumes of footage, making manual review time-consuming and inefficient. Milestone’s new Video Summarization tool addresses this challenge by converting raw video into concise, searchable text summaries directly inside the XProtect Smart Client.

Early deployments indicate that video summarization can reduce operator false alarm fatigue by up to 30%, allowing teams to focus on relevant incidents rather than noise or irrelevant motion.

Milestone also introduced Hafnia VLM as a Service (VLMaaS), providing developers, integrators, and partners with API access to production-ready vision language intelligence.

VLMaaS eliminates the complexity of building and managing AI infrastructure, enabling rapid development of AI-powered applications regardless of existing analytics maturity. The platform supports everything from MVP testing to enterprise-scale deployments.

Milestone reports that VLMaaS can reduce development effort by up to

Andrew Burnett, Acting Chief Technology Officer at Milestone Systems, said the new offerings directly address long-standing industry challenges.

“With the Vision Language Model as a Service and Video Summarization for XProtect, we’re tackling some of the most challenging bottlenecks: video overload and time-consuming manual work. Operators get immediate insight directly within XProtect; builders get API-first access to production-ready intelligence without bespoke training or heavy infrastructure.”

He added that specialization in real-world traffic video and responsibly sourced data allows customers to deploy confidently and extract value from existing systems.

Cities such as Genoa, Italy, and Dubuque, Iowa, are among early adopters, using the new capabilities to advance intelligent traffic management initiatives.

Both solutions are powered by Milestone’s Hafnia VLM, fine-tuned on 75,000 hours of responsibly sourced real-world traffic video from Europe and the United States. Data preparation leverages NVIDIA Cosmos Curator, with deployment across cloud and regional data centers.

This combination of NVIDIA Cosmos Reason and Milestone’s domain-specific training positions Hafnia as one of the industry’s most advanced video AI platforms.

Share This
Scroll to Top