Technical Report · 2025 · Deployed Production System

Automated Multimodal Analysis of Conversational Video Content

AI-Assisted Video Annotation via Structured Cinematographic Knowledge Formalization
Sedat Bin Vedat · Hagia Labs, AI Yapım Kerem Çatay · AI Yapım, Ay Yapım Arda Sar · Hagia Labs Enes Kutay Yarkan · Hagia Labs Meftun Akarsu · Hagia Labs Recep Kaan Karaman · Hagia Labs
Read Abstract

Abstract

We present a deployed AI-assisted video annotation system that achieved an 83% reduction in production annotation time through structured formalization of expert cinematographic knowledge. Evaluated across 12 production projects over 3 months, our system reduced annotation requirements from 150–200 person-hours to 25–30 person-hours while improving annotation accuracy (91% vs 89% F1) and consistency (94% vs 82% terminology standardization).

Our core contribution is a methodology for encoding tacit director expertise into JSON-formatted schemas that enable Google Gemini to serve as an intelligent annotation assistant. The system structures cinematographic conventions—shot types, camera movements, composition principles—into machine-readable specifications that guide AI analysis and enable iterative human-AI validation loops.

Deployed in multilingual production environments supporting nine languages, the system demonstrates 95% translation accuracy for technical terminology and 91% cross-lingual semantic consistency. Statistical analysis across projects shows significant improvements in temporal precision, reduction in revision cycles, and decreased multilingual communication overhead.

Keywords— Video Annotation, Large Multimodal Models, Gemini, Production Workflow, Human-AI Collaboration, Structured Metadata, Knowledge Formalization

Contact: sedat@hagiaproject.com