Audio Description (AD) is a narration designed to enhance accessibility for visually impaired individuals by conveying the key visual elements of a video. Thus, automating AD generation for long-form videos, such as movies and dramas, provides high social value but is a challenging task. First, AD must reflect the narrative context of the entire movie, including the storyline, names of characters and places, and the cultural setting. Second, to avoid disrupting the immersive experience of the movie, AD must not overlap with the characters’ dialogues, requiring the delivery of numerous visual elements in concise sentences. This paper presents NarrAD, a training-free AD generation framework that satisfies both of the requirements by leveraging rich narrative context in movie scripts and curating information across narration slots. Experiments on the MAD dataset demonstrate that our approach outperforms prior works in both captioning and LLM-based metrics. In the user study with 600 subjects, NarrAD achieves the highest user experience and movie comprehension. NarrAD’s AD samples are available at https://bit.ly/4aSwOTr.
NarrAD: Automatic Generation of Audio Descriptions for Movies with Rich Narrative Context
•
Leave a Reply