Unsupervised Audio-Visual Lecture Segmentation

10/29/2022
by   Darshan Singh S, et al.
0

Over the last decade, online lecture videos have become increasingly popular and have experienced a meteoric rise during the pandemic. However, video-language research has primarily focused on instructional videos or movies, and tools to help students navigate the growing online lectures are lacking. Our first contribution is to facilitate research in the educational domain, by introducing AVLectures, a large-scale dataset consisting of 86 courses with over 2,350 lectures covering various STEM subjects. Each course contains video lectures, transcripts, OCR outputs for lecture frames, and optionally lecture notes, slides, assignments, and related educational content that can inspire a variety of tasks. Our second contribution is introducing video lecture segmentation that splits lectures into bite-sized topics that show promise in improving learner engagement. We formulate lecture segmentation as an unsupervised task that leverages visual, textual, and OCR cues from the lecture, while clip representations are fine-tuned on a pretext self-supervised task of matching the narration with the temporally aligned visual content. We use these representations to generate segments using a temporally consistent 1-nearest neighbor algorithm, TW-FINCH. We evaluate our method on 15 courses and compare it against various visual and textual baselines, outperforming all of them. Our comprehensive ablation studies also identify the key factors driving the success of our approach.

READ FULL TEXT

page 2

page 4

page 7

page 8

page 11

page 15

research
10/26/2020

Classification of Important Segments in Educational Videos using Multimodal Features

Videos are a commonly-used type of content in learning during Web search...
research
09/03/2021

PEEK: A Large Dataset of Learner Engagement with Educational Videos

Educational recommenders have received much less attention in comparison...
research
01/14/2022

CLUE: Contextualised Unified Explainable Learning of User Engagement in Video Lectures

Predicting contextualised engagement in videos is a long-standing proble...
research
03/20/2021

Temporally-Weighted Hierarchical Clustering for Unsupervised Action Segmentation

Action segmentation refers to inferring boundaries of semantically consi...
research
05/11/2016

Unsupervised Semantic Action Discovery from Video Collections

Human communication takes many forms, including speech, text and instruc...
research
10/12/2022

LiveSeg: Unsupervised Multimodal Temporal Segmentation of Long Livestream Videos

Livestream videos have become a significant part of online learning, whe...

Please sign up or login with your details

Forgot password? Click here to reset