Venus: An Efficient Edge Memory-and-Retrieval System for VLM-based Online Video Understanding

Shengyuan Ye, Bei Ouyang, Tianyi Qian, Liekang Zeng, Mu Yuan, Xiaowen Chu, Weijie Hong, Xu Chen

December 2025

Venus Overview

Abstract

Vision-language models (VLMs) have demonstrated impressive multimodal comprehension capabilities and are being deployed in an increasing number of online video understanding applications. While recent efforts extensively explore advancing VLMs’ reasoning power in these cases, deployment constraints are overlooked, leading to overwhelming system overhead in real-world deployments. To address that, we propose Venus, an on-device memory-and-retrieval system for efficient online video understanding. Venus proposes an edge–cloud disaggregated architecture that sinks memory construction and keyframe retrieval from cloud to edge, operating in two stages. In the ingestion stage, Venus continuously processes streaming edge videos via scene segmentation and clustering, where the selected keyframes are embedded with a multimodal embedding model to build a hierarchical memory for efficient storage and retrieval. In the querying stage, Venus indexes incoming queries from memory, and employs a threshold-based progressive sampling algorithm for keyframe selection that enhances diversity and adaptively balances system cost and reasoning accuracy.

Type

Conference paper

Publication

In IEEE International Conference on Computer Communications (INFOCOM), Tokyo, Japan, 18-21 May 2026, CCF-A, Acceptance rate = 18.9% (329/1740)

Shengyuan Ye

Ph.D. student at SMCLab

He is a Ph.D. student at School of Computer Science and Engineering, Sun Yat-sen University. His research interests include Resource-efficient AI Systems and Applications with Mobile AI.

Liekang Zeng

Ph.D., Sun Yat-sen University

He obtained Ph.D. degree at School of Computer Science and Engineering, Sun Yat-sen University. His research interest lies in building edge intelligence systems with real-time responsiveness, systematic resource efficiency, and theoretical performance guarantee.

Mu Yuan

Ph.D., USTC

He received PhD degree from University of Science and Technology of China in 2024, advised by Prof. Xiang-Yang Li and Prof. Lan Zhang. He was a postdoctoral fellow at CUHK, working with Prof. Guoliang Xing.His research interest lie in designing theory-backed algorithms and building innovative systems for AI workloads.

Xiaowen Chu

Professor, HKUST(GZ)
Acting Head, Data Science and Analytics Thrust

Dr. Chu is currently a Professor at the Data Science and Analytics Thrust, Information Hub of HKUST(GZ), and an Affiliate Professor in the Department of Computer Science and Engineering, HKUST. His current research interests include GPU Computing, Distributed Machine Learning, Cloud Computing, and Wireless Networks. He is especially interested in the modelling, parallel algorithm design, application optimization, and energy efficiency of GPU computing.

Xu Chen

Professor and Assistant Dean, Sun Yat-sen University
Director, Institute of Advanced Networking & Computing Systems

Xu Chen is a Full Professor with Sun Yat-sen University, Director of Institute of Advanced Networking and Computing Systems (IANCS), and the Vice Director of National Engineering Research Laboratory of Digital Homes. His research interest includes edge computing and cloud computing, federated learning, cloud-native intelligent robots, distributed artificial intelligence, intelligent big data analysis, and computing power network.