D01 |
468 |
SelectSum: Topic-Based Selective Summarization of Speech-Based Videos |
Wattasseril, Jobin Idiculla; Döllner, Jürgen |
D02 |
469 |
Real-time Visualizer for Turntablist Performance |
Hamanaka, Masatoshi |
D03 |
494 |
Multi-Dimensional Exploration of Media Collection Metadata |
Khan, Omar Shahbaz ; Duane, Aaron ; Hasnan, Hariz ; Blavec, Noé Le ; Ouvrard, Pierre ; Verdon, Johan ; d’Orazio, Laurent ; Thierry, Constance ; Jónsson, Björn Þór |
D04 |
470 |
DriveCoach: Smart Driving Assistance with Multimodal Risk Prediction and Risk Adaptive Behavior Recommendation |
Gan, Wenbin; Dao, Minh-Son; Zettsu, Koji |
D05 |
472 |
System Demo of Modeling Smart University Campus Virtual Environments |
Fernandez Roblero, Jaime Boanerjes ; Ali, Muhammad Intizar |
D06 |
473 |
AMDA: Advancing Multimedia Data Annotation for human-centric situations |
Mohamed Serouis, Ibrahim; Sèdes, Florence |
D07 |
475 |
FencBuddy: Action-aware Depth Perception Training for Fencing Attacks |
HUNG-YAO, PENG; ZI-HENG, ZHONG; CHENG-CHIH, TSAI; CHING-YEH, CHIANG; TSE-YU, PAN |
D08 |
477 |
WaveFontStyler: Font Style Transfer Based on Sound |
Izumi, Kota; Yanai, Keiji |
D09 |
479 |
Training a Segmentation-based Visual Anonymization Service for Street Scenes |
Korb, Martin; Bailer, Werner |
D10 |
481 |
CleverFox: Integrating Visual Mnemonics with AI for Enhanced Language Learning |
Chiang, Yung-Chu ; Tang, Zi-Xian ; Luo, Yi-Ching ; Chang, Jason S. |
D11 |
482 |
Fingering Prediction for Classical Guitar: Dataset Creation and Model Development |
Iino, Nami ; Iino, Akinaru |
D12 |
483 |
An Implementation of Networked JamSketch |
Kitahara, Tetsuro ; Tsutsumi, Takuya ; Nagoshi, Takaaki ; Suzuki, Taizan |
D13 |
485 |
Using Language Models to Generate and Forget the Narrative Memories of an Assistive Robot |
Garcia Contreras, Angel Fernando ; Chang, Wen-Yu ; Kawano, Seiya ; Chen, Yun-Nung ; Yoshino, Koichiro |
D14 |
486 |
Better Image Segmentation with Classification: Guiding Zero-Shot Models Using Class Activation Maps |
Borgli, Hanna ; Stensland, Håkon Kvale ; Halvorsen, Pål |
D15 |
488 |
Transformer-Based Audio Generation Conditioned by 2D Latent Maps: A Demonstration |
Limberg, Christian ; Zhang, Zhe ; Kastner, Marc A. |
D16 |
489 |
KuzushijiFontDiff: Diffusion Model for Japanese Kuzushiji Font Generation |
YUAN, HONGHUI; YANAI, KEIJI |
D17 |
490 |
SceneTextStyler: Editing Text with Style Transformation |
YUAN, HONGHUI; YANAI, KEIJI |
D18 |
492 |
Multimodal Interoperability with the CLAMS Platform |
Lynch, Kelley ; Rim, Kyeongmin ; King, Owen ; Pustejovsky, James |
D19 |
493 |
Enhancing User Control in AI-Based Video Summarization for Social Media |
Kontostathis, Ioannis; Apostolidis, Evlampios; Apostolidis, Konstantinos; Mezaris, Vasileios |
D20 |
496 |
Movie Retrieval Systems Using Genre-guided Multimodal Learning Techniques |
Huang, Wei-Lun ; Hidayati, Shintami Chusnul ; Pan, Tse-Yu |
D21 |
497 |
A User Identification and Reading Style Detection System Based on Eye Movement Patterns During Reading |
Kongmeesub, Onanong; Gurrin, Cathal; Nie, Dongyun |
D22 |
484 |
Federated Learning with Multimodal-Sensing and Knowledge Distillation: An application on real-world benchmark dataset |
Le, Duy-Dong ; Huynh, Duy-Thanh ; Bao, Pham The |
D23 |
499 |
Efficient Deployment of Multimodal AI Models: Leveraging Pruning, Quantization and Multi-Objective Optimization for Edge Computing |
Vu, Dang ; Dang, Tien ; Nguyen, Quoc-Trung ; Pham, Tan |
D24 |
466 |
Badminton Footwork Practice via an Immersive Virtual Reality System |
Jheng, Duen-Chian ; Harchan, Bill Louis ; Kostka de Sztemberg, Berenika Nawoja ; Hsu, Jen-Hao ; Hu, Min-Chun |
D25 |
480 |
RoboDJ: Live Commentary Robots System Driven by Physical- and Cyber-world Observations |
Kawanishi, Yasutomo; Nakamura, Yutaka; Shintani, Taiken; Ishi, Carlos T.; Kawano, Seiya; Yoshino, Koichiro; Minato, Takashi; Minoh, Michihiko |
D26 |
487 |
Leveraging Latent Diffusion in 3D Gaussian Splatting for Novel View Synthesis |
Li, Bohan ; Li, Xingyi ; Liang, Yangwen ; Wang, Shuangquan ; Song, Kee-Bong |