Page content

Oral Sessions

Day 1: 8 January

11:00 - 12:00
Best Paper Session
Chair: Toshihiko Yamasaki (The University of Tokyo)
Paper ID Paper Title Authors
196 RoLD: Robot Latent Diffusion for Multi-task Policy Modeling Tan, Wenhui; Liu, Bei; Zhang, Junbo; Song, Ruihua; Fu, Jianlong
379 TDM: Temporally-Consistent Diffusion Model for All-in-One Real-World Video Restoration Li, Yizhou; Liu, Zihua; Monno, Yusuke; Okutomi, Masatoshi
451 ESC-MISR: Enhancing Spatial Correlations for Multi-Image Super-Resolution in Remote Sensing Zhang, Zhihui; Pang, Jinhui; Li, Jianan; Hao, Xiaoshuai
462 Flat Local Minima for Continual learning on Semantic Segmentation Huang, Zhongzhan; Liang, Mingfu; Liang, Senwei; Zhong, Shanshan
15:30 – 16:30
Oral Session 1: Content Generation
Chair: Luwei Zhang (The University of Tokyo)
Paper ID Paper Title Authors
268 AD2AT: Audio Description to Alternative Text, a Dataset of Alternative Text from Movies Lincker, Elise; Guinaudeau, Camille; Satoh, Shin’ichi
310 KuzushijiDiffuser: Japanese Kuzushiji Font Generation with FontDiffuser YUAN, HONGHUI; YANAI, KEIJI
167 Saliency Guided Optimization Of Diffusion Latents Wang, Xiwen; Zhou, Jizhe; Li, Mao; Zhu, Xuekang; Li, Cheng
308 Skin-Adapter: Fine-Grained Skin-Color Preservation for Text-to-Image Generation Chen, Zhuowei; Huang, Mengqi; Chen, Nan; Mao, Zhendong
16:45 – 17:45
Oral Session 2: Audio Analysis
Chair: Ling Xiao (The University of Tokyo)
Paper ID Paper Title Authors
273 Operatic Singing Voice Synthesis From Inexperienced Voice Considering Tempo and Vowel Change Sugahara, Aoto; Kishimoto, Soma; Adachi, Yuji; Tai, Kiyoto; Takashima, Ryoichi; Takiguchi, Tetsuya
129 Small Tunes Transformer: Exploring Macro & Micro-Level Hierarchies for Skeleton-Conditioned Melody Generation Lv, Yishan; Luo, Jing; Ju, Boyuan; Yang, Xinyu
430 WavFusion: Towards wav2vec 2.0 Multimodal Speech Emotion Recognition Li, Feng; Luo, Jiusong; Xia, Wanjun
374 SPLGAN-TTS:Learning Semantic and Prosody to Enhance the Text-to-Speech Quality of Lightweight GAN Models Chang, Ding-Chi; Li, Shiou-Chi; Huang, Jen-Wei

Day 2: 9 January

9:30 – 10:30
Oral Session 3: Object Detection, Recognition, and Tracking
Chair: Wei-Ta Chu (National Cheng Kung University)
Paper ID Paper Title Authors
236 MineTinyNet-YOLO: An Efficient Small Object Detection Method for Complex Underground Coal Mine Scenarios Yaling, Hao; Wei, Wu
436 Mix-YOLONet: Deep Image Dehazing for Improving Object Detection Lim, Xin; Wong, Lai-Kuan; Loh, Yuen Peng; Gu, Ke; Lin, Weisi
411 Counting Unique Objects in Geo-Tagged Street Images: A Case Study Of Homeless Encampments in Los Angeles Ghasemi, Narges; Kim, Seon Ho; Alfarrarjeh, Abdullah; Shahabi, Cyrus
181 HCV: Lightweight Hybrid CNN-Vision Transformer for Visual Object Tracking Chen, Liang-Chia; Chu, Wei-Ta
10:45 – 11:30
Oral Session 4: Trusted and Explainable AI
Chair: Kazuaki Nakamura (Tokyo University of Science)
Paper ID Paper Title Authors
174 Detoxification of Unlabeled Dataset: Reducing Implicit Class Imbalance Using Pseudo-Jacobian of GAN’s Generator Suyama, Kosei; Nakamura, Kazuaki
244 Making strides Security in Multimodal Fake News Detection Models: A Comprehensive Analysis of Adversarial Attacks Si, Jiahua; Wang, Youze; Hu, Wenbo; Liu, Qiang; Hong, Richang
415 AMPLE: Emotion-Aware Multimodal Fusion Prompt Learning for Fake News Detection Xu, Xiaoman; Li, Xiangrun; Wang, Taihang; Jiang, Ye
15:00 – 15:45
Oral Session 5: Signal Processing
Chair: Masahiro Toyoura (University of Yamanashi)
Paper ID Paper Title Authors
297 Uncertainty-guided Joint Semi-supervised Segmentation and Registration of Cardiac Images Chen, Junjian; Yang, Xuan
337 Wavelet Integrated Convolutional Neural Network for ECG Signal Denoising Terada, Takamasa; Toyoura, Masahiro
392 MPPQNet: A Moment-Preserving Product Quantization Neural Network for Progressive 3D Point Cloud Transmission Cheng, Shyi-Chyi; CHEN, YEN-LIN; Li, Shih-Yu

Day 3: 10 January

10:30 – 11:30
Oral Session 6: Recognition and Reasoning
Chair: Satoshi Yamasaki (NEC)
Paper ID Paper Title Authors
218 A Multi-Expert Collaborative Framework for Multimodal Named Entity Recognition Xu, Bo; Jiang, Haiqi; Wei, Shouang; Du, Ming; Song, Hui; Wang, Hongya
266 SSDL:Sensor-to-Skeleton Diffusion Model with Lipschitz Regularization for Human Activity Recognition Sharma, Nikhil; Sun, Changchang; Zhao, Zhenghao; Ngu, Anne Hee Hiong; Latapie, Hugo; Yan, Yan
395 Open-vocabulary Scene Graph Generation via Synonym-based Predicate Descriptor Goto, Yuta; Yamazaki, Satoshi; Shibata, Takashi; Liu, Jianquan
274 Grounding Deliberate Reasoning in Multimodal Large Language Models Chen, Jiaxing; Liu, Yuxuan; Li, Dehu; An, Xiang; Deng, Weimo; Feng, Ziyong; Zhao, Yongle; Xie, Yin
15:00 – 16:00
Special Session: MLLMA
Chair: Rajiv Ratn Shah (IIIT-Delhi)
Paper ID Paper Title Authors
193 Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models Huang, Jia-Hong; Zhu, Hongyi; Shen, Yixian; Rudinac, Stevan; Kanoulas, Evangelos
288 Enhanced Anomaly Detection in 3D Motion through Language-Inspired Occlusion-Aware Modeling Li, Su; Wang, Liang; Wang, Jianye; Zhang, Ziheng; Zhang, Junjun; Zhang, Lei
364 Evaluating VQA Models' Consistency in the Scientific Domain C. Quan, Khanh-An; Guinaudeau, Camille; Satoh, Shin’ichi
Panel Discussion
16:15 – 17:00
Oral Session 7: Search and Retrieval
Chair: Nicolas Michel (The University of Tokyo)
Paper ID Paper Title Authors
346 RobSparse: Automatic Search for GPU-Friendly Robust and Sparse Vision Transformers Su, Yulan; Zhang, Sisi; Wang, Yan; Wang, Xingbin; Zhao, Lutan; Dan, Meng; Hou, Rui
232 Image-Generation AI Model Retrieval by Contrastive Learning-based Style Distance Calculation Vu, Thi Ngoc Anh; Shoji, Yoshiyuki; Oe, Yuma; PHAM, Huu Long; Ohshima, Hiroaki
414 Dynamic Exploration Graph: A Novel Approach for Efficient Nearest Neighbor Search in Evolving Multimedia Datasets Hezel, Nico; Barthel, Kai Uwe; Schilling, Bruno; Schall, Konstantin; Jung, Klaus