Oral Sessions
Day 1: 8 January
Paper ID | Paper Title | Authors |
---|---|---|
196 | RoLD: Robot Latent Diffusion for Multi-task Policy Modeling | Tan, Wenhui; Liu, Bei; Zhang, Junbo; Song, Ruihua; Fu, Jianlong |
379 | TDM: Temporally-Consistent Diffusion Model for All-in-One Real-World Video Restoration | Li, Yizhou; Liu, Zihua; Monno, Yusuke; Okutomi, Masatoshi |
451 | ESC-MISR: Enhancing Spatial Correlations for Multi-Image Super-Resolution in Remote Sensing | Zhang, Zhihui; Pang, Jinhui; Li, Jianan; Hao, Xiaoshuai |
462 | Flat Local Minima for Continual learning on Semantic Segmentation | Huang, Zhongzhan; Liang, Mingfu; Liang, Senwei; Zhong, Shanshan |
Paper ID | Paper Title | Authors |
---|---|---|
268 | AD2AT: Audio Description to Alternative Text, a Dataset of Alternative Text from Movies | Lincker, Elise; Guinaudeau, Camille; Satoh, Shin’ichi |
310 | KuzushijiDiffuser: Japanese Kuzushiji Font Generation with FontDiffuser | YUAN, HONGHUI; YANAI, KEIJI |
167 | Saliency Guided Optimization Of Diffusion Latents | Wang, Xiwen; Zhou, Jizhe; Li, Mao; Zhu, Xuekang; Li, Cheng |
308 | Skin-Adapter: Fine-Grained Skin-Color Preservation for Text-to-Image Generation | Chen, Zhuowei; Huang, Mengqi; Chen, Nan; Mao, Zhendong |
Paper ID | Paper Title | Authors |
---|---|---|
273 | Operatic Singing Voice Synthesis From Inexperienced Voice Considering Tempo and Vowel Change | Sugahara, Aoto; Kishimoto, Soma; Adachi, Yuji; Tai, Kiyoto; Takashima, Ryoichi; Takiguchi, Tetsuya |
129 | Small Tunes Transformer: Exploring Macro & Micro-Level Hierarchies for Skeleton-Conditioned Melody Generation | Lv, Yishan; Luo, Jing; Ju, Boyuan; Yang, Xinyu |
430 | WavFusion: Towards wav2vec 2.0 Multimodal Speech Emotion Recognition | Li, Feng; Luo, Jiusong; Xia, Wanjun |
374 | SPLGAN-TTS:Learning Semantic and Prosody to Enhance the Text-to-Speech Quality of Lightweight GAN Models | Chang, Ding-Chi; Li, Shiou-Chi; Huang, Jen-Wei |
Day 2: 9 January
Paper ID | Paper Title | Authors |
---|---|---|
236 | MineTinyNet-YOLO: An Efficient Small Object Detection Method for Complex Underground Coal Mine Scenarios | Yaling, Hao; Wei, Wu |
436 | Mix-YOLONet: Deep Image Dehazing for Improving Object Detection | Lim, Xin; Wong, Lai-Kuan; Loh, Yuen Peng; Gu, Ke; Lin, Weisi |
411 | Counting Unique Objects in Geo-Tagged Street Images: A Case Study Of Homeless Encampments in Los Angeles | Ghasemi, Narges; Kim, Seon Ho; Alfarrarjeh, Abdullah; Shahabi, Cyrus |
181 | HCV: Lightweight Hybrid CNN-Vision Transformer for Visual Object Tracking | Chen, Liang-Chia; Chu, Wei-Ta |
Paper ID | Paper Title | Authors |
---|---|---|
174 | Detoxification of Unlabeled Dataset: Reducing Implicit Class Imbalance Using Pseudo-Jacobian of GAN’s Generator | Suyama, Kosei; Nakamura, Kazuaki |
244 | Making strides Security in Multimodal Fake News Detection Models: A Comprehensive Analysis of Adversarial Attacks | Si, Jiahua; Wang, Youze; Hu, Wenbo; Liu, Qiang; Hong, Richang |
415 | AMPLE: Emotion-Aware Multimodal Fusion Prompt Learning for Fake News Detection | Xu, Xiaoman; Li, Xiangrun; Wang, Taihang; Jiang, Ye |
Paper ID | Paper Title | Authors |
---|---|---|
297 | Uncertainty-guided Joint Semi-supervised Segmentation and Registration of Cardiac Images | Chen, Junjian; Yang, Xuan |
337 | Wavelet Integrated Convolutional Neural Network for ECG Signal Denoising | Terada, Takamasa; Toyoura, Masahiro |
392 | MPPQNet: A Moment-Preserving Product Quantization Neural Network for Progressive 3D Point Cloud Transmission | Cheng, Shyi-Chyi; CHEN, YEN-LIN; Li, Shih-Yu |
Day 3: 10 January
Paper ID | Paper Title | Authors |
---|---|---|
218 | A Multi-Expert Collaborative Framework for Multimodal Named Entity Recognition | Xu, Bo; Jiang, Haiqi; Wei, Shouang; Du, Ming; Song, Hui; Wang, Hongya |
266 | SSDL:Sensor-to-Skeleton Diffusion Model with Lipschitz Regularization for Human Activity Recognition | Sharma, Nikhil; Sun, Changchang; Zhao, Zhenghao; Ngu, Anne Hee Hiong; Latapie, Hugo; Yan, Yan |
395 | Open-vocabulary Scene Graph Generation via Synonym-based Predicate Descriptor | Goto, Yuta; Yamazaki, Satoshi; Shibata, Takashi; Liu, Jianquan |
274 | Grounding Deliberate Reasoning in Multimodal Large Language Models | Chen, Jiaxing; Liu, Yuxuan; Li, Dehu; An, Xiang; Deng, Weimo; Feng, Ziyong; Zhao, Yongle; Xie, Yin |
Paper ID | Paper Title | Authors |
---|---|---|
193 | Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models | Huang, Jia-Hong; Zhu, Hongyi; Shen, Yixian; Rudinac, Stevan; Kanoulas, Evangelos |
288 | Enhanced Anomaly Detection in 3D Motion through Language-Inspired Occlusion-Aware Modeling | Li, Su; Wang, Liang; Wang, Jianye; Zhang, Ziheng; Zhang, Junjun; Zhang, Lei |
364 | Evaluating VQA Models' Consistency in the Scientific Domain | C. Quan, Khanh-An; Guinaudeau, Camille; Satoh, Shin’ichi |
Panel Discussion |
Paper ID | Paper Title | Authors |
---|---|---|
346 | RobSparse: Automatic Search for GPU-Friendly Robust and Sparse Vision Transformers | Su, Yulan; Zhang, Sisi; Wang, Yan; Wang, Xingbin; Zhao, Lutan; Dan, Meng; Hou, Rui |
232 | Image-Generation AI Model Retrieval by Contrastive Learning-based Style Distance Calculation | Vu, Thi Ngoc Anh; Shoji, Yoshiyuki; Oe, Yuma; PHAM, Huu Long; Ohshima, Hiroaki |
414 | Dynamic Exploration Graph: A Novel Approach for Efficient Nearest Neighbor Search in Evolving Multimedia Datasets | Hezel, Nico; Barthel, Kai Uwe; Schilling, Bruno; Schall, Konstantin; Jung, Klaus |