Page content

Oral Presentations

paperID authors title
117 Wang, Zhensu ; Peng, Weilong ; Wang, Le ; Wu, Zhizhe ; Zhu, Peican ; Tang, Keke EIA: Edge-aware Imperceptible Adversarial Attacks on 3D Point Clouds
127 Zhang, Jiahao ; Gao, Guangyu ; Zhao, Xiao MKSNet: Advanced Small Object Detection in Remote Sensing Imagery with Multi-Kernel and Dual Attention Mechanisms
129 Lv, Yishan; Luo, Jing; Ju, Boyuan; Yang, Xinyu Small Tunes Transformer: Exploring Macro & Micro-Level Hierarchies for Skeleton-Conditioned Melody Generation
140 Li, Xiuhong; Zhu, Xinyue; Li, Boyuan; Li, Songlin; Wang, Luyao; Jia, Zhenhong Infrared Small Target Detection with Feature Refinement and Context Enhancement
167 Wang, Xiwen; Zhou, Jizhe; Li, Mao; Zhu, Xuekang; Li, Cheng Saliency Guided Optimization Of Diffusion Latents
174 Suyama, Kosei; Nakamura, Kazuaki Detoxification of Unlabeled Dataset: Reducing Implicit Class Imbalance Using Pseudo-Jacobian of GAN’s Generator
193 Huang, Jia-Hong; Zhu, Hongyi; Shen, Yixian; Rudinac, Stevan; Kanoulas, Evangelos Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models
196 Tan, Wenhui ; Liu, Bei ; Zhang, Junbo ; Song, Ruihua ; Fu, Jianlong RoLD: Robot Latent Diffusion for Multi-task Policy Modeling
199 Zhu, Jian ; Sheng, Mingkai ; Huang, Zhangmin ; Chang, Jingfei ; Long, Jian ; Jiang, Jinling ; Liu, Lei ; Luo, Cheng CLIP Multi-modal Hashing for Multimedia Retrieval
218 Xu, Bo; Jiang, Haiqi; Wei, Shouang; Du, Ming; Song, Hui; Wang, Hongya A Multi-Expert Collaborative Framework for Multimodal Named Entity Recognition
223 Yang, Xiukang ; Ge, Jingguo ; Li, Hui ; Li, Liangxiong ; Wu, Bingzhen Integrating S1&S2 Framework for Enhanced Semantic Match in Person Re-identification
232 Vu, Thi Ngoc Anh ; Shoji, Yoshiyuki ; Oe, Yuma ; PHAM, Huu Long ; Ohshima, Hiroaki Image-Generation AI Model Retrieval by Contrastive Learning-based Style Distance Calculation
236 Yaling, Hao; Wei, Wu MineTinyNet-YOLO: An Efficient Small Object Detection Method for Complex Underground Coal Mine Scenarios
244 Si, Jiahua ; Wang, Youze ; Hu, Wenbo ; Liu, Qiang ; Hong, Richang Making strides Security in Multimodal Fake News Detection Models: A Comprehensive Analysis of Adversarial Attacks
266 Sharma, Nikhil ; Sun, Changchang ; Zhao, Zhenghao ; Ngu, Anne Hee Hiong ; Latapie, Hugo ; Yan, Yan SSDL:Sensor-to-Skeleton Diffusion Model with Lipschitz Regularization for Human Activity Recognition
268 Lincker, Elise ; Guinaudeau, Camille ; Satoh, Shin’ichi AD2AT: Audio Description to Alternative Text, a Dataset of Alternative Text from Movies
273 Sugahara, Aoto ; Kishimoto, Soma ; Adachi, Yuji ; Tai, Kiyoto ; Takashima, Ryoichi ; Takiguchi, Tetsuya Operatic Singing Voice Synthesis From Inexperienced Voice Considering Tempo and Vowel Change
274 Chen, Jiaxing ; Liu, Yuxuan ; Li, Dehu ; An, Xiang ; Deng, Weimo ; Feng, Ziyong ; Zhao, Yongle ; Xie, Yin Grounding Deliberate Reasoning in Multimodal Large Language Models
305 Zhu, Deli ; Xu, Zhao ; Yang*, Yunong MambaTalk: Speech-driven 3D Facial Animation with Mamba
308 Chen, Zhuowei; Huang, Mengqi; Chen, Nan; Mao, Zhendong Skin-Adapter: Fine-Grained Skin-Color Preservation for Text-to-Image Generation
310 YUAN, HONGHUI; YANAI, KEIJI KuzushijiDiffuser: Japanese Kuzushiji Font Generation with FontDiffuser
331 Minghui, Hou ; Gang, Wang ; Zhiyang, Wang ; Tongzhou, Zhang ; Baorui, Ma BLCC: A Benchmark for Multi-LiDAR and Multi-Camera Calibration
346 Su, Yulan; Zhang, Sisi; Wang, Yan; Wang, Xingbin; Zhao, Lutan; Dan, Meng; Hou, Rui RobSparse: Automatic Search for GPU-Friendly Robust and Sparse Vision Transformers
359 Zhao, Hui; Qi, Na; Zhu, Qing; Lin, Xiumin SSCDUF: Spatial-Spectral Correlation Transformer Based on Deep Unfolding Framework for Hyperspectral Image Reconstruction
364 C. Quan, Khanh-An ; Guinaudeau, Camille ; Satoh, Shin’ichi Evaluating VQA Models' Consistency in the Scientific Domain
374 Chang, Ding-Chi; Li, Shiou-Chi; Huang, Jen-Wei SPLGAN-TTS:Learning Semantic and Prosody to Enhance the Text-to-Speech Quality of Lightweight GAN Models
379 Li, Yizhou; Liu, Zihua; Monno, Yusuke; Okutomi, Masatoshi TDM: Temporally-Consistent Diffusion Model for All-in-One Real-World Video Restoration
385 Lu, Lingyi; Xu, Xin; Wang, Xiao Style Separation and Content Recovery for Generalizable Sketch Re-identification and A New Benchmark
393 Zhang, Zhengzhuo; Zhuang, Liansheng Progressive Neural Architecture Generation with Weaker Predictors
395 Goto, Yuta ; Yamazaki, Satoshi ; Shibata, Takashi ; Liu, Jianquan Open-vocabulary Scene Graph Generation via Synonym-based Predicate Descriptor
415 Xu, Xiaoman; Li, Xiangrun; Wang, Taihang; Jiang, Ye AMPLE: Emotion-Aware Multimodal Fusion Prompt Learning for Fake News Detection
430 Li, Feng; Luo, Jiusong; Xia, Wanjun WavFusion: Towards wav2vec 2.0 Multimodal Speech Emotion Recognition
451 Zhang, Zhihui ; Pang, Jinhui ; Li, Jianan ; Hao, Xiaoshuai ESC-MISR: Enhancing Spatial Correlations for Multi-Image Super-Resolution in Remote Sensing
462 Huang, Zhongzhan; Liang, Mingfu; Liang, Senwei; Zhong, Shanshan Flat Local Minima for Continual learning on Semantic Segmentation
214 Wei, Wei; Zhang, Bingkun; Wang, Yibing TACST: Time-Aware Transformer for Robust Speech Emotion Recognition
215 Wei, Wei; Zhang, Bingkun; Wang, Yibing TS-MEFM: A New Multimodal Speech Emotion Recognition Network Based on Speech and Text Fusion
288 Li, Su ; Wang, Liang ; Wang, Jianye ; Zhang, Ziheng ; Zhang, Junjun ; Zhang, Lei Enhanced Anomaly Detection in 3D Motion through Language-Inspired Occlusion-Aware Modeling
411 Ghasemi, Narges ; Kim, Seon Ho ; Alfarrarjeh, Abdullah ; Shahabi, Cyrus Counting Unique Objects in Geo-Tagged Street Images: A Case Study Of Homeless Encampments in Los Angeles