117 |
Wang, Zhensu ; Peng, Weilong ; Wang, Le ; Wu, Zhizhe ; Zhu, Peican ; Tang, Keke |
EIA: Edge-aware Imperceptible Adversarial Attacks on 3D Point Clouds |
127 |
Zhang, Jiahao ; Gao, Guangyu ; Zhao, Xiao |
MKSNet: Advanced Small Object Detection in Remote Sensing Imagery with Multi-Kernel and Dual Attention Mechanisms |
129 |
Lv, Yishan; Luo, Jing; Ju, Boyuan; Yang, Xinyu |
Small Tunes Transformer: Exploring Macro & Micro-Level Hierarchies for Skeleton-Conditioned Melody Generation |
140 |
Li, Xiuhong; Zhu, Xinyue; Li, Boyuan; Li, Songlin; Wang, Luyao; Jia, Zhenhong |
Infrared Small Target Detection with Feature Refinement and Context Enhancement |
167 |
Wang, Xiwen; Zhou, Jizhe; Li, Mao; Zhu, Xuekang; Li, Cheng |
Saliency Guided Optimization Of Diffusion Latents |
174 |
Suyama, Kosei; Nakamura, Kazuaki |
Detoxification of Unlabeled Dataset: Reducing Implicit Class Imbalance Using Pseudo-Jacobian of GAN’s Generator |
193 |
Huang, Jia-Hong; Zhu, Hongyi; Shen, Yixian; Rudinac, Stevan; Kanoulas, Evangelos |
Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models |
196 |
Tan, Wenhui ; Liu, Bei ; Zhang, Junbo ; Song, Ruihua ; Fu, Jianlong |
RoLD: Robot Latent Diffusion for Multi-task Policy Modeling |
199 |
Zhu, Jian ; Sheng, Mingkai ; Huang, Zhangmin ; Chang, Jingfei ; Long, Jian ; Jiang, Jinling ; Liu, Lei ; Luo, Cheng |
CLIP Multi-modal Hashing for Multimedia Retrieval |
218 |
Xu, Bo; Jiang, Haiqi; Wei, Shouang; Du, Ming; Song, Hui; Wang, Hongya |
A Multi-Expert Collaborative Framework for Multimodal Named Entity Recognition |
223 |
Yang, Xiukang ; Ge, Jingguo ; Li, Hui ; Li, Liangxiong ; Wu, Bingzhen |
Integrating S1&S2 Framework for Enhanced Semantic Match in Person Re-identification |
232 |
Vu, Thi Ngoc Anh ; Shoji, Yoshiyuki ; Oe, Yuma ; PHAM, Huu Long ; Ohshima, Hiroaki |
Image-Generation AI Model Retrieval by Contrastive Learning-based Style Distance Calculation |
236 |
Yaling, Hao; Wei, Wu |
MineTinyNet-YOLO: An Efficient Small Object Detection Method for Complex Underground Coal Mine Scenarios |
244 |
Si, Jiahua ; Wang, Youze ; Hu, Wenbo ; Liu, Qiang ; Hong, Richang |
Making strides Security in Multimodal Fake News Detection Models: A Comprehensive Analysis of Adversarial Attacks |
266 |
Sharma, Nikhil ; Sun, Changchang ; Zhao, Zhenghao ; Ngu, Anne Hee Hiong ; Latapie, Hugo ; Yan, Yan |
SSDL:Sensor-to-Skeleton Diffusion Model with Lipschitz Regularization for Human Activity Recognition |
268 |
Lincker, Elise ; Guinaudeau, Camille ; Satoh, Shin’ichi |
AD2AT: Audio Description to Alternative Text, a Dataset of Alternative Text from Movies |
273 |
Sugahara, Aoto ; Kishimoto, Soma ; Adachi, Yuji ; Tai, Kiyoto ; Takashima, Ryoichi ; Takiguchi, Tetsuya |
Operatic Singing Voice Synthesis From Inexperienced Voice Considering Tempo and Vowel Change |
274 |
Chen, Jiaxing ; Liu, Yuxuan ; Li, Dehu ; An, Xiang ; Deng, Weimo ; Feng, Ziyong ; Zhao, Yongle ; Xie, Yin |
Grounding Deliberate Reasoning in Multimodal Large Language Models |
305 |
Zhu, Deli ; Xu, Zhao ; Yang*, Yunong |
MambaTalk: Speech-driven 3D Facial Animation with Mamba |
308 |
Chen, Zhuowei; Huang, Mengqi; Chen, Nan; Mao, Zhendong |
Skin-Adapter: Fine-Grained Skin-Color Preservation for Text-to-Image Generation |
310 |
YUAN, HONGHUI; YANAI, KEIJI |
KuzushijiDiffuser: Japanese Kuzushiji Font Generation with FontDiffuser |
331 |
Minghui, Hou ; Gang, Wang ; Zhiyang, Wang ; Tongzhou, Zhang ; Baorui, Ma |
BLCC: A Benchmark for Multi-LiDAR and Multi-Camera Calibration |
346 |
Su, Yulan; Zhang, Sisi; Wang, Yan; Wang, Xingbin; Zhao, Lutan; Dan, Meng; Hou, Rui |
RobSparse: Automatic Search for GPU-Friendly Robust and Sparse Vision Transformers |
359 |
Zhao, Hui; Qi, Na; Zhu, Qing; Lin, Xiumin |
SSCDUF: Spatial-Spectral Correlation Transformer Based on Deep Unfolding Framework for Hyperspectral Image Reconstruction |
364 |
C. Quan, Khanh-An ; Guinaudeau, Camille ; Satoh, Shin’ichi |
Evaluating VQA Models' Consistency in the Scientific Domain |
374 |
Chang, Ding-Chi; Li, Shiou-Chi; Huang, Jen-Wei |
SPLGAN-TTS:Learning Semantic and Prosody to Enhance the Text-to-Speech Quality of Lightweight GAN Models |
379 |
Li, Yizhou; Liu, Zihua; Monno, Yusuke; Okutomi, Masatoshi |
TDM: Temporally-Consistent Diffusion Model for All-in-One Real-World Video Restoration |
385 |
Lu, Lingyi; Xu, Xin; Wang, Xiao |
Style Separation and Content Recovery for Generalizable Sketch Re-identification and A New Benchmark |
393 |
Zhang, Zhengzhuo; Zhuang, Liansheng |
Progressive Neural Architecture Generation with Weaker Predictors |
395 |
Goto, Yuta ; Yamazaki, Satoshi ; Shibata, Takashi ; Liu, Jianquan |
Open-vocabulary Scene Graph Generation via Synonym-based Predicate Descriptor |
415 |
Xu, Xiaoman; Li, Xiangrun; Wang, Taihang; Jiang, Ye |
AMPLE: Emotion-Aware Multimodal Fusion Prompt Learning for Fake News Detection |
430 |
Li, Feng; Luo, Jiusong; Xia, Wanjun |
WavFusion: Towards wav2vec 2.0 Multimodal Speech Emotion Recognition |
451 |
Zhang, Zhihui ; Pang, Jinhui ; Li, Jianan ; Hao, Xiaoshuai |
ESC-MISR: Enhancing Spatial Correlations for Multi-Image Super-Resolution in Remote Sensing |
462 |
Huang, Zhongzhan; Liang, Mingfu; Liang, Senwei; Zhong, Shanshan |
Flat Local Minima for Continual learning on Semantic Segmentation |
214 |
Wei, Wei; Zhang, Bingkun; Wang, Yibing |
TACST: Time-Aware Transformer for Robust Speech Emotion Recognition |
215 |
Wei, Wei; Zhang, Bingkun; Wang, Yibing |
TS-MEFM: A New Multimodal Speech Emotion Recognition Network Based on Speech and Text Fusion |
288 |
Li, Su ; Wang, Liang ; Wang, Jianye ; Zhang, Ziheng ; Zhang, Junjun ; Zhang, Lei |
Enhanced Anomaly Detection in 3D Motion through Language-Inspired Occlusion-Aware Modeling |
411 |
Ghasemi, Narges ; Kim, Seon Ho ; Alfarrarjeh, Abdullah ; Shahabi, Cyrus |
Counting Unique Objects in Geo-Tagged Street Images: A Case Study Of Homeless Encampments in Los Angeles |