Poster Sessions
To Presenters
- Please set up your poster after 1:00 PM and before your poster session starts.
Day 1: 8 January 14:00 – 15:30
Poster ID | Paper ID | Paper Title | Authors |
---|---|---|---|
PS1-1 | 120 | Quantized-ViT Efficient Training via Fisher Matrix Regularization | Shang, Yuzhang; Liu, Gaowen; Kompella, Ramana; Yan, Yan |
PS1-2 | 121 | Saliency based data augmentation for few-shot video action recognition | Kong, Yongqiang; Wang, Yunhong; Li, Annan |
PS1-3 | 128 | Hybrid Scalable Video Coding with Neural Compression and Enhancement for Streaming Media | Ye, Yuyao; Yang, Jiayu; Zhao, Yang; Gao, Mengping; Cao, Hongbin; Wang, Ronggang |
PS1-4 | 130 | Pubic Symphysis-Fetal Head Segmentation Network Using BiFormer Attention Mechanism and Multipath Dilated Convolution | Cai, Pengzhou; Jiang, Lu; Li, Yanxin; Liu, Xiaojuan; Lan, Libin |
PS1-5 | 131 | DART: Depth-Enhanced Accurate and Real-Time Background Matting | Li, Guofeng; Li, Hanxi; Li, Bo; Wu, Lin; Cheng, Yan |
PS1-6 | 141 | MLP-AMDC: A MLP Architecture for Adaptive-Mask-based Dual-Camera snapshot hyperspectral imaging | Cai, Zeyu; Chen, Xunhao; Zhang, Can; Chen, yuchong; Yang, Jiming; Shi, Wubin; Jin, Chengqian; Da, Feipeng |
PS1-7 | 144 | Kiite World: Socializing Map-Based Music Exploration Through Playlist Sharing and Synchronized Listening | Tsukuda, Kosetsu; Takahashi, Takumi; Ishida, Keisuke; Hamasaki, Masahiro; Goto, Masataka |
PS1-8 | 146 | Enhancing Environmental Monitoring through Multispectral Imaging: The WasteMS Dataset for Semantic Segmentation of Lakeside Waste | Zhu, Qinfeng; Weng, Ningxin; Fan, Lei; Cai, Yuanzhi |
PS1-9 | 158 | Frequency-aware Convolution for Sound Event Detection | Song, Tao; Zhang, Wenwen |
PS1-10 | 163 | MSD-YOLO : An efficient algorithm for small target detection | Liu, Dongyu; Zhu, Yuan; liu, rui; Xing, Zhecong; Geng, Weiyang; Wang, Yanqiang |
PS1-11 | 166 | Robust Active Speaker Detection in Challenging Environments Using GNN-Fused Multi-Modal Cues and Body Language | Li, Yongqian; Luo, Yong; Zhou, Xin |
PS1-12 | 172 | Intra-Class Compact Facial Expression Recognition Based on Amplitude Phase Separation | Tian, Xiang; Zhang, Yuan; Mu, Chang; Zhang, Ziyang |
PS1-13 | 176 | PA2Net: Pyramid Attention Aggregation Network for Saliency detection | Yu, Jizhe; Liu, Yu; Wu, Xiaoshuai; Xu, Kaiping; Li, Jiangquan |
PS1-14 | 188 | LIESA: Low-light Image Enhancement with Semantic Awareness | Zhang, Jingyao; Hao, Shijie; Sun, Fuming Sun; Rao, Yuan |
PS1-15 | 195 | Deep Dual Internal Learning for Hyperspectral Image Super-Resolution | Sun, Yongqing; Liu, Hong; Chang, Qiong; Han, Xianhua |
PS1-16 | 198 | Zero-shot sketch-based image retrieval with hybrid information fusion and sample relationship modeling | Wu, Weijie; Li, Jun; Wu, Zhijian; Xu, Jianhua |
PS1-17 | 206 | The Right to an Explanation under the GDPR and the AI Act | Juliussen, Bjørn Aslak |
PS1-18 | 221 | Improving singing voice transcription generalization with AI generated accompaniments | Perez, Miguel; Kirchhoff, Holger; Grosche, Peter; Serra, Xavier |
PS1-19 | 228 | LITA: LMM-guided Image-Text Alignment for Art Assessment | Sunada, Tatsumi; Shiohara, Kaede; Xiao, Ling; Yamasaki, Toshihiko |
PS1-20 | 229 | Towards Inclusive Education: Multimodal Classification of Textbook Images for Accessibility | Yadav, Saumya; Lincker, Élise; Huron, Caroline; Martin, Stéphanie; Guinaudeau, Camille; Satoh, Shin’ichi; Shukla, Jainendra |
PS1-21 | 296 | GWUNet: A UNet with Gated Attention and Improved Wavelet Transform for Thyroid Nodules Segmentation | Zheng, Shuijing; Yu, Suxi; Wang, Yi; Wen, Jing |
PS1-22 | 111 | SCLSTE: Semi-Supervised Contrastive Learning-Guided Scene Text Editing | Yin, Min; Xie, Liang; Liang, HaoRan; Zhao, Xing; Chen, Ben; Liang, RongHua |
Day 2: 9 January 13:30 – 15:00
Poster ID | Paper ID | Paper Title | Authors |
---|---|---|---|
PS2-1 | 192 | Comparative Analysis of Relevance Feedback Techniques for Image Retrieval | Vadicamo, Lucia; Scotti, Francesca; Dearle, Alan; Connor, Richard |
PS2-2 | 241 | Understanding the Roles of Visual Modality in Multimodal Dialogue: An Empirical Study | Cao, Qian; Song, Ruihua; Chen, Xu |
PS2-3 | 242 | DistillSleep: Leverage Self-Distillation to Improve Performance After Representation Learning for Sleep Staging | Yu, Le; Zhang, Xianchao; Qian, Shuxia; Sun, Hong |
PS2-4 | 246 | Temporal Closeness for Enhanced Cross-Modal Retrieval of Sensor and Image Data | Yamamoto, Shuhei; Kando, Noriko |
PS2-5 | 247 | An Analytical Method for Rendering Plenoptic Cameras 2.0 on 3D Multi-Layer Displays | Losfeld, Armand; Seznec, Nicolas; Van Bogaert, Laurie; Lafruit, Gauthier; Teratani, Mehrdad |
PS2-6 | 251 | QRALadder: QoE and Resource Consumption-Aware Encoding Ladder Optimization for Live Video Streaming | Zhu, Yingqian; Gao, Guanyu |
PS2-7 | 256 | Boosting Human Pose Estimation via Heatmap Refinement | Jiang, Ling; Liu, Zhuocheng; Li, Kaige; Wu, Wei |
PS2-8 | 265 | FoodMLLM-JP: Leveraging Multimodal Large Language Models for Japanese Recipe Generation | Imajuku, Yuki; Yamakata, Yoko; Aizawa, Kiyoharu |
PS2-9 | 283 | LLMs-based Augmentation for Domain Adaptation in Long-tailed Food Datasets | Wang, Qing; Ngo, Chong Wah; Lim, Ee-Peng; Sun, Qianru |
PS2-10 | 292 | Music2MIDI: Pop Music to MIDI Piano Cover Generation | Yip, Tin Yui; Chau, Chuck-jee |
PS2-11 | 293 | Balancing Efficiency and Accuracy: An Analysis of Sampling for Video Copy Detection | Chen, Xiangyu; Satoh, Shinichi |
PS2-12 | 295 | One-Shot Generative Domain Adaptation by Constructing Self-Amplifying Datasets | Xiang, Yanru; Li, Yi |
PS2-13 | 306 | Visual Anomaly Detection on Topological Connectivity under Improved YOLOv8 | Li, Yu; Xie, Zhenping |
PS2-14 | 315 | HierArtEx: Hierarchical Representations and Art Experts Supporting the Retrieval of Museums in the Metaverse | Falcon, Alex; Abdari, Ali; Serra, Giuseppe |
PS2-15 | 317 | DocMamba: Robust Document Image Dewarping via Selective State Space Sequence Modeling | Han, Miaolin; Li, Huibin |
PS2-16 | 326 | Real-Time Action Detection in Volleyball Matches Using DETR Architecture | shih, Mu-Jan; Hsu, Yi-Yu |
PS2-17 | 332 | Select and Order: Enhancing Few-Shot Image Classification through In-Context Learning | Huang, Hujiang; Xie, Yu; Gao, Jun; Fan, Chuanliu; Cao, Ziqiang |
PS2-18 | 336 | SMG-Diff: Adversarial Attack Method Based on Semantic Mask-Guided Diffusion | Zhang, Yongliang; Liu, Jing |
PS2-19 | 344 | Dual-Task Feedback Learning for Tongue Detection via Super-Resolution Integration | Sun, Ying; Wei, Meiyi; Chen, Gang |
PS2-20 | 354 | Towards Visual Storytelling by Understanding Narrative Context through Scene-Graphs | Phueaksri, Itthisak; Kastner, Marc A.; Kawanishi, Yasutomo; Komamizu, Takahiro; Ide, Ichiro |
PS2-21 | 456 | AMFT-YOLO: A Adaptive Multi-Scale YOLO Algorithm with Multi-Level Feature Fusion for Object Detection in UAV Scenes | wang, tiebiao; li, xiaoyang; cui, zhenchao |
PS2-22 | 276 | Lightweight Dual Grouped Large-Kernel Convolutions for Salient Object Detection Network | Liu, Jiajie; Zhang, Zhibin |
PS2-23 | 312 | Modeling High-order Relationships between Human and Video for Emotion Recognition | Ai, Hanxu; Tao, Xiaomei; Li, Xingbing; Gan, Yanling |
DP | 117 | EIA: Edge-aware Imperceptible Adversarial Attacks on 3D Point Clouds | Wang, Zhensu; Peng, Weilong; Wang, Le; Wu, Zhizhe; Zhu, Peican; Tang, Keke |
DP | 127 | MKSNet: Advanced Small Object Detection in Remote Sensing Imagery with Multi-Kernel and Dual Attention Mechanisms | Zhang, Jiahao; Gao, Guangyu; Zhao, Xiao |
DP | 140 | Infrared Small Target Detection with Feature Refinement and Context Enhancement | Li, Xiuhong; Zhu, Xinyue; Li, Boyuan; Li, Songlin; Wang, Luyao; Jia, Zhenhong |
DP | 173 | Modality-Specific Hashing: Transform Cross-Modal Retrieval into Single-Modal Retrieval | Ding, Guohui; Li, Zhonghua; Ren, Yongqiang |
DP | 178 | Multimodal Prompt Learning for Audio Visual Scene-aware Dialog | Xu, Feifei; Jia, Fumiaoyue; Zhou, Wang |
DP | 182 | MSA-Former: Multi-Scale Adaptive Transformer for Image Snow Removal | Wang, Bin; Chen, Zekun; Zhang, Lei; Liang, Shili; Guo, Sijia; Kang, Xinyu; Li, Huajing |
DP | 184 | SES-Net: Multi-dimensional Spot-Edge-Surface Network for Nuclei Segmentation | Lu, Congjian; Zhou, Shuwang; Shan, Ke; Zhang, Hongkuan; Liu, Zhaoyang |
DP | 189 | PianoPal: A Robotic Multimedia System for Interactive Piano Instruction Based on Q-learning and Real-time Feedback | Wang, Yufei; Yao, Junfeng; Wang, Zefeng |
DP | 199 | CLIP Multi-modal Hashing for Multimedia Retrieval | Zhu, Jian; Sheng, Mingkai; Huang, Zhangmin; Chang, Jingfei; Long, Jian; Jiang, Jinling; Liu, Lei; Luo, Cheng |
DP | 223 | Integrating S1&S2 Framework for Enhanced Semantic Match in Person Re-identification | Yang, Xiukang; Ge, Jingguo; Li, Hui; Li, Liangxiong; Wu, Bingzhen |
DP | 237 | Hyper-NeuS:Hypernetworks for Neural SDF Implicit Surface Reconstruction by Volume Rendering | Li, Jingkun; Qi, Na; Zhu, Qing |
DP | 253 | Structural Information-guided Fine-grained Texture Image Inpainting | Fang, Zhiyi; Qian, Yi; Dai, Xiyue |
DP | 272 | GFA-UDIS: Global-to-Flow Alignment for Unsupervised Deep Image Stitching | Han, Sijia; Zhang, Zhibin |
DP | 275 | Joint Decision Network with Modality-Specific and Dual Interactive Features for Fake News Detection | Wu, Fei; Zhou, Ruixuan; Ji, Yimu; Jing, Xiao-Yuan |
DP | 277 | MS-SAM:Multi-Scale SAM based on Dynamic Weighted Agent Attention | Yang, Enhui; Zhang, Zhibin |
DP | 281 | Multi-Modal Information Multi-Angle Mining For Multimedia Recommendation | ZHU, YIJIE; Li, MingYong |
DP | 305 | MambaTalk: Speech-driven 3D Facial Animation with Mamba | Zhu, Deli; Xu, Zhao; Yang*, Yunong |
Day 3: 10 January 13:30 – 15:00
Poster ID | Paper ID | Paper Title | Authors |
---|---|---|---|
PS3-1 | 356 | Rotation Methods for 360-degree Videos in Virtual Reality - A Comparative Study | Hürst, Wolfgang; Zeches, Leo |
PS3-2 | 360 | Camouflaged Object Detection Based on Localization Guidance and Multi-Scale Refinement | Wang, JinYang; Wu, Wei |
PS3-3 | 362 | Poseidon: A NAS-Based Ensemble Defense Method against Multiple Perturbations | Su, Yulan; Zhang, Sisi; Lin, Zechao; Wang, Xingbin; Zhao, Lutan; Meng, Dan; Hou, Rui |
PS3-4 | 363 | MM-CARP: Multimodal Model with Cross-modal retrieval-Augmented and visual Region Perception | Guo, Junhao; Fu, Chenhan; Wang, Guoming; Lu, Rongxing; Chen, Dong; Tang, Siliang |
PS3-5 | 365 | Revisit Data Association in Semantic SLAM Systems for Autonomous Parking | Shao, Xuan; Huang, Leming; Liu, Xinghua |
PS3-6 | 368 | Lightweight Motion-Aware Video Super-Resolution for Compressed Videos | KWON, ILHWAN; Li, Jun; Shah, Rajiv Ratn; Prasad, Mukesh |
PS3-7 | 373 | Vision-Language Pretraining for Variable-shot Image Classification | Papadopoulos, Sotirios; Ioannidis, Konstantinos; Vrochidis, Stefanos; Kompatsiaris, Ioannis; Patras, Ioannis |
PS3-8 | 377 | A Multi-Aspect Multi-Granularity Pronunciation Assessment Method Based on Branchformer Encoder and Hierarchical Aggregation | Du, Wenxu; Wumaier, Aishan; Shi, Yahui; Yi, Nian; Liu, Dehua |
PS3-9 | 386 | SCANet: Semantic Coherence Attention Network for Clothing Change Person Re-identification | Yang, Dajiang; Wu, Wei; Lee, Yuxing |
PS3-10 | 417 | Toward A Full Pipeline Approach to Autonomous Drone Landing Site Identification: From Terrain Survey to Embedded Classifier | Springer, Joshua David; Guðmundsson, Gylfi Þór; Kyas, Marcel |
PS3-11 | 429 | Innovative Lifelog Visualization and Exploration in Virtual Reality - A Comparative Study | Hürst, Wolfgang; Visser, Yannick |
PS3-12 | 435 | Synchronization and Calibration of Video Sequences acquired using Multiple Plenoptic 2.0 Cameras | Bonatto, Daniele; Fernandes Pinto Fachada, Sarah; Sancho, Jaime; Juarez, Eduardo; Lafruit, Gauthier; Teratani, Mehrdad |
PS3-13 | 444 | A Dual-Branch Model for Color Constancy | Chen, Zhaoxin; Ma, Bo |
PS3-14 | 445 | Data-free Functional Projection of Large Language Models onto Social Media Tagging Domain | Mu, Wenchuan; Lim, Kwan Hui |
PS3-15 | 455 | MDT-Net: a mask decoder tuning strategy for CLIP-based zero-shot 3D Classification | Yan, Hao; Bai, Jing |
PS3-16 | 458 | Optimally Planning Drone Trajectory to Capture a 3D Gaussian Splatting Object | Wu, Cheng-Yuan; Sun, Yuan-Chun; Lee, Cheng-Tse; Hsu, Cheng-Hsin |
PS3-17 | 230 | Quantifying Image-Adjective Associations by Leveraging Large-Scale Pretrained Models | Matsuhira, Chihaya; Kastner, Marc A.; Komamizu, Takahiro; Hirayama, Takatsugu; Ide, Ichiro |
PS3-18 | 137 | Can masking background and object reduce static bias for zero-shot action recognition? | Fukuzawa, Takumi; Hara, Kensho; Kataoka, Hirokatsu; Tamaki, Toru |
PS3-19 | 355 | CalorieVoL: Integrating Volumetric Context into Multimodal Large Language Models for Image-based Calorie Estimation | Tanabe, Hikaru; Yanai, Keiji |
PS3-20 | 416 | Multimodal Engagement Prediction in Human-Robot Interaction using Transformer Neural Networks | Lim, Jia Yap; See, John; Dondrup, Christian |
PS3-21 | 431 | What Should Autonomous Robots Verbalize and What Should They Not? | Yoshihara, Daichi; Yuguchi, Akishige; Kawano, Seiya; Iio, Takamasa; Yoshino, Koichiro |
PS3-22 | 438 | BiCA-YOLO: Bidirectional Feature Enhancement and Cross Coordinate Attention for Small Object Detection | Lv, Jinyan; Xiao, Guoqiang |
DP | 307 | Frequency-Based Unsupervised Low-Light Image Enhancement Framework | Wang, Haodian |
DP | 309 | Target-Oriented Dynamic Denosing Curriculum Learning for Multimodel Stance Detection | Suo, Zihao; Pan, Shanliang |
DP | 316 | Noise-robust Separating Multi-source Aliased Vibration Signal Based on Transformer Demucs | Jiang, Wanchang; Jiang, Yuxin |
DP | 321 | gFlow: Distributed Real-Time Reverse Remote Rendering System Model | Xu, Yixiao; Li, Yubo; Xu, Wanzhao; Gu, Yicheng; Wang, Yun; Ma, Jiangyuan; Qi, Zhengwei |
DP | 331 | BLCC: A Benchmark for Multi-LiDAR and Multi-Camera Calibration | Minghui, Hou; Gang, Wang; Zhiyang, Wang; Tongzhou, Zhang; Baorui, Ma |
DP | 342 | MC-YOLO: Multi-scale Transmission Line Defect Target Recognition Network | Wang, Jingdong; Ding, XU; Meng, Fanqi |
DP | 350 | A Novel Human Abnormal Posture Detection Method Based on Spatial-Topological Feature Fusion of Skeleton | Ma, Yuefeng; Cheng, Zhiqi; Liu, Deheng; Tang, Shiying |
DP | 359 | SSCDUF: Spatial-Spectral Correlation Transformer Based on Deep Unfolding Framework for Hyperspectral Image Reconstruction | Zhao, Hui; Qi, Na; Zhu, Qing; Lin, Xiumin |
DP | 383 | Cross-View Geo-Localization via Learning Correspondence Semantic Similarity Knowledge | Chen, Guanli; Huang, Guoheng; Yuan, Xiaochen; Chen, Xuhang; Zhong, Guo; Pun, Chi-Man |
DP | 385 | Style Separation and Content Recovery for Generalizable Sketch Re-identification and A New Benchmark | Lu, Lingyi; Xu, Xin; Wang, Xiao |
DP | 387 | Chain of Thought Guided Few-shot Fine-tuning of LLMs for Multimodal Aspect-based Sentiment Classification | Wu, Hao; Yang, Danping; Liu, Peng; Li, Xianxian |
DP | 393 | Progressive Neural Architecture Generation with Weaker Predictors | Zhang, Zhengzhuo; Zhuang, Liansheng |
DP | 420 | Self-Supervised Reference-based Image Super-Resolution with Conditional Diffusion Model | shi, shuai; Qi, Na; Li, Yezi; Zhu, Qing |
DP | 447 | TPS-YOLO: The Efficient Tiny Person Detection Network Based on Improved YOLOv8 and Model Pruning | Yao, Li; Huang, Qianni; Wan, Yan |
DP | 460 | MICAN: Multi-modal Inconsistency-based Cooperation Attention Network for fake news detection | Yi, Zepu; Lu, Songfeng; Tang, Xueming; Zhu, Jianxin; Wu, Junjun |
DP | 214 | TACST: Time-Aware Transformer for Robust Speech Emotion Recognition | Wei, Wei; Zhang, Bingkun; Wang, Yibing |
DP | 215 | TS-MEFM: A New Multimodal Speech Emotion Recognition Network Based on Speech and Text Fusion | Wei, Wei; Zhang, Bingkun; Wang, Yibing |