Title:A invariant structure approach for media representation and recognition
Speaker:Dr. Yu QIAO (乔宇)
主 持 人:姚剑 教授 (武汉大学遥感信息工程学院)

This talk will be divided into two parts. In the first part, I will spend time to explain our recent work on structural representation of media, with speech as an example. One of the major challenging problems in speech engineering is to deal with non-linguistic variations contained in speech signals. These variations are caused by the difference of speakers, communication channels, environment noise, etc. Modern speech approaches mainly rely on statistical methods (such as GMM and HMM) to model the distributions of acoustic features. These methods always require a large amount of data for training. It is well-known that the performance of speech recognizers drops significantly if mismatch exists. We proposed an invariant structural representation of speech which aims at removing the non-linguistic factors from speech signals. Different from classical speech models, the structural representations make use of globally contrastive features to model the global and dynamic aspects of speech and discard the local and static features. It can be proved that these contrastive features (f-divergence) are invariant to any invertible transformations and thus are robust to non-linguistic variations. Experimental results on connected Japanese vowel utterances show that the structural approach achieves better recognition rates than HMM. In the second part, I will review several ongoing projects in Multimedia laboratory, Shenzhen Institutes of Advance Technology, including image retrieval, activity classification, 3D reconstruction, and face recognition.
乔宇, 副研究员,博士生导师。2006年于日本电气通信大学获得工学博士学位。2010年加入中国科学院深圳先进技术研究院任多媒体集成研究室执行主 任,2012年起担任先进院集成技术研究所副所长。回国前任东京大学电子信息系特任助理教授。获中国科学院“百人计划”择优支持,曾获卢嘉锡青年人才奖以 及多次国内国际会议学术奖励。主要研究兴趣包括图像处理、计算机视觉、语音识别、手写体分析和机器学习等。已在包括IEEE T-PAMI,IEEE T-IP,IEEE T-SP,CVPR等会议和期刊上发表论文80余篇,其中第一作者论文30余篇。