Abstract: Audio-visual segmentation (AVS) is a challenging multimodal task that needs to fuse the spatial-temporal audio-visual features to achieve pixel-wise segmentation of sounding objects. This ...
Abstract: Multimodal Emotion Recognition in Conversation (MERC) is an important element in human-machine interaction. It allows machines to automatically identify and track the emotional status of ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results