ByteDance's latest Seedance 2.0 release produces hyper-real outputs from fairly simple text and image outputs that blur the line between real and AI generated.
Abstract: Audio-visual segmentation (AVS) is a challenging multimodal task that needs to fuse the spatial-temporal audio-visual features to achieve pixel-wise segmentation of sounding objects. This ...
Abstract: Multimodal Emotion Recognition in Conversation (MERC) is an important element in human-machine interaction. It allows machines to automatically identify and track the emotional status of ...