Abstract: Multimodal automatic speech recognition (ASR) technology has attracted much attention because it improves the accuracy of speech recognition by adding other modal information. However, most ...