Abstract: Document Understanding (DU) in long-contextual scenarios with complex layouts remains a significant challenge in vision-language research. Although Large Vision-Language Models (LVLMs) excel ...
Abstract: Vibration sensing and infrared thermal image technology have been widely used in the health monitoring of machines. Multimodal fault diagnosis combining vibration and infrared thermal data ...