Vim Visual Model Example

ByteDance’s next-gen AI model can generate clips based on text, images, audio, and video

Big Tech’s race to leapfrog the latest AI models continues with the launch of ByteDance’s next-gen video generator. In a blog post, ByteDance – the China-based company behind TikTok – says Seedance ...

IEEE

Popeye: A Unified Visual-Language Model for Multisource Ship Detection From Remote Sensing Imagery

Abstract: Ship detection needs to identify ship locations from remote sensing scenes. Due to different imaging payloads, various appearances of ships, and complicated background interference from the ...

IEEE

Merging Context Clustering With Visual State Space Models for Medical Image Segmentation

Abstract: Medical image segmentation demands the aggregation of global and local feature representations, posing a challenge for current methodologies in handling both long-range and short-range ...

Frontiers

Vision-language models for zero-shot weed detection and visual reasoning in UAV-based precision agriculture

1 Khalifa University Center for Autonomous Robotic Systems, Khalifa University, Abu Dhabi, United Arab Emirates 2 College of Information Technology, United Arab Emirates University, Al-Ain, Abu Dhabi, ...

GitHub

High-level visual representations in the human brain are aligned with large language models

The human brain extracts complex information from visual inputs, including objects, their spatial and semantic interrelations, and their interactions with the environment. However, a quantitative ...

GitHub

[NeurIPS 2025] ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models

ChartMuseum is a chart question answering benchmark designed to evaluate reasoning capabilities of large vision-language models (LVLMs) over real-world chart images. The benchmark consists of 1162 ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results