Abstract: Open-vocabulary semantic segmentation aims to assign pixel-level labels using arbitrary text queries, but existing CLIP-based methods often produce diffuse similarity maps and struggle with ...
Abstract: Visual dialog aims to facilitate the answering of multi-round questions by effectively integrating dialog history and the relevant content of images. Existing methods in visual dialog ...