Abstract: Open-vocabulary semantic segmentation aims to assign pixel-level labels using arbitrary text queries, but existing CLIP-based methods often produce diffuse similarity maps and struggle with ...
Abstract: Visual dialog aims to facilitate the answering of multi-round questions by effectively integrating dialog history and the relevant content of images. Existing methods in visual dialog ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results