E-FineR is a training-free, fully automated framework for vocabulary-free fine-grained visual recognition. This repository accompanies the research paper: Vocabulary-free Fine-grained Visual ...
Abstract: Grounded Multimodal Named Entity Recognition (GMNER) aims to extract named entities, their types, and corresponding visual objects from image-text pairs. However, existing GMNER methods rely ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results