Abstract: Zero-shot image captioning can harness the knowledge of pre-trained visual language models (VLMs) and language models (LMs) to generate captions for target domain images without paired ...
Abstract: Programming based approaches to reasoning tasks have substantially expanded the types of questions models can answer about visual scenes. Yet on benchmark visual reasoning data, when models ...
For decades, my creative journey as a jazz pianist and composer has been deeply intertwined with visual art. Each piece I create is rooted in the rhythms, harmonies, and improvisational spirit of jazz ...