What Can GPT-4 Do for Zero-shot Visual Recognition?

4 min readDec 6, 2023

Since the launch of ChatGPT in Nov ’22, it has captivated the tech world, spurring a wave of investment in Generative AI. From the unveiling of the large multimodal model GPT-4 in Mar ’23 to the integration of GPT-4 with Vision into the ChatGPT platform in Sep ’23 🎯, it’s been a thrilling journey. To top it off, OpenAI hosted its first DevDay on ChatGPT’s anniversary, releasing the GPT-4V API 👩‍💻. This major milestone opens avenues for extensive academic evaluations 🏫.

The following work studies the capabilities of GPT-4 for the image recognition task. This blog introduces and summarizes the study.

GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?

This paper does not present a novel method. Instead, it delves into an essential, yet must-know baseline in light of…

arxiv.org

Underlying Task

The authors assess the capabilities of GPT4 for visual recognition (classification) by evaluating performance on 16 different benchmarks (images, videos and point cloud).

Videos and point clouds were converted into 2D images. For videos, they take a few frames from videos. For point clouds, they take views from multiple angles, following MVCNN.

What Can GPT-4 Do for Zero-shot Visual Recognition?

GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?

This paper does not present a novel method. Instead, it delves into an essential, yet must-know baseline in light of…

Underlying Task

Written by Ching (Chingis)

No responses yet