Skip to main content
← HomeJacob Dineen
All publications

VisAnalog: A Diagnostic Suite for Visual Concept Transfer on Natural Images

Zhaonan Li, Kyle R. Chickering, Bangzheng Li, Jacob Dineen, Xiao Ye, Zhikun Xu, Shijie Lu, Yuxi Huang, Ming Shen, Bach Nguyen, Jaya Adithya Pavuluri, Mau Son Nguyen, Sanika Chavan, Ngoc Minh Thu Le, Muhao Chen, Ben Zhou

CVPR Workshop on Visual Concepts (VisCon), 2026 (Oral)

Abstract

A useful test of visual concept learning is not just whether a model can recognize a concept in a single image, but whether it can preserve and manipulate concept-level properties under transformation and transfer them to new scenes. We introduce VisAnalog, a controlled suite for this setting on natural images. Each example instantiates A:B::C:?, where images B and a hidden target image D are produced by applying the same deterministic transformation sequence to source images A and C. Given A, B, and C, a model must answer a multiple-choice question about D. The benchmark contains 617 human-validated questions spanning one- to four-step transformations such as zoom, quadrant swap, rotation, flip, and hue rotation. Across strong proprietary and open-source VLMs, end-to-end accuracy is substantially lower than oracle accuracy when D is directly shown, and degrades sharply as transformation depth increases, while human performance remains near the ceiling. A program-conditioned evaluation further separates failures of relation inference from failures of transformation application, showing that inferring the visual relation from A to B is the dominant bottleneck, with additional application errors emerging on harder multi-step cases.

Citation

@article{li2026visanalog,
  title={VisAnalog: A Diagnostic Suite for Visual Concept Transfer on Natural Images},
  author={Li, Zhaonan and Chickering, Kyle R and Li, Bangzheng and Dineen, Jacob and Ye, Xiao and Xu, Zhikun and Lu, Shijie and Huang, Yuxi and Shen, Ming and Nguyen, Bach and others},
  journal={arXiv preprint arXiv:2605.23141},
  year={2026}
}