Foundation Models as Data Engines:Label-Efficient Learning in Modern Computer Vision
Published in AI Revolution: Research, Ethics and Society, 2026
The field of computer vision is undergoing a fundamental transition to a paradigm in which foundation models serve as autonomous data engines. Historically, the “data hunger “ of deep learning created a significant bottleneck, with manual annotation consuming nearly 80% of project timelines and imposing unsustainable costs. This paper defines Data Engine as an iterative system that automatically curates, labels, and verifies its own training data to improve itself or its associated student models. We present a supervision-centered taxonomy that traces the evolution of vision learning from the manual supervised era to the Autonomous Frontier, where AI verifiers and self-improving loops increasingly replace direct human labor. The survey analyzes three converging developments: automated curation and labeling, in which models such as SAM, DINO, and CLIP function as generative engines for large-scale dataset construction; knowledge agglomeration, where semantic and geometric intelligence from massive “teacher” models is consolidated into efficient, edge-ready “student” models through distillation; and domain-specific adaptation, highlighting the rise of vertical engines in label-scarce fields such as medical imaging, remote sensing, and cellular biology. Finally, we outline a roadmap toward Artificial Super-Intelligence loops, emphasizing a shift from model-centric design toward supervision-centric system engineering. This paradigm promises to democratize visual intelligence by enabling self-evolving AI systems.
Recommended citation: Darbandi, Mohammad Reza, et al. "Foundation Models as Data Engines:Label-Efficient Learning in Modern Computer Vision. " International Conference on the AI Revolution. Cham: Springer Nature Switzerland, 2026
Download Paper
