We present OMNI-EDIT, which is an omnipotent editor to handle seven different image editing tasks with any aspect ratio seamlessly. Our contribution is in two folds:
(1) OMNI-EDIT is trained by utilizing the supervision from seven different specialist models to ensure task coverage.
(2) we utilize importance sampling based on the scores provided by large multimodal models (like GPT-4o) instead of CLIP-score to improve the data quality.
We synthesize the large scale dataset through specialist distillation and cost-efficient LMM filtering. Our released version contains 1.2M pairs covering seven different skills like addition, swaping, removal, attribute modification, background change, environment change and sytle transfer. The dataset has been filtered with VIEScore.
@article{wei2024omniedit,
title={OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision},
author={Wei, Cong and Xiong, Zheyang and Ren, Weiming and Du, Xinrun and Zhang, Ge and Chen, Wenhu},
journal={arXiv preprint arXiv:2411.07199},
year={2024}
}