Our contribution is in three folds:
(1) We construct and release EditReward-Data, a large-scale (200K) preference dataset for image editing, distinguished by its high-quality manual annotations and diversity of sources.
(2) We train and release EditReward, a VLM-based reward model trained on EditReward-Data that demonstrates superior alignment with human preferences.
(3) We propose EditReward-Bench, a new benchmark featuring a more challenging multi-way preference ranking task that provides a more robust evaluation of reward models.
EditReward-Data, a large-scale (200K) preference dataset for image editing, distinguished by its high-quality manual annotations and diversity of sources.
We evaluate our approach on a suite of three established public benchmarks and our newly proposed benchmark, designed to provide a more comprehensive assessment of image editing quality.
To demonstrate EditReward's practical utility as a data supervisor, we conducted a data curation experiment designed to improve a state-of-the-art editing model. We employed our reward model to score the ~46,000 examples in the ShareGPT-4o-Image dataset, from which we selected a high-quality subset of the top 20,000 samples. This curated dataset was then used to fine-tune the powerful Step1X-Edit model.
@misc{wu2025editrewardhumanalignedrewardmodel,
title={EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing},
author={Keming Wu and Sicong Jiang and Max Ku and Ping Nie and Minghao Liu and Wenhu Chen},
year={2025},
eprint={2509.26346},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.26346},
}