Our contribution is in three folds:
(1) We construct and release EditReward-Data, a large-scale (200K) preference dataset for image editing, distinguished by its high-quality manual annotations and diversity of sources.
(2) We train and release EditReward, a VLM-based reward model trained on EditReward-Data that demonstrates superior alignment with human preferences.
(3) We propose EditReward-Bench, a new benchmark featuring a more challenging multi-way preference ranking task that provides a more robust evaluation of reward models.
EditReward-Data, a large-scale (200K) preference dataset for image editing, distinguished by its high-quality manual annotations and diversity of sources.
We evaluate our approach on a suite of three established public benchmarks and our newly proposed benchmark, designed to provide a more comprehensive assessment of image editing quality.
To demonstrate EditReward's practical utility as a data supervisor, we conducted a data curation experiment designed to improve a state-of-the-art editing model. We employed our reward model to score the ~46,000 examples in the ShareGPT-4o-Image dataset, from which we selected a high-quality subset of the top 20,000 samples. This curated dataset was then used to fine-tune the powerful Step1X-Edit model.
@article{wu2025editreward,
title={EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing},
author={Wu, Keming and Jiang, Sicong and Ku, Max and Nie, Ping and Liu, Minghao and Chen, Wenhu},
journal={arXiv preprint arXiv:2509.26346},
year={2025}
}