Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate

♠️Yubo Wang, Xiang Yue, ♠️Wenhu Chen
♠️University of Waterloo, Carnegie Mellon University
yubo.wang.sunny@gmail.com, wenhuchen@uwaterloo.ca

Abstract

Supervised Fine-Tuning (SFT) is commonly used to train language models to imitate annotated responses for given instructions. In this paper, we challenge this paradigm and propose Critique Fine-Tuning (CFT), a strategy where models learn to critique noisy responses rather than simply imitate correct ones. Inspired by human learning processes that emphasize critical thinking, CFT encourages deeper analysis and nuanced understanding-traits often overlooked by standard SFT. To validate the effectiveness of CFT, we construct a 50K-sample dataset from WebInstruct, using GPT-4o as the teacher to generate critiques in the form of (input=[query; noisy response], output=critique). CFT on this dataset yields a consistent 4–10% improvement over SFT on six math benchmarks with different base models like Qwen2.5, Qwen2.5-Math and DeepSeek-Math. We further expand to MetaMath and NuminaMath datasets and observe similar gains over SFT. Notably, our Qwen2.5-Math-CFT model-trained on just 50K samples-matches or outperforms competitive models such as AceMath and Qwen2.5-Math-Instruct on most benchmarks, both of which use over 2M samples. Ablation studies show that CFT is robust to the source of noisy response and teacher critique model. Through these findings, we argue that critique-based training offers a more effective alternative to advance the reasoning of language models.

CFT

Figure 1: Comparison between CFT and SFT on 50K samples from WebInstruct). SFT-verified means SFT training on the responses validated by GPT-4o, SFT-GPT4o means SFT training on the responses from GPT-4o. CFT is our approach, which trains on the critique provided by GPT-4o

CFT vs. SFT

To evaluate the effectiveness of CFT, we compare it with various SFT methods on three 7B-scale base models using mathematical reasoning benchmarks. All the experiments are trained with WebInstruct subset.

main_result

Qwen2.5-Math-7B-CFT vs. Other Competitive Models

We compare our best 7B CFT-models with other competitive models with different scales. We expanded the evaluation benchmarks to cover broader STEM topics.

vs_other_reasoning_models

Training Dynamics Comparison

Training dynamics comparison of different methods on Qwen2.5-Math-7B across key mathematical reasoning benchmarks. We compare CFT with two SFT variants: SFT-V (trained on WebInstruct-verified) and SFT-G (trained on WebInstruct-GPT4o). The x-axis represents training steps, and the y-axis shows the accuracy on each benchmark. CFT demonstrates consistently better performance and faster convergence across most tasks

training_dynamics

Comparison with RL-based Method

We compare the efficiency and performance of different RL-based methods for mathematical reasoning. Key findings: (1) Our method achieves similar performance with 144x less computation cost, (2) Shows strong performance on rigorous math tests like Minerva-Math and AMC23.

simple_rl_compare

Case Studies

Here we showcase detailed examples comparing the performance between three models: CFT model (Qwen2.5-Math-7B-CFT), the original model (Qwen2.5-Math-7B), and SFT model. These cases demonstrate how different training approaches affect mathematical reasoning abilities.

Case Study 1: Sequential Fraction Operations

Problem:

Sam is carrying a 2 gallon bucket of milk to his house from the barn, which is 3 miles away from his house. However, the bucket has a leak in it. For each mile he walks, there is $\frac{2}{3}$ as much milk in the bucket as at the beginning of the mile. How many gallons of milk will be in the bucket when Sam gets home?

CFT Model Solution

Step-by-step solution:

  1. Initial Amount: 2 gallons
  2. After First Mile:
    $2 \cdot \frac{2}{3} = \frac{4}{3}$ gallons
  3. After Second Mile:
    $\frac{4}{3} \cdot \frac{2}{3} = \frac{8}{9}$ gallons
  4. After Third Mile:
    $\frac{8}{9} \cdot \frac{2}{3} = \frac{16}{27}$ gallons

Final Answer: $\boxed{\frac{16}{27}}$ gallons (Correct)

Original Model Solution

Step-by-step reasoning:

The model attempts to solve using sequential calculations:

  1. Initially, the bucket contains 2 gallons of milk.
  2. After the first mile: $2 \times \frac{2}{3}$
  3. After the second mile: $2 \times (\frac{2}{3})^2$
  4. After the third mile: $2 \times (\frac{2}{3})^3$

However, makes calculation errors in the final steps.

Final Answer: $\boxed{\frac{8}{27}}$ gallons (Incorrect)

Key Observations:

  • The CFT model maintains precise step-by-step calculations, showing each intermediate fraction.
  • The original model has the right approach but makes calculation errors in the final steps.
  • CFT demonstrates better ability to track and compute sequential fraction multiplications.

Case Study 2: Geometric Area Problem

Problem:

As shown, a square is constructed on the hypotenuse of a right triangle whose legs have lengths 3 units and 4 units. What is the area of the pentagon $ABCDE$, in square units?

[Note: The figure shows a pentagon formed by a right triangle with legs 3 and 4 units, with a square constructed on its hypotenuse]

CFT Model Solution

Step-by-step solution:

  1. Calculate hypotenuse:
    Using Pythagorean theorem: $\sqrt{3^2 + 4^2} = 5$ units
  2. Calculate square area:
    Area = $5^2 = 25$ square units
  3. Calculate triangle area:
    Area = $\frac{1}{2} × 3 × 4 = 6$ square units
  4. Total pentagon area:
    Area = Square + Triangle = $25 + 6 = 31$ square units

Final Answer: $\boxed{31}$ square units (Correct)

SFT Model Solution

Step-by-step solution:

  1. Calculate hypotenuse:
    Using Pythagorean theorem: $\sqrt{3^2 + 4^2} = 5$ units
  2. Calculate triangle area:
    Area = $\frac{1}{2} × 3 × 4 = 6$ square units
  3. Calculate square area:
    Area = $5^2 = 25$ square units
  4. Critical Error:
    Incorrectly subtracts triangle from square instead of adding

Final Answer: $\boxed{19}$ square units (Incorrect)

Key Observations:

  • While both models correctly apply the Pythagorean theorem, their approaches to combining areas differ significantly
  • CFT model correctly recognizes that the pentagon is the sum of the square and triangle areas
  • SFT model makes a critical conceptual error by subtracting instead of adding the areas

Case Study 3: Worker Efficiency Problem

Problem:

Each good worker can paint my new house alone in 12 hours. Each bad worker can paint my house alone in 36 hours. I need my house painted in 3 hours. If I can only find 3 good workers, how many bad workers must I also find in order to have my house painted on time?

CFT Model Solution

Step-by-step solution:

  1. Good worker rate:
    Each good worker: $\frac{1}{12}$ house/hour
    3 good workers: $3 \times \frac{1}{12} = \frac{1}{4}$ house/hour
  2. Time available:
    3 hours total
  3. Good workers' contribution:
    In 3 hours: $3 \times \frac{1}{4} = \frac{3}{4}$ of house
  4. Remaining work for bad workers:
    $1 - \frac{3}{4} = \frac{1}{4}$ of house
  5. Bad worker contribution:
    Each bad worker in 3 hours: $3 \times \frac{1}{36} = \frac{1}{12}$ of house
  6. Number of bad workers needed:
    $\frac{1/4}{1/12} = 3$ workers

Final Answer: $\boxed{3}$ bad workers (Correct)

SFT Model Solution

Analysis of approach:

  1. Started with correct work rates:
    • Good worker: $\frac{1}{12}$ house/hour
    • Bad worker: $\frac{1}{36}$ house/hour
  2. Made several critical errors:
    • Incorrectly interpreted the remaining work
    • Mixed up rate equations
    • Introduced unrelated concepts (tank filling)
  3. Arrived at incorrect conclusion through faulty reasoning

Final Answer: $\boxed{12}$ bad workers (Incorrect)

Key Observations:

  • CFT model maintains clear focus on the problem's core concept of work rates and time
  • SFT model shows significant confusion in problem interpretation and solution approach
  • CFT demonstrates superior ability in handling multistep rate problems with different worker types

Reference

Please kindly cite our paper if you use our code or results:
@misc{wang2025critiquefinetuninglearningcritique,
      title={Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate},
      author={Yubo Wang and Xiang Yue and Wenhu Chen},
      year={2025},
      eprint={2501.17703},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.17703},
}