Investigating the consequences of accidentally grading CoT during RL
We found limited accidental CoT grading in some released models, fixed the affected reward pathways, and found no clear evidence that monitorability degraded.