Language Model Self-improvement by Reinforcement Learning Contemplation
Published:
Recommended citation: Jing-Cheng Pang, Peng-Yuan Wang, Kaiyuan Li, Xiong-Hui Chen, Jiacheng Xu, Zongzhang Zhang and Yang Yu. Language Model Self-improvement by Reinforcement Learning Contemplation. ICLR 2024.