Language Model Self-improvement by Reinforcement Learning Contemplation without External Supervision

Published:

Recommended citation: Jing-Cheng Pang, Kaiyuan Li, Peng-Yuan Wang, Xiong-Hui Chen, Jiacheng Xu, Zongzhang Zhang and Yang Yu. Language Model Self-improvement by Reinforcement Learning Contemplation without External Supervision. Journal of Artificial Intelligence Research (JAIR), to appear. /files/pdf/jair_rlc.pdf