Language Model Self-improvement by Reinforcement Learning Contemplation

Published:

Recommended citation: Jing-Cheng Pang, Kaiyuan Li, Peng-Yuan Wang, Xiong-Hui Chen, Jiacheng Xu, Zongzhang Zhang and Yang Yu. Language Model Self-improvement by Reinforcement Learning Contemplation without External Supervision. Submitted to Journal of Artificial Intelligence Research (JAIR). /files/pdf/jair_rlc.pdf