[DAI] Language Model Self-improvement by Reinforcement Learning Contemplation
Published:
Recommended citation: Jing-Cheng Pang, Pengyuan Wang, Nan Tang, Kaiyuan Li, Xionghui Chen, Jiacheng Xu, Zongzhang Zhang and Yang Yu. Language Model Self-improvement by Reinforcement Learning Contemplation. In: DAI (Poster Paper Track), 2023. /files/pdf/dai23_rlc.pdf