• Jing-Cheng Pang, Kaiyuan Li, Pengyuan Wang, Xiong-Hui Chen, Jiacheng Xu, ZongZhang Zhang and Yang Yu. Language Model Self-improvement by Reinforcement Learning Contemplation without External Supervision. Submitted to Journal of Artificial Intelligence Research (JAIR).
Publications
Manuscripts
• Jing-Cheng Pang, Tian Xu, Shengyi Jiang, Yu-Ren Liu and Yang Yu. Reinforcement Learning With Sparse-Executing Actions via Sparsity Regularization. Submitted to IEEE Transactions on Neural Networks and Learning Systems (TNNLS).
Preprints
• Zhilong Zhang, Ruifeng Chen, Junyin Ye, Yihao Sun, Pengyuan Wang, Jing-Cheng Pang, Kaiyuan Li, Tianshuo Liu, Haoxin Lin, Yang Yu, Zhi-Hua Zhou. WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making. CoRR abs/2411.05619, 2024.
• Yuting Tang*, Xin-Qiang Cai*, Jing-Cheng Pang, Qiyu Wu, Yao-Xiang Ding and Masashi Sugiyama. Beyond Simple Sum of Delayed Rewards: Non-Markovian Reward Modeling for Reinforcement Learning. CoRR abs/2410.20176, 2024.
• Jing-Cheng Pang*, Heng-Bo Fan*, Pengyuan Wang*, Jia-Hao Xiao*, Nan Tang, Si-Hang Yang, Chengxing Jia, Sheng-Jun Huang and Yang Yu. Empowering Language Models with Active Inquiry for Deeper Understanding. CoRR abs/2402.03719, 2024.
• Rong-Jun Qin, Jing-Cheng Pang and Yang Yu. Improving Fictitious Play Reinforcement Learning with Expanding Models. CoRR abs/1907.01077, 2019.
Conference Papers
• Jing-Cheng Pang, Si-Hang Yang, Kaiyuan Li, Jiaji Zhang, Xiong-Hui Chen, Nan Tang and Yang Yu. Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts. In: NeurIPS, 2024.
• Jing-Cheng Pang*, Pengyuan Wang*, Kaiyuan Li, Xiong-Hui Chen, Jiacheng Xu, ZongZhang Zhang and Yang Yu. Language Model Self-improvement by Reinforcement Learning Contemplation. In: ICLR, 2024.
• Jing-Cheng Pang*, Pengyuan Wang*, Nan Tang, Kaiyuan Li, Xionghui Chen, Jiacheng Xu, Zongzhang Zhang and Yang Yu. Language Model Self-improvement by Reinforcement Learning Contemplation. In: DAI (Poster Paper Track), 2023.
• Jing-Cheng Pang*, Xinyu Yang*, Si-Hang Yang, Xiong-Hui Chen and Yang Yu. Natural Language Instruction-following with Task-related Language Development and Translation. In: NeurIPS, 2023.
• Jing-Cheng Pang*, Si-Hang Yang*, Xiong-Hui Chen, Xinyu Yang, Yang Yu, Mas Ma, Ziqi Guo, Howard Yang and Bill Huang. Object-Oriented Option Framework for Robotics Manipulation in Clutter. In: IROS (Oral presentation), 2023.
• Xu-Hui Liu*, Zhenghai Xue*, Jing-Cheng Pang, Shengyi Jiang, Feng Xu and Yang Yu. Regret Minimization Experience Replay in Off-Policy Reinforcement Learning. In: NeurIPS, 2021.
• Shengyi Jiang, Jing-Cheng Pang and Yang Yu. Offline imitation learning with a misspecified simulator. In: NeurIPS, 2020.
Journal Papers
• Chengxing Jia*, Fuxiang Zhang*, Tian Xu, Jing-Cheng Pang, Zongzhang Zhang and Yang Yu. Model Gradient: Unified Model and Policy Learning in Model-based Reinforcement Learning. Frontiers of Computer Science, 2024.