[article] a1200507-5ce1-4c12-98ad-9fffeab77032
AI Summary (English)
Title: The State of Post-Training in 2025
Summary:
This email summarizes a NeurIPS tutorial on post-training in language models, highlighting significant advancements since 2024. The author, Nathan Lambert, expresses increased optimism about open post-training methods, though acknowledging they still lag behind proprietary models like GPT-4. The tutorial categorizes post-training methods into instruction finetuning, preference finetuning, and reinforcement finetuning, emphasizing the growing importance and cost-effectiveness of post-training compared to pretraining. Data from ChatBotArena shows accelerated model progress despite relatively stable model sizes, suggesting post-training's impact.
Key Points:
1. 📈 Post-training's impact on model performance has significantly increased, becoming a crucial area for improving models, especially given limitations in scaling pretraining.
2. 💰 Post-training, while cheaper than pretraining, is becoming increasingly expensive. Estimated costs for LLaMA in Q1 2023 were around $1 million for a large academic project.
3. 🤖 Post-training is becoming less reliant on human data, with AI feedback offering a cost-effective alternative.
4. 🎯 Three main categories of post-training methods are now established: instruction finetuning, preference finetuning, and reinforcement finetuning.
5. 🏆 ChatBotArena Elo ratings demonstrate accelerated model progress due to post-training improvements, even without significant increases in model size.
6. 🤔 Post-training alone is insufficient for a complete understanding of training reasoning language models; it's a crucial step, but not the whole picture.
7. ⚖️ Concerns about violating terms of service when using foundation model outputs for research have decreased, with distillation from strong models becoming common practice.
Summary:
This email summarizes a NeurIPS tutorial on post-training in language models, highlighting significant advancements since 2024. The author, Nathan Lambert, expresses increased optimism about open post-training methods, though acknowledging they still lag behind proprietary models like GPT-4. The tutorial categorizes post-training methods into instruction finetuning, preference finetuning, and reinforcement finetuning, emphasizing the growing importance and cost-effectiveness of post-training compared to pretraining. Data from ChatBotArena shows accelerated model progress despite relatively stable model sizes, suggesting post-training's impact.
Key Points:
1. 📈 Post-training's impact on model performance has significantly increased, becoming a crucial area for improving models, especially given limitations in scaling pretraining.
2. 💰 Post-training, while cheaper than pretraining, is becoming increasingly expensive. Estimated costs for LLaMA in Q1 2023 were around $1 million for a large academic project.
3. 🤖 Post-training is becoming less reliant on human data, with AI feedback offering a cost-effective alternative.
4. 🎯 Three main categories of post-training methods are now established: instruction finetuning, preference finetuning, and reinforcement finetuning.
5. 🏆 ChatBotArena Elo ratings demonstrate accelerated model progress due to post-training improvements, even without significant increases in model size.
6. 🤔 Post-training alone is insufficient for a complete understanding of training reasoning language models; it's a crucial step, but not the whole picture.
7. ⚖️ Concerns about violating terms of service when using foundation model outputs for research have decreased, with distillation from strong models becoming common practice.
AI Summary (Chinese)
Title: 2025 年训练后模型状态
Summary:
本文总结了 NeurIPS 教程中关于语言模型训练后优化技术的介绍,重点介绍了自 2024 年以来的显著进展。作者 Nathan Lambert 对开源训练后优化方法的乐观情绪有所提升,尽管他承认这些方法仍落后于像 GPT-4 这样的专有模型。该教程将训练后优化方法分为指令微调、偏好微调和强化微调,强调了相较于预训练,训练后优化方法日益重要且成本效益更高。来自 ChatBotArena 的数据显示,尽管模型规模相对稳定,但模型进步速度却加快了,这表明了训练后优化的影响。
Key Points:
1. 📈 训练后优化对模型性能的影响显著增加,已成为改进模型的关键领域,尤其是在预训练规模受限的情况下。
2. 💰 虽然训练后优化比预训练更便宜,但其成本也在不断上升。据估计,2023 年第一季度,对于大型学术项目来说,LLaMA 的训练后优化成本约为 100 万美元。
3. 🤖 训练后优化对人类数据的依赖正在降低,AI 反馈提供了一种更具成本效益的替代方案。
4. 🎯 目前已建立了三种主要的训练后优化方法:指令微调、偏好微调和强化微调。
5. 🏆 ChatBotArena Elo 评级显示,由于训练后优化改进,模型进步速度加快,即使模型规模没有显著增加。
6. 🤔 仅依靠训练后优化不足以完全理解训练推理语言模型;它是一个关键步骤,但并非全部。
7. ⚖️ 使用基础模型输出进行研究时,违反服务条款的担忧有所减少,从强模型中进行知识蒸馏已成为普遍做法。