无标题
Agentic RL:veRL AgentLoop 全流程与计算细节(Async Rollout、状态机、Tool-Interaction)
Agentic RL:veRL Infra AgentLoop 代码串讲(Multi-turn 推理与 Ray Trainer)
Agentic RL:veRL Infra AgentLoop(AgentLoopManager、Async Rollout 与 Hybrid 推训)
Agentic RL:重新理解 DPO(KL 正则 RL、隐式奖励模型与缺陷)
Agentic RL:分布视角理解 SFT 与 RL(Forward/Reverse KL、分布与奖励)
Agentic RL:Reward Model Insights(Bradley-Terry、MLE 与深度学习)
Agentic RL:Tokenizer 编解码非对称性与 Token-in-Token-out(RL 训练崩溃的根因)
Agentic RL:veRL MultiTurn Tool Use 与 Coding Agent SFT(Cold Start for RL)
Agentic RL:veRL FSDP SFT Trainer 补充(Teacher Forcing、Shift Labels/Logits、Loss Mask)