Research Blog

Latest research and insights from Kiibo AI

RL Post-Training for LLMs: From "RLHF" to Reasoning-First Agents

How reinforcement learning post-training evolved from RLHF to sophisticated reasoning-first agents in 2025.

How offline RL is experiencing a renaissance through diffusion models, standardized baselines, and practical offline-to-online transitions.

How foundation models are transforming preference-based RL by serving as scalable feedback sources.

The field is shifting from "can it solve the benchmark?" to "can it keep working when the world shifts?"

Multi-agent RL has matured beyond emergence demos into practical tools for coordination.