At the end of an interview, Kimi founder Yang Zhilin used AI training as an analogy for how a CEO manages a team.
-
SFT (supervised fine-tuning): top-down management, where the manager gives team members detailed instructions. It can be efficient, but it risks suppressing initiative and creativity.
-
RL (reinforcement learning): set a clear goal, or reward, and let team members figure out the method themselves. It can maximize creativity, but it also creates the risk of "reward hacking," where people game the metric instead of achieving the real objective.
"The risk of managing with SFT is that people lose creativity. The risk of managing with RL is that the reward gets hacked. It can look like you achieved a good result, while in reality you still missed the final goal."
He sees balancing those two approaches as one of the CEO's key responsibilities, and said he is experimenting with a model centered on RL, using SFT as a supporting mechanism to provide direction.
Earlier this year, in a post titled "Management Is Saying You're Good at What You're Good At, and Bad at What You're Bad At," I wrote about organizational reinforcement learning. At this point, companies that do annual performance reviews and work in two-week sprints are simply moving too slowly. https://lnkd.in/g6WnScFc
"One day in AI is like one year in the human world. I am not sure how many days in the human world equal one year in AI, though." - Yang Zhilin https://lnkd.in/gayvtdGH

Kimi Founder Yang Zhilin: K2, Agentic LLMs, Brains in Vats, and the Beginning of Infinity