You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ten years ago, "distillation" was just a way to squeeze massive AI models into smaller versions that could run on your phone.
then it became replication/cloning from larger to smaller models to enable wider adoption/realizing capabilities wider audience
By 2023, we used it to copy the "smarts" of giant models (like GPT-4) into open-source models so everyone could use them.
Now, it is a thinking and reasoner.
In 2026, models are using Self-Distillation to act as their own teachers, where they analyze their own mistakes to get smarter without needing humans steering to evaluate them.
The big shift:
"We have moved from simply copying answers to actually teaching models how to reason."
It started as a compression.
then it became replication/cloning from larger to smaller models to enable wider adoption/realizing capabilities wider audience
Now, it is a thinking and reasoner.
The big shift: