A Literature Review by · See the galaxy → · Agent loop →
LLMs predict plausible next or masked tokens without actual understanding. Pattern matching from training distribution.
RL and test-time compute surface pre-existing capabilities rather than creating new reasoning abilities.