A Literature Review by Prayson Wilfred Daniel
LLMs predict plausible next or masked tokens without actual understanding. Pattern matching from training distribution. RL and test-time compute surface pre-existing capabilities rather than creating new reasoning abilities.
LLMs predict plausible next or masked tokens without actual understanding. Pattern matching from training distribution.
RL and test-time compute surface pre-existing capabilities rather than creating new reasoning abilities.