LLMs Still Can't Plan; Can LRMs?

Distillnation

1×

0:00

Current time: 0:00 / Total time: -10:30

-10:30

LLMs Still Can't Plan; Can LRMs?

A Preliminary Evaluation of OpenAI's o1 on PlanBench

Distillnation

Sep 24, 2024

Transcript

In this podcast episode, we explore the cutting-edge research presented in the paper "LLMs Still Can’t Plan; Can LRMs? A Preliminary Evaluation of OpenAI’s o1 on PlanBench". The authors delve into whether the recent advancements in Large Reasoning Models (LRMs), specifically OpenAI’s o1 (Strawberry), mark a shift from the capabilities of traditional Large Language Models (LLMs) in planning tasks. PlanBench, a benchmark developed to test AI's planning abilities, has seen minimal progress with LLMs over the years, but o1 promises to change that.

This episode raises critical questions about the future of AI in reasoning:

Can LRMs like o1 truly reason, or do they just simulate reasoning better than previous models?
How far are we from AI models that can robustly plan and reason like humans?
What are the limitations and costs associated with these new systems, and can they be trusted in real-world applications?

Tune in to hear about the results, implications, and whether we are witnessing a significant step forward in AI's ability to reason and plan.

Paper: https://arxiv.org/abs/2409.13373
NotebookLM: https://notebooklm.google.com/

Distillnation

LLMs Still Can't Plan; Can LRMs?

Discussion about this podcast