5 Comments

Great interview! I have been out of the loop for a bit, but this was a great way to get back into the conversation.

In the initial section, I was tracking three important distinctions: logical vs. sub-logical, linguistic vs. non-linguistic, and tacit vs. explicit. Would love to build a chart to track the dynamics between these concepts.

His case of approximate retrieval is very useful for my own work in education and instruction. I also found his analysis of the what humans vs. LLMs do well/poorly to be one of the best I have heard to date.

Humans struggle with generation while finding revision and verification much easier.

For LLMs, the outcome of generation and verification is largely dependent upon training data and still is approximate. This is powerful stuff. Insights educators and other professionals need to hear.

Daniel, could you recommend any reading on the LLM-deep reinforcement learning system compilations he spoke of that can both generate and reason? I'd love to dig into both topics a little deeper.

Expand full comment

Many thanks for listening! Let me get back to you on references/papers when I have a chance to dig some up.

Expand full comment

Thanks for the high quality interview.

I'm a bit unsure what to think of system 1 vs 2 analogies, in light of the brain not physically being separated in two in this way.

I'm not sure if it's being implied that doing LLMs + RLHF could be sufficient to achieve a) reasoning, or b) human type intelligence. Either way, it's an interesting question. If it is b), then one question is whether we can create the reward function of humans without building from the ground up with genes (which are what create "selection").

Again, it's a good thought that ye raise. Hassabis, in his recent interview with Dwarkesh, highlights how chess players are far more efficient at selecting which moves to consider, not just more efficient than a tree search like Stockfish, but more efficient than a reinforcement learning + tree search solution like Alpha Zero. And that leads to the question of whether this decrement in efficiency is because of a) a lack of data, b) an incomplete/wrong reward function, or c) RL + tree search is not enough. My guess is that RL + tree search is enough BUT probably one can't get the correct RL objective function without building out a system of genes. Said differently, if you don't build out the genes, then you need to figure out the mechanism those genes use and replicate it in another way, and for that you need RL + Tree Search + understanding that other mechanism.

Expand full comment

Came for the AI stayed for the jokes :)

A really enjoyable discussion - thanks Daniel.

Expand full comment

Thanks for listening! 🙂

Expand full comment