Understanding Higher-Level Intelligence from AI, Psychology, and Neuroscience Perspectives - Discussion
#talk #ai
https://www.youtube.com/watch?v=j7eVimhSvnQ&list=WL&index=6&t=2251s
- People:
- Jacob Andreas
- Max Dabagia
- Michael Frank
- Noah D. Goodman (psychology, CS @ Stanford)
- Trevor Darrell
First of all, I think it's really cool that people, academics, in CS are willing to label themselves as studying "intelligence" and "cognition" now. These have always, and I guess still, feel like very murky, hard to pin down terms. But the surprising power of transformers for language modeling, and the power of language modeling for "general tasks" has perhaps given us confidence that we have the tools to take a real stab at understanding intelligence now.
Summary
The panel generally talked about the philosophical (making several references to Wittgenstein) ramifications of the success of language models.
- Why has language proven to be so powerful?
- Is language "complete"? Is it sufficient for reasoning? Is it necessary?
- What is the role of other modalities?
- Implications of the platonic representation hypothesis. Can ask the normative question of, do we really want one language model, one way of representing the world, to dominate? Or would it be better to have sort of agentic specialization?
- The meta question of, what is really safe to take away (about cognition) from the successes and failures of attempts to create cognition?
Some interesting tidbits:
- LLMs only became good at long range reference and other things humans do with language after being trained on code!
- In hindsight, maybe the success of language models isn't so surprising, because language was constructed for the purpose of being the easiest, most concise way of representing the world and representing human concepts.
- Several examples against the necessity of language:
- Learning piano
- Manipulation teaches perception about abstraction
- About acquiring number sense via manipulation of objects (no need for language)
- Thinking of language models as Bayesian estimators: using the context to update its posterior distribution over "what external context caused these words to appear?"
My thoughts
- I like the question about what is safe to take away from these attempts at artificial cognition, towards understanding human cognition.
- In particular, I have this hypothesis that biology (and therefore intelligence) are sort of an accumulation of accidents throughout history;
- And now, we're sort of trying to "rediscover the key accidents" in creating our own intelligence.
- E.g.
- the transformer architecture: was key to the inductive biases that led to GPT breakthroughs
- the fact that using Adam for optimization greatly decreases sample complexity and quality from SGD
- even neural networks in the first place
- Which of these are accidents, which of these were necessary?
- If we want to come up with a theory of cognition, we'll need to be able to separate out what is purely symmetry breaking, vs what is truly fundamental. [related:: what is purely symmetry breaking vs actually fundamental and new]