Noah Goodman - Cognitive science requires causal abstraction analysis

Metadata

URL: https://www.youtube.com/watch?v=zPkRyTLjYHI
Publisher: Causality in Cognition Lab
Published Date: 2023-11-20

Notes

Summary

It's interesting to hear a psychologist's take on LLMs and interpretability.
Goodman starts with this "flash-forward" where in 2030, psychology is to a certain extent solved by LLMs. That is, any reproducible findings from psychology are displayed by LLMs. Not only that, LLMs are able to display the psychological characteristics of a wide, representative, range of the human population.
The question is, if this occurs, is there still a place for psychology?
He presents 3 camps:

Application: The operational utility (i.e. predictive power) is all that matters. Therefore, psychology is solved.
Explanation: It's not over until we can actually explain how our minds work.
- Issue: unclear what it means to be a "satisfactory explanation"
Mechanism: We still want to understand the circuits by which intelligence operates.
- Issue: a priori very non-obvious that there should be simple, discoverable, circuits

He then reveals himself as a mechanism-ist. And presents some work where, given a hypothesis as to what the causal mechanism a LM is employing for a task, one can test if this is indeed the case via interventions. Basically, assuming the linear representation hypothesis, and then going and testing if there exist directions along which changing the representation along that direction is isomorphic to changing a variable in the causal structure one posits.

Interesting tidbits

One answer that Goodman presents to "what could be a good explanation?" is Feyerabend's position in Against Method--that an explanation's value is derived "solely from the consensus of a community"
- This reminds me a bit of what I was "arguing" in Locally Meaningful, Globally Meaningless - a Call to Action for the Future of Work#The Academic Water Cycle: that having self-contained academic silos is very important for the goal of having academics do useless work.
- Also, of the potential that agreement is more important than truth.

My thoughts

What is the purpose of interpretability?
- The point here is interpretability for the purpose of understanding
What reasons do we have to expect that circuits should be simple or complex?
- I have this hypothesis that some phenomena just don't have simple explanations. That is, some functions that nature computes just are not simulatable by small circuits.
  - One example of this is protein folding, where decades of attempts to give a simple, first principles algorithm failed, while the eventual solution was a deep neural net. [related:: the role of complexity and machine learning in science and math]
    - [?] although, maybe have people worked on interpretability for AlphaFold?
  - Maybe something similar is true for something like AlphaGo?
- There is a question though of learnability vs computability: learning general circuits is hard, so maybe it's not the case that small circuits don't exist, but rather that in the restricted class of circuits which can be learned efficiently, the smallest (approximate) representation is complicated?

Highlights

so when they discover that there are now AI surrogates that let them predict how humans will respond in very complicated real world situations, they think this is great, and they just say, that's all we need in order to design interventions and use psychology the way we always wanted.
Note: Reminds me of this distinction between prediction and inversion (Ala Manish Raghavan)

This is what Feyre Ben said back in against method. Anything goes. And so it's okay. It's defined by the community what counts as a satisfying explanation, and that's what we're doing.
Note: The silo theory of everything.

So this is the idea of causal abstraction. We want a mathematical theory of when one causal system is an abstraction, faithful, full abstraction of another.