the platonic representation hypothesis
- there's evidence showing that large, competent, models converge to similar representations (especially at early layers)
- e.g. via model stitching
- reminds me of the sedimentary or hierarchical path of progress: since all of our (human) intelligence was built one layer at a time, perhaps the first couple layers of representations are the same across different tasks because that's just how the world works.