The ability to handle situations (or tasks) that differ from from previously encountered situations - On the Measure of Intelligence

Grokking is an instance of the minimum description length principle.

If you have a problem, you can just memorize a point-wise input to output mapping.
This has zero generalization.
But from there, you can keep pruning your mapping, making it simpler, a.k.a. more compressed.
The program that generalizes the best (while performing well on a training set), is the shortest. (or is it…? See How to build conscious machines)
→ **Generalization is memorization + regularization ** ←
(this type of generalization is still limited to in distribution, however)

Link to original

Breadth of training predicts breadth of transfer. That is, the more contexts in which something is learned, the more the learner creates abstract models, and the less they rely on any particular example. Learners become better at applying their knowledge to a situation they’ve never seen before, which is the essence of creativity. - David Epstein, Range: Why Generalists Triumph in a Specialized World.

Beyond the facts, I looked for laws. Naturally, this lead me - more than once - to hasty and incorrect generalizations. Especially in my younger years, when my knowledge - book aquired - and my experience in life were still inadequate. But in every sphere, barring none, I felt that I could only move and act when I held in my hand the thread of the general. - Trotzki

  • Beware of hasty generalization → “from the particular to the general
  • Theory is necessary for practice → “from the general to the particular
Link to original

Intelligence is skill-acquisition efficiency.

The size of the skill-space you can navigate within a given time / budget is the generality of the intelligence.
Joscha Bach calls this the ability to make models, which is the same thing. Being good at a single task is a skill. Having a model that allows you to pick up different skills is intelligence.

Link to original

general intelligence is not a task specific skill, but the ability to quickly and sample-efficiently pick up any new task.

It is relatively easy to write an algorithm or train a model to solve a specifc task, even ARC, better than every human, especially if we just throw so much compute at it in order to model the entire distribution, i.e. brute-force it the deep-learning way, or spend a lot of compute doing discrete program search to solve it.

Link to original

Transclude of LLM#^a43c70

References

On the Measure of Intelligence

https://x.com/_saurabh/status/1763626711407816930?s=20
https://x.com/fchollet/status/1763692655408779455?s=20
https://arxiv.org/abs/1911.01547
https://arxiv.org/abs/2402.19450