general_value_functions_sutton_slides.pdf

Cumulant … the signal being predicted - this replaces the reward signal in traditional value functions. It could be anything measurable: sensor readings, state features, auxiliary rewards, or even other value function outputs
Policy … the behavior policy being followed, which can be different from the policy you’re trying to improve.
Termination function … a state-dependent discount factor that can model episodic boundaries more flexibly than a constant discount.

References

value function
curiosity
off-policy
prediction

reinforcement learning

http://incompleteideas.net/Talks/luganoreduced.pdf

Fun comparison in explanation between 3.5 Sonnet (I think) and 4.0 Opus:
https://claude.ai/chat/96e338c0-ba8b-4e53-aaac-ce4b5abc2b54
https://claude.ai/chat/f5b96432-c579-48ac-b8b1-a0c60d7cbcfb