( is the current experience)

Unlike offline learning, online learning makes no distinction between training and deplyoment. The dataset is built from experiences the agent itself collects through interaction in the real world. Training is still conducted via mini-batch gradient descent, so it learns from a random subset of its past experience.

Why store past experience?

  • It works stably for offline learnig
  • Memory replay occurs in natural intelligence (albeit not with raw experience)
  • Replaying samples multiple times extracts more information / is more sample efficient
  • Stability