Crafter is a procedurally generated 2D survival game designed as a benchmark for reinforcement learning research. Created as a simpler, more tractable alternative to Minecraft while maintaining similar complexity in terms of tech trees, resource gathering, and long-term planning requirements.
Game Mechanics
procedural world generation with resources (wood, stone, coal, iron, diamonds)
Crafting system with hierarchical dependencies (e.g., need wood → wooden pickaxe → stone → stone pickaxe → iron)
Survival elements: health, food, water, enemies (zombies, skeletons)
Day/night cycle affecting gameplay dynamics
22 achievement unlocks used as evaluation metrics
Crafter exposes fundamental challenges for traditional RL:
sparse reward - many actions needed before any achievement
Long-term credit assignment - crafting diamond tools requires ~20+ correct sequential decisions
exploration vs exploitation - must balance immediate survival with resource gathering
Compositional structure - understanding tool hierarchies and resource dependencies
Crafter demonstrates that pre-trained world models outperform task-specific learning for complex planning tasks.
Traditional RL approaches struggle:
PPO: ~1.5 score after 1M environment steps
Rainbow DQN: ~2.0 score after 1M steps
dreamerv3: ~10 score after millions of steps (previous SOTA)But using an LLM zero-shot, prompted with the latex source of the crafter paper:
SPRING (GPT-4): ~17.8 score with zero training
→ World knowledge and reasoning can substitute for millions of trial-and-error iterations.An LLM that reads game documentation outperforms RL agents with millions of gameplay steps - learning world models from scratch (esp. for a single, specific environment) when general priors could be learnt first instead, is incredibly inefficient and stupid.