Crafter is a procedurally generated 2D survival game designed as a benchmark for reinforcement learning research. Created as a simpler, more tractable alternative to Minecraft while maintaining similar complexity in terms of tech trees, resource gathering, and long-term planning requirements.

Craftax: https://github.com/MichaelTMatthews/Craftax jax version with nethack like expansion.

Game Mechanics

procedural generation with resources (wood, stone, coal, iron, diamonds)
Crafting system with hierarchical dependencies (e.g., need wood → wooden pickaxe → stone → stone pickaxe → iron)
Survival elements: health, food, water, enemies (zombies, skeletons)
Day/night cycle affecting gameplay dynamics
22 achievement unlocks used as evaluation metrics
Pixel vs symbolic obs. Most train on symbolic obs since way faster and learning perception often not the point.

Crafter exposes fundamental challenges for traditional RL:

sparse reward - many actions needed before any achievement
Long-term credit assignment - crafting diamond tools requires ~20+ correct sequential decisions
exploration vs exploitation - must balance immediate survival with resource gathering
Compositional structure - understanding tool hierarchies and resource dependencies

Crafter demonstrates that pre-trained world models outperform task-specific learning for complex planning tasks.

Traditional RL approaches struggle:
PPO: ~1.5 score after 1M environment steps
Rainbow DQN: ~2.0 score after 1M steps
dreamerv3: ~10 score after millions of steps (previous SOTA)

But using an LLM zero-shot, prompted with the latex source of the crafter paper:
SPRING (GPT-4): ~17.8 score with zero training
World knowledge and reasoning can substitute for millions of trial-and-error iterations.

An LLM that reads game documentation outperforms RL agents with millions of gameplay steps - learning world models from scratch (esp. for a single, specific environment) when general priors could be learnt first instead, is incredibly inefficient and stupid.


Strong baselines:

Link to original


rl environments