Graph View

looped transformer

1 min read

Decoupling compute from data and model size.

Demonstrated at scale: Scaling Latent Reasoning via Looped Language Models

Todo

https://arxiv.org/pdf/2301.13196
Has some interesting stuff in the apendix also; e.g. also link to a paper that shows how deep a relu network needs to be to approximate a polynomial with X precision.


Backlinks

  • Scaling Latent Reasoning via Looped Language Models
  • Universal Transformers
  • dynamical-systems lens on neural networks

Created with Quartz v4.5.2 © 2026