You can view neural networks as static functions or as dynamical system, a rule iteratively applied.
The dynamical-systems framing lets you ask about emergence, attractors, self-organization, and stability of patterns.

universal-transformers is the explicit application of this idea to transformers: instead of L different transformer blocks, you have one block applied T times, with T either fixed or adaptively chosen per token via adaptive computation time.
Pure weight tying seems to cost something in raw modeling power compared to having distinct layers learn distinct functions, but the idea that is alive in things like looped transformers, diffusion models, state space models which are explicitly recurrent.

Meta-learned and open-ended NCA are DS², while for self-modifying NCAs (parameters part of the structure), the parameters can eb folded into the state space.