SwiGLU GLU, but the sigmoid is swapped with Swish: SwiGLU(x)=Swish(xW)⊙(xV) Most modern open-weight LLMs use SwiGLU.