Should render without errors (works with mathjax not with katex - writing a plugin for this is more effort than just adapting this manually once mathjax 4.0 comes out):
\displaylinesab\displaylinescd
Should be centered:
Big callout
Stuff
Stuff
Agent i1 goes first, picks an action with ai1 aiming for positive advantage Aπi1(o,ai1)>0
Agent i2, knowing ai1, chooses ai2 for positive Aπi2(o,ai1,ai2)>0
Agent i3, knowing (ai1,ai2), …
Stuff
Stuff i1 goes first, picks an action with ai1 aiming for positive advantage Aπi1(o,ai1)>0
Agent i2, knowing ai1, chooses ai2 for positive Aπi2(o,ai1,ai2)>0
Agent i3, knowing (ai1,ai2), …
Agent
Referencing a header with a link in the title should work
[[test#text]]
Referencing a section with alt text works: Stuff
Referencing a section without alt text should also work! ^f39253
Sanity check:
?={abif conditionif other condition
This should work inline too! c=\casesab
[[test]]
↑ this should display as a link even iff there’s a tab in the line above it lol
I have NO idea what’s wrong here:
Hazard Rate
The hazard rate (or failure rate) h(t) represents the instantaneous rate of failure at time t, given survival up to that time. For the exponential distribution, it’s defined as:
h(t)=P(T>t)p(t)=e−λtλe−λt=λ
The constant hazard rate λ is a direct consequence of the memoryless property - the failure rate doesn’t change over time. This means an exponentially distributed component is just as likely to fail in the next instant whether it’s brand new or has been running for years.
The relationship P(T<t+dt∣T>t)=λ⋅dt (given I’ve lasted t time, what’s the probability I’ll last another dt) tells us that for a small time interval dt, the probability of failure is approximately λ times the length of that interval, regardless of how long the component has already survived.
$\implies P(t \lt T \le t + dt) = \lambda P(T \gt t)dt$
What about this? Bruh, this fixes it.. it’s the nested callout latex…
Hazard Rate
The hazard rate (or failure rate) h(t) represents the instantaneous rate of failure at time t, given survival up to that time. For the exponential distribution, it’s defined as:
h(t)=P(T>t)p(t)=e−λtλe−λt=λ
The constant hazard rate λ is a direct consequence of the memoryless property - the failure rate doesn’t change over time. This means an exponentially distributed component is just as likely to fail in the next instant whether it’s brand new or has been running for years.
The relationship P(T<t+dt∣T>t)=λ⋅dt (given I’ve lasted t time, what’s the probability I’ll last another dt) tells us that for a small time interval dt, the probability of failure is approximately λ times the length of that interval, regardless of how long the component has already survived.
When we do [[taylor expansion]] for the exponential and make a small dt approximation, we get the hazard rate:
=1−[1−λdt+21λ2dt2−…]≈λdt
⟹P(t<T≤t+dt)=λP(T>t)dt
This callout should be properly indented (the indentation of that numbered list shouldnt stop after the numbered list ends, since therea an empty newline (with>)):
Training procedure
Evolution loop (CMA-ES):
Start of a generation: Sample a population of parameter vectors θ
For each individual:
Load parameters θ
Initialize fresh graph: random connections via P(Aij=1)∼N[0,1](μconn,σconn)
Run development phase if enabled (TSA steps of spontaneous activity)
Run multiple episodes, keeping network graph between episodes
Return fitness (average reward over episodes)
CMA-ES updates its distribution based on fitnesses
Repeat for … generations
Each individual gets its own graph that persists across episodes but not across generations.
Information flow (per timestep): ht+1=fθh(Gt) eijt+1=fθe(eijt,hit+1,hjt+1,rt) wij=eij,0 vt+1=tanh(v^t⋅wt),v^i={oiviif i∈Inputotherwise (repeat rnn_iters times)
Actions = voutput_nodes argmax (discrete) or raw “concatted” activations (continuous)
Weight changes: via edge state updates eijt+1=fθe(...)
Both use evolved rules θ fixed at birth
Core challenge: Discover both structural rules (which connections to form) and learning rules (how to update weights) using only episodic rewards - no supervision on topology or weights.
The comments should be aligned:
The task code length varies around 200-400+ LOC, with a gym env like structure