Mechanistic Foundations of Goal-Directed Control

Alma Lago

Cajal Neuroscience Center (CSIC) · University of the Basque Country (UPV/EHU)

how do control strategies form, compete, and get resolved through experience?

two routes, one task. a learning agent faces motor tasks with varying demands. two control strategies compete: a reactive route (respond to the present) and a prospective route (anticipate the future). task demand determines which route is reliable.

a gate arbitrates. an attention mechanism reads the training history and routes control, converging toward theoretically motivated confidence thresholds. this gate is the central object of study.

one parameter decides. context window k, not network depth, determines whether the gate forms at all. below k ≤ 4 no arbitration emerges. above k ≥ 8 the gate begins resolving, with full separation by k = 32. the transition shows a clear two-phase structure: a dead zone, then an abrupt breakthrough.

a diagnostic, not a replacement. commitment dynamics are well described by a closed-form EMA. this models the rate of commitment analytically, but cannot reproduce the arbitration itself. where experimental curves are noisy and ambiguous, the analytical surrogate provides a clean theoretical reference.

these are circuit-level phase transition signatures of the same kind found in transformer induction heads (Olsson et al., 2022), extending mechanistic interpretability beyond sequence prediction to embodied control architectures.

phase transition dynamics

select k and compare the learned gate (navy) against the closed-form EMA prediction (dashed).

R² shown top-right quantifies fit quality. at small k, improvement is gradual; at large k, the dynamics slow beyond what 200 post-snap epochs can resolve. k = 32 sits at the crossing point: a clean breakthrough well within the observation window. this distinction between linear improvement and abrupt breakthrough reflects a qualitative difference in circuit formation (Zhao et al., 2024).

c(t) = c_∞ − (c_∞ − 0.5) · e^{−(t−τ_snap)/k}

context window k

interpretation is based on the analytical model, not the raw experimental curves

learned gate

EMA prediction

control strategy arbitration

select k to watch arbitration emerge.

the heatmap reveals the learned confidence landscape across task demand and training time. context window k governs whether the gate forms at all. below the threshold (k ≤ 4), no structure appears. above k ≥ 8 the gate begins resolving, with the phase diagram fully resolved at k = 32 ★. at k ≥ 64 gains saturate.

context window k

colorbar scale is global across all k. direct comparison is valid

ema ablation — why not just use ema?

because ema is blind to task demand. temporal integration alone cannot arbitrate: ema lacks the state-dependent mechanism required for arbitration. task demand is invisible to it. the result is Δ = 0.0006 versus Δ = 0.192 for the attention gate. ema is useful as a diagnostic model of commitment dynamics, but it cannot implement the arbitration itself. the attention gate implements the state-dependent routing that ema cannot, and crucially it scales: Δ grows monotonically with k up to saturation at k ≥ 64, consistent with the scaling behaviour observed in transformer-based systems.

attention gate (k=32)
Δ = 0.192 · prospective / reactive structure emerges

EMA ablation (α = 1/k)
Δ = 0.0006 · uniform. task demand invisible

robustness

select condition to compare. each cell is a phase diagram for a given (k, ED) pair.

the k-threshold remains stable across all tested settings. across network depth (NL ∈ {1, 2, 3}), NL = 1 is sufficient for emergence and NL = 2 achieves sharper separation. NL = 3 provides no clear arbitration gain. learning rate is the most sensitive hyperparameter: too low, the gate never forms; too high, the reactive strategy does not form at all and the prospective one fails to stabilise, with structure collapsing. the learning rate results are included for completeness. they confirm why the magnitude e-3 was selected. the threshold does not depend on it.

network depth

learning rate

Prospective Reactive

colorbar scale is global across all conditions. direct comparison is valid

result summary

dead zone

k ≤ 4

no arbitration structure forms

optimal ★

k = 32

Δ = 0.192, phase diagram fully resolved

saturation

k ≥ 64

Δ ≈ 0.198, diminishing returns

predicted threshold

k ≥ K_steps ≈ 10

gate needs full prospective trajectory

context window k, not network depth, is the binding constraint.

the depth ablation (NL ∈ {1,2,3}) shows plateau at NL=2 (Δ: 0.164 → 0.192 → 0.195), confirming k drives the transition.

citation

@misc{lago2026mechanistic,
  title   = {Mechanistic Foundations of Goal-Directed Control},
  author  = {Lago, Alma},
  year    = {2026},
  eprint  = {2603.15248},
  archivePrefix = {arXiv},
  primaryClass  = {cs.LG},
  url     = {https://arxiv.org/abs/2603.15248},
}