how do control strategies form, compete, and get resolved through experience?
Cajal Neuroscience Center (CSIC) · University of the Basque Country (UPV/EHU)
how do control strategies form, compete, and get resolved through experience?
select k and compare the learned gate (navy) against the closed-form EMA prediction (dashed).
R² shown top-right quantifies fit quality. at small k, improvement is gradual; at large k, the dynamics slow beyond what 200 post-snap epochs can resolve. k = 32 sits at the crossing point: a clean breakthrough well within the observation window. this distinction between linear improvement and abrupt breakthrough reflects a qualitative difference in circuit formation (Zhao et al., 2024).
interpretation is based on the analytical model, not the raw experimental curves
select k to watch arbitration emerge.
the heatmap reveals the learned confidence landscape across task demand and training time. context window k governs whether the gate forms at all. below the threshold (k ≤ 4), no structure appears. above k ≥ 8 the gate begins resolving, with the phase diagram fully resolved at k = 32 ★. at k ≥ 64 gains saturate.
colorbar scale is global across all k. direct comparison is valid
because ema is blind to task demand. temporal integration alone cannot arbitrate: ema lacks the state-dependent mechanism required for arbitration. task demand is invisible to it. the result is Δ = 0.0006 versus Δ = 0.192 for the attention gate. ema is useful as a diagnostic model of commitment dynamics, but it cannot implement the arbitration itself. the attention gate implements the state-dependent routing that ema cannot, and crucially it scales: Δ grows monotonically with k up to saturation at k ≥ 64, consistent with the scaling behaviour observed in transformer-based systems.
select condition to compare. each cell is a phase diagram for a given (k, ED) pair.
the k-threshold remains stable across all tested settings. across network depth (NL ∈ {1, 2, 3}), NL = 1 is sufficient for emergence and NL = 2 achieves sharper separation. NL = 3 provides no clear arbitration gain. learning rate is the most sensitive hyperparameter: too low, the gate never forms; too high, the reactive strategy does not form at all and the prospective one fails to stabilise, with structure collapsing. the learning rate results are included for completeness. they confirm why the magnitude e-3 was selected. the threshold does not depend on it.
colorbar scale is global across all conditions. direct comparison is valid
context window k, not network depth, is the binding constraint.
the depth ablation (NL ∈ {1,2,3}) shows plateau at NL=2 (Δ: 0.164 → 0.192 → 0.195), confirming k drives the transition.
@misc{lago2026mechanistic,
title = {Mechanistic Foundations of Goal-Directed Control},
author = {Lago, Alma},
year = {2026},
eprint = {2603.15248},
archivePrefix = {arXiv},
primaryClass = {cs.LG},
url = {https://arxiv.org/abs/2603.15248},
}