首页 /研究 /Beyond the Bellman Fixed Point: Geometry and Fast Policy Identification in Value Iteration

OTHER

Beyond the Bellman Fixed Point: Geometry and Fast Policy Identification in Value Iteration

Donghwan Lee

发表年份: 2026
访问权限: 开放获取

摘要

Q-value iteration (Q-VI) is usually analyzed through the \(γ\)-contraction of the Bellman operator. This argument proves convergence to \(Q^*\), but it gives only a coarse account of when the induced greedy policy becomes optimal. We study discounted Q-VI as a switching system and focus on the practically optimal solution set (POSS), the set of \(Q\)-functions whose tie-broken greedy policies are optimal. The main result shows that Q-VI reaches the optimal action class in finite time by entering an invariant tube around \(\mathcal X_1=Q^*+\operatorname{span}(\mathbf 1)\), which is contained in the POSS. For every \(\varepsilon>0\), the distance to \(\mathcal X_1\) satisfies an exponential bound with rate \((\barρ+\varepsilon)^k\), where \(\barρ\) is the joint spectral radius of the projected switching family restricted to directions transverse to \(\mathcal X_1\). When \(\barρ<γ\), this transverse convergence is faster than the classical contraction rate. The analysis separates fast policy identification from the subsequent convergence to \(Q^*\), which may still be governed by the all-ones mode. We also give spectral and graph-theoretic conditions under which the strict inequality \(\barρ<γ\) holds or fails.

关键词

math.OCcs.AIeess.SY

Beyond the Bellman Fixed Point: Geometry and Fast Policy Identification in Value Iteration

摘要

关键词

相关论文

Statistical Learning Theory

Fractional Differential Equations

Applied Nonlinear Control

Genetic Programming: On the Programming of Computers by Means of Natural Selection