Graded Transformers
Tony Shaska
- Year
- 2025
- Access
- Open access
Abstract
We introduce the Graded Transformer framework, a new class of sequence models that embeds algebraic inductive biases through grading transformations on vector spaces. Extending Graded Neural Networks (GNNs), we propose two architectures: the Linearly Graded Transformer (LGT) and the Exponentially Graded Transformer (EGT). These models apply parameterized scaling operators, governed by fixed or learnable grading tuples and in the case of EGT exponential factors, to encode hierarchical structure in attention and representation layers and to improve efficiency for structured data. We establish rigorous guarantees, including universal approximation theorems for continuous and Sobolev functions, reduced sample complexity via effective VC dimension bounds, Lipschitz continuity of graded operations, and robustness to perturbations. A graded loss ensures gradient stability and alignment with domain priors during optimization. By treating grades as differentiable parameters, the framework enables adaptive feature prioritization, overcoming limitations of fixed grades in earlier models. The Graded Transformer provides a mathematically principled approach to hierarchical learning and neuro-symbolic reasoning. Applications include algebraic geometry (moduli spaces and zeta functions), physics (multiscale systems), natural language processing (syntactic parsing), biological sequence analysis (variant prediction), robotics and autonomous systems (safety-critical prioritization), the automotive industry (certifiable AI for ADAS), and blockchain and financial cryptography (secure coding and structured prediction).
Keywords
Related papers
The Organization of Behavior
D. O. Hebb
2005
Fractional Brownian Motions, Fractional Noises and Applications
Benoît B. Mandelbrot, John W. Van Ness
1968
Review of deep learning: concepts, CNN architectures, challenges, applications, future directions
Laith Alzubaidi, Jinglan Zhang, Amjad J. Humaidi +7 more
2021
A guide to deep learning in healthcare
Andre Esteva, Alexandre Robicquet, Bharath Ramsundar +7 more
2018