Home › Knowledge Base › Rademacher

Rademacher

No mentions found

This entity hasn't been tracked yet, or Iris is still building its knowledge base.

Related Articles from SNS

Quantum Reservoir Computing and Risk Bounds

arXiv:2501.08640v2 Announce Type: replace Abstract: We propose a way to bound the generalisation errors of several classes of quantum reservoirs using the Rademacher complexity. We give specific, parameter-dependent bounds for two particular quantum reservoir classes. We analyse how the generalisation bounds scale with growing numbers of qubits.

arXiv CS 8d ago

Does Order Matter : Connecting The Law of Robustness to Robust Generalization

Announce Type: replace Abstract: Bubeck and Selke (2021) propose the connection between the Law of Robustness and robust generalization error as an open problem. The Law of Robustness states that overparameterization is necessary for models to interpolate robustly, i.e., the interpolating function is required to be Lipschitz. (2023) extend this law to arbitrary data distributions, proving that the Lipschitz constant satisfies $L = \Omega(n^{1/d})$. Robust generalization, on the other hand,...

arXiv CS 6d ago

Sign Lock-In: Randomly Initialized Weight Signs Persist and Bottleneck Sub-Bit Model Compression

arXiv:2602.17063v2 Announce Type: replace Abstract: Sub-bit model compression targets storage below one bit per weight; as magnitudes are aggressively compressed, the sign bit becomes a fixed-cost bottleneck. Across Transformers, CNNs, and MLPs, learned sign matrices resist low-rank approximation and are spectrally indistinguishable from an i.i.d. This randomness gives rise to the lower bound of sub-bit model compression -- the one-bit wall.

arXiv CS 7d ago

From Scaling to Structured Expressivity: Rethinking Transformers for CTR Prediction

arXiv:2511.12081v2 Announce Type: replace Abstract: Despite massive investments in scale, deep models for click-through rate (CTR) prediction often exhibit rapidly diminishing returns -- a stark contrast to the {predictable scaling laws} seen in large language models (LLMs). We identify the root cause as a {fundamental} \textit{structural misalignment}: {standard} Transformers assume sequential compositionality, whereas CTR data demand combinatorial reasoning over {heterogeneous} fields. To...

arXiv CS 8d ago

Rationality Measurement and Theory for Reinforcement Learning Agents

Announce Type: replace Abstract: This paper proposes a suite of rationality measures and associated theory for reinforcement learning agents, a property increasingly critical yet rarely explored. We define an action in deployment to be perfectly rational if it maximises the hidden true value function in the steepest direction. The expected value discrepancy of a policy's actions against their rational counterparts, culminating over the trajectory in deployment, is defined to be expected...

arXiv CS 9d ago

Optimal Rates for Generalization of Gradient Descent for Deep ReLU Classification

arXiv:2510.02779v4 Announce Type: replace Abstract: Recent advances have significantly improved our understanding of the generalization performance of gradient descent (GD) methods in deep neural networks. A natural and fundamental question is whether GD can achieve generalization rates comparable to the minimax optimal rates established in the kernel setting. Existing results either yield suboptimal rates of $O(1/\sqrt{n})$, or focus on networks with smooth activation functions, incurring...

arXiv CS 7d ago