3 Pilgrim LLC | Fractal–Hyperbolic Degeneracy in Overparameterized Learning Manifolds | Version 1.0 · February 5, 2026

Fractal–Hyperbolic Degeneracy in Overparameterized Learning Manifolds

A Companion Explainer

3 Pilgrim LLC

Version 1.0 · February 5, 2026

Click here for full PDF of paper


1) Why This Paper Exists

Modern overparameterized neural networks exhibit a constellation of strange, widely observed behaviors: flat minima, hyperbolic curvature, fractally rough loss boundaries, low intrinsic dimension, and low‑rank Fisher spectra. These observations are real but fragmented in the literature. There is no minimal theory explaining why these features co‑occur or how they jointly enable efficient training at scale. This paper proposes that all of them arise from three simple primitives governing the geometry of overparameterized optimization. The goal is to unify disparate empirical results into a causal and structural account, and to derive a practical three‑phase training protocol from that structure.


2) What the Paper Says (Plain‑Language Summary)

The paper introduces three primitives that, together, explain the entire observed regime of paradoxes in large‑scale learning:

  1. Gradient Erosion (Negative‑Space Carving).
    Training removes redundant directions instead of filling the space. Erosion collapses vast degenerate regions into a resistant low‑dimensional core—producing flat minima, low intrinsic dimension, and the observed fractal roughness of boundaries.

  2. Fisher Metric as a Parametric Friction Field.
    The Fisher information defines local friction: high eigenvalues = tight data constraints; low eigenvalues = sloppy directions. This explains low‑rank Fisher structure and why curvature spectra show hyperbolic traits.

  3. Overparameterization as Degeneracy Amplifier.
    Extra parameters create vast families of nearly equivalent solutions. Degeneracy is not a bug but a feature: it accelerates exploration and enables erosion to find the stable manifold.

From these primitives, the paper derives the Target Acquisition Protocol, a three‑phase training method exploiting evolving manifold geometry:

Across toy experiments and a small transformer, this protocol yields 15–35% fewer training steps to matched accuracy under equal compute.


3) What Distinguishes This Framework


4) Theoretical Implications (Assuming the Work Is Correct)


5) Potential Implications (Downstream, Not Predictions)

A) Training Efficiency & Scaling

B) Model Design

C) Theory of Generalization

D) Tooling & Infrastructure