What If Neural Networks Have the Signal Backwards?
The paper claimed that per-type carrier budgets are conserved during training, derived from Noether’s theorem. The experiments showed they’re not — budgets drift upward under soft penalties.
PART 1 OF 2
Justin Harris & Claude | TroponinIQ Research | April 2026
What If The Vectors Have It Backwards?
I've been thinking about this for years," Justin said. "The action
potential is one bit. The information is in the serotonin, the dopamine,
the GABA. We've been building AI backwards."
In your brain, the action potential --- the electrical spike that
travels down a neuron --- is informationally stupid. It's a binary
switch: fire or don't fire. One bit. The actual computational payload is
carried by the neurotransmitter cocktail released at the synapse.
In AI, it's flipped. The activation vector is high-dimensional and
information-rich. The weight is a boring scalar multiplier.
So we built it the other way.
The Uni-Bit Vector Gate: 100 binary gates per layer, 10 scalar
"carriers" per gate organized into 5 neurotransmitter types ---
monoamine, GABA, glutamate, acetylcholine, neuropeptide. When a gate
fires, it releases its carriers. When it doesn't? Zero cost. Silence is
free.
➤ The expensive thing in the brain isn't the signal. It's the silence
that makes the signal meaningful. ➤
Fixing the Symmetry
The first version (v1) had a broken symmetry --- and it mattered more
than expected. The output projection was reading raw carrier values,
giving each one a distinct meaning. If you rotate carriers within a type
by 45° you get different outputs. That's broken symmetry.
Noether's theorem says: no symmetry → no conservation law. Budget
drift was the symptom.
The fix (v2): instead of reading raw carrier values, compute the
energy in each type: B~j~ = ∑~i\ ∈\ type j~ |s~i~|^2^ Energy is
rotation-invariant. Now O(kⱼ) rotation within type j is a genuine
symmetry of the full forward pass. Noether's theorem now has something
to work with.
Side effect: Layer-0 firing rates dropped from 20% to 3.5% --- right in
the biological range. Without any sparsity penalty.
{width="5.208333333333333in"
height="3.125in"}
Figure 1 --- Gate firing rates during training. v2 architecture:
Layer-0 settles at 3.5% (biological 1--10% range) without any sparsity
regularizer. Emerges from information economics.
➤ The symmetry fix was about theory. The sparsity drop was a free
gift from physics. ➤
What Does Noether Actually Conserve Here?
NOT the budget (total energy Bⱼ). That would require scale
invariance, which you don't have.
WHAT IT CONSERVES: the carrier unit-direction --- θᵢⱼ = sⱼᵢ /
‖sⱼᵢ‖. The "recipe" for how energy is distributed within the type,
independent of total dose.
The math: since the gradient is radial within each type subspace when
the loss has O(kⱼ) symmetry, gradient flow scales all carriers in a type
by the same factor. The ratios don't change. The direction is preserved.
{width="5.625in"
height="2.0833333333333335in"}
Figure 2 --- Type budget evolution. Left: soft penalty --- budgets
drift upward (O(kⱼ) conserves angles, not magnitude). Right: hard
projection --- flat by construction. Both train equally well.
The biological analogy: you can deplete a synapse (change the dose), but
under pure Hebbian gradient flow the relative mix of vesicle contents
--- the recipe --- is preserved.
➤ Noether's theorem conserves the recipe, not the dose. ➤
Adam Breaks It (But In A Derivable Way)
Then we tested it. Measured angular drift through training. Expected:
~0 under gradient flow. Got: drift = 0.52 under Adam.
If Adam breaks symmetry, wouldn't that fail the entire model?
**How can it learn without the symmetry laws to build from?*
*--- Justin Harris, interrupting a training run
Great question. The answer is that Adam doesn't break the symmetry
randomly --- it breaks it in a completely predictable way.
Since the gradient is radial: gᵢ ∝ sᵢ, Adam's second moment estimate
is vᵢ ≈ sᵢ². Therefore:
Δsᵢ ∝ gᵢ / √vᵢ ≈ sign(sᵢ) ← NOT radial
Adam systematically rotates all carriers toward the ±45° equi-magnitude
point --- the "sign attractor."
Prediction: Adam drift saturates at ~0.51. SGD drift stays at 0.00.
➤ Adam doesn't break symmetry chaotically. It breaks it toward a
fixed point you can calculate in advance. ➤
SGD Confirmed It To Machine Precision
SGD ..let's goooo! Ya boi
*--- Justin Harris, upon seeing the results
We trained three models side by side with identical seeds:
Adam Drift = 0.5100 | Train accuracy = 96.2% |
Sign attractor --- CONFIRMED
SGD +mom Drift = 0.0000 | Train accuracy = 50.0%
(chance) | Noether --- CONFIRMED
Pure SGD Drift = 0.0000 | Train accuracy = 47.8%
(chance) | Noether --- CONFIRMED
{width="5.833333333333333in"
height="2.9166666666666665in"}
Figure 3 --- Angular drift vs. epoch for three optimizers. SGD curves
(green) hug zero to machine precision across all 80 training epochs.
Adam (red) converges exactly to the analytically-predicted sign
attractor at ~0.51.
Zero. The angular drift under pure SGD is zero to machine precision
for 80 epochs. The Noether prediction is confirmed as exactly as we can
measure it.
But look at the accuracy. SGD can't train the network. It stays at
chance. Why? The straight-through estimator produces sparse, noisy
gradients. Adam's adaptive scaling gives every parameter a normalized
update regardless of gradient magnitude. That's why it works. That's
also why it breaks the symmetry.
The architecture NEEDS Adam to learn.
**But Adam breaks what makes the architecture theoretically
interesting.
**Real neurons face the same tension --- their 'optimizer' doesn't
preserve symmetries,
**so they evolved explicit homeostatic mechanisms instead.*
*--- From the research log
➤ We confirmed a 19th-century theorem in a 21st-century neural
network. ➤
Angular Deviation as a Failure Signal
{width="5.625in"
height="2.5in"}
Figure 4 --- AUC comparison for failure prediction. Soft angular
deviation score: AUC = 0.558 (weak positive signal). Budget violation:
AUC = 0.489 (null). Hard variant: null throughout.
Using the angular deviation from the training-time EMA profile as a
per-example failure predictor gives AUC = 0.558 for the soft model --- a
weak but real improvement over chance. It's above the null; it's below
the 0.7 target. Honest data.
The signal is there but small. At n = 100 gates, the synthetic task is
too simple to create a strong confident-correct vs.
confused-incorrect separation. This is a scale question, not an
architecture question.
➤ The angular signal is a real but weak failure predictor at this
scale. Part 2 shows what happens when we stress-test it. ➤
What's Next
In Part 2, we test whether conservation violations predict failures
under distribution shift --- and discover that the conservation signal
measures something more fundamental, and more interesting, than we
expected.
Spoiler: the signal goes the wrong way. And the reason why is the most
interesting result of the whole project.
Continue to Part 2 →