I Tracked 35 Supplements and 2 Years of HRV Data. The Result Surprised Me.

I wanted to write the article every biohacking blog promises and never delivers.

“Two years of personal HRV data. 35 supplements ranked by effect size. Statistically validated. No bro-science.”

I had the data. I had the discipline to track everything. I sat down expecting to publish a clean ranking — winners on top, expensive disappointments at the bottom, a few clear recommendations. The kind of post that gets shared on Twitter.

I’m not publishing that article. Because once I ran the numbers properly, the ranking dissolved — twice.

What follows is a long-form walk through what 576 days of Whoop HRV plus 38 documented supplement intake periods actually let you conclude. It’s also a case study in how a careful-looking N=1 analysis can fool the person running it — even when the person running it is paying attention. I made two methodology mistakes that almost survived. I caught both. They are the kind of mistakes every “I tried X and my HRV went up Y%” post on the internet is making invisibly.

The Dataset

What	Detail
HRV source	Whoop 4.0, rMSSD measured during slow-wave sleep
HRV window	2024-10-10 → 2026-05-11 (576 days with valid HRV)
HRV range	19 ms (worst day) → 122 ms (best day)
Supplements tracked	35 unique products, 38 documented intake periods
Logging precision	Amazon order CSV → period start + end date per product
Other signals	Resting HR, Recovery score, Sleep duration, Sleep performance, Daily strain

The logging discipline was the easy part: every supplement order from Amazon got mapped to a start_date → end_date period via a simple import script. Whoop syncs continuously. Two years of data, hands-off.

What I didn’t track granularly enough — and this turned out to matter more than the tracking — was: alcohol grams per day, subjective stress on a 1-10 scale, travel days, inflammation markers, and one non-supplement variable I’ll mention later. We’ll come back to all of it.

Attempt 1: The Naïve Analysis

The obvious approach: for each supplement, compare HRV during the intake window to a baseline window before it.

28-day pre-baseline. Welch’s t-test for unequal variances. Mann-Whitney U as a non-parametric backup. Cohen’s d for effect size. Standard playbook.

Here’s the top of the ranking it produced:

Supplement	Started	ΔHRV (vs pre-baseline)	Cohen’s d	p (Welch)
Sunflower lecithin	2025-11-21	+32.4 ms (+78%)	2.02	<0.001
L-Tyrosine	2025-11-21	+32.4 ms (+78%)	2.02	<0.001
PEA (Palmitoylethanolamide)	2025-11-23	+31.2 ms (+73%)	1.87	<0.001
P5P (active Vitamin B6)	2025-11-21	+30.1 ms (+73%)	1.84	<0.001
Magnesium Bisglycinate	2025-11-23	+26.7 ms (+63%)	1.83	<0.001
Taurine	2025-08-31	+25.0 ms (+61%)	1.33	<0.001

A Cohen’s d above 1.5 is, in clinical research, a massive effect — bigger than most prescription antihypertensives, bigger than caffeine on alertness.

If I’d stopped here, I’d be telling you that lecithin (a phospholipid you can get from two eggs) raised my HRV by 78%, p < 0.001.

Two things saved me from publishing that.

Clue 1: Multiple supplements with identical effect sizes

Look closer. Sunflower lecithin and L-Tyrosine show exactly the same +32.4 ms. Butea Superba and Magnesium Bisglycinate show exactly the same +26.7 ms (further down the list).

Why? Because I started them on the same day. They share the same intake window. They share the same pre-baseline. The math has no way to attribute the effect to one or the other.

In fact: during the lecithin window, an average of 14.2 other supplements were active simultaneously. The “lecithin effect” is the effect of whatever was happening to my body during late November 2025 onwards — which includes, but is in no way reducible to, lecithin.

Clue 2: The pre-baseline wasn’t a baseline

I plotted monthly HRV averages across the whole 19-month window. This is where everything broke.

Month	Avg HRV (ms)
Oct 2024	86.7
Dec 2024	78.4
Feb 2025	62.0
May 2025	53.8
Aug 2025	40.6 ← trough
Nov 2025	48.3
Feb 2026	81.6
May 2026	82.6

My HRV took a 19-month U-curve. From 87 ms in late 2024, down to 41 ms in August 2025, back up to 82 ms in early 2026. A swing of more than 45 ms — roughly 2 standard deviations of the entire dataset.

That trough dwarfs anything a single supplement could plausibly do. And I started most of the “winning” supplements in October–December 2025, right at the bottom of the U. Their pre-baseline windows captured the trough; their intake windows captured the recovery climbing back to normal. The supplements got credit for biology that was going to happen regardless.

The first lesson was obvious in hindsight: before-vs-during analysis is structurally broken in a trending dataset. Whatever caused the U-curve had to be controlled for, not averaged into.

Attempt 2: Controlling For the Trend

If the HRV trajectory has its own slow dynamics, the right comparison is not “during the supplement vs. the 4 weeks before”, but “during the supplement vs. what HRV would have done anyway.”

I implemented this with a 60-day rolling-median baseline: for each day, compute the expected HRV from the surrounding 60 days. Then compute the daily residual — how much that day deviates from its local expectation. A real supplement effect would show up as a positive (or negative) bias in residuals during the intake window, with confidence intervals that don’t include zero.

I bootstrapped each window 1000 times to get a proper 95% confidence interval.

The result, ranked by residual:

Supplement	n days	Δresidual (ms)	95% CI	Significant?
Psyllium husk	41	+2.7	[-2.1, +8.1]	No
Saccharomyces boulardii	43	+2.5	[-1.9, +7.3]	No
Digestive Enzymes	43	+2.5	[-2.2, +7.7]	No
Taurine (powder, recent)	29	+2.5	[-3.3, +8.3]	No
Bergamot	34	+1.5	[-3.6, +7.4]	No
Vitamin B complex	87	+1.4	[-1.7, +4.7]	No
L-Tyrosine	172	+0.25	[-2.4, +2.5]	No
Magnesium Bisglycinate	61	-0.5	[-3.7, +2.3]	No
Vitamin A	205	-1.3	[-3.3, +0.7]	No
CoQ10 (Ubiquinol)	61	-2.3	[-6.3, +1.2]	No
Uridine monophosphate	61	-2.3	[-6.7, +1.8]	No
Omega-3 algae oil	21	-14.0	[-18.1, -9.6]	Yes

Out of 33 supplement periods with sufficient data, exactly one reached statistical significance. And it was negative.

The +32 ms lecithin effect — gone. The +27 ms magnesium effect — gone. CoQ10, tyrosine, PEA, P5P, Butea Superba — all within noise. Almost the entire ranking from Attempt 1 collapsed into a confidence interval straddling zero.

I almost stopped here. Two analyses, one conclusion: most of the “effects” are timing artifacts. Clean enough for an article.

Then I had to deal with one more thing.

Attempt 3: The Confound I Had to Exclude

There’s a stretch in the middle of my 19 months during which a non-supplement variable in my life was clearly affecting my HRV. It’s outside the scope of this article and I won’t be sharing the specifics — but it has to be controlled for, so I’m flagging it.

The relevant fact for the methodology is simple: during that window, my HRV was being moved by something that had nothing to do with the supplements I was studying.

That means two things:

Any supplement whose intake window overlapped the confound period had a negative bias in its measured effect — its “during” days were artificially compressed.
Any supplement whose pre-baseline ended right after the confound period had a positive bias — its baseline captured suppressed days, and its intake window captured the regression back to my actual baseline.

That second one is the dangerous one. It’s why six supplements I started between late October and late November 2025 showed huge naïve effects: their pre-baselines lived in a confounded valley, and their during-windows lived on the recovery climb out of it.

Putting numbers on the confound

I anchored the analysis with HRV medians from two clean windows on either side of the confound:

Pre-anchor (the months before the confound period): median HRV 60.0 ms (n=87 days)
Post-anchor (the months after the confound period): median HRV 58.1 ms (n=92 days)
Actual HRV during the confound window: median 52.2 ms (n=153 days)

Linear interpolation between the anchors says HRV “should have been” around 59 ms during the confound window. Actual was 52 ms. The confound cost me about 7 ms of HRV on average during that period.

A small but real effect — and importantly, smaller than the full 45 ms U-curve. Which means: most of the U-curve is NOT explained by this confound. The 87 → 60 ms decline from late 2024 through early 2025 happened before the confound period started. The 58 → 82 ms climb from late 2025 onward happened entirely after it ended. The confound added about 7 ms of compression on top of an already-existing slow drift.

That matters because it means the methodology problem from Attempt 1 is real with or without the confound. The detrending in Attempt 2 was still necessary.

Re-running the analysis two ways

I corrected for the confound two ways and compared:

Approach C — Hard Exclude. Drop every day inside the confound window from both the baseline and intake windows. If a period loses too many days to be statistically meaningful, flag it as “insufficient.” This is the most conservative correction, at the cost of some periods becoming unanalyzable.

Approach D — Offset Correction. Add +6.85 ms to every day inside the confound window (the offset that brings the actual window mean back to the interpolated expectation). Then re-run both the naïve before-vs-during AND the rolling-median residual analyses on the corrected daily series. This preserves all the data and trades a small modeling assumption for statistical power.

Both approaches agreed:

Supplement	Original Δ	Hard-Exclude Δ	Offset-Corrected Δ	Detrended Residual (95% CI)
Creatine Monohydrate (Mar 2025)	-12.2	-7.4	-6.9	+0.9 [-1.1, +2.7]
Taurine (Aug 2025)	+25.0	insufficient	+19.0	-0.0 [-1.9, +1.8]
Magnesium Bisglycinate (Oct 2025)	+14.0	insufficient	+7.7	-0.9 [-3.0, +1.0]
CoQ10 (Sep 2025)	+2.0	insufficient	-3.7	-2.3 [-6.2, +1.2]
Vitamin A (Oct 2025)	+11.6	+12.2	+9.1	-1.3 [-3.3, +0.7]
Sunflower lecithin (Nov 2025)	+32.4	+32.4	+32.4	+0.2 [-2.2, +2.6]
L-Tyrosine (Nov 2025)	+32.4	+32.4	+32.4	+0.2 [-2.4, +2.5]

What changed:

The “Creatine lowers HRV” story was wrong. The 2025 creatine window was 76% inside the confound period. Once you correct for that, the apparent effect drops from -12 ms to -7 ms (naïve) and to +0.9 ms (detrended, no effect). Creatine wasn’t doing anything to my HRV — the confound was.
The “Taurine raises HRV by 25 ms” story was wrong. Starting taurine in August 2025 meant its baseline was at the bottom of the trough and its during-window covered the entire recovery climb. With offset correction the effect shrinks to +19 ms naïve; with detrending it goes to literally zero. Taurine looked like a top-five winner of the entire dataset. It wasn’t.
CoQ10 silently flipped sign. Naïve analysis said +2 ms; offset-corrected said -3.7 ms; detrended says noise. The naïve result wasn’t even a “real” winner, and the corrected one isn’t a real loser either. But the sign-flip is a useful warning: in a confounded dataset, even small naïve effects are unreliable in their direction.
The big November-2025 winners didn’t change. Sunflower lecithin, L-tyrosine, P5P, PEA, magnesium — their start dates are all after the confound window, so neither Hard Exclude nor Offset Correction changes their naïve numbers. They were already explained by the detrended analysis (zero effect after controlling for slow drift). The confound correction doesn’t add or subtract evidence; the detrending already showed they were artifacts.

After all three corrections — drift, confound, multiple comparison — exactly one period in the entire 19-month dataset still showed a statistically significant effect. The same one as before: Omega-3 algae oil, 21 days, -14 ms residual, 95% CI [-18.1, -9.6]. And that result is almost certainly an edge-effect artifact: the 21-day window was too short for a stable rolling baseline, and those particular days happened to fall on the two worst HRV weeks of the entire dataset (calendar weeks 43–45 of 2025). The algae oil didn’t cause the dip — it was simply present during it.

Out of 38 documented intake periods, with three independent statistical corrections applied, the supplement evidence is consistent with: nothing detectable.

Why N=1 Supplement Analysis Is So Hard

A fair question after reading this far: “Then why bother tracking at all?”

Because the dataset proved one thing very clearly: the methodology used in 99% of online supplement self-experiments doesn’t work.

When someone tweets “I started X and my HRV went up 15%”, they’re almost always doing one of these:

Trap 1: Their baseline is too short

A 7-day or 14-day pre-baseline can’t capture personal rhythms. If they happened to have a stressful week before starting, the supplement gets credit for the stress regressing. My HRV had a 45 ms U-curve over 19 months. Anyone with a normal life has at least 10–20 ms cycles over weeks.

Trap 2: They start multiple supplements at once

I added six supplements in a five-day window in November 2025. The data physically cannot tell them apart. If you start Magnesium, Ashwagandha, and Omega-3 in the same week and your HRV goes up, you’ve learned “adding stuff to my morning routine correlates with feeling better” — which is mostly placebo-adjacent.

Trap 3: They don’t control for what else changed

Most “supplement worked!” stories are confounded by: better sleep that week, a deload week from training, recovering from a cold, less alcohol, a new partner, hot weather (HRV is temperature-sensitive), travel, or — in my case — a non-supplement life variable. You don’t see those variables in the screenshot.

The confound in my dataset is the kind of thing every honest self-experimenter eventually runs into. Real life is not RCT-clean. The discipline isn’t pretending the confounds don’t exist — it’s being willing to flag and excise them.

Trap 4: They publish the wins, not the losses

This one is the worst. If I’d published the naïve analysis above, my “lecithin +78%” claim would already be on Twitter. I’d never publish that the same dataset says creatine lowered my HRV by 19% during its 198-day window in 2025 — also “statistically significant,” also a complete artifact of timing and an unrelated confound.

Selection bias × bad methodology = an industry.

What the Data Did Prove

Despite the negative result, the analysis taught me real things:

1. Slow HRV drift exists and dominates

A 45 ms U-curve over 19 months means HRV has its own dynamics on timescales longer than most supplement trials. Even after correcting for the unrelated confound, the residual slow drift is still huge. Either you trial for >6 months with rotating on/off blocks, or you don’t trial at all.

2. Supplement stacks of 15+ are unanalyzable

I currently take 17 things daily. Even if 3 of them are working, I can’t tell which 3. The information cost of adding another supplement to an active stack is roughly zero. You can only learn from supplements you take in isolation.

3. Training, sleep, and life events matter more than I want to admit

Across the 19 months, day-to-day HRV correlated more strongly with the previous day’s strain and sleep duration than with any supplement flag. The biggest HRV moves I can explain came from a vacation week (Whoop 7-day average jumped 12 ms) and from a stretch of 3 nights below 5 hours of sleep (Whoop dropped 18 ms).

The CoQ10 question is interesting in a vacuum. It’s not interesting compared to did you sleep 7 hours last night.

4. The supplements that have actual RCT-grade evidence aren’t going anywhere

I’m not going to stop taking:

Creatine monohydrate — 30+ years of solid evidence for strength, cognition, neuroprotection. My N=1 can’t detect a 0.5 ms HRV effect; that doesn’t mean it isn’t real. (And as we saw, the apparent “creatine lowered HRV” finding was a confound, not a result.)
Omega-3 (EPA+DHA from fish oil) — cardiovascular and anti-inflammatory evidence is robust. I switched products mid-year, which is one reason the “algae oil” period looks weird in this analysis.
Vitamin D3 — I’m at 47°N latitude; my serum levels were 18 ng/mL pre-supplementation. This is endocrinology, not biohacking.
Magnesium glycinate — sleep-onset effect is plausible, but the evidence in my dataset isn’t from HRV; it’s from subjective sleep latency.

The decision criterion isn’t “did my Whoop go up?”. It’s “does mechanism + literature + cost + side-effect profile make this rational?“

5. What I’m dropping

After this analysis, my drop list:

Butea Superba (Thai herb) — no RCT evidence in men with normal baseline endocrine status, my data is uninterpretable, €30/month is buying nothing.
Lecithin at the dose I was taking — choline status was already fine via diet; the “phospholipid for cognition” pitch doesn’t hold up at sub-therapeutic doses.
Uridine monophosphate — the cognitive enhancement evidence is shaky outside specific clinical populations; I can’t detect any effect.
Astaxanthin — I love the antioxidant story; my dataset can’t distinguish it from placebo at consumer doses.

That’s ~€80/month back. The savings will fund the controlled crossover trial I’m setting up next.

What I’m Doing Differently for Year 3

Concretely, the next 12 months of data collection will look different:

Better daily logging (one-tap entries each evening):

Alcohol grams
Subjective stress 1–10
Travel day yes/no
Major non-supplement lifestyle changes (so the next confound window gets flagged before it contaminates the analysis, not after)
Resting heart rate variance

Add inflammation markers quarterly: hs-CRP, IL-6, homocysteine. If a supplement is doing something systemic, it’ll usually show up in inflammation before HRV.

N=1 crossover trial design:

Pick 2–3 candidate supplements I genuinely want to evaluate (current shortlist: CoQ10, Ashwagandha-KSM-66, Bergamot). For each one:

4 weeks ON / 4 weeks OFF / 4 weeks ON / 4 weeks OFF
During every ON/OFF transition, change nothing else in the stack
Blind where possible — capsule re-packaging into unmarked vials
Pre-register the hypothesis (HRV-delta on vs off) before starting

That’s 16 weeks per supplement, with 2–3 in parallel = roughly a year of real data. The result will still be N=1. But it’ll be N=1 done at the level the question deserves.

Continue the daily Whoop logging. That part of the system actually worked.

The Hard Truth About Self-Quantified Supplementation

The supplement industry is worth ~$170 billion. The volume of “I tried this and felt great” content online is essentially infinite. The amount of actual data is microscopic, and most of it — including the first two analyses I almost published — is contaminated by exactly the methodological errors I described above.

The right standard for an N=1 conclusion is the same as for an RCT: could a sober person look at this data and disagree with my conclusion? If yes, the conclusion isn’t ready.

After two years and 35 supplements, three rounds of statistical correction, and one confound I had to acknowledge separately, I have one finding I trust:

I cannot, from before-vs-during analysis, reliably attribute HRV changes to individual supplements when those supplements are taken alongside 10+ others against a baseline of slow drift and at least one unrelated life variable I have to control for separately.

That’s it. That’s the article.

I know it’s not the ranking everyone wants. But it’s the only ranking the data supports — which is also a useful thing to know before you spend another year and another four figures.

I’ll publish year 3 in May 2027. The methodology will be better. The conclusions might still be modest. That’s how this should work.

Tracking my own data daily with: Whoop 4.0 (HRV + sleep + recovery), and a self-built Mission Control dashboard for the supplement log.

Affiliate disclosure: the Whoop link contains my referral at no extra cost to you. No affiliate links on supplements — for obvious reasons, given the contents of this article.

This is not medical advice. Talk to your doctor before starting or stopping anything you’re taking for health reasons.