Why running-shoe reviewers contradict each other on the same shoe

May 28, 2026

The Saucony Endorphin Pro 5 came out earlier this year. EDDBUD called it probably his favorite first look at an Endorphin Pro yet. FORDY Runs said Saucony had “nailed it” for marathon racers who mildly pronate. Believe in the Run gave it a red light, graded its performance a C-to-D, and called it “a race day shoe from two years ago.” Doctors of Running found it noticeably firmer and stiffer than the Pro 4. Kofuzi suggested grabbing the discounted Pro 4 instead.

Five credible review outlets. Same shoe. One enthusiastic recommendation. One outright rejection. Three caveated takes spread between. None of them are wrong.

And the verdicts above aren’t even on the same scale. Believe in the Run uses colored lights and letter grades. EDDBUD scores out of 12. Doctors of Running scores out of 12 in their solo reviews — but does no numerical scoring at all in their comparison and podcast formats, which is what you’re hearing in the Endorphin Pro 5 case. Kofuzi never gives numbers. Comparing their verdicts as if they were a single comparable number is a mistake before you’ve started reading.

That’s not a flaw in the reviewing ecosystem. It’s a structural feature of it — and once you understand why it happens, reading running-shoe reviews stops being a guessing game.

Looking across reviewer disagreements, there are roughly eight repeating reasons two trusted reviewers will rate the same shoe wildly differently. Below, each one, with the cases that illustrate it best.

1. Tester biomechanics decide more than people admit

In their multi-shoe disagreements podcast, Doctors of Running walked through their split takes on the New Balance Rebel v4. Matt said he loved it — “I really love this thing… got a ton of miles on it.” His co-host landed in a spot just behind the rear outsole patch; the shoe rocked backwards onto his heel and produced, in his own words, “literally the only shoe that has ever made my achilles hurt ever.” He ended up giving the pair to his mom — who, apparently, loves them. Same shoe, same publication, same review window. One verdict positive, the other negative — and the cause was a landing-zone geometry that interacted with one tester’s stride and not the other’s.

This isn’t an outlier. Look at the Nike Pegasus Premium. FORDY’s verdict was blunt: “Do not buy this shoe… no reason to buy this if you are a runner, absolutely no point whatsoever. This has no place in anybody’s shoe rotation.” EDDBUD totaled 10.3 out of 12 overall — which sounds like a recommendation, until you see the sub-scores. His system scores four categories out of 3 each (upper, midsole, outsole, value); the Pegasus Premium got 2.5/3 on value and 2.5/3 on midsole, with the verdict: “way too expensive… not essential by any stretch.” A strong aggregate, in his rubric, doesn’t translate to “buy it” — construction quality and price are tracked separately. The Run Testers, in a Vomero Premium vs. Pegasus Premium head-to-head, picked the Vomero as the definite winner and said the Pegasus Premium “just misses the mark.” All three reviewers describe the same physical sensation: a soft ReactX wedge above a rigid full-length Air Zoom unit. They diverge on whether that sensation is acceptable — and that depends almost entirely on landing geometry. Nike explicitly positions the shoe for “neutral runners who land mid to forefoot.” Reviewers inside that band tolerated the stack; reviewers outside it called it unstable.

The takeaway: a recommendation from a midfoot striker and a pan from a heel striker are not contradictory data points. They’re the same shoe filtered through two biomechanically different runners. The verdict is more about them than about the shoe.

2. The pace and distance the shoe was tested at

The Endorphin Pro 5 split makes more sense once you read the disagreement, not just the headline. Doctors of Running framed the Pro 5 as a meaningfully different ride from the Pro 4 — firmer, stiffer, with the PWRRUN HG layer brought out more. FORDY positioned it specifically as a marathon racer for runners who mildly pronate. Believe in the Run tested it to 10 miles and reported forefoot fatigue setting in late in that distance.

These reviewers aren’t disagreeing about the shoe. They’re disagreeing about which question the shoe is being asked to answer. At a 5K-to-half-marathon pace, the firm PWRRUN PB bottom layer feels snappy. At marathon distance, that same firmness becomes the late-run fatigue Believe in the Run identified. Both verdicts are correct — for the use cases they tested.

You’ll see the same pattern in lighter daily trainers. Kofuzi tested the Rebel v4 for 100 miles and described its sweet spot as “an easy run with strides” or longer easy efforts — a shoe he reached for with excitement on those days, but found cumbersome for faster workouts. Run Moore tested it earlier in his rotation and called it a nice all-encompassing everyday trainer that delivered bang for the buck. The shoe lives in a narrow speed band; reviewers who stayed in that band loved it, reviewers who pushed past it found its limits.

When you read a verdict, the implicit question is: at what pace and distance was this tested? If the reviewer doesn’t say, the rating tells you less than you think it does.

3. Foot shape vs. the shoe’s last

The Rebel v4 also shows this. New Balance widened the last meaningfully between v3 and v4. Run Moore called the new fit true to size with the wider, longer last as a clear improvement. EDDBUD totaled 9.9 out of 12 overall — but rated the upper just 2 out of 3, in his words “one of the lowest scores I’ve given in a while.” He called the upper “the elephant in the room” and ended up switching out the laces and gluing in the insole to get a usable fit. Kofuzi landed in between, calling the v4 “borderline a half size too big.”

None of these takes is wrong. The shoe got wider. If your feet are wide, that’s a feature. If your feet are narrow, the same change is a fit problem severe enough to require aftermarket modifications to get a usable shoe.

This is the easiest disagreement to read past once you know it’s there. A reviewer’s foot shape is, frustratingly, often not stated — but the language gives it away. “Snug,” “had to cinch,” “ran narrow” describe narrow-footed reviewers in a wide shoe. “Roomy,” “wiggle room,” “true to size” describe a fit-match. Treat the verdict as data about that reviewer’s feet meeting that shoe’s last, not as a universal claim.

4. Surface and conditions

The HOKA Tecton X 3 is a $275 carbon-plated trail shoe. Run Moore, in a Cascadia Elite vs. Tecton X 3 head-to-head, said all the hype had been exceeded — he “freaking loved” the shoe. Believe in the Run graded it A — even bumping to grade S for “top tier” — but flagged the soft upper compromising lockdown on technical terrain: “foot slides side to side.” The Ginger Runner found the shoe genuinely awesome on the run but couldn’t justify the $275 price tag — “I’m just not getting that much spectacle out of it.” Seth James DeMoor framed it more cautiously: a buffed-out trail shoe that will show up on a lot of starting lines, possibly too much shoe for shorter trail races.

The reviewers aren’t disagreeing about the shoe. They’re testing it on different ground. On smooth fire roads and buffed singletrack, the soft upper isn’t a problem. On scrabbly technical terrain, the same upper becomes the deal-breaker Believe in the Run identified.

You get the same pattern with the Adidas Adizero Evo SL ATR — Adidas’s winterized all-terrain spin on the standard Evo SL. FORDY ran it on frosty British pavement and called the result a stroke of genius — “Thank you Adidas… you need to go out and buy this one.” EDDBUD ran it on actual mud and was blunt: “absolutely useless on mud… shallow 2mm lugs compressing into soft foam.” The shoe is genuinely good at the surface FORDY tested it on. It’s genuinely less good at the one EDDBUD tested it on. Both takes are accurate.

When a reviewer rates a trail or winter shoe, the rating describes the shoe at the surface they tested it on — full stop. The marketing might claim “all-terrain”; the reviewer is reporting on whatever ground was outside their door that week.

5. Break-in time and how many miles got tested

The On Cloudmonster 3 is the cleanest case of this in our catalog. Sort the verdicts by miles tested:

The Run Testers, 28 miles tested → “this one isn’t the hit for me… long list of shoes I’d rather be looking at ahead of it”
Doctors of Running, ~50 miles tested → a positive verdict for easy daily and walking miles, with a noted firmer feel than v2
Ben Is Running, ~100 km tested (in a Cloudmonster 3 vs. Hyper 3 head-to-head) → “if I could only have one, I would pick the Cloud Monster 3,” with the explicit caveat: “give it a chance because it does take a little bit of time to break in”
Run Moore, extended testing → the shoe became one he kept reaching for and now uses more often in his rotation

EDDBUD also mentioned the foam needing to soften up over the initial miles. The shoe ships firm; by somewhere around mile 30-50, the CloudTec pods compress into something poppier and more responsive. Reviewers who tested through that window were experiencing a different shoe than reviewers who finished earlier.

This is a methodology question the industry doesn’t really discuss publicly. A 28-mile review is honest; a 100-km review is also honest. They’re reviewing different states of the same product. If you read only the 28-mile take, you walk away with a partial picture — not because the reviewer was wrong, but because the shoe wasn’t done changing yet.

When you read a review, check the mileage. Most YouTube reviewers state it. If they don’t, default to mid-skepticism on initial-impression reviews of shoes from brands known for break-in effects — On’s CloudTec, Brooks’s DNA Loft, and historically Saucony’s PWRRUN PB all behave this way.

6. What the reviewer is comparing it to

The Brooks Ghost Max 3 is the textbook case. The verdicts in our catalog span the full range:

Doctors of Running (in a 3-trainer roundup with Ghost 17 + Hurricane 25): the Max 3 works much better than v2 and returns to what made the original Ghost Max good
EDDBUD: scored it 10.2 out of 12 — “well built and an upgrade over the standard Ghost 17”
FORDY: a qualified positive — a good shoe for bigger people and runners prioritizing comfort and stability
Kofuzi: a plush, comfy daily trainer that existing Brooks customers will love, though he wished Brooks had pushed itself further
Believe in the Run (in a Ghost Max vs. Glycerin Max head-to-head): “your beginners max cushion shoe… feels very similar to me sort of like a Clifton”
The Run Testers: a thumbs up but qualified — “not a shoe that I would pick up very much in comparison to the softer, bouncier shoes I like… it was fine”
Run Moore (in a 3-way comparison across the Brooks Max family): “very lukewarm tepid… probably the clunkiest of the three.” His pick of the Brooks Max family is the Hyperion Max 3, not this one.

Doctors of Running evaluated it as an update over the Ghost Max 2 and called it a real improvement. Run Moore evaluated it against the rest of the Brooks Max family and found it the weakest of three. Believe in the Run evaluated it against the broader $150 max-cushion market and concluded that with a bit more budget the Glycerin Max wins. The Run Testers evaluated it against the softer, bouncier dailies they prefer and found it competent but uninspiring.

None of these reviewers is wrong. They’re each internally consistent. But the verdict is meaningless without the comparison set, and the comparison set is usually implicit, not stated.

The Asics Gel-Cumulus 28 has the same problem in miniature. EDDBUD scored it 10.9 out of 12 — “ideal for neutral runners that want a nice, increased stack, but a more traditional upper feel.” The Run Testers called it “perfectly acceptable… solid daily trainer… not that exciting.” Same shoe. EDDBUD evaluated it within max-cushion long-run cruisers. The Run Testers evaluated it against the hotter Novablast 5 and Nimbus and found it generic. Within-brand cannibalization can drop a perfectly competent shoe several rungs.

The On Cloudsurfer 2 has the version-update variant. The Run Testers (in a Cloudsurfer 2 vs. Cloudmonster 2 head-to-head) picked the Cloudsurfer 2: “easy choice… everything about the shoe I prefer.” Believe in the Run felt the shoe might just be a miss — “hopeful for the next version.” Both reviewers acknowledged the same change: the shoe got firmer between v1 and v2. They disagreed on whether that change was an improvement.

When you read a review, ask: what is this being compared to? If the answer is “the previous version” or “another shoe by the same brand” or “the broader category,” you’re reading three different verdicts dressed up as one.

7. What question the reviewer was asking

Back to the Endorphin Pro 5. FORDY’s positive verdict was specifically for marathon and half-marathon racers who mildly pronate — Saucony “nailed it” for that runner. Doctors of Running’s video was framed as a comparison to the Pro 4, with the takeaway that the Pro 5 was firmer and more aggressive — and longtime Pro 4 fans should expect a different ride.

These are different shoes when you frame them that way — or rather, they’re the same shoe answering different questions. FORDY asked: is this a good stability marathon racer? Yes. Doctors of Running asked: should Pro 4 fans expect the same shoe? No. Same shoe; different implicit questions; different verdicts.

A verdict is an answer; it presupposes a question. Most reviewers don’t state their question explicitly. If you don’t know what question they were asking, the answer is less useful than it looks.

8. Foam-character preference and tolerance for experimentation

The Brooks Glycerin Flex shows this. Believe in the Run gave it a green light: “I’ve been enjoying it quite a bit.” Kofuzi pushed back directly on the “gimmick” framing: “I don’t think the Flex is a gimmick. I definitely enjoy the Flex.” The Run Testers found it solid but worried about over-saturation in Brooks’s daily-trainer range — “may end up being to its detriment.” Run Moore couldn’t get along with it but suspected “maybe it’s a me thing.” FORDY was outright dismissive: “I’m not sure who’s going to want to go and buy this shoe… I don’t think it’s worked.”

The verbal gradient is the disagreement: from “green light, enjoying it” to “I don’t think it’s worked.” Reading the verdicts, the disagreement isn’t about facts: every reviewer noticed the decoupled flex-groove design, every reviewer noticed it was lighter than the standard Glycerin. They disagreed on whether the experiment was worth running.

Believe in the Run and Kofuzi framed the flex grooves as innovation worth rewarding. FORDY framed them as a failure of concept. Neither side is operating on bad information.

You’ll see this pattern with most experimental shoes — anything that materially changes the formula a brand is known for. Some reviewers reward the willingness to try; others reward fidelity to what works. The shoe is the same; the principle being applied to it is different.

How to read shoe reviews when you know all this

If you’ve read this far, the short version is: a verdict is a single signal stripped from a lot of context that determines whether the signal applies to you.

A practical checklist:

Read the verdict in its own system. A red light from Believe in the Run is not a converted number; it’s a recommendation in their rubric. EDDBUD’s 10.3 out of 12 means something different from a 10/12 elsewhere. Kofuzi never says a number at all — his recommendation lives in how he describes the shoe, not in a score. Aggregators (including ours) that flatten these into a single comparable rating are doing real harm to the underlying signal. Read the original language, not the conversion.
Read at least three reviews. Any single take, no matter how well-respected the reviewer, is one data point filtered through one set of feet, one stride, one mileage, one comparison frame.
Find the reviewer whose body type, stride, mileage, and target distance most resemble yours. A glowing review from someone who runs your weekly mileage at your paces on your surfaces is more useful than the same glowing review from someone who doesn’t.
Ask what they’re comparing it to. “Compared to what?” is the single most useful question to apply to a verdict. If the reviewer doesn’t say, look for it in the body of the review.
Note the mileage when the verdict was rendered. Initial-impression reviews of break-in-heavy shoes can describe a state that doesn’t exist after mile 50.
Treat divergence as information, not noise. When two reviewers split widely, the reason they split is usually more informative than either verdict alone. It tells you what dimension of the shoe is sensitive to who you are.

The implicit promise of a shoe review is that the verdict generalizes. It usually doesn’t. The work of figuring out whether a shoe is right for you is almost entirely the work of figuring out which reviewers’ contexts match yours — and how to weight what each said accordingly.

This is the problem Next Pair was built to help with. We pull what each reviewer actually said about a shoe, name them, link to the source video, and let you weight their take against your runner profile — so you can quickly find the reviewer whose context most resembles yours and go watch their full review. We don’t try to flatten different reviewers’ systems into a single comparable number, because the systems aren’t comparable.

All quoted material is from publicly available reviews and was verified against video transcripts before publish. Next Pair’s catalog is built directly on the work of these reviewers; this post is a methodology explainer, not a critique of any individual reviewer or outlet.

Reviewers cited

In order of first appearance. Go watch their work — they’re the source of everything here.

EDDBUD — YouTube
FORDY RUNS — YouTube
Believe in the Run — YouTube
Doctors of Running — YouTube
Kofuzi — YouTube
The Run Testers — YouTube
Run Moore — YouTube
The Ginger Runner — YouTube
Seth James DeMoor — YouTube
Ben Is Running — YouTube