DraftGap reads a draft as a bag of champion pairs, which means it literally cannot tell you that a team of five AP champions is beatable with one magic-resist item. This post shows a concrete case where that blind spot produces a 24-point win-probability error, then measures how much the same blind spot costs DraftGap on 32,750 real ranked matches neither tool had ever seen. Both halves use DraftGap's own published formula, run against DraftGap's own frozen data snapshot.
DraftGap shortcomings
In their own FAQ section, DraftGap explicitly acknowledges their limitations:
Does DraftGap have any shortcomings? DraftGap is not perfect, and there are several things to keep in mind. The overall team comp identity is not taken into account. The synergy of duos within a team are used in the calculations, but the tool does not know about team comp identity like 'engage' or 'poke'. Damage composition is also not used in the calculation (but it is shown, above the team winrate), so you need to keep this in mind on your own. These shortcomings result from the fact that there is not enough data to make a perfect prediction. And we do not want to incorporate opinions like 'malphite is an engage champion' into the tool, as using just data is the most objective way to make a decision.
LoLDraftAI, a neural net trained on raw matches, reasons about the draft as a whole rather than as a bag of independent pair statistics. Here's a concrete case where that matters.
DraftGap shortcoming example: full-AP draft
To surface this concretely, we constructed a deliberately adversarial partial draft: one team of five, no enemy team yet, with all five champions dealing only magic damage. Both tools were given the same input:
- Top: Anivia
- Jungle: Zyra
- Middle: Zilean
- Bottom: Brand
- Support: Elise
DraftGap puts this team at a 64.88% win chance, because each of these champions has a strong solo winrate and they pair well on paper. LoLDraftAI, which reasons about the damage profile of the whole team, puts the same draft at 40.2%. A single magic-resist item on every enemy trivializes the entire team's damage, and LoLDraftAI picks that up from the composition alone; DraftGap has no way to see it.
DraftGap prediction:

LoLDraftAI prediction:

This isn't just about the overall win rate. It leaks into champion suggestions too. Against a full-AP team, LoLDraftAI rates MR-stacking counter-picks (Ornn, Galio, Kassadin) as extremely strong, because the model understands that a single magic-resist item on every enemy trivializes the team's damage. DraftGap, not seeing the damage profile, treats those picks as average.
Shortcomings conclusion
The full-AP comp isn't isolated. The same blind spot applies to any composition-level dynamic (side differences, carry counts, CC distribution, total damage, scaling curves). Below, we measure how much these blind spots add up to on ~33,000 real ranked games neither tool had seen before.
Statistical accuracy comparison
Holdout methodology
Scoring a model against matches it was trained on is meaningless, so we use a strict temporal holdout:
- DraftGap snapshot frozen at 2026-04-18, 12:57 UTC. That's the
datefield of theircurrent-patch.jsondataset. Since DraftGap is a pure statistical formula over a fixed snapshot, anything played after this is out-of-sample for it. - LoLDraftAI training cutoff at 2026-04-17, 23:25 UTC — the latest game timestamp in the model's training data.
- Eval set = matches played after both cutoffs. Neither tool has ever seen these games.
The eval set is further restricted to match DraftGap's own data scope: ranked solo/duo, emerald+ (DraftGap pulls its data from Lolalytics at tier=emerald_plus), EUW1 and KR regions. No filter on game duration: remakes and short games are part of the population both tools try to predict, so there's no principled reason to hide them.
Matching the evaluation to DraftGap's scope
If LoLDraftAI wins on even footing, it wins on the merits, so we match every asymmetry that would otherwise give it an unfair head start over DraftGap:
- Side-agnostic prediction (no blue/red knowledge), matching DraftGap's side-blind nature.
- A single fixed elo bucket for every match, matching DraftGap's flat emerald+ aggregate.
- Matches where any
(champion, role)cell has fewer than 50 games in DraftGap's current-patch dataset are excluded, since DraftGap falls back to a default baseline in those cells (no real data to work with); leaving them in would be evaluating it outside its supported regime.
Headline results
Quick definitions: Log loss (lower = better): how surprised the model is by the true outcome. ln(2) ≈ 0.693 is the coinflip score; below that beats 50/50. Brier score (lower = better): mean squared error between predicted probability and outcome. Accuracy: fraction of games the higher-WR team actually won. ECE (expected calibration error): average gap between stated probability and observed frequency, so lower means stated confidence matches reality. Parenthetical ranges are 95% bootstrap confidence intervals, the plausible range if we resampled matches with replacement. On every one of these, LoLDraftAI beats DraftGap.
| Model | Log loss | Accuracy | Brier | ECE |
|---|---|---|---|---|
| DraftGap | 0.6869 (0.6851–0.6887) | 54.66% (54.12–55.20) | 0.2469 (0.2460–0.2478) | 0.0199 |
| LoLDraftAI | 0.6829 (0.6811–0.6846) | 55.88% (55.34–56.42) | 0.2449 (0.2441–0.2458) | 0.0088 |
Evaluated on 32,750 matches. A few things to note:
- On log loss and Brier score, the 95% confidence intervals do not overlap. That's the strongest signal here: the difference is not explained by sampling noise.
- ECE is roughly 2× lower for LoLDraftAI. That's a calibration story: when LoLDraftAI says "65%", it's close to right; DraftGap overshoots the confident end.
- The shipped LoLDraftAI product additionally uses side information and fine-grained elo (features DraftGap doesn't have), but they weren't needed to produce this result.
Calibration
A reliability diagram: for each bucket of predicted win probability (x-axis), we plot the actual observed win rate for matches in that bucket (y-axis). A perfectly calibrated model sits exactly on the dashed diagonal: when it says "70%", it wins 70% of the time. The further a curve drifts from the diagonal, the less its stated probabilities match reality.

DraftGap consistently understates the observed winrate of losing drafts and overstates the observed winrate of winning drafts, so its confident predictions are further from reality than LoLDraftAI's. This pattern is consistent with a model that doesn't account for interaction effects between champions.
Slices by elo and region
To make sure the aggregate win isn't hiding a single-bucket fluke, here's the same comparison broken down by elo bracket and region. Lower log loss is better.
| Elo bucket | n | DG log loss | LoLDraftAI log loss | DG acc | LoLDraftAI acc |
|---|---|---|---|---|---|
| Master+ | 4,824 | 0.6818 | 0.6814 | 55.76% | 55.97% |
| Diamond I–II | 7,065 | 0.6879 | 0.6818 | 54.58% | 56.35% |
| Diamond III–IV | 9,225 | 0.6885 | 0.6824 | 54.51% | 56.25% |
| Emerald I | 11,636 | 0.6871 | 0.6846 | 54.37% | 55.27% |
| Region | n | DG log loss | LoLDraftAI log loss | DG acc | LoLDraftAI acc |
|---|---|---|---|---|---|
| EUW1 | 17,977 | 0.6871 | 0.6828 | 54.68% | 55.97% |
| KR | 14,773 | 0.6866 | 0.6831 | 54.63% | 55.78% |
LoLDraftAI wins in every slice. The effect is strongest in the middle elo buckets and slightly smaller at the extremes: Master+ (where both tools have the most variance due to smaller sample sizes per champion), and Emerald I (which dominates DraftGap's tier=emerald_plus data pull by volume, so it's the bucket DraftGap is implicitly optimized for). The aggregate win isn't a bucket artifact. LoLDraftAI leads across every elo and region split.
Where DraftGap is strong
DraftGap is open-source, free, and runs entirely in the browser, so there's no backend roundtrip or paywall. Its formula is fully inspectable: you can see exactly why it rated a matchup the way it did, and duos / counters are exposed as separate components, which is useful for teaching the why. For anyone who wants a fast, transparent pairwise tool (or a codebase to fork), none of what follows takes away from those strengths. This comparison is strictly about predictive accuracy on full 5v5 drafts, where the pairwise formulation runs out of expressiveness.
Related comparisons
- iTero vs LoLDraftAI — a trained tree-ensemble version of the same pairwise-stats architecture, tested via a 100-draft tournament.
- LoLTheory vs LoLDraftAI — a per-elo logistic regression over ~14,500 pairwise features, scored on the same 33k-match holdout.
Try LoLDraftAI
LoLDraftAI is free to use in the browser, or as a desktop app that auto-imports your draft from the League client.
- Web tool — no install
- Desktop app — live draft tracking
Appendix A: Reproducibility
DraftGap's predictions above were produced by a Python port of their open-source formula, run against the exact JSON datasets their website was serving on 2026-04-18 at 12:57 UTC. On randomly-sampled drafts, our port agrees with draftgap.com to four decimal places.
For full reproducibility, the following are published as frozen downloads: the two DraftGap JSON snapshots that their tool served on 2026-04-18, and a CSV with one row per eval match (both tools' predicted winrates, actual outcome, the ten champions, elo, region, patch). Readers can re-verify any individual prediction on draftgap.com (ours agree to four decimal places on sampled drafts) and recompute the aggregate metrics.
- predictions.csv — per-match predictions and outcomes.
- draftgap-current-patch.json — DraftGap current-patch dataset, frozen 2026-04-18T12:57Z.
- draftgap-30-days.json — DraftGap 30-day dataset, frozen 2026-04-18T12:57Z.