iTero vs LoLDraftAI: A Detailed Comparison

iTero is a trained model over the same lolalytics-style pairwise aggregates that other pairwise-stats tools wrap in a formula (its API response signature is consistent with a gradient-boosted tree ensemble, more on that below). That architecture has a structural limit: it evaluates champions pair by pair, never as a whole draft, and it exposes no team-level win probability you can score against real matches. So this post tests iTero the only way you can: by making it draft head-to-head against LoLDraftAI across 100 scenarios, and letting a third-party judge (DraftGap itself) score the outcomes.

The setup: 100 head-to-head greedy drafts, 50 starting scenarios (each given a different first pick for variety), each played twice, once with LoLDraftAI on Blue and once on Red, to cancel out side advantage. At each pick, the active model greedily selects its top recommendation under the standard B-R-R-B-B-R-R-B-B-R pick order. Both candidate sets are filtered at ≥ 0.5% playrate per (champion, role) on the current patch. Each final 5v5 is then scored by two judges: DraftGap (a third-party tool, neutral to both contestants) and LoLDraftAI's own win-probability head (flagged as non-independent).

How iTero's model actually works

We can tell a lot about iTero's model from what its API returns. For each candidate champion in a lane, the response includes "feature importances", signed contributions for keys like champ_wr, champ_gold, counter_MIDDLE_wr, syn_JUNGLE_wr, gold_multiplier, match_tier, mastery, plus "ratings" on a 0-5 scale for archetype traits (eco, snowball, early_scaling_diff, etc.). Their methodology article confirms a two-stage design: "I have a separate model which first predicts the Gold @ 12 minutes before then going on to predict the outcome."

That signature (per-feature contribution scores, SHAP-style: each feature's signed pull on the final prediction, over hand-crafted pairwise features brute-forced over all ~160 champions for each lane) is consistent with a gradient-boosted tree ensemble over lolalytics-style aggregates: champion solo winrates, pair synergies, pair counters, per-role gold differentials. DraftGap is a simpler deterministic formula over the same kind of aggregates, not a trained model, just an arithmetic combination of the pairwise stats. Both iTero and DraftGap are therefore pairwise-statistics tools: they consume the draft as a bag of champion-pair numbers. Neither has any mechanism to reason about composition-level dynamics.

LoLDraftAI, by contrast, is a deep neural network trained end-to-end on raw match records, not aggregates. It learns its own champion embeddings and reasons about the draft as a whole. That architectural difference is what drives most of the disagreements below.

Qualitative example: a draft that can't reach its win condition

The cleanest way to see where iTero's approach falls short is one of the 100 drafts from the tournament described above: a Master+ scenario where the two tools made the following picks:

  • Blue (LoLDraftAI): Shen, Bel'Veth, Swain, Draven, Lux (Lux SUP was the forced first pick that seeded the scenario).
  • Red (iTero): Ornn, Rek'Sai, Aurelion Sol, Yasuo, Milio.

iTero outputs no team-level win probability, so its "verdict" on this 5v5 is implicit: these are the five champions it picked on Red, in order, under the same pick sequence LoLDraftAI used for Blue. And Red's draft is fine on paper. Four of the five isolated 1v1 lane matchups favor Red; three of Red's four carries (Ornn, Aurelion Sol, Yasuo) are dominant late-game champions, and Milio is a scaling enchanter built specifically to keep Yasuo alive into that late game. If you read the draft as five independent matchups, which is all a pairwise-stats model like iTero or DraftGap is architecturally able to do, Red looks at worst even.

LoLDraftAI reads the draft as a whole and disagrees: 62.1% for Blue, +3,523 gold at 15 minutes. The model's per-duration breakdown explains why in one picture:

LoLDraftAI analysis of seed #20: 62.1% Blue overall, per-position impact and gold@15 table, plus a Predicted Winrate Over Time chart that crosses from Blue-favored early to Red-favored after about 27 minutes.

The time-bucketed chart on the right is unambiguous: LoLDraftAI itself agrees Red wins ~68% of 35+ minute games. The problem is the 0–25 minute bucket is 77.9% Blue and the crossover point sits around minute 27. In Master+, games that are 3.5k gold down at 15 rarely reach minute 35; one side closes the game before the scaling curve ever catches up.

Masking champions out (using LoLDraftAI's unknown-champion feature to see how each pick contributes) pins down the load-bearing picks. Removing Bel'Veth drops Blue's 0–25 minute WR from 77.9% to 65.8%; removing Aurelion Sol cuts Red's 35+ minute WR by 7.9 points and no other pick moves the late-game bucket by more than ~3. That isn't a coincidence. Bel'Veth and Aurelion Sol are two of the most skewed early-vs-late champions in the game, as the "Win Rate by Game Length" panel on their pages shows. The whole match comes down to whether Blue can close before Aurelion Sol comes online; in Master+, it usually can.

DraftGap rates the same 5v5 at 35.5% for Blue. Its formula has no column for "the game ends before Aurelion Sol's third item" because no such column exists in a lolalytics dataset. The 27-point gap between the two judges is the whole story of this article in one draft: pairwise stats can evaluate a teamfight at minute 35, they can't evaluate whether the game ever gets to minute 35.

Does seed #20 generalize? A look across all 100 drafts

For each of the 100 final 5v5s from the tournament we computed:

  • Disagreement = p(Blue wins) by LoLDraftAI p(Blue wins) by DraftGap.
  • Predicted gold lead at 15 minutes from LoLDraftAI's per-position gold regression heads (summed over five roles, Blue − Red).

The two quantities are strongly correlated: Pearson r = +0.59 (n = 100). Pearson r measures how tightly two numbers move together: 0 means unrelated, ±1 means perfect lockstep. A permutation test with 5,000 random re-pairings gives p < 0.001: a correlation this strong arose in less than 0.1% of shuffles. When LoLDraftAI thinks Blue gets ahead early, it also thinks Blue wins more often, and that's exactly when DraftGap disagrees with it.

Scatter plot: predicted gold lead at 15 min vs LoLDraftAI − DraftGap disagreement. Left panel signed (r = +0.59), right panel magnitude (r = +0.22).

Slicing the same 100 drafts by the magnitude of the predicted gold lead makes the effect more concrete. The mean absolute disagreement between the two judges trends up with the predicted lead:

|Predicted gold lead @ 15min|nMean |LoLDraftAI − DraftGap|
Q1: 32 – 497 g2510.4 pp
Q2: 497 – 1,108 g2513.3 pp
Q3: 1,108 – 2,162 g2512.6 pp
Q4: 2,162 – 4,882 g2514.8 pp

When the predicted early-game swing is small, the two models mostly agree. When LoLDraftAI predicts that one side is going to be thousands of gold ahead by minute 15, DraftGap can't see any of that and the judges drift ~50% farther apart. This is the numerical version of the seed-#20 story: pairwise stats structurally can't encode tempo.

Tournament results

Scoring all 100 final 5v5s with both judges gives the headline scoreboard:

JudgeLoLDraftAI win rateMean WR margin (LoLDraftAI)
DraftGap (third-party)55.0% (45.0–65.0)+0.6% (−1.0 to +2.2)
LoLDraftAI (non-independent)93.0% (88.0–98.0)+13.2% (+11.6 to +14.9)

Two different judges look at the same 100 drafts and reach very different verdicts. DraftGap's formula calls it a near-tie (55.0%, CI 45.0–65.0); LoLDraftAI's own head calls it a decisive +13-point margin. We should discount the second number heavily (LoLDraftAI judging drafts it itself picked is textbook circular), but the disagreement between the two judges is the story. DraftGap and iTero are both pairwise-statistics tools built on the same kind of lolalytics-style aggregates (DraftGap is the formula version, iTero is the trained tree ensemble version), so a judge built on pairwise stats will systematically under-rate the tempo dynamics we demonstrated above. DraftGap can't see what LoLDraftAI sees, and that gold-correlation result from the last section is exactly why.

Full tournament data (all 100 drafts, per-pick decisions, both judges' scores, and LoLDraftAI's predicted gold at 15 min per draft) is available at drafts_scored.jsonl.

Feature coverage: what each tool actually ships

Beyond predictive accuracy, the other reason to choose one tool over the other is simply whether it supports what you want to do with it. Here's a concrete comparison, based on observing both tools' web UIs and APIs. Legend: supported, ~ partial / limited, not available.

FeatureLoLDraftAIiTero
Calibrated team win probability for a full 5v5✔ (side-aware + side-agnostic modes)✘ (per-candidate score only; no team WR)
User-selectable patch✔ (historical patches)✘ (current patch only, not even labeled)
Pro play model (fine-tuned on pro drafts)✔ (Plus tier)
Rune / perk recommendations evaluated by the WR model✔ (web tool)✘ on web; desktop-only, not built on a WR model
Timeline prediction (win % by game duration)✔ (0-25, 25-30, 30-35, 35+ min buckets)✘ (early/mid/late scaling diffs only)
Predicted gold lead per position✔ (5 positions at 15 min)~ (single predicted gold number per candidate)
Side-agnostic / side-aware switch✘ (side is fixed input, no merged mode)
Per-champion / per-matchup SEO pages✔ (every champion × role, per patch)✘ (only blog articles)
Bans in the drafting UI✔ (desktop app)~ (API supports bans; web UI does not expose them)
Partial-draft recommendations
Desktop app with League Client auto-detection✔ (Base/Plus tier)✔ (free)
Mastery-aware recommendations✔ (optional "Consider Mastery" toggle)
Champion Pool Builder (tiered S+/S/A+ recs)
Personality Quiz / champion suggestion by playstyle
In-game overlays (map timers, skill order, damage, etc.)✔ (desktop-only)
Account-level macro coaching✔ (desktop-only, "AI Macro Coach")

The split is roughly: LoLDraftAI is deeper on the draft-analysis model itself (calibrated WR, patch awareness, pro mode, timeline, side-awareness, WR-grounded rune recommendations), while iTero is broader in scope with features outside the draft-prediction core (personality quiz, pool builder, in-game overlays, macro coach).

Where iTero is ahead

iTero ships product features LoLDraftAI doesn't: in-game overlays, a Champion Pool Builder, and more polished mastery integration. If those matter to you more than draft-prediction accuracy, iTero is a reasonable choice. But on the drafting model itself, LoLDraftAI has the deeper architecture, broader input coverage (patches, pro play, timeline), and a calibrated win-probability output you can evaluate against real matches.

  • DraftGap vs LoLDraftAI — the formula version of the same pairwise-stats architecture, scored on a 33,000-match strict holdout with log loss and calibration metrics.
  • LoLTheory vs LoLDraftAI — a per-elo logistic regression over ~14,500 pairwise features, scored on the same 33k-match holdout.

Try LoLDraftAI

LoLDraftAI is free to use in the browser, or as a desktop app that auto-imports your draft from the League client.