Magic: The Gathering Win Expectancy Part Two

We continue from Part 1.

I was able to call back to my Data Science competition winning days and started building a Jupyter notebook to determine a viable solution to MTG Win Expectancy.

Step 1: Parse 17Lands Data

The first challenge is transforming raw historical replay logs into structured snapshots. To do this, I pulled down 71 massive, compressed datasets from 17Lands.

A single MTG game is not one row of data. It is hundreds. Every land played, combat phase declared, and spell cast creates a new game state. When you expand millions of matches into individual turn-by-turn snapshots, you quickly end up with millions of rows of training data.

Processing this in Python immediately triggered severe out-of-memory crashes.

The standard Pandas C-tokenizer simply could not handle reading files of this size into memory. To make the parsing pipeline robust, I had to implement a strict chunked tokenization strategy, parsing files in tight 50,000-row intervals, extracting only the relevant columns, and caching the intermediate results as pickled binaries.

Now, I had clean training data.

Step 2: Figure Out Which Features Actually Matter

The fun time starts now. Feature engineering.

In data science you creature “features” from the data you have available. This can be simple things like dates or names. But you can also create combinations of multiple inputs. You are trying to find what levers to pull to give your neural network the right answer as quickly as possible.

Some signals are obvious, like life total and card differentials. But Magic is a game of hidden information and subtle momentum. To give the model genuine tactical intuition, I pruned away the noise and engineered 28 highly predictive, purely numeric features.

Feature Category	Engineered Metric	What It Quantifies
Physical Metrics	delta_life, delta_hand, delta_board, delta_library	The raw physical state of the board and resources.
Momentum & Pace	cumulative_tempo, momentum_life, momentum_board	Total mana spent over the game, and turn-over-turn delta swings.
Bluffing & Interaction	oppo_open_mana, oppo_bluff_threat_index, user_bluff_threat_index	Open mana multiplied by cards in hand. Measures holding instants/interaction.
Tension & Complexity	board_complexity, board_stall_index	Total creatures on board versus how deadlocked the combat state is.
Resource Deficit	user_mana_screw_proxy, user_mana_deficit, user_flood_proxy	How far behind a player is on mana development relative to the current turn.
Velocity & Pressure	user_card_velocity, life_race_ratio, board_to_hand_ratio	How fast cards are moving, and objective evaluation of who is the “beatdown”.

Instead of hardcoding assumptions about who is winning, we allow the machine learning model to discover these complex patterns directly from millions of historical outcomes.

Step 3: XGBoost GPU

Even with top-tier hardware (NVIDIA RTX 5090 and 192GB of system RAM), raw training data will easily thrash VRAM. To solve this, I downcast the engineered features to float32 and leveraged XGBoost’s custom QuantileDMatrix. By compressing the continuous feature values into discrete 8-bit bins (setting a consistent max_bin=256), the massive dataset was securely loaded into GPU memory, allowing the model to train and optimize trees in minutes rather than days.

Step 4: Moving into Production

The results of the training run were incredibly encouraging. The final model achieved a highly accurate 0.794 ROC-AUC rating, with a near-perfect calibration curve, meaning the model’s predicted win expectancy maps almost 1-to-1 with actual historical win fractions in the 17Lands test dataset.

But a python model in a Jupyter Notebook is useless for live, local game tracking. I needed to run this model inside a lightweight, cross-platform Tauri desktop app built in Rust.

The solution was exporting the trained XGBoost model to an ONNX file.

The Outcome

Now, when you play a match on MTG Arena, a background Rust thread tails your local log file, reconstructs the absolute truth of the game state, and feeds it into the compiled ONNX model.

The model runs in less than a millisecond, giving you a live win expectancy graph and real-time advanced metrics.