We continue from Part 1.

I was able to call back to my Data Science competition winning days and started building a Jupyter notebook to determine a viable solution to MTG Win Expectancy.
Step 1: Parse 17Lands Data
The first challenge is transforming raw historical replay logs into structured snapshots. To do this, I pulled down 71 massive, compressed datasets from 17Lands.
A single MTG game is not one row of data. It is hundreds. Every land played, combat phase declared, and spell cast creates a new game state. When you expand millions of matches into individual turn-by-turn snapshots, you quickly end up with millions of rows of training data.
Processing this in Python immediately triggered severe out-of-memory crashes.
The standard Pandas C-tokenizer simply could not handle reading files of this size into memory. To make the parsing pipeline robust, I had to implement a strict chunked tokenization strategy, parsing files in tight 50,000-row intervals, extracting only the relevant columns, and caching the intermediate results as pickled binaries.

Now, I had clean training data.
Step 2: Figure Out Which Features Actually Matter
The fun time starts now. Feature engineering.

In data science you creature “features” from the data you have available. This can be simple things like dates or names. But you can also create combinations of multiple inputs. You are trying to find what levers to pull to give your neural network the right answer as quickly as possible.
Some signals are obvious, like life total and card differentials. But Magic is a game of hidden information and subtle momentum. To give the model genuine tactical intuition, I pruned away the noise and engineered 28 highly predictive, purely numeric features.
| Feature Category | Engineered Metric | What It Quantifies |
|---|---|---|
| Physical Metrics | delta_life, delta_hand, delta_board, delta_library | The raw physical state of the board and resources. |
| Momentum & Pace | cumulative_tempo, momentum_life, momentum_board | Total mana spent over the game, and turn-over-turn delta swings. |
| Bluffing & Interaction | oppo_open_mana, oppo_bluff_threat_index, user_bluff_threat_index | Open mana multiplied by cards in hand. Measures holding instants/interaction. |
| Tension & Complexity | board_complexity, board_stall_index | Total creatures on board versus how deadlocked the combat state is. |
| Resource Deficit | user_mana_screw_proxy, user_mana_deficit, user_flood_proxy | How far behind a player is on mana development relative to the current turn. |
| Velocity & Pressure | user_card_velocity, life_race_ratio, board_to_hand_ratio | How fast cards are moving, and objective evaluation of who is the “beatdown”. |
Instead of hardcoding assumptions about who is winning, we allow the machine learning model to discover these complex patterns directly from millions of historical outcomes.

Step 3: XGBoost GPU
Even with top-tier hardware (NVIDIA RTX 5090 and 192GB of system RAM), raw training data will easily thrash VRAM. To solve this, I downcast the engineered features to float32 and leveraged XGBoost’s custom QuantileDMatrix. By compressing the continuous feature values into discrete 8-bit bins (setting a consistent max_bin=256), the massive dataset was securely loaded into GPU memory, allowing the model to train and optimize trees in minutes rather than days.

Step 4: Moving into Production
The results of the training run were incredibly encouraging. The final model achieved a highly accurate 0.794 ROC-AUC rating, with a near-perfect calibration curve, meaning the model’s predicted win expectancy maps almost 1-to-1 with actual historical win fractions in the 17Lands test dataset.

But a python model in a Jupyter Notebook is useless for live, local game tracking. I needed to run this model inside a lightweight, cross-platform Tauri desktop app built in Rust.
The solution was exporting the trained XGBoost model to an ONNX file.
The Outcome
Now, when you play a match on MTG Arena, a background Rust thread tails your local log file, reconstructs the absolute truth of the game state, and feeds it into the compiled ONNX model.
The model runs in less than a millisecond, giving you a live win expectancy graph and real-time advanced metrics.

Turn 2 missed land drop? The system instantly flags a “Severe Tempo Deficit” and the graph dips.
Opponent passes with three lands up and a full grip? The UI slides in a “High Opponent Bluff Threat” badge.
You find your line, attack, and swing the win expectancy by 12% in a single turn? The event log dynamically highlights a Game Changing Play.
It’s not an unreality. It’s DeckLogic.

