Methodology¶
Models¶
We will investigate two models for estimating within-game win probability:
The
lifelinestime-varying Cox proportional hazards model (see here), andAn XGBoost regression model with the
survival:coxobjective function.
To accurately compare the models to each other and the NBA’s own win probability, we will split the dataset into two pieces: build (80%) and holdout (20%). The build dataset will be used for model training and hyperparameter tuning while the holdout dataset will be used for comparing the models. The build dataset will be broken down for each model:
Model |
Build datasets |
Proportion of build (total) |
Description |
|---|---|---|---|
|
Train |
75% (60%) |
Model training data. |
Tune |
25% (20%) |
Hyperparameter tuning data. |
|
|
Train |
75% (60%) |
Model training data. |
Stopping/Tune |
25% (20%) |
Data for tuning and early
stopping *
|
The datasets will be stratified by season and by the target to ensure that the models are being built on representative data.
- *
We will use early stopping to determine the number of boosting rounds for the model.
Hyperparameter tuning¶
Both models have hyperparameters that we will tune using hyperopt.
Lifelines¶
We will use the following hyperparameter search space for the lifelines model.
Hyperparameter |
Search space |
|---|---|
|
\(Unif(0, 1)\) |
|
\(Unif(0, 1)\) |
To maximize the tuning search, we limited to the following space after some trials:
Hyperparameter |
Search space |
|---|---|
|
\(Unif(0.05, 0.15)\) |
|
\(Unif(0, 0.015)\) |
XGBoost¶
Based on discussions with heytheredli and iteratively limiting the range for each parameter to maximize the effectiveness of the search, we used the following space to start:
Hyperparameter |
Search space |
|---|---|
|
\(Unif(0, 0.01)\) |
|
\(Unif(0, 1)\) |
|
\(Unif(0, 1)\) |
|
\(QUnif(2, 15, 1)\) |
|
\(Unif(0, 1)\) |
|
\(Unif(0, 1)\) |
|
\(Unif(0, 1)\) |
|
\(Unif(0, 1)\) |
|
\(Unif(0, 1)\) |
|
\(Unif(0, 1)\) |
|
\(QUnif(100, 600, 10)\) |
After iteration, we used the following space:
Hyperparameter |
Search space |
|---|---|
|
\(Unif(0, 0.01)\) |
|
\(Unif(0.4, 1)\) |
|
1 |
|
4 |
|
\(Unif(0.5, 1)\) |
|
\(Unif(0.6, 1)\) |
|
\(Unif(0.25, 0.75)\) |
|
\(Unif(0.5, 1)\) |
|
1 |
|
\(Unif(0, 0.5)\) |
|
\(QUnif(510, 530, 1)\) |
We also added a monotonic constraint to ensure that the model output is monotonic in scoring margin.
Calibration¶
We will use isotonic regression to calibrate the output probabilities from each model to ensure that we have interpretable outputs.
Model evaluation¶
We will compare each survival model with the NBA win probability output using AUROC. Specifically, we will generate a plot describing the AUROC at each time step from 0 to 2880 seconds (48 minutes); this metric is based on a similar concept introduced in scikit-survival.