Methodology¶

Models¶

We will investigate two models for estimating within-game win probability:

The lifelines time-varying Cox proportional hazards model (see here), and
An XGBoost regression model with the survival:cox objective function.

To accurately compare the models to each other and the NBA’s own win probability, we will split the dataset into two pieces: build (80%) and holdout (20%). The build dataset will be used for model training and hyperparameter tuning while the holdout dataset will be used for comparing the models. The build dataset will be broken down for each model:

Model	Build datasets	Proportion of build (total)	Description
`lifelines`	Train	75% (60%)	Model training data.
`lifelines`	Tune	25% (20%)	Hyperparameter tuning data.
`xgboost`	Train	75% (60%)	Model training data.
`xgboost`	Stopping/Tune	25% (20%)	Data for tuning and early stopping *

The datasets will be stratified by season and by the target to ensure that the models are being built on representative data.

*: We will use early stopping to determine the number of boosting rounds for the model.

Hyperparameter tuning¶

Both models have hyperparameters that we will tune using hyperopt.

Lifelines¶

We will use the following hyperparameter search space for the lifelines model.

Hyperparameter	Search space
`penalizer`	\(Unif(0, 1)\)
`l1_ratio`	\(Unif(0, 1)\)

To maximize the tuning search, we limited to the following space after some trials:

Hyperparameter	Search space
`penalizer`	\(Unif(0.05, 0.15)\)
`l1_ratio`	\(Unif(0, 0.015)\)

XGBoost¶

Based on discussions with heytheredli and iteratively limiting the range for each parameter to maximize the effectiveness of the search, we used the following space to start:

Hyperparameter	Search space
`learning_rate`	\(Unif(0, 0.01)\)
`subsample`	\(Unif(0, 1)\)
`max_delta_step`	\(Unif(0, 1)\)
`max_depth`	\(QUnif(2, 15, 1)\)
`gamma`	\(Unif(0, 1)\)
`reg_alpha`	\(Unif(0, 1)\)
`reg_lambda`	\(Unif(0, 1)\)
`colsample_bytree`	\(Unif(0, 1)\)
`colsample_bylevel`	\(Unif(0, 1)\)
`colsample_bynode`	\(Unif(0, 1)\)
`min_child_weight`	\(QUnif(100, 600, 10)\)

After iteration, we used the following space:

Hyperparameter	Search space
`learning_rate`	\(Unif(0, 0.01)\)
`subsample`	\(Unif(0.4, 1)\)
`max_delta_step`	1
`max_depth`	4
`gamma`	\(Unif(0.5, 1)\)
`reg_alpha`	\(Unif(0.6, 1)\)
`reg_lambda`	\(Unif(0.25, 0.75)\)
`colsample_bytree`	\(Unif(0.5, 1)\)
`colsample_bylevel`	1
`colsample_bynode`	\(Unif(0, 0.5)\)
`min_child_weight`	\(QUnif(510, 530, 1)\)

We also added a monotonic constraint to ensure that the model output is monotonic in scoring margin.

Calibration¶

We will use isotonic regression to calibrate the output probabilities from each model to ensure that we have interpretable outputs.

Model evaluation¶

We will compare each survival model with the NBA win probability output using AUROC. Specifically, we will generate a plot describing the AUROC at each time step from 0 to 2880 seconds (48 minutes); this metric is based on a similar concept introduced in scikit-survival.