Methodology¶
Models¶
We will investigate two models for estimating withingame win probability:
The
lifelines
timevarying Cox proportional hazards model (see here), andAn XGBoost regression model with the
survival:cox
objective function.
To accurately compare the models to each other and the NBA’s own win probability, we will split the dataset into two pieces: build (80%) and holdout (20%). The build dataset will be used for model training and hyperparameter tuning while the holdout dataset will be used for comparing the models. The build dataset will be broken down for each model:
Model 
Build datasets 
Proportion of build (total) 
Description 


Train 
75% (60%) 
Model training data. 
Tune 
25% (20%) 
Hyperparameter tuning data. 


Train 
75% (60%) 
Model training data. 
Stopping/Tune 
25% (20%) 
Data for tuning and early
stopping *

The datasets will be stratified by season and by the target to ensure that the models are being built on representative data.
 *
We will use early stopping to determine the number of boosting rounds for the model.
Hyperparameter tuning¶
Both models have hyperparameters that we will tune using hyperopt.
Lifelines¶
We will use the following hyperparameter search space for the lifelines
model.
Hyperparameter 
Search space 


\(Unif(0, 1)\) 

\(Unif(0, 1)\) 
To maximize the tuning search, we limited to the following space after some trials:
Hyperparameter 
Search space 


\(Unif(0.05, 0.15)\) 

\(Unif(0, 0.015)\) 
XGBoost¶
Based on discussions with heytheredli and iteratively limiting the range for each parameter to maximize the effectiveness of the search, we used the following space to start:
Hyperparameter 
Search space 


\(Unif(0, 0.01)\) 

\(Unif(0, 1)\) 

\(Unif(0, 1)\) 

\(QUnif(2, 15, 1)\) 

\(Unif(0, 1)\) 

\(Unif(0, 1)\) 

\(Unif(0, 1)\) 

\(Unif(0, 1)\) 

\(Unif(0, 1)\) 

\(Unif(0, 1)\) 

\(QUnif(100, 600, 10)\) 
After iteration, we used the following space:
Hyperparameter 
Search space 


\(Unif(0, 0.01)\) 

\(Unif(0.4, 1)\) 

1 

4 

\(Unif(0.5, 1)\) 

\(Unif(0.6, 1)\) 

\(Unif(0.25, 0.75)\) 

\(Unif(0.5, 1)\) 

1 

\(Unif(0, 0.5)\) 

\(QUnif(510, 530, 1)\) 
We also added a monotonic constraint to ensure that the model output is monotonic in scoring margin.
Calibration¶
We will use isotonic regression to calibrate the output probabilities from each model to ensure that we have interpretable outputs.
Model evaluation¶
We will compare each survival model with the NBA win probability output using AUROC. Specifically, we will generate a plot describing the AUROC at each time step from 0 to 2880 seconds (48 minutes); this metric is based on a similar concept introduced in scikitsurvival.