Oscars prediction model

February 8, 2020

Oscars 2020 prediction

Regularized Conditional logistic model

To model the relationship between winning the Oscars (Y) and predictors (denoted, X), we sought to use a penalised conditional logistic regression model. Suppose we have K independent competitions, each comprising of $n_k$ nominees, the conditional likelihood per stratum ( k ) can be written as

where ${S}_{m_k}^{n_k}$ is the set of collection of $^nC_m$ sets $\{i_1,...,i_m\}$ where $1 \le i_1<...<i_m \le n$ . Consequently we can write the likelihood as

Regularization

In high-dimensional settings, penalized methods such as the Lasso could potentially reduce variance (to improve prediction accuracy) and to identify the subset of predictors that exhibit the strongest link with the response. We therefore recast our likelihood to include the $L_1$ penalisation.

Our problem is now expressed as

where the parameter $\lambda$ is estimated using cross-validation. The model allows us to bound the coefficient such that $\sum_{j=1}^p|\beta_j|\le t$ for a pre-specified parameter t . To model the probability of winning the oscar price, we first standardize our predictors.

Cross-validation

## optimal lambda value:

## [1] 2.132296

Next, we present the values of $\beta$ corresponding to the best value of $\lambda$ .

## estimated betas:

Oscars 2020 predictions

## All Predictions for 2020 Oscars:

## Winners for 2020 Oscars:

The predicted winners are