Predicting offensive play calling

Is it possible to predict whether a team will run or pass on a given play? Probably -- after all, even the casual football observer could label a given scenario as a "passing situation" with reasonable accuracy. Similarly, most fans knows that on 3rd & 1, up 14 in the 4th quarter, a team is likely to run the ball. But can we use machine learning to improve upon these simple rules of thumb? And are some teams more easily predictable than others? Does this matter for the team's offensive success? This is what the following will attempt to answer. We will use play-by-play data from the 2015 college football season, courtesy of Reddit.

First, we load in the data and do some formatting:

library(tidyr)
library(dplyr)
library(stringr)
library(ggplot2)
library(gbm)
library(scales)
library(rvest)

df <- read.csv('play.csv') %>% select(-X) %>% mutate(Year=2015)

df <- df %>% arrange(Game.Code,Play.Number) %>%
  mutate(margin = Offense.Points- Defense.Points) %>%
  group_by(Game.Code,Drive.Number) %>% 
  mutate(playofdrive = dense_rank(Play.Number)) %>%
  ungroup() %>% filter(!is.na(Drive.Number))

df$Play.Type <- as.character(df$Play.Type)
df$scoreTextVector <- as.character(df$scoreTextVector)

# Filter out 4th downs, penalties, and other special teams plays
df <- df %>% filter(Down < 4, !Play.Type %in% c('PENALTY','ATTEMPT','FIELD GOAL','PUNT')) 

df$Play.Type <- recode(df$Play.Type, RUSH="0",PASS="1")
df$Play.Type <- as.numeric(df$Play.Type)

df$Period.Number <- as.factor(df$Period.Number)
df$Period.Number <- recode(df$Period.Number,`5`="OT",`6`="OT",`7`="OT",`8`="OT")

#Change sacks and scrambles to pass plays
for(i in 1:nrow(df)){
  if(grepl('sack|scramble', df[i,'scoreTextVector'])==TRUE){
    df[i,'Play.Type'] <- 1
  }
}

df$PreviousPlay <- ''

for(i in 2:nrow(df)){
  if(df[i,'playofdrive'] == 1) {df[i,'PreviousPlay'] <- 3}
  else if(df[i,'Game.Code'] == df[i-1,'Game.Code'] &&
          df[i,'Drive.Number'] == df[i-1,'Drive.Number']){
    df[i,'PreviousPlay'] <- df[i-1,'Play.Type']
  }
}

df$PreviousPlay <- as.factor(df$PreviousPlay)

We now have a dataset of 117,752 plays across 852 games, with a number of features to describe the game context at the time of each play. We exclude 4th downs, special teams plays, and penalties to keep things simple. 

Next, we randomly sample the data into training and test sets. We'll fit a boosted tree model on the training data, and use that model to make a run/pass prediction on the test set given the following features: Down, Distance, Spot, Quarter, Score Margin. Our label in this case is a 0 or 1, representing run or pass. 

indices <- sample(1:nrow(df), size=0.5*nrow(df))
testdf <- df[indices,]
traindf <- df[-indices,]

gb_play <- gbm(Play.Type ~ Down + Distance + Spot + margin + Period.Number,
               data = traindf, distribution = 'bernoulli', 
               n.trees = 10000, shrinkage = .005)

predict_play <- predict(gb_play, testdf, n.trees = 10000, type='response')
results <- cbind(testdf,predict_play)

for(i in 1:nrow(results)){
  if(results[i,'predict_play'] > 0.5){results[i,'prediction'] <- 1}
  if(results[i,'predict_play'] < 0.5){results[i,'prediction'] <- 0}
}

for(i in 1:nrow(results)){
  if(results[i,'prediction'] == results[i,'Play.Type']){
    results[i,'correct'] <- TRUE}
  if(results[i,'prediction'] != results[i,'Play.Type']){
    results[i,'correct'] <- FALSE}
}

nrow(results[results$correct == TRUE,]) / (nrow(results[results$correct == TRUE,]) +
                                           nrow(results[results$correct == FALSE,]))

This model correctly predicts the play call 64.58% of the time. 

Below is a look at how well-calibrated the model is. Ideally, if a model says some event is 80% likely to happen, it should happen roughly 80% of the time over a large enough sample. Below we group all of the plays from our test set into buckets (10 percent increments) based on how likely the model thinks a pass play is, and check how often pass plays were actually called in these situations. For example, among plays that the model assigned a 70-80% probability of being a pass play, 75.2% of the those plays actually were pass plays. So the model appears to be well-calibrated.

Below is a matrix of run/pass probabilities for a given down & distance in a generic ("neutral") scenario (2nd quarter, time game, ball on own 35 yard line):

scenario <- data.frame(
  Down=rep(c(1,2,3),each=20),
  Distance=rep(seq(from=1,to=20,by=1),3))

scenario$Spot <- 65
scenario$margin <- 0
scenario$Period.Number <- 2

predict_scenario <- predict(gb_play, scenario, n.trees = 10000, type='response')
scenario_results <- cbind(scenario,predict_scenario)

scenario_results <- scenario_results %>% select(-Spot,-margin)

plot2 <- ggplot(filter(scenario_results,Distance<=15),
                aes(x=Down,y=Distance,fill=predict_scenario))
plot2 <- plot2 + geom_tile()
plot2 <- plot2 + scale_fill_gradient2(low='#cf5300',mid='lightgoldenrodyellow',
                                      high='forestgreen',midpoint=0.5,name='Probability of Pass')
plot2 <- plot2 + theme(legend.position='bottom')
plot2 <- plot2 + geom_text(aes(label=percent(predict_scenario),y=Distance,
                               color=predict_scenario),size=3)
plot2 <- plot2 + scale_colour_gradient2(low='black',mid='black',high='white',
                                        midpoint=0.43,guide='none')
plot2 <- plot2 + labs(title = "Probability of Pass Given Down & Distance \n 
                      2nd Quarter, Tie Game, Ball On Own 35") + 
  theme(plot.title = element_text(hjust = 0.5))
plot2

We can feed in any scenario and receive back a run/pass probability. 

Interestingly, spot (which yard line the team is located on) in the most influential predictor in the group:

> summary(gb_play)
                        var   rel.inf
Spot                   Spot 31.854970
margin               margin 21.847967
Distance           Distance 21.186705
Down                   Down 16.234035
Period.Number Period.Number  8.876323

We can also fit models for each team separately. Which teams are most unpredictable? And is unpredictability related to offensive efficiency?

teams <- read.csv('team.csv')

teamcodes <- df %>% group_by(Offense.Team.Code) %>% 
  count(Offense.Team.Code) %>%
  filter(n > 500) %>%
  select(Offense.Team.Code)

teamcodes <- as.vector(teamcodes$Offense.Team.Code)

results_tm <- list()

for(i in 1:length(teamcodes)){
  tmdf <- df %>% filter(Offense.Team.Code == teamcodes[i])
  
  indices = sample(1:nrow(tmdf), size=0.4*nrow(tmdf))
  testdf = tmdf[indices,]
  traindf = tmdf[-indices,]
  
  gb_play <- gbm(Play.Type ~ Down + Distance + Spot + margin + Period.Number,
                 data = traindf, distribution = 'bernoulli', 
                 n.trees = 3000, shrinkage = .01)
  
  predict_play <- predict(gb_play, testdf, n.trees = 3000, type='response')
  results <- cbind(testdf,predict_play)
  
  for(i in 1:nrow(results)){
    if(results[i,'predict_play'] > 0.5){results[i,'prediction'] <- 1}
    if(results[i,'predict_play'] < 0.5){results[i,'prediction'] <- 0}
  }
  
  for(i in 1:nrow(results)){
    if(results[i,'prediction'] == results[i,'Play.Type']){
      results[i,'correct'] <- TRUE}
    if(results[i,'prediction'] != results[i,'Play.Type']){
      results[i,'correct'] <- FALSE}
  }
  
  team <- as.character(tmdf[1,'Offense.Team.Code'])
  correct <- nrow(results[results$correct == TRUE,]) / 
    (nrow(results[results$correct == TRUE,]) + nrow(results[results$correct == FALSE,]))
  
  results_tm[[team]] <- correct
}

results_tm <- do.call(rbind.data.frame, results_tm)
results_tm$Team <- as.integer(rownames(results_tm))
colnames(results_tm) <- c('Correct','Team')

results_tm <- left_join(x=results_tm,
                        y=teams[,c('Team.Code','Name')],
                        by=c("Team"="Team.Code"))
                        
results_tm$Name <- as.character(results_tm$Name)
results_tm %>% arrange(-Correct) %>% top_n(n=10,wt=Correct) # Most Predictable

      Correct             Name
1  0.8394737        Air Force
2  0.8318318             Navy
3  0.8067485 Georgia Southern
4  0.7913669             Army
5  0.7804054     Georgia Tech
6  0.7803030 Washington State
7  0.7759740       New Mexico
8  0.7600000           Auburn
9  0.7573770              LSU
10 0.7470588        Wisconsin

results_tm %>% arrange(Correct) %>% select(-Team) %>% top_n(n=10,wt=-Correct) # Least Predictable

     Correct          Name
1  0.5530547          Troy
2  0.5555556      Marshall
3  0.5658537 West Virginia
4  0.5660377       Alabama
5  0.5809249        Kansas
6  0.5884477  Old Dominion
7  0.5893417    Penn State
8  0.5936599        Purdue
9  0.5977860          UTEP
10 0.6018809      Kentucky

You can see the most predictable and least predictable teams above. The most predictable aren't too surprising: six of the top ten run the run-heavy flexbone option offense or some variant thereof (Air Force, Navy, Army, Georgia Southern, Georgia Tech, New Mexico [Georgia Tech head coach Paul Johnson previously coached at Georgia Southern and Navy]). Washington State under Mike Leach of course run the pass-heavy "air raid" scheme. Auburn ranked 86th of 128 teams in yards per play after having ranked 13th and 7th the previous two seasons (causality could run both ways here). And LSU's offensive coordinator was fired shortly into the 2016 season.

The least predictable list is less obvious. Alabama sticks out as they won the national championship despite ranking 47th in yards per play. But unpredictability clearly didn't help Kansas at all, as they ranked 124th of 128 teams in yards per play. UTEP and Purdue were similarly awful on offense. So perhaps there is a mix here of unpredictability as a good thing (in other words, catching your opponent off guard) and unpredictably as a bad thing (making inappropriate play calls in certain game situations). And of course, calling an 'unpredictable' play is not always the same as calling the 'right' play.

We can also check if unpredictability is related to offensive efficiency. We'll scrape efficiency data from ESPN and plot it against teams' predictability (how often its play call could be guessed correctly). 

effurl <- "http://www.espn.com/college-football/statistics/teamratings/_/year/2015/sort/offEfficiency/tab/efficiency"

eff <- effurl %>%
  read_html() %>%
  html_nodes(xpath='//*[@id="my-teams-table"]/div/div[1]/table') %>%
  html_table(fill=TRUE)

eff <- eff[[1]]
colnames(eff) <- eff[2,]
eff <- eff[-c(1,2),]
eff <- eff %>% select(1:6)
eff <- eff %>% mutate(TEAM = str_split_fixed(eff$TEAM,", ", n = 2)[,1],
                      CONF = str_split_fixed(eff$TEAM,", ", n = 2)[,2])

eff$OFFENSE <- as.numeric(eff$OFFENSE)
eff <- eff[!eff$TEAM=="TEAM",]

###

scatdf <- left_join(x=results_tm[,c('Name','Correct')],
                    y=eff[,c('TEAM','OFFENSE')],
                    by=c("Name"="TEAM"))

plot3 <- ggplot(filter(scatdf,Correct<0.7),aes(x=Correct,y=OFFENSE))
plot3 <- plot3 + geom_point(pch = 21, size = 4, color = 'gray',fill='steelblue4')
plot3 <- plot3 + geom_smooth(method="lm")
plot3 <- plot3 + xlab('% of Plays Predicted Correctly') + ylab('Offensive Efficiency')
plot3

It appears there is very little relationship between play call predictability and offensive efficiency. However, a proper analysis would probably control for "offensive talent". After all, play calling is only likely to matter on the margins. Excellent play calling is unlikely to turn a team of bad players into a good offensive unit, and vice versa. Put differently, if you take two teams of equal raw talent, the one with quality play calling is likely to outperform the one with poor play calling.

Further analysis would also expand the play 'types' that can be predicted beyond simple run/pass. There are numerous 'types' of pass plays and run plays, often with very different intentions.