Predicting the Next Elite NFL Quarterbacks with Machine-Learning Using Only College Statistics
Our primary focus and goal is to use machine-learning technology to predict elite NFL quarterbacks based solely on their college statistics. Not good quarterbacks, elite Super Bowl winning quarterbacks.
To predict who will be the next elite NFL quarterback coming out of the college ranks we need a way to identify what makes up an elite NFL quarterback. Once we know what an elite quarterback is, we can then use those quarterbacks college statistics to train machine-learning algorithms to predict which college quarterbacks to be elite.
Is it winning a Super Bowl(s)? Is it yards? Is it touchdowns? Is it wins? Is it longevity and wins? Or is it something else? We think it is a combination of the above and here is why.
Most teams today measure success by winning the Super Bowl. So are all Super Bowl winning quarterbacks elite? Well yes and no. In the last 20 years, there have only been 14 Super Bowl Winning quarterbacks. So we would call this a pretty elite group of 14 men. However, are they all elite when it comes to their NFL statistics? We think not. And if not, is there a way to predict how a quarterback will perform in the NFL based on college statistics? We think the analysis below will convince you that the answer is yes.
Super Bowl 30. Troy Aikman
Super Bowl 31. Brett Favre
Super Bowl 32. John Elway
Super Bowl 33. John Elway
Super Bowl 34. Kurt Warner
Super Bowl 35. Trent Dilfer
Super Bowl 36. Tom Brady
Super Bowl 37. Brad Johnson
Super Bowl 38. Tom Brady
Super Bowl 39. Tom Brady
Super Bowl 40. Ben Roethlisberger
Super Bowl 41. Peyton Manning
Super Bowl 42. Eli Manning
Super Bowl 43: Ben Roethlisberger
Super Bowl 44: Drew Brees
Super Bowl 45: Aaron Rogers
Super Bowl 46: Eli Manning
Super Bowl 47: Joe Flacco
Super Bowl 48: Russell Wilson
Super Bowl 49: Tom Brady
Super Bowl 50: Peyton Manning
So looking at the list above and having the opportunity to pick any of the quarterbacks in their prime, putting aside rivalries and team loyalty, whom would you pick to most benefit your team? To assist you, here are some career stats for the above listed quarterbacks.
The columns are Super Bowl wins, career completion percentage, interceptions divide by touchdowns, wins, years in the league and wins divided by years.
We would choose Tom Brady. Why? He has won 4 championships, has averaged 10.76 wins over his 17-year career, has a 33% interception to TD ratio and has completed almost 64% of his passes.
Not to disparage anyone, but if you had to rank all the players on the list, who would be last? Two players jump out at us immediately. They are Trent Dilfer and Brad Johnson because they both have very low wins when divide by their years in the league. But if we had to choose the last choice, it would be Dilfer because he has the lowest completion percentage and has more interceptions than touchdowns.
Now, let’s employ a machine-learning technique called clustering to visualize how these quarterbacks relate based on the above stats. Below is the graphical representation of the relationships and then below that is the text version of clusters by name. We are only using the completion %, interception / TD ratio and wins / years ratio as the other columns skew the data.
Click Chart for More Detail
Cluster 1: Aikman, Brees, Rodgers
Cluster 2: Brady, P Manning, Wilson
Cluster 3: Warner, Dilfer, Johnson
Cluster 4: Favre, Elway, Roethlisberger, E Manning, Flacco
Interestingly, we picked Brady as our top QB and Dilfer as our last QB. Brady is in Cluster 2, which has 2 multiple Super Bowl winning quarterbacks. In fact there are 3 quarterbacks with 7 wins or average 2.33 wins per player. However, Dilfer is clustered with Johnson and Warner there are 4 wins or 1.33 per player.
Here is a breakdown of Super Bowl (SB) wins by cluster:
Cluster 1: Aikman, Brees, Rodgers 1.66 wins per player
Cluster 2: Brady, P Manning, Wilson 2.33 wins per player
Cluster 3: Warner, Dilfer, Johnson 1.33
Cluster 4: Favre, Elway, Roethlisberger, E Manning, Flacco 1.66
Remember that SB wins are not factored into the clustering so it is significant that the clustering algorithm puts the players together in a way where Cluster 2 has significantly more wins than the others. We think that most of you would agree that Clusters 1, 2 & 4 are made up of elite NFL quarterbacks.
Now that we have identified elite quarterbacks based on their NFL stats, the next step will be to see if we can find any relationships between these players’ college stats that would help predict how they might perform in the NFL. Simply put, can we put together clusters of the same quarterbacks, in the same or similar way as above, using only their college statistics?
Below is a list of college statistics for the above quarterbacks. The columns are completion %, int/td ratio, QB rating, and total tds. Please note we could not get college statistics for Kurt Warner, so he is not included in this analysis.
We will now use the same clustering algorithm we used above using only the college statistics for each SB winner. Below is a graphical representation of the clustering and the text version of the clusters.
Click Chart for More Detail
Cluster 1: Favre, Dilfer
Cluster 2: Johnson
Cluster 3: Elway, Roethlisberger, P Manning, E Manning, Flacco, Brees, Wilson
Cluster 4: Aikman, Brady, Rodgers
Remember this clustering is based only on college statistics and looks very similar to the clustering based on their NFL statistics.
Interestingly, the clustering algorithm still separates out Dilfer and Johnson based on their college stats, however, it includes Favre this time. Looking at Favre’s college stats in the above list, they are not impressive so this makes sense.
We think Cluster 3 is interesting as it includes 4 multiple SB winners, but Cluster 4 captures 8 SB wins among 3 players or 2.66 SB wins per player.
What this analysis demonstrates is that college quarterback statistics do have predictive value when trying to predict the next elite NFL quarterback.
Now, let’s put this all together and try and make some predictions on other quarterbacks. We will use the elite quarterbacks identified above to train a machine-learning algorithm based on their college statistics. Conversely, we need to also identify quarterbacks that we want to avoid so we can use their college stats to train the algorithm to understand what type of quarterbacks we do not seek to draft. For this we will identify highly touted, layers and high draft picks that did not have success in the NFL.
Below is the list of quarterbacks we compiled for elite and not elite to train the algorithm:
Quarterbacks to Avoid
Having used the college stats of the quarterbacks on each list above to train our algorithm, we can now take the trained algorithm and feed in other quarterbacks college stats (some current starters and some recent draft picks) and start predicting how they rate in relation to the two lists above. If they are in the “yes” list below, there is a high probability they will be elite. If they are on the “no” list below, there is high probability that they will not be elite. The number next to the “yes” or “no” is the probability they belong on that list.
Below are the results…
QB data read successfully!
****Remember, the above “yes” and “no” lists are based only on college statistics of the players as they existed on draft day.
We have also run this algorithm on the 2017 QB draft class and have found the information very interesting about who is projected to be elite. If you would like to work with us on the draft or consult with us on any other machine learning projects please let us know.
We are looking for one or two NFL teams to work with under an exclusive agreement to share our data and analysis.
Visit us at: MachineLearningConsultant.com