Skip to main content

Era Adjustments (Part 3)

All data via nhl.com

In this update, I have added the first 50 NHL seasons to my data, as well as changed the source to the newest NHL.com stats, which are described as more accurate. Now, in summary, I have the adjusted points values for every NHL season using the most accurate point totals available. I also created some new visuals and a hall of fame score, which will be explained below.

As a note, for seasonal and career stats, only seasons were used in which the given player played: GP >= max possible GP / 2. This was done to ensure the distribution represented all NHL regulars. You may notice some seasons are missing or point totals are off (ex. Marcel Dionne's '88-89 was omitted from analysis and totals because he only played in 37 games, and 37 < 80/2.

Here are updated graphs from my previous parts. HOF analysis will be below and updated Tableau


 As you may notice, the earlier seasons of the NHL were much more volatile, which we should expect as the league was just forming and many variables were changing during this time, such as GP, # of players, # of teams, etc. The trends mentioned in earlier parts appear much less pronounced, but multipliers for top players remained in the same general range.

 Here is the graph of top seasons in history, still dominated by post-expansion players. Most players fall under a negative adjustment, which is displayed as under the white line y=x. The one cluster of points forming a trend at a greater slope (where 25 pts. -> 60 adjusted pts.) consists of players in the first several seasons in which the point rate was much less.
Similarly, here are the career leaders, although the 80s still dominate (in part due to more total GP as there were generally longer seasons and careers than in the past + new era of players haven't retired yet). Nonetheless, there is a much better mix of talent, which helps achieve the original goal of adjusting points.


Just as last time, here are players visualized by hall of fame status. We see that adjusted points generally does a good job of distinguishing which players make the HOF, especially for forwards. Most players in the top right region are either in the HOF or ineligible. As this appeared to be a good indicator, I created a HOF likelihood using logistic regression with the data. With inputs of GP, PPG, adj.P, and Position (F/D) the model was trained using all eligible (including HOFers) careers. The result gives the likelihood a player will make the HOF according to the data. Although position was a statistically significant variable, defensemen simply aren't judged by points on the same level as forwards so the model had a tough time predicting defensemen. On the other hand, the results for forwards appear to have much more value. Overall, HOF status can't be predicted with this few variables as voters place emphasis on awards won as well as international play, which are both out of the current scope. In Stat Shot by Rob Vollman, a HOF model is developed which accounts for these variables and performs very well.




As shown in the visual, the model was applied to ineligible players to get a sense of who we should expect to make the HOF in future years. Here are the top 20s separated by position. Defensemen were given much lower likelihoods than forwards as a group, but Erik Karlsson stands out as an almost lock. We should expect Zubov and Gonchar to eventually make their way in, as expected. The long list of eligibles following the top group are much less likely to get in both on the model's predictions and by looking at the names. Several are past the point of eligibility and the remaining are a group of fringe contenders. Towards the bottom we see Letang leading the group of current players. Other active players of interest with (in my opinion) a likelihood of entering the HOF include Zdeno Chara (0.38) and Drew Doughty (0.22). Showing again that without award recognition, defensive defenseman are ignored.
The forwards model provides a much clearer picture of what to expect. The top 5 are complete locks by the model and by general public opinion. In the next group are several players that are nearly identical stat-wise to many of the early inductees, but my hockey history knowledge cannot explain why they are omitted and others are not. The next grouping is a list of current era players, most of whom will likely be strong candidates for the HOF in the future. McDavid stands out because of his very strong PPG numbers. When setting a minimum for games played, the list is pretty much what is expected in this current batch of players and Herberts, Denneny, and Carson are omitted.

Here are some scores for players in the HOF as a basis for comparison:
Gretzky - 0.9999280 (Highest F)
Orr - 0.9988738 (Highest D)
Lidstrom- 0.9550488 (5th D)
Maurice Richard- 0.9508771 (17th F)
Leetch - 0.9231727 (8th D)
Selanne - 0.9199010 (21st F)
Lindros - 0.8099994 (42nd F)
Hawerchuk - 0.7659057 (48th F)
Bure - 0.6784484 (65th F)
Pronger - 0.66935832 (20th D)
The Worst Scores:

Bob Gainey - 0.002817570 (Lowest F)
Jack Stewart - 0.003486746 (Lowest D)


Finally, here is the updated Tableau for NHL careers. If you mouse over a player you can see their HOF score and the other data used in model.



Comments

Popular posts from this blog

Measuring Consistency

A while back I decided to look into player consistency, but after doing initial calculations, I never went any further. After Namita Nandakumar's VANHAC  presentation on consistency, I decided to go back, refine my old work, and release the results. Namita's methodology is likely much more statistically relevant and meaningful, but nonetheless, I use a different approach that I think is worth sharing. The methodology I adopted was taken from this article on Nylon Calculus on NBA player consistency written by Hal Brown. This consistency metric gets the normalized variance of a player's performance for a given metric. In this post, I will be calculating the consistency of a player's game score  (GS) in individual seasons, from 2007-08 to 2015-16, using the data provided at the bottom of the linked game score article. I also have a folder with my code + better resolution graphs + data at the bottom of the article if you'd like to check it out. Calculation T...

Who Plays Where? Determining Skater Positions Using Clustering

While browsing through various different websites keeping NHL player stats, I realized that the league does a terrible job of keeping updated player positions. I’m not exactly sure how or where they get their data from, but it is quite inaccurate. All sites do distinguish between forwards and defenseman, which is enough for most analysis, but I still think more specific player positions hold value, especially when looking at team depth and related areas. In an attempt to solve this problem, I decided to use k-means clustering on location information within play-by-play data (thanks to Emmanuel Perry and Corsica Hockey for making this cleaned data available to the public). Clustering has been used pretty frequently in hockey analysis, most recently (I believe) to identify different styles of goal scorers by Alex Novet. It has also been used by Ryan Stimson to identify team and player styles with data collected from his passing project and similarly on DTM About Heart’s old blog...

New Project Announcements

I recently completed two projects that were primarily shared through twitter but I'll briefly explain them on here. My first project is a shiny app that allows users to select two players from the entire history of the NHL, NBA, or MLB and view the shortest path between their careers by who they've played with. For example, here is how Jack Laviolette (born 1879) and Alex Formenton (born 1999) can be connected. You can use the tool here  jflancer.shinyapps.io/shiny_app/ . My second project was scraping NWHL play by play data and cleaning it into an easy to use format with R. This project entailed a lot of cleaning and formatting, and I additionally wrote a formula to calculate strength state from just the penalty and goal data available. If you'd like to check out the code or use the data it can be found here  https://github.com/jflancer/nwhl-scraper .