Skip to main content

Can We Predict NHL Success from World Juniors Performance?

Every year during the World Juniors we hear two prevailing opinions on prospect analysis from fans and analysts: 1) it's too small of a sample size to mean anything and 2) this player will excel in the NHL because of his WJC performance. Obviously, there is a middle ground between these two perspectives, but to my knowledge, it hasn’t been explored in great detail. In this piece, I’ll be looking at what we can learn from World Juniors data.

To perform this analysis, I’ll be using NHL career data from Rob Vollman link (I’m using a slightly older version cut off at the end of the 16-17 season), world juniors data collected from elite prospects, and draft data via hockey reference. The first decision I made was to consider only the World Junior performance from players that have been drafted. Although this is limiting, I think it’s safe to cut off the data to only players truly considered NHL prospects.


The first exploratory step to look into is the relationship between World Juniors stats and NHL stats. In this correlation plot, we can see that goals, points, and points per game have a very weak predictive value of NHL pts/gp (R of 0.32, 0.38, and 0.37 respectively or R^2 of 0.10, 0.14, and 0.13). This is a very weak relationship, but may be improved when considering the small sample and many confounding factors in play at the moment.

I chose to further examine this relationship by graphing WJC points per game vs. NHL points per Game. Here we can get some more interesting information, as well as go further into the flaws. When splitting the data by when the player participated in World Juniors, we learn that the data from players participating two years before their draft season (usually very exceptional talent) and players participating two years after their draft season (by this point they are realized talents) is the most predictive. We can also see from this graph that time period may have a significant influence on results. Recent draftees such as Pavel Zacha and Jesse Puljujarvi haven’t established themselves up to this point, and older players such as Peter Forsberg and Jeremy Roenick played in a tournament arguably much different than the one played today.

As such, I limited the data to only players participating in the World Juniors between 2000 and 2010, and furthermore only selected players from the Top 9 Countries in the dataset: Canada, Czech Republic, Czechoslovakia, Finland, Russia, Slovakia, Sweden, Switzerland, USA. The other countries participated in the tournament infrequently and rarely had players drafted into the NHL.


With this data, we can see a slightly stronger overall relationship then was seen before. As I mentioned earlier, the draft - 2 players are rarely occurring, and it appears countries have shied away from using these players in recent seasons. Overall, there still isn’t much of a relationship and it appears that WJC performance doesn’t differentiate between NHL performance at all.

I then came to the realization that just by making a WJC roster, a prospect is setting himself apart from his peers, and we should expect that on average, any player in the WJC is more likely to have an NHL career than one that does not.

With that information at hand, I decided to create a logistic regression model on the probability that a WJC prospect plays more than 200 NHL games. As inputs, I used points per game, the percentage of teams points scored (to account for overall team strength), position (forward/defense), draft status, and team. I chose to include team as it is harder for prospects of equal skill level to play for Canada versus, say, Switzerland. When looking at the coefficients, we can see that the team factors and point percentage have statistical significance. One of the most interesting results is that playing for Canada significantly increases a prospect’s odds of playing 200 GP while playing for USA or Sweden has some value and the rest of the countries lag behind.

Coefficients:
                      Estimate Std. Error z value Pr(>|z|)    
(Intercept)            -10.8885 601.9339 -0.018  0.98557
PPG_wjc                 -0.3695 0.4348 -0.850  0.39541
Ppct_wjc                21.7979 4.4817 4.864 1.15e-06 ***
Position_wjcF           -0.5942 0.2226 -2.670  0.00759 **
status-1                12.3682 601.9340 0.021  0.98361
status0                 12.3891 601.9338 0.021  0.98358
status1                 11.8024 601.9338 0.020  0.98436
status2                 11.0607 601.9339 0.018  0.98534
Team_wjcCzech Republic  -2.5376 0.3841 -6.606 3.95e-11 ***
Team_wjcFinland         -3.3896 0.4026 -8.420  < 2e-16 ***
Team_wjcRussia          -3.4491 0.4325 -7.974 1.54e-15 ***
Team_wjcSlovakia        -3.8334 0.5115 -7.494 6.70e-14 ***
Team_wjcSweden          -2.7932 0.3533 -7.907 2.63e-15 ***
Team_wjcSwitzerland     -4.2073 0.6486 -6.487 8.76e-11 ***
Team_wjcUSA             -1.4367 0.2913 -4.932 8.12e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

We can also measure the model’s accuracy by looking at its area under the curve, which comes out to ~0.75. This isn’t the strongest measure, but it certainly shows that using some rather simple information from World Juniors is quite indicative of how a player will perform in the NHL.

In conclusion, I think we can learn several interesting things about prospects from World Juniors data. First and most importantly, simply making a World Junior team for one of the top countries is quite an accomplishment and greatly improves the probability that the prospect will play over 200 NHL games. I’d recommend that prospect models take this information into serious consideration when projecting NHL success. Secondly, world junior stats on their own aren’t very predictive and we shouldn’t rave about a prospect or consider them the next big thing because of how they did in a major tournament. The WJC is certainly fun to watch, and scouts may be able to learn a lot, but just looking at player stats isn’t helpful.

I also thought I’d share the top 25 pre-draft player-seasons from the model, most of whom have seen pretty significant NHL success.

            Player Season_wjc GP_threshold   GP_prob
1      Brandon Reid       2000 FALSE 0.9891933
2        Jeff Taffe            2000 FALSE 0.9676776
3    Steven Stamkos    2008 TRUE 0.9470016
4      John Tavares       2009 TRUE 0.9453516
5      Drew Doughty      2008 TRUE 0.9437260
6      Jason Spezza      2001 TRUE 0.9389219
7      John Tavares       2008 TRUE 0.9265671
8      Dany Heatley       2000 TRUE 0.9249912
9    Matt Pettinger        2000 TRUE 0.9249912
10      Phil Kessel        2006 TRUE 0.9218822
11       Ryan Ellis         2009 TRUE 0.9173350
12     Patrick Kane      2007 TRUE 0.9149276
13      Taylor Hall         2010 TRUE 0.9141479
14    Sidney Crosby    2005 TRUE 0.9116260
15 Jordan Schroeder  2008 FALSE 0.9097607
16     Mark Popovic     2001 FALSE 0.8916862
17  Jay Bouwmeester 2001         TRUE 0.8896476
18  Scottie Upshall      2002 TRUE 0.8834741
19     Jarret Stoll          2002 TRUE 0.8834741
20     Danny Syvret     2005 FALSE 0.8782902
21     Colin Wilson       2008 TRUE 0.8777814
22    Sidney Crosby    2004 TRUE 0.8707736
23      Karl Alzner        2007 TRUE 0.8700641
24  Jay Bouwmeester 2002         TRUE 0.8667440
25      Dan Hamhuis    2001 TRUE 0.8588982


Comments

Popular posts from this blog

Tape to Tape Tracker Visualization

tapetotapetracker.com has created an excellent way to track shots, shot assists, and zone entries. Using an 11 game sample of 5v5 data from  here  provided by Prashanth Iyer, I created a Tableau visualization to map the shot and shot assist data. This data includes all shot attempts classified by type, and when relevant the pass leading up to the shot. The "origin" is where the passer makes the pass, and "destination" is where the pass is received. Finally, each shot is denoted as "goal" "shot"- SOG or "missed shot", and the location is where the shot was taken. Some features include viewing by the shooter, passer, team, and game. You can also select specific events and results. Result filters an entire event by what its end result was. For example, if "goal" is selected, it will show all events (origin+destination+shot) which resulted in a goal. Similarly, event filters for individual events. This means that specific types

Tape to Tape Shot Visualization

In this post, I'll be breaking down my newest (and favorite) viz, which acts as a pretty comprehensive overview of tape to tape shot data. This is based on my previous tape to tape viz but has many new features. I'm going to go through each component of the display below, and explain how they work. You'll be able to work with the viz at the bottom of the page, and any feedback or suggestions are greatly appreciated. 1) The Rink First I'm going to explain what you're directly looking at. There are three parts to the rink: the points, the lines, and the tooltip (the box that pops out when you hover over a point/line). Both points and lines are colored by the event result. Goals are green, shots on goal are blue, and missed shots are tan. There are two different points: a circle and a square. Circles represent either where a pass was made or received. Squares represent the location of shot attempts. Lines show the flow of events. They grow in size as the eve

Who Plays Where? Determining Skater Positions Using Clustering

While browsing through various different websites keeping NHL player stats, I realized that the league does a terrible job of keeping updated player positions. I’m not exactly sure how or where they get their data from, but it is quite inaccurate. All sites do distinguish between forwards and defenseman, which is enough for most analysis, but I still think more specific player positions hold value, especially when looking at team depth and related areas. In an attempt to solve this problem, I decided to use k-means clustering on location information within play-by-play data (thanks to Emmanuel Perry and Corsica Hockey for making this cleaned data available to the public). Clustering has been used pretty frequently in hockey analysis, most recently (I believe) to identify different styles of goal scorers by Alex Novet. It has also been used by Ryan Stimson to identify team and player styles with data collected from his passing project and similarly on DTM About Heart’s old blog