Every year during the World Juniors we hear two prevailing opinions on prospect analysis from fans and analysts: 1) it's too small of a sample size to mean anything and 2) this player will excel in the NHL because of his WJC performance. Obviously, there is a middle ground between these two perspectives, but to my knowledge, it hasn’t been explored in great detail. In this piece, I’ll be looking at what we can learn from World Juniors data.
To perform this analysis, I’ll be using NHL career data from Rob Vollman link (I’m using a slightly older version cut off at the end of the 16-17 season), world juniors data collected from elite prospects, and draft data via hockey reference. The first decision I made was to consider only the World Junior performance from players that have been drafted. Although this is limiting, I think it’s safe to cut off the data to only players truly considered NHL prospects.
The first exploratory step to look into is the relationship between World Juniors stats and NHL stats. In this correlation plot, we can see that goals, points, and points per game have a very weak predictive value of NHL pts/gp (R of 0.32, 0.38, and 0.37 respectively or R^2 of 0.10, 0.14, and 0.13). This is a very weak relationship, but may be improved when considering the small sample and many confounding factors in play at the moment.
I chose to further examine this relationship by graphing WJC points per game vs. NHL points per Game. Here we can get some more interesting information, as well as go further into the flaws. When splitting the data by when the player participated in World Juniors, we learn that the data from players participating two years before their draft season (usually very exceptional talent) and players participating two years after their draft season (by this point they are realized talents) is the most predictive. We can also see from this graph that time period may have a significant influence on results. Recent draftees such as Pavel Zacha and Jesse Puljujarvi haven’t established themselves up to this point, and older players such as Peter Forsberg and Jeremy Roenick played in a tournament arguably much different than the one played today.
As such, I limited the data to only players participating in the World Juniors between 2000 and 2010, and furthermore only selected players from the Top 9 Countries in the dataset: Canada, Czech Republic, Czechoslovakia, Finland, Russia, Slovakia, Sweden, Switzerland, USA. The other countries participated in the tournament infrequently and rarely had players drafted into the NHL.
With this data, we can see a slightly stronger overall relationship then was seen before. As I mentioned earlier, the draft - 2 players are rarely occurring, and it appears countries have shied away from using these players in recent seasons. Overall, there still isn’t much of a relationship and it appears that WJC performance doesn’t differentiate between NHL performance at all.
I then came to the realization that just by making a WJC roster, a prospect is setting himself apart from his peers, and we should expect that on average, any player in the WJC is more likely to have an NHL career than one that does not.
With that information at hand, I decided to create a logistic regression model on the probability that a WJC prospect plays more than 200 NHL games. As inputs, I used points per game, the percentage of teams points scored (to account for overall team strength), position (forward/defense), draft status, and team. I chose to include team as it is harder for prospects of equal skill level to play for Canada versus, say, Switzerland. When looking at the coefficients, we can see that the team factors and point percentage have statistical significance. One of the most interesting results is that playing for Canada significantly increases a prospect’s odds of playing 200 GP while playing for USA or Sweden has some value and the rest of the countries lag behind.
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -10.8885 601.9339 -0.018 0.98557
PPG_wjc -0.3695 0.4348 -0.850 0.39541
Ppct_wjc 21.7979 4.4817 4.864 1.15e-06 ***
Position_wjcF -0.5942 0.2226 -2.670 0.00759 **
status-1 12.3682 601.9340 0.021 0.98361
status0 12.3891 601.9338 0.021 0.98358
status1 11.8024 601.9338 0.020 0.98436
status2 11.0607 601.9339 0.018 0.98534
Team_wjcCzech Republic -2.5376 0.3841 -6.606 3.95e-11 ***
Team_wjcFinland -3.3896 0.4026 -8.420 < 2e-16 ***
Team_wjcRussia -3.4491 0.4325 -7.974 1.54e-15 ***
Team_wjcSlovakia -3.8334 0.5115 -7.494 6.70e-14 ***
Team_wjcSweden -2.7932 0.3533 -7.907 2.63e-15 ***
Team_wjcSwitzerland -4.2073 0.6486 -6.487 8.76e-11 ***
Team_wjcUSA -1.4367 0.2913 -4.932 8.12e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
We can also measure the model’s accuracy by looking at its area under the curve, which comes out to ~0.75. This isn’t the strongest measure, but it certainly shows that using some rather simple information from World Juniors is quite indicative of how a player will perform in the NHL.
In conclusion, I think we can learn several interesting things about prospects from World Juniors data. First and most importantly, simply making a World Junior team for one of the top countries is quite an accomplishment and greatly improves the probability that the prospect will play over 200 NHL games. I’d recommend that prospect models take this information into serious consideration when projecting NHL success. Secondly, world junior stats on their own aren’t very predictive and we shouldn’t rave about a prospect or consider them the next big thing because of how they did in a major tournament. The WJC is certainly fun to watch, and scouts may be able to learn a lot, but just looking at player stats isn’t helpful.
I also thought I’d share the top 25 pre-draft player-seasons from the model, most of whom have seen pretty significant NHL success.
Player Season_wjc GP_threshold GP_prob
1 Brandon Reid 2000 FALSE 0.9891933
2 Jeff Taffe 2000 FALSE 0.9676776
3 Steven Stamkos 2008 TRUE 0.9470016
4 John Tavares 2009 TRUE 0.9453516
5 Drew Doughty 2008 TRUE 0.9437260
6 Jason Spezza 2001 TRUE 0.9389219
7 John Tavares 2008 TRUE 0.9265671
8 Dany Heatley 2000 TRUE 0.9249912
9 Matt Pettinger 2000 TRUE 0.9249912
10 Phil Kessel 2006 TRUE 0.9218822
11 Ryan Ellis 2009 TRUE 0.9173350
12 Patrick Kane 2007 TRUE 0.9149276
13 Taylor Hall 2010 TRUE 0.9141479
14 Sidney Crosby 2005 TRUE 0.9116260
15 Jordan Schroeder 2008 FALSE 0.9097607
16 Mark Popovic 2001 FALSE 0.8916862
17 Jay Bouwmeester 2001 TRUE 0.8896476
18 Scottie Upshall 2002 TRUE 0.8834741
19 Jarret Stoll 2002 TRUE 0.8834741
20 Danny Syvret 2005 FALSE 0.8782902
21 Colin Wilson 2008 TRUE 0.8777814
22 Sidney Crosby 2004 TRUE 0.8707736
23 Karl Alzner 2007 TRUE 0.8700641
24 Jay Bouwmeester 2002 TRUE 0.8667440
25 Dan Hamhuis 2001 TRUE 0.8588982
Comments
Post a Comment