I know Layne Vashro's draft models have come up on occasion here. Vashro is one of the statistical modelers of NBA matters whose predictions have received increasing attention over the past few years. Several of his models appear on the Nylon Calcus website here and in the "Our Stats" section here. You can also access more on his basketball homepage.
I've long been intrigued by his models as a tool for greater insights into draft prospects, but I've also had questions about how certain aspects of the models should be interpreted. I contacted him with these questions and he was kind enough both to respond and to allow me to post the responses on Jazzfanz. So for those who are interested in the models, I hope these questions and answers give more insight on how they might be used. If you find anything useful here, don't hesitate to send this Timberwolves fan and U. of Utah anthropologist a tweet thanking him for coming onto Jazzfanz.
NOTE: This is a very long post. If you're primarily interested Vashro's take on players likely to be available to the Jazz, go to nearly the end of the post.
My understanding of your modeling is that you use the statistics and attributes of college (or Euro/international) players coming into previous drafts and statistically observe what kind of production those players produced in the NBA..... Is that basically correct?
Correct.
When using players’ NBA production to create the models that will allow you to make all of these predictions, what measure(s) of NBA production do you use? Why do you like this/these measure(s) over other possible measures?
This is a decision that is key to all of the models. I calculate and average of Win Shares and RAPM-wins (which is just RAPM converted into a version where wins are accrued with more minutes) for each player-season. I then roll those up for each season calculating a two-year average. A past player's "Win Peak" is then the highest they ever reach in two-year rolling average. The goal is to get as uncontroversial a list of player-rankings as possible. Here is that resulting list if you are curious (BTW, any player who peaked before 1990 is measured only in WS, and and player whose career started before 1990 is not included in my models)
How is what you are trying to do different from or similar to what Kevin Pelton does in his draft projections? Is there more of a difference in philosophy or formula? Would you say that there’s typically more agreement between your models and his or between your models and NBA GMs’ assessments?
Fundamentally, our approaches are quite similar. The big difference is that Kevin calculates expected box-scores for each number separately, then uses that expected box-score to generate an expected value. I actually have a model that does this as well, but I haven't posted it publicly or even updated it for this season. I'm not sure whose generally fits closer to GM consensus, but mine seems to be less controversial this season.
If I understand correctly, your EWP (Expected Wins Produced at NBA peak) is the model that your other models are based on, or at least the one that seems to get highlighted most often. Is that a fair assessment? And then the Humble model adds a variable into the EWP equation--and uses a somewhat different formula/procedure--that tries to simulate scouts’/draft boards’ assessment of draft prospect. Correct? Is it fair to interpret this as a kind of average between the EWP model and scouts’/draft boards’ assessments?
EWP is more the flagship. It is the simplest approach and the version I've been using the longest. The humble is not quite as you describe but functionally similar. The technical procedure is, as you note, somewhat different in that I use a machine learning algorithm rather than linear regression. You are also correct that the other key distinction is that I include a variable to capture NBA scout consensus, however, that is included along with all of the other variables that were included in the EWP model, not the ultimate output.
Could you tell more about how you get that extra variable? Is it based on many or a few sources?
I calculated the expected value of each pick over the past 25 seasons, then use mock draft ranking to assign that expected value to a "scouting" variable in the model. I previously used a combination of Ford and DX to get that expected rank, but I currently am only using DX, which I try to update whenever they do. I only stopped including Chad because I'm lazy.
And then you have an “Average” column? This is the average of the Humble and EWP models, I take it. If the Humble model is something of an average already, how should we interpret this model?
Correct, it is just an average. As you probably got from the above, the Humble is actually a unique output from the EWP. Typically, averaging across models gives you better prediction than any one model, so I would recommend treating the AVG values as if they are the best projection.
Then you have a model that predicts the odds that a player will end up as a “bust,” “bench” player, “starter,” “stud,” and “star” in the NBA (labels based on production only, not production in relation to expectations). Is the model/formula that gets you to these odds essentially based on the EWP, or is it rather different? If rather different, what’s a simple way to understand the difference?..... How should we think about the sometimes fairly significantly different results between these two models? Which of the several models do you put the most stock in?
Star% draws on the same basic data as EWP, but uses it in a different way. Rather than trying to project the exact expected value of a prospect, I split the historical players into bust/bnch/strt/stud/star based on benchmarks in their RAPM-wins and Win Shares (which is the thing I am trying to predict in all of my models). The model then gives the odds of landing at each of those levels. Honestly, I don't personally pay that much attention to this model, but people like it and it adds some sense of variance you don't get from the others. I think it should be viewed as a negative if a player struggles in the star% model even if he does well in the others, but I would give the EWP and HUM more weight.
How would you suggest using your player comparison model?
These are used to get a sense of the type of player you are looking at. It also gives a value to help assess whether a player's NCAA #s are a "better or worse" version of the guys his numbers are most like. I recommend using this to help identify red-flags or false-flags. If a guy looks like a bunch of other players who looked good by the numbers, but didn't work out, maybe the model fails on this type of player. Then if a guy looks bad, but in a similar way to previous successes, maybe you shouldn't be as concerned.
How do you deal with missing data, for example the prospects that do only some of the Combine testing, or don’t show up to the Combine at all?
I use an imputation approach... which basically means I make it up, but in a way that is informed by all of the data I do have.
I’ve heard others (i.e. Ben Dowsett, Salt City Hoops) claim that your success rate is higher than the success rate of NBA GMs, though I’m not sure I’ve seen/heard you make that claim. If that’s your claim, can you tell us how you compare the success rates?
I haven't personally made this claim, because I don't think it is possible to do a perfect comparison, but I do have reason to believe it is true. One thing I have tried is retrodicting previous drafts and ranking players on based on HUM and EWP outputs, then seeing whether those or actual draft order correlates better with player success. Both models beat actual order. I won't go into the problems, but there are reasons why this approach is slightly unfair for both my models as well as the GMs I am comparing them to.
Do you have a theory about what kind of players NBA GMs typically overrate? Or underrate?
Scorers, and shooters in particular, tend to be overrated. Steals seem to be underrated at every position. Guys who put up stats in the areas you don't immediately think about for that position (i.e. Blks/Rebs for guards or Asts for bigs) tend to be underrated, and guys who struggle in those areas tend to be overrated. Especially more recently, older players tend to be underrated.
Of course all predictions are going to miss sometimes. Do you have any theories about why your models drastically miss on occasion; if I recall correctly, we’re hoping that Rodney Hood is one of these cases, and you’re hoping that Zach LaVine is. Are there patterns of tangibles or intangibles you think you see?
I've found some patterns. One of the easier ones is bulky college bigs who are too short to be NBA Cs but don't have much in terms of "skills". Montrezl Harrel would be an example from this class... except he actually doesn't score well. Past guys would be Sweetney and Blair. Another is that the model underrates physical and explosive PGs... Rose, Wall, Westbrook, Bledsoe, all looked goodish in the models, but easily surpassed expectations in the NBA. I wager this has something to do with increased spacing. Another isn't really something that is useful ahead of time, just a common excuse. Scoring is really difficult to project. Some mediocre college scorers end up being much better in the NBA, while some really efficient NCAA scorers end up being miserable at it in the NBA... Finally (and this is the one for optimism re: Hood) most aspects of defense are missed by the available numbers (at least directly). That means I tend to give subjective weight towards prospects with a strong defensive rep, and vice versa. Hood is one of the guys I figured was actually worse than his numbers for this reason, but it seems his D has actually been a plus so far in the NBA.
As far as prospects for the Jazz at #12 go:
Stanley Johnson: This would be the ideal. I think Stanley is going to be really really good.
Kelly Oubre: He'd be a fine pick, but I'm not the biggest fan. Obviously has some talent, but also seems a bit lacking in the mental aspect, which is always a flag for me.
Frank Kaminsky: I am a huge Kaminsky fan. He'd be the perfect 3rd big in Utah. I need to take a look at it, but I think it has been a long time since a player has posted as good of a senior season as Kaminsky did in my models.
Devin Booker: I'm skeptical of Booker.
Kevon Looney: I don't like him as much as my numbers, but I still think he would be really good value late-lottery.
Bobby Portis: I'd take Looney first.
Trey Lyles: See Oubre.
Myles Turner: Fully-realized Turner could be the best player in this draft, but he needs to prove that he can actually shoot and play in the post offensively. Not a great fit with Utah, but could be great value.
Sam Dekker: I'm not a big fan. He is a solid player, but Utah is probably too early for him.
Jerian Grant: Too early. I prefer Delon by quite a bit. I would take him ahead of Tyus though... and I assume most Jazz fans agree after watching Burke struggle.
Who are your top four for the Wolves? Why? Are you as smitten by D’Angelo Russell as most of us are (I’m guessing yes from your models)?
My preferences at the top are pretty boring. I would be happy with any of Towns/D'jello/Winslow/Okafor/Mudiaye, but Towns and Russell are my favorites. I have some mild fears about Okafor's ability to translate to the NBA, and Mudiaye is a terrible fit with Rubio. Those factors drop them a bit below the other top guys.
I've long been intrigued by his models as a tool for greater insights into draft prospects, but I've also had questions about how certain aspects of the models should be interpreted. I contacted him with these questions and he was kind enough both to respond and to allow me to post the responses on Jazzfanz. So for those who are interested in the models, I hope these questions and answers give more insight on how they might be used. If you find anything useful here, don't hesitate to send this Timberwolves fan and U. of Utah anthropologist a tweet thanking him for coming onto Jazzfanz.
NOTE: This is a very long post. If you're primarily interested Vashro's take on players likely to be available to the Jazz, go to nearly the end of the post.
My understanding of your modeling is that you use the statistics and attributes of college (or Euro/international) players coming into previous drafts and statistically observe what kind of production those players produced in the NBA..... Is that basically correct?
Correct.
When using players’ NBA production to create the models that will allow you to make all of these predictions, what measure(s) of NBA production do you use? Why do you like this/these measure(s) over other possible measures?
This is a decision that is key to all of the models. I calculate and average of Win Shares and RAPM-wins (which is just RAPM converted into a version where wins are accrued with more minutes) for each player-season. I then roll those up for each season calculating a two-year average. A past player's "Win Peak" is then the highest they ever reach in two-year rolling average. The goal is to get as uncontroversial a list of player-rankings as possible. Here is that resulting list if you are curious (BTW, any player who peaked before 1990 is measured only in WS, and and player whose career started before 1990 is not included in my models)
How is what you are trying to do different from or similar to what Kevin Pelton does in his draft projections? Is there more of a difference in philosophy or formula? Would you say that there’s typically more agreement between your models and his or between your models and NBA GMs’ assessments?
Fundamentally, our approaches are quite similar. The big difference is that Kevin calculates expected box-scores for each number separately, then uses that expected box-score to generate an expected value. I actually have a model that does this as well, but I haven't posted it publicly or even updated it for this season. I'm not sure whose generally fits closer to GM consensus, but mine seems to be less controversial this season.
If I understand correctly, your EWP (Expected Wins Produced at NBA peak) is the model that your other models are based on, or at least the one that seems to get highlighted most often. Is that a fair assessment? And then the Humble model adds a variable into the EWP equation--and uses a somewhat different formula/procedure--that tries to simulate scouts’/draft boards’ assessment of draft prospect. Correct? Is it fair to interpret this as a kind of average between the EWP model and scouts’/draft boards’ assessments?
EWP is more the flagship. It is the simplest approach and the version I've been using the longest. The humble is not quite as you describe but functionally similar. The technical procedure is, as you note, somewhat different in that I use a machine learning algorithm rather than linear regression. You are also correct that the other key distinction is that I include a variable to capture NBA scout consensus, however, that is included along with all of the other variables that were included in the EWP model, not the ultimate output.
Could you tell more about how you get that extra variable? Is it based on many or a few sources?
I calculated the expected value of each pick over the past 25 seasons, then use mock draft ranking to assign that expected value to a "scouting" variable in the model. I previously used a combination of Ford and DX to get that expected rank, but I currently am only using DX, which I try to update whenever they do. I only stopped including Chad because I'm lazy.
And then you have an “Average” column? This is the average of the Humble and EWP models, I take it. If the Humble model is something of an average already, how should we interpret this model?
Correct, it is just an average. As you probably got from the above, the Humble is actually a unique output from the EWP. Typically, averaging across models gives you better prediction than any one model, so I would recommend treating the AVG values as if they are the best projection.
Then you have a model that predicts the odds that a player will end up as a “bust,” “bench” player, “starter,” “stud,” and “star” in the NBA (labels based on production only, not production in relation to expectations). Is the model/formula that gets you to these odds essentially based on the EWP, or is it rather different? If rather different, what’s a simple way to understand the difference?..... How should we think about the sometimes fairly significantly different results between these two models? Which of the several models do you put the most stock in?
Star% draws on the same basic data as EWP, but uses it in a different way. Rather than trying to project the exact expected value of a prospect, I split the historical players into bust/bnch/strt/stud/star based on benchmarks in their RAPM-wins and Win Shares (which is the thing I am trying to predict in all of my models). The model then gives the odds of landing at each of those levels. Honestly, I don't personally pay that much attention to this model, but people like it and it adds some sense of variance you don't get from the others. I think it should be viewed as a negative if a player struggles in the star% model even if he does well in the others, but I would give the EWP and HUM more weight.
How would you suggest using your player comparison model?
These are used to get a sense of the type of player you are looking at. It also gives a value to help assess whether a player's NCAA #s are a "better or worse" version of the guys his numbers are most like. I recommend using this to help identify red-flags or false-flags. If a guy looks like a bunch of other players who looked good by the numbers, but didn't work out, maybe the model fails on this type of player. Then if a guy looks bad, but in a similar way to previous successes, maybe you shouldn't be as concerned.
How do you deal with missing data, for example the prospects that do only some of the Combine testing, or don’t show up to the Combine at all?
I use an imputation approach... which basically means I make it up, but in a way that is informed by all of the data I do have.
I’ve heard others (i.e. Ben Dowsett, Salt City Hoops) claim that your success rate is higher than the success rate of NBA GMs, though I’m not sure I’ve seen/heard you make that claim. If that’s your claim, can you tell us how you compare the success rates?
I haven't personally made this claim, because I don't think it is possible to do a perfect comparison, but I do have reason to believe it is true. One thing I have tried is retrodicting previous drafts and ranking players on based on HUM and EWP outputs, then seeing whether those or actual draft order correlates better with player success. Both models beat actual order. I won't go into the problems, but there are reasons why this approach is slightly unfair for both my models as well as the GMs I am comparing them to.
Do you have a theory about what kind of players NBA GMs typically overrate? Or underrate?
Scorers, and shooters in particular, tend to be overrated. Steals seem to be underrated at every position. Guys who put up stats in the areas you don't immediately think about for that position (i.e. Blks/Rebs for guards or Asts for bigs) tend to be underrated, and guys who struggle in those areas tend to be overrated. Especially more recently, older players tend to be underrated.
Of course all predictions are going to miss sometimes. Do you have any theories about why your models drastically miss on occasion; if I recall correctly, we’re hoping that Rodney Hood is one of these cases, and you’re hoping that Zach LaVine is. Are there patterns of tangibles or intangibles you think you see?
I've found some patterns. One of the easier ones is bulky college bigs who are too short to be NBA Cs but don't have much in terms of "skills". Montrezl Harrel would be an example from this class... except he actually doesn't score well. Past guys would be Sweetney and Blair. Another is that the model underrates physical and explosive PGs... Rose, Wall, Westbrook, Bledsoe, all looked goodish in the models, but easily surpassed expectations in the NBA. I wager this has something to do with increased spacing. Another isn't really something that is useful ahead of time, just a common excuse. Scoring is really difficult to project. Some mediocre college scorers end up being much better in the NBA, while some really efficient NCAA scorers end up being miserable at it in the NBA... Finally (and this is the one for optimism re: Hood) most aspects of defense are missed by the available numbers (at least directly). That means I tend to give subjective weight towards prospects with a strong defensive rep, and vice versa. Hood is one of the guys I figured was actually worse than his numbers for this reason, but it seems his D has actually been a plus so far in the NBA.
As far as prospects for the Jazz at #12 go:
Stanley Johnson: This would be the ideal. I think Stanley is going to be really really good.
Kelly Oubre: He'd be a fine pick, but I'm not the biggest fan. Obviously has some talent, but also seems a bit lacking in the mental aspect, which is always a flag for me.
Frank Kaminsky: I am a huge Kaminsky fan. He'd be the perfect 3rd big in Utah. I need to take a look at it, but I think it has been a long time since a player has posted as good of a senior season as Kaminsky did in my models.
Devin Booker: I'm skeptical of Booker.
Kevon Looney: I don't like him as much as my numbers, but I still think he would be really good value late-lottery.
Bobby Portis: I'd take Looney first.
Trey Lyles: See Oubre.
Myles Turner: Fully-realized Turner could be the best player in this draft, but he needs to prove that he can actually shoot and play in the post offensively. Not a great fit with Utah, but could be great value.
Sam Dekker: I'm not a big fan. He is a solid player, but Utah is probably too early for him.
Jerian Grant: Too early. I prefer Delon by quite a bit. I would take him ahead of Tyus though... and I assume most Jazz fans agree after watching Burke struggle.
Who are your top four for the Wolves? Why? Are you as smitten by D’Angelo Russell as most of us are (I’m guessing yes from your models)?
My preferences at the top are pretty boring. I would be happy with any of Towns/D'jello/Winslow/Okafor/Mudiaye, but Towns and Russell are my favorites. I have some mild fears about Okafor's ability to translate to the NBA, and Mudiaye is a terrible fit with Rubio. Those factors drop them a bit below the other top guys.
Last edited: