What's new

GT: Jazz v 76ers | 12/28/15 | 7pm MST

The larger number of variables exist because those variables are necessary in order to CALCULATE the number of possessions, as opposed to simply estimating them like NBA.com does.

To be clear, I'm not claiming that basketball-reference is perfect; just that it's far more accurate than NBA.com. In fact, I can prove it.

Let's look at Game 1 of the 2013 NBA Finals, as this is one of the best examples that had me dig into the differences between formulas in the first place:

According to basketball-reference:
SAS ortg 108.2
MIA ortg 103.5
SAS +4.7 efficiency differential

According to NBA.com:
SAS 102.3 ortg
MIA 102.9 ortg
SAS -0.6 efficiency differential.

The Spurs won by 4 points, and yet NBA.com gave the Spurs a lower offensive rating than the Heat in that game.

The main reason why NBA.com's numbers are way off (according to them, the Spurs lost Game 1!) is because they calculate offensive and defensive ratings from team pace rather than from game pace. Basketball-reference calculates team A's and team B's possessions, adds them and divides by 2, and that gives us game pace and then based on THAT calculates ortg/drtg.

It's a very good method, and from this and several other examples, I've found the error margin is very small if we compare estimated results to real (calculated manually from play-by-play).

As I stated, but you cut out, our drop in Assist % is much more telling to me. Our players have developed another year, Burks has been back, Burke has had a huge step up, so saying we are more efficient this year and saying our offensive system has improved is crazy.

Look at the teams with the top assist % from year to year. The top teams are generally the best in the league. The bottom teams suck. We need more balance and our system isn't giving it to us. Our system was so much better in 03 that a much less talented team played much better, and it was fun to watch. This **** is boring to watch, even when we win. Does anyone really like our system? If so, please tell me why.

I like Quin because I think he gives our players more confidence and helps them develop. This is a huge plus. But I hate his system. I think Sloan didn't adapt to the importance of the 3 enough, but other than that he had a great system, but I do think his rigidness did cause some of our players to lose confidence.
 
This is absolutely true. He gets a very high number of no-calls.

He tries to get calls by throwing his head back. Refs don't call it much. He needs to do a better job if getting his arms tangled and hit on shots. I do agree refs let him get pretty banged up with no calls. It is probably the hair flopping around distracting the refs.
 
As I stated, but you cut out, our drop in Assist % is much more telling to me. Our players have developed another year, Burks has been back, Burke has had a huge step up, so saying we are more efficient this year and saying our offensive system has improved is crazy.

Look at the teams with the top assist % from year to year. The top teams are generally the best in the league. The bottom teams suck. We need more balance and our system isn't giving it to us. Our system was so much better in 03 that a much less talented team played much better, and it was fun to watch. This **** is boring to watch, even when we win. Does anyone really like our system? If so, please tell me why.

I like Quin because I think he gives our players more confidence and helps them develop. This is a huge plus. But I hate his system. I think Sloan didn't adapt to the importance of the 3 enough, but other than that he had a great system, but I do think his rigidness did cause some of our players to lose confidence.

Yeah, thing is, we don't have a single PG worth speaking of on the active roster. None of our wings are particularly good at passing the ball. The lowest assist % is a reflection of the personnel, not the system.
 
It has been a LONG time since I took Advanced Linear Statistics, but it is clear to me that Basketball Reference's calculation method is prone to high variance and multicollinearity, due to the relatively large number of predictor variables.
1. Not really.

2. There's really no need to estimate possessions; they can be taken directly from the play-by-play data. To date, I've scraped the NBA play-by-play data from the 2013/14 and 2014/15 seasons. I've broken the data down into segments between substitutions and the start/end of quarters, and have almost finished getting lineups in these segments -- there are some quarters/periods in which teams have players who play from start to finish without showing up in the box score, which makes doing so a bit more difficult. Fortunately, I only have 25 instances of a player missing from a lineup over those two seasons -- roughly 20000 team periods -- and no team has two periods in a single game missing a player. As such, I can sum minutes over my play-by-play lineups (since I've also calculated the length of the segments) and compare to the box score minutes. The player off by 5 or 12 minutes (in 24 of the 25 instances, the missing player occurs in an overtime period) must be the missing player. I'm busy finishing up some other work, but I should have all NBA lineups for the last 19 seasons (NBA play-by-play data goes back to the 1996/97 season) by the end of this coming weekend. I digress...Next step is getting possessions from the play-by-play data so that I can measure player/lineup efficacy in different ways. Accurate offensive/defensive efficiency should be done in fairly short order.
 
1. Not really.

2. There's really no need to estimate possessions; they can be taken directly from the play-by-play data. To date, I've scraped the NBA play-by-play data from the 2013/14 and 2014/15 seasons. I've broken the data down into segments between substitutions and the start/end of quarters, and have almost finished getting lineups in these segments -- there are some quarters/periods in which teams have players who play from start to finish without showing up in the box score, which makes doing so a bit more difficult. Fortunately, I only have 25 instances of a player missing from a lineup over those two seasons -- roughly 20000 team periods -- and no team has two periods in a single game missing a player. As such, I can sum minutes over my play-by-play lineups (since I've also calculated the length of the segments) and compare to the box score minutes. The player off by 5 or 12 minutes (in 24 of the 25 instances, the missing player occurs in an overtime period) must be the missing player. I'm busy finishing up some other work, but I should have all NBA lineups for the last 19 seasons (NBA play-by-play data goes back to the 1996/97 season) by the end of this coming weekend. I digress...Next step is getting possessions from the play-by-play data so that I can measure player/lineup efficacy in different ways. Accurate offensive/defensive efficiency should be done in fairly short order.

Wow, I thought I liked to dig into data. I hope you will share.
 
Wow, I thought I liked to dig into data. I hope you will share.
I want to get a bit further before sharing what's needed to replicate what I'm doing. I may share some of the **** I find though.

It's worth mentioning that NBA data is in an incredibly user-friendly format. I just started learning python in October, and I've already scraped a ****load of data, and saved it in a more usable/versatile format (.csv). Most of my scripts to scrape the data and put it into dataframes (using the requests and pandas modules in python) are <20 lines of code. I have all of the raw player tracking data (1/2 formatted), a bunch of player tracking stats, and shot- and other player stat logs from the last two seasons. I may scrape the box scores, which will probably take a few minutes to code, and very little time to scrape -- I'd guess scraping, formatting and saving every NBA box score available could be done overnight.

I'm a little reluctant to share how to do this -- I'm a bit of a lazy dick, and don't want to help people get ahead (of me) -- but a few google searches is likely enough to figure much of this out. The script to get lineups from the play-by-play data (not quite done) is a little more complicated, but getting the raw data, and saving it in .csv format is pretty straightforward.

The NBA has some other data that I don't know how to get. In particular, they have some player tracking passing stats that can be viewed -- and easily scraped -- right on NBA.com. That these stats exist means they must also have player tracking passing logs used to calculate the statistics. I'd really like to get my hands on those, but since they can't be viewed on the website, I have no idea what the 'resource' is where those stats are kept. This 'resource' is likely the only part of the urls needed to find these stats. I've tried a few shots in the dark that lined up with 'resource' names of similar stats to no avail. I'll likely shoot them an email requesting a full stats.nba.com site/resource map. Fingers crossed.
 
Back
Top