Hi all,
I’ve seen a lot of people talk about the preseason as irrelevant in terms of how a team actually performs during the regular season, so I wanted to see whether or not a correlation existed or not. This project was pretty simple, and the hardest part was just getting the data(there is very little preseason data, and most of it requires copying and pasting from website tables).
Methodology
I was looking at correlation purely from a “Win %” perspective, so I just gathered data on the last ~10 regular seasons and preseasons and had them in separate tables. I then merged the tables together based on both the year of the season and the team itself. With my final data frame, I created a scatterplot that plotted preseason winning percentages against regular season winning percentages. I also built a simple linear regression model and found the correlation between the two.
Conclusions
In terms of the linear regression model, the equation for the line of best fit was calculated to be (Predicted Regular Season Win %) = .405 + .178(Preseason Win %), which indicates that a 1% increase in preseason winning percentages correlates to a 0.178% increase in regular season winning percentages. The coefficient of preseason winning percentages was found to be statistically significant, which indicates that, at least to some degree, preseason performance CAN be used to predict regular season performance. The R^2, however, was only .088, indicating that very little variability of the regular season can be predicted by the preseason.
The graph shows results similar to what the models predict, with the data being scattered all over the place. The graph can be accessed through this link.
Next Steps
This project was really simple, but I think there are some other applications. For one, you could try looking at whether preseason statistics are indicative of regular season statistics(i.e. FG%, 3P%, etc.) for both teams and players. You could also look at the correlation between preseason and regular season for extremely good preseason performances and extremely poor preseason performances, as there may be stronger correlations there. I think a lot of it boils down to the preseason being a place for teams to test what they’ve worked on in the offseason instead of treating it like the actual league.
absolutely not
Simple but interesting analysis. As we know, utilisation of key players is significantly lower during pre-season games. It would be interesting to account for a structure of playtime. E.g. PSI (population stability index) could be computed for each team and season combimation to measure difference between minutes of each player played in pre-season and regular season games. This could be utilised as an additional variable that could somewhat capture this structural change. Obviously, this still would be a pretty naive analysis as there is a multitude of other factors to acccount for.
Wins don’t correlate but I think player performance does.
Wins don’t correlate bc coaches will use preseason to give guys who otherwise wouldn’t get mins a chance to show what they got. It’s effectively extended garbage time for decent chunks of the game.
That being said, how a player looks (starter or otherwise) in that time I think is meaningful - just like any other game film (Summer League, International play etc).
A lot teams haven’t been playing starters after halftime.
sure. doesn’t mean the first half film is not meaningful. and even evaluating non-starter talent in the 2nd half is appropriate and meaningful as well.
Great work, and also (perhaps unintentionally) also highlights how stats always needs to be contextualized lol.
Nice work! If you’re curious, Kostya Medvedovsky (creator of the DARKO model) has studied this extensively. I talked to him about it here.
Great work. I agree with u/AnkitPancakes. I don’t think wins/losses correlate but performance does, and teams/players that don’t perform well in preseason…that can be a foreshadowing for at least a rough initial start. Teams develop habits in preseason and developing bad habits or a loss in confidence in a player after a rough preseason can carry.
I would try only looking at minutes that 4 rotation guys are on the floor, which requires some subjective pruning of the data set, but I just don’t think you can consider minutes when 2 out of rotation guys are out there. So like for the Raptors I’d only look at minutes where 4 of Barnes, Siakam, OG, Poeltl, Trent, Schroeder, Boucher and McDaniels are out there.
TLDR: no