Hi everybody,

Michael here, and in today’s post, I thought I’d try something a little familiar. You may recall that last October, I released a pair of posts (Python, Linear Regression & An NBA Season Opening Day Special Post and Python, Linear Regression & the 2024-25 NBA season) attempting to predict each NBA team’s win total and conference seeding based off of their performance from the previous 10 seasons.
All in all, after seeing how the season played out-I managed to get only 3/30 teams in the correct seeding. So what would I do here?
I’ll give my ML NBA machine learning predictions another go, also using data from the previous 10 seasons (2015-16 to 2024-25). You may be wondering why I’m trying to predict the outcomes of the upcoming NBA season once more given how off last year’s predictions were-the reason I’m giving the whole “Michael’s NBA crystal ball” thing another go is because I’m not only interested in how my predictions change from one season to the next but also because I plan to use a slightly different model than I did last year (it’ll still be good old linear regression, however) so I can analyze how different factors might play a role in a team’s record and ultimately their conference seeding.
So, without further ado, let’s jump right in to Michael’s Linear Regression NBA Season Predictions II!
Reading the data
Before we dive in to our juicy predictions, the first thing we need to do is read in the data to the IDE. Here’s the file:
Now let’s import the necessary packages and read in the data!
import pandas as pd
from sklearn.model_selection import train_test_split
from pandas.core.common import random_state
from sklearn.linear_model import LinearRegression
from google.colab import files
uploaded = files.upload()
import io
NBA = pd.read_excel(io.BytesIO(uploaded['NBA analysis 2025-26.xlsx']))
NBA.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300 entries, 0 to 299
Data columns (total 31 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Season 300 non-null object
1 Team 300 non-null object
2 W 300 non-null int64
3 L 300 non-null int64
4 Finish 300 non-null int64
5 Age 300 non-null float64
6 Ht. 300 non-null object
7 Wt. 300 non-null int64
8 G 300 non-null int64
9 MP 300 non-null int64
10 FG 300 non-null int64
11 FGA 300 non-null int64
12 FG% 300 non-null float64
13 3P 300 non-null int64
14 3PA 300 non-null int64
15 3P% 300 non-null float64
16 2P 300 non-null int64
17 2PA 300 non-null int64
18 2P% 300 non-null float64
19 FT 300 non-null int64
20 FTA 300 non-null int64
21 FT% 300 non-null float64
22 ORB 300 non-null int64
23 DRB 300 non-null int64
24 TRB 300 non-null int64
25 AST 300 non-null int64
26 STL 300 non-null int64
27 BLK 300 non-null int64
28 TOV 300 non-null int64
29 PF 300 non-null int64
30 PTS 300 non-null int64
dtypes: float64(5), int64(23), object(3)
memory usage: 72.8+ KB
As you can see, we’ve still got all 31 features that we had in last year’s dataset-the only difference between this dataset and last year’s is the timeframe covered (this dataset starts with the 2015-16 and ends with the 2024-25 season).
- Just like last year, this year’s edition of the predictions comes from http://basketball-reference.com, where you can search up plenty of juicy statistics from both the NBA and WNBA. Also, just like last year, the only thing I changed in the data from Basketball Reference is the
Finishvariable, which represents a team’s conference finish (seeding-wise) as opposed to divisional finish (since divisional finishes are largely irrelevant for a team’s playoff standings). - If you want a better explanation of these terms, please feel free to refer to last year’s edition of my predictions post-Python, Linear Regression & An NBA Season Opening Day Special Post.
Now that we’ve read our file into the IDE, let’s create our model!
Creating the model
You may recall that last year, before we created the model, we used the Select-K-Best algorithm to help us pick the optimal model features. For a refresher, here’s what Select-K-Best chose for us:
['L', 'Finish', 'Age', 'FG%', '3P%']
After seeking the five best features for our model from the Select-K-Best algorithm, this is what we got. However, we’re not going to use the Select-K-Best suggestions this year as there are other factors I’d like to analyze when it comes to making upcoming season NBA predictions.
Granted, I’ll keep the Finish, FG%, and 3P% as I feel they provide some value to the model’s predictions, but I’ll also add a few more features of my own choosing:
X = NBA[['FG%', '3P%', '2P%', 'Finish', 'TRB', 'AST', 'STL', 'BLK', 'TOV']]
y = NBA['W']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
Along with the features I chose from last year’s model, I’ll also add the following other scoring categories:
2P%-The percentage of a team’s successful 2-pointers in a given seasonTRB-A team’s total rebounds in a seasonAST-A team’s total assists in a seasonSTL-A team’s total steals in a seasonBLK-A team’s total blocks in a seasonTOV-A team’s total turnovers in a season
The y-variable will still be W, as we’re still trying to predict an NBA team’s win total for the upcoming season based off of all our x-variables.
Now, let’s create a linear regression model object and run our predictions through that model object:
NBAMODEL = LinearRegression()
NBAMODEL.fit(X_train, y_train)
yPredictions = NBAMODEL.predict(X_test)
yPredictions
array([43.2515066 , 36.4291265 , 55.14626364, 46.01164579, 24.18679591,
35.59131124, 35.59836527, 49.98114132, 48.57869061, 50.65733101,
21.296126 , 49.94020238, 31.98306604, 41.89217714, 45.65373458,
50.57831266, 32.76923727, 45.6898562 , 20.4393901 , 55.28944034,
52.79027154, 21.81113366, 50.79142468, 50.95798684, 53.23802534,
50.00199063, 48.4639119 , 49.1671417 , 51.12760913, 31.20606334,
45.3090483 , 25.02488097, 43.67955061, 48.47484838, 33.74041157,
41.7463038 , 36.10796911, 40.5399278 , 35.30656175, 16.92677689,
49.77947698, 39.2160337 , 22.08871355, 31.83549487, 15.2675987 ,
18.24486804, 21.71657476, 42.21505537, 22.84745758, 25.56862333,
43.6212702 , 20.28339646, 44.60289296, 49.20316062, 53.69182149,
29.48304908, 44.60789347, 42.44466633, 55.93637972, 54.89728291])
Just as with last year’s model, the predictions are run on the test dataset, which consists of the last 60 of the dataset’s 300 total records.
And now for the equation…
Now that we’ve generated predictions for our test dataset, let’s find out all of the coefficients and the intercept for the equation I will use to make this year’s NBA predictions:
NBAMODEL.coef_
array([ 6.48260593e+01, 1.13945178e+02, 1.54195451e+01, -1.94822281e+00,
1.10428617e-02, -3.46015457e-03, 2.15326621e-02, 6.63810730e-03,
-9.70593407e-03])
NBAMODEL.intercept_
np.float64(-60.37720744829896)
Now that we know what our coefficients are, let’s see what this year’s equation looks like:

Although it’s much more of a mouthful than last year’s equation, it follows the same logic in that it uses the features of this year’s model in the order that I listed them:
['FG%', '3P%', '2P%', 'Finish', 'TRB', 'AST', 'STL', 'BLK', 'TOV']
A is FG%, B is 3P%, and so on until you get to I (which represents TOV).
- Since all the coefficients are listed in scientific notation, I rounded them to two decimal places before converting them for this equation. Same thing for the intercept.
- In case you’re wondering, no you can’t add all the coefficients together for this equation as each coefficient plays a part in the overall equation. Just like last year, we’re going to do the weighted-averages thing to generate projected win totals. Keep your eyes peeled for the next post, which covers the juicy predictions.
…and the accuracy test!
So now that we’ve got our 2025-26 NBA predictions model, let’s see how accurate it is:
from sklearn.metrics import mean_absolute_percentage_error
mean_absolute_percentage_error(y_test,yPredictions)
0.09573425883736708
Using the MASE (mean absolute percentage error) from sklearn like we did in last year’s analysis, we see that the model’s margin of error is roughly 9.57%. I’ll round that up to 10%, which means that despite not choosing the model’s features from a prebuilt algorithm, the overall accuracy of the model is still 90%.
Now, whether the model’s accuracy and my predictions hold up is something I’ll certainly revisit in 8 months time for another end-of-season reflection. After all, last season I only got 3 of the 30 teams in the correct seeding, though I did do better with predicting which teams didn’t make playoffs though.
- Recall that to find the accuracy of the model using the MASE, subtract 100 from the (MASE * 100). Since the MASE rounds out to 10 as the nearest whole number (rounded to 2 decimal places), 100-10 gives us an accuracy of 90%
Last but not least, it’s prediction visualization time!
Before we go, the last thing I want to cover is how to visualize this year’s model’s predictions. Just like last year, we’re going to use the PYPLOT module from MATPLOTLIB:
import matplotlib.pyplot as plt
plt.scatter(y_test, yPredictions, color="red")
plt.xlabel('Actual values', size=15)
plt.ylabel('Predicted values', size=15)
plt.title('Actual vs Predicted values', size=15)
plt.show()

As you can see, the plot forms a sort of diagonal-line shape, which reinforces the model’s 90% prediction accuracy rate.
Also, just for comparison’s sake, here’s what my predictions looked like on last year’s model (the one where I used Select-K-Best to choose the model features):

This also looks like a diagonal-line shape, and last year’s model had a 91% accuracy rate.
Here’s the link to the Colab notebook in my GitHub-https://github.com/mfletcher2021/blogcode/blob/main/NBA_25_26_predictions.ipynb
Thanks for reading, and keep an eye out for my 2025-26 season predictions,
Michael