October 2024 - Michael's Programming Bytes

Python, Linear Regression & the 2024-25 NBA season

Hello everybody!

Michael here, and in today’s post, we’ll continue where we left off from the previous post Python, Linear Regression & An NBA Season Opening Day Special Post. As I mentioned in that post, we’ll use the linear regression equation we obtained to see if we can obtain predictions for the current 2024-25 NBA season?

Disclaimer

Yes, I know I’m trying to predict the various juicy outcomes of the 2024-25 NBA season, but these predictions are purely meant for educational purposes to display the methodology of the predictions, not for game-day parlays and/or your fantasy NBA team. After all, I am your friendly neighborhood coding blogger, but I am not your friendly neighborhood sportsbook. If you do decide to bet on anything during the NBA season, please bet responsibly :-).

Previously on Michael’s Programming Bytes…

In the previous post, we used data from the last 10 NBA seasons for each of the 30 teams to predict season record results, which in turn gave us this linear regression equation that I will use to predict team-by-team results and standings for the 2024-25 NBA season:

Just to recap, here’s what in this equation:

-0.47x (represents team’s losses in a given season)
-1.31x (represents team’s conference finish from 1-15 in a given season)
0.4x (represents average age of team’s roster)
34.13x (represents % of field goals made)
-22.12x (represents % of 3-pointers made)
50.95 (linear regression model intercept)

Our predictions generated in the previous post came back with 91% accuracy/9% mean absolute percentage error, so I can tell we’re gonna get some good predictions here.

And now, for the predictions…

Yes, here comes the fun part, the predictions. For the predictions, I gathered the weighted averages of the five features we used in our model (losses, conference finish, average roster age, % of field goals made and % of 3-pointers made) and placed them into this spreadsheet:

NBA weighted averages Download

Now, how did I calculate the weighted averages of these five features for each team? Well, I simply assigned different weights for different seasons like so:

2021-22 to 2023-24 seasons-0.2 weight (higher weight for the three most recent seasons)
2018-19 to 2020-21 seasons-0.1 weight (they’re a little further back, plus I factored in COVID impacts to the 2019-20 and 2020-21 seasons)
2014-15 to 2017-18 season-0.025 weight (smaller weight since these are the furthest in the past, plus many players in the league during this time have since retired)

After assigning these weights, I calculated averages using the standard procedure for average calculation.

Here’s the basic Python code I used to calculate projected wins for all 30 NBA teams:



import pandas as pd

NBAAVG = pd.read_csv(r'C:\Users\mof39\OneDrive\Documents\NBA weighted averages.csv')

for n in NBAAVG['Team']:
    print(str(-0.47*NBAAVG['L']-1.31*NBAAVG['Finish']
                                      +0.4*NBAAVG['Age']+34.13*NBAAVG['FG%']
                                      -22.15*NBAAVG['3P%']+50.95))
    break

Here’s the GitHub link to that script-https://github.com/mfletcher2021/blogcode/blob/main/NBA%20averages.py

And here are the projected win totals for each team using this equation:

0     37.911934
1     52.863761
2     40.819851
3     31.742252
4     37.524958
5     40.441851
6     42.540851
7     51.103223
8     24.263654
9     45.852691
10    33.197160
11    38.736829
12    47.364055
13    41.338946
14    41.297291
15    45.202762
16    53.722600
17    39.185063
18    37.443009
19    38.462010
20    39.284500
21    32.296571
22    47.795819
23    45.063567
24    33.312626
25    37.493793
26    32.519145
27    42.072515
28    41.920285
29    31.773436

Granted, you don’t actually see the team names in this output, but since the team names are organized alphabetically in the dataset you can tell which team corresponds to which projected win total. However, just for clarity, I’ll elaborate on those totals below:

Atlanta Hawks: 37.911934 wins (38-44)
Boston Celtics: 52.863761 wins (53-29)
Brooklyn Nets: 40.819851 wins (41-41)
Charlotte Hornets: 31.742252 wins (32-50)
Chicago Bulls: 37.524958 wins (38-44)
Cleveland Cavaliers: 40.441851 wins (40-42)
Dallas Mavericks: 42.540851 wins (43-39)
Denver Nuggets: 51.103223 wins (51-31)
Detroit Pistons: 24.263654 wins (24-58)
Golden State Warriors: 45.852691 wins (46-36)
Houston Rockets: 33.197160 wins (33-49)
Indiana Pacers: 38.736829 wins (39-43)
LA Clippers: 47.364055 wins (47-35)
LA Lakers: 41.338946 wins (41-41)
Memphis Grizzlies: 41.297291 wins (41-41)
Miami Heat: 45.202762 wins (45-37)
Milwaukee Bucks: 53.722600 wins (54-28)
Minnesota Timberwolves: 39.185063 wins (39-43)
New Orleans Pelicans: 37.443009 wins (37-45)
New York Knicks: 38.462010 wins (38-44)
Oklahoma City Thunder: 39.284500 wins (39-43)
Orlando Magic: 32.296571 wins (32-50)
Philadelphia 76ers: 47.795819 wins (48-34)
Phoenix Suns: 45.063567 wins (45-37)
Portland Trailblazers: 33.312626 wins (33-49)
Sacramento Kings: 37.493793 wins (37-45)
San Antonio Spurs: 32.519145 wins (33-49)
Toronto Raptors: 42.072515 wins (42-40)
Utah Jazz: 41.920285 wins (42-40)
Washington Wizards: 31.773436 wins (32-50)

As you can see above, I have managed to predict the records for each team for the 2024-25 NBA season. A few things to note about my predictions:

Since NBA records are only counted in whole numbers, I rounded each team’s projected win total up or down to the nearest whole number. For instance, for the Milwaukee Bucks, since their projected win total was 53.722600, I rounded that up to 54 wins (and a 54-28 record).
According to my model, all team’s projected win totals fall between 24 and 54 wins. This make sense since in a given NBA season, a majority of teams’ win totals fall in the 24-54 win range. In the last NBA season (2023-24), 21 teams fell within the 24-54 win range.
Four teams obtained over 54 wins (Celtics with 64, Thunder and Nuggets with 57, and Timberwolves with 56) while five teams obtained less than 24 wins (Spurs with 22, Hornets and Trailblazers with 21, Wizards with 15 and Pistons with 14).
One thing to note about my predictions is that while I rounded up or down to the nearest whole number to get a projected record total, I’ll still factor in the entire decimal (e.g. 45.202762 for the Heat) when deciding how to seed teams, as teams with a higher decimal will be seeded higher in their respective conference.

Michael’s Magnificently Way-To-Early Playoff Picture

Yes, now that we have projected record totals for each of the 30 teams, the next thing we’ll do is predict each team’s seeding.

How will we seed the teams? Well, for one, I’ll rank the teams with the higher projected records higher in their respective conference. For instance, since the Bucks have a higher projected record than the Celtics, I’ll rank the Bucks higher than the Celtics.

However, what if two teams have a really, really close margin between them? For instance, the Minnesota Timberwolves and Oklahoma City Thunder’s projected records of 39.185063 wins and 39.284500 wins respectively are very close to each other. However, since OKC has a slightly higher projected win total, I’ll rank them higher than the Timberwolves.

So without further ado, here’s Michael’s Magnificently Way-Too-Early Playoff Picture!

Eastern Conference

INTO THE PLAYOFFS	INTO THE PLAY-IN	OUT OF PLAYOFF RUNNING
1. Milwaukee Bucks	7. Cleveland Cavaliers	11. Chicago Bulls
2. Boston Celtics	8. Indiana Pacers	12. Washington Wizards
3. Philadelphia 76ers	9. Atlanta Hawks	13. Charlotte Hornets
4. Miami Heat	10. New York Knicks	14. Orlando Magic
5. Toronto Raptors		15. Detroit Pistons
6. Brooklyn Nets

Western Conference

INTO THE PLAYOFFS	INTO THE PLAY-IN	OUT OF PLAYOFF RUNNING
1. Denver Nuggets	7. LA Lakers	11. Sacramento Kings
2. LA Clippers	8. Memphis Grizzlies	12. New Orleans Pelicans
3. Golden State Warriors	9. Oklahoma City Thunder	13. San Antonio Spurs
4. Phoenix Suns	10. Minnesota Timberwolves	14. Portland Trailblazers
5. Dallas Mavericks		15. Houston Rockets
6. Utah Jazz

And now, for some insights

Now that we have our predictions for both team’s projected win totals and projected conference seeding, let’s see if we can gather some insights into what the 2024-25 NBA season might bring for all 30 teams. Without further ado, here are insights across the NBA that I think will be interesting to see play out over the course of the season:

Will the Celtics repeat as champs?

For those who don’t know, the Boston Celtics came out on top as the champions of the 2023-24 NBA season, beating the Dallas Mavericks in 5 games in the 2024 NBA Finals.

Question is, can they do it again? There’s a good chance that can happen, even with the projected 2-seed in the Eastern Conference. After all, the Celtics have kept many of their key playmakers from their championship squad such as Al Horford, Derrick White, Jaylen Brown and of course, Jayson Tatum.

Interestingly, we’ve had SIX different teams win the NBA championship in the last six seasons, such as:

2019-Raptors
2020-Lakers
2021-Bucks
2022-Warriors
2023-Nuggets
2024-Celtics

Could we have a repeat champ for the first time since those seemingly endless Warriors-Cavs finals (remember those)? I’ll reiterate that it’s certainly possible, especially with Tatum in his prime.

Warriors for a deep playoff run?

Yes, I know they’ve had their ups and downs over the last 10 years, but after all, the Golden State Warriors have won 4 championships over the last 10 years, so I have reason to believe they’ll go on another deep playoff run.

Will the loss of Klay Thompson hurt? Yes. Stephen Curry is also on the back-nine of his career (he turns 37 in March), but he did put up the most points per game of anyone on the Warriors’ roster last season (26.4). Curry also had the highest 3-pointer percentage of anyone on the Warriors’ roster last season (40.8%)-recall that successful 3-pointer percentage was one of the five features I used in the linear regression model. Plus, Draymond Green will be returning to the Warriors this eason; he proved to be one of the Warriors’ strongest 3-point shooters and rebounders last season (though he is also in his later career as he will be 35 in March).

Interestingly, this model has the Warriors going 46-36 as the 3-seed in the Western Conference. Funny enough, the Warriors finished 46-36 last season but ended up as the 10-seed in the Western Conference and failed to make it past play-in.

This brings me to my next point…

Will the West be close again?

Last season, the Western Conference was incredibly close when it came to win totals and playoff seeding. After all, the 6-seed in the West last year (Phoenix Suns) still finished with a 49-33 record…and were promptly swept in the Western Conference first round (though that’s neither here nor there).

Another thing to put the closeness of last year’s Western Conference playoff race into perspective-the Warriors finished 46-36 yet only notched a 10-seed and the Houston Rockets finished with an even 41-41 record but missed the postseason entirely (they got the 11-seed).

Which brings be to my next point…

Will the East be far apart?

While last year’s Western Conference was quite competitive, the Eastern Conference was, well, another story:

Image from Wikipedia: https://en.wikipedia.org/wiki/2023%E2%80%9324_NBA_season.

Yes, the Celtics not only got the 1-seed in the East but also finished FOURTEEN games ahead of the 2-seed New York Knicks (yes, the Knicks finished 50-32 and still got the 2-seed). Two teams that had very up-and-down seasons-the Bulls and Hawks-both finished with under 40 wins yet still qualified for the play-in as the 9- and 10-seeds in the East, respectively.

Miami Heat to the play…offs?

Throughout the last 10 years, the Miami Heat have had a great deal of success, making it to the Finals twice in that span (’20 and ’23) and making it to the playoffs 7 of the last 10 seasons (exceptions being ’15, ’17 and ’19).

However, while they did make the playoffs the last two seasons, they had to do so through first making it through the play-ins-both times they made it as the 8-seed in the play-in (meaning they had to play two play-in games to even get a playoff slot).

In this model however, the Miami Heat will earn the 4-seed and make the actual playoffs, not the play-in. What could possibly work to their advantage? Here are a few factors:

While their successful field goal percentage was in the bottom half of the league last season, they came in 12th amongst all teams in successful 3-pointer percentage, which should help their case.
After losing Jimmy Butler and Terry Rozier before play-offs last season, both are now (as of this writing) healthy and ready to play.
Those 42.3 rebounds (both offensive and defensive) last year look pretty good.
Tyler Herro, Bam Adebayo and Jimmy Butler were the top-3 scorers on the Heat in both points per game and field goals last year…those stats certainly matter for big games. Plus Herro is 24 and Adebayo is 27, so both are still in the primes of their careers (though Jimmy Butler at 35 still plays like he’s in his prime in my opinion)

Will the Heat win an NBA championship or make another Finals appearance? TBD. However, it looks like (according to this model I made) that they will at least make it to the play-offs without needing to go through play-ins first (though their 8-seed to-the-Finals run in 2023 was certainly memorable).

And now, for the bottom of the conference

Most of my insights discussed more successful teams and (potential) deep playoff runs. However, I wanted to offer one more insight concerning the two teams at the (projected) bottom of their conferences-the Pistons in the East and Rockets in the West.

First off-the Detroit Pistons, who, according to my model, are projected to be the 15-seed again (they were the 15-seed last season); will they manage to improve this season? My guess is yes-at least in terms of having more wins this season (only 14 wins last year)-but I don’t think they’ll make a strong playoff run, and I know a 28 game losing streak last season to drop the Pistons to 2-30 at one point didn’t help make a case for their postseason hopes. However, give the Pistons credit for changing their coach (now JB Bickerstaff) and GM (now Trajan Langdon) and adding some solid free agents like Tobias Harris (48.7% of successful field goals last season-not too shabby). Again, I doubt they’ll make a strong playoff run, but they could very well finish higher than the 15-seed.

As for the Houston Rockets (projected 15-seed in the West), they finished in the 11-seed last year in a competitive Western Conference with an even 41-41 record. Judging from last years stats-coming in 9th on defense but 20th on offense-they do have some work to do to make a deep playoff run. However, with a good mix of young players like Tari Eason and veterans like Fred VanVleet (who was on the championship 2019 Toronto Raptors), the Rockets could make it past play-in.

Just for fun…Michael’s Play-In Predictions

Now for an added bonus for my loyal readers, here are my educated guess, just-for-fun play-in predictions for both the Eastern and Western conferences. Granted, while the model I made did help predict regular-season seeding in each conference, it didn’t predict who would make it past play-in to grab the 7- and 8-seeds in the conference. So without further ado, here are my play-in predictions based on what I saw in the teams last season:

Eastern Conference

Predictions: Pacers 7-seed, Cavaliers 8-seed

Western Conference

Predictions: Lakers 7-seed, Timberwolves 8-seed

Thanks for reading and I hope you learned something new from this post! Enjoy the NBA season and I will follow up with a Part 3 post on this topic sometime in April, or at least some time after the conclusion of the regular season. It will be interested to see how accurate or off my predictions were.

Michael

Python, Linear Regression & An NBA Season Opening Day Special Post

Hello readers,

Michael here, and in today’s lesson, we’re gonna try something special! For one, we’re going back to this blog’s statistical roots with a linear regression post; I covered linear regression with R in the way, way back of 2018 (R Lesson 6: Linear Regression) on this blog, so I thought I’d show you how to work the linear regression process in Python. Two, I’m going to try something I don’t normally do, which is predict the future. In this case, the future being the results of the just-beginning 2024-25 NBA season. Why try to predict NBA results you might ask? Well, for one, I wanted to try something new on this blog (hey, gotta keep things fresh six years in), and for two, I enjoy following along with the NBA season. Plus, I enjoyed writing my post on the 2020 NBA playoffs-R Analysis 10: Linear Regression, K-Means Clustering, & the 2020 NBA Playoffs.

Let’s load our data and import our packages!

Before we get started on the analysis, let’s first load our data into our IDE and import all necessary packages:

import pandas as pd
from sklearn.model_selection import train_test_split
from pandas.core.common import random_state
from sklearn.linear_model import LinearRegression

You’re likely quite familiar with pandas but for those of you that don’t know, sklearn is an open-source Python library commonly used for machine learning projects (like the linear regression we’re about to do)!

A note about uploading files via Google Colab

Once we import our necessary packages, the next thing we should do is upload the data-frame we’ll be using for this analysis.

This is the file we’ll be using; it contains team statistics such as turnovers (team total) and wins for all 30 NBA teams for the last 10 seasons (2014-15 to 2023-24). The data was retrieved from basketball-reference.com, which is a great place to go if you’re looking for juicy basketball data to analyze. This site comes from https://www.sports-reference.com/, which contains statistics on various sports from NBA to NFL to the other football (soccer for Americans), among other sports.

NBA analysis Download

Now, since I used Google Colab for this analysis, I’ll show you how to upload Excel files into Colab (a different process from uploading Excel files into other IDEs):

To import local files into Google Colab, you’ll need to include the lines from google.colab import files and uploaded = files.upload() in the notebook since, for some odd reason, Google Colab won’t let you upload local files directly into your notebook. Once you run these two lines of code, you’ll need to select a file from the browser tool that you want to upload to Colab.

Next (and ideally in a separate cell), you’ll need to add the lines import io and dataframe = pd.read_csv(io.BytesIO(uploaded['dataframe name'])) to the notebook and run the code. This will officially upload your data-frame to your Colab notebook.

Yes, I know it’s annoying, but that’s just how Colab works. If you’re not using Colab to follow along with me, feel free to skip this section as a simple pd.read_csv() will do the trick to upload your data-frame onto the IDE.

Let’s learn about our data-frame!

Now that we’ve uploaded our data-frame into the IDE, let’s learn more about it!

NBA.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300 entries, 0 to 299
Data columns (total 31 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Season  300 non-null    object 
 1   Team    300 non-null    object 
 2   W       300 non-null    int64  
 3   L       300 non-null    int64  
 4   Finish  300 non-null    int64  
 5   Age     300 non-null    float64
 6   Ht.     300 non-null    object 
 7   Wt.     300 non-null    int64  
 8   G       300 non-null    int64  
 9   MP      300 non-null    int64  
 10  FG      300 non-null    int64  
 11  FGA     300 non-null    int64  
 12  FG%     300 non-null    float64
 13  3P      300 non-null    int64  
 14  3PA     300 non-null    int64  
 15  3P%     300 non-null    float64
 16  2P      300 non-null    int64  
 17  2PA     300 non-null    int64  
 18  2P%     300 non-null    float64
 19  FT      300 non-null    int64  
 20  FTA     300 non-null    int64  
 21  FT%     300 non-null    float64
 22  ORB     300 non-null    int64  
 23  DRB     300 non-null    int64  
 24  TRB     300 non-null    int64  
 25  AST     300 non-null    int64  
 26  STL     300 non-null    int64  
 27  BLK     300 non-null    int64  
 28  TOV     300 non-null    int64  
 29  PF      300 non-null    int64  
 30  PTS     300 non-null    int64  
dtypes: float64(5), int64(23), object(3)
memory usage: 72.8+ KB

Running the NBA.info() command will allow us to see basic information about all 31 columns in our data-frame (such as column names, amount of records in dataset, and object type).

In case you’re wondering about all the abbreviations, here’s an explanation for each abbreviation:

Season-The specific season represented by the data (e.g. 2014-15)
Team-The team name
W-A team’s wins in a given season
L-A team’s losses in a given season
Finish-The seed a team finished in during a given season in their conference (e.g. Detroit Pistons finishing 15th seed in the East last season)
Age-The average age of a team’s roster as of February 1 of a given season (e.g. February 1, 2024 for the 2023-24 season)
Ht.-The average height of the team’s roster in a given season (e.g. 6’6)
Wt.-The average weight (in lbs.) of the team’s roster in a given season
G-Total amount of games played by the team in a given season
MP-Total minutes played as a team in a given season
FG-Field goals scored by the team in a given season
FGA-Field goal attempts made by the team in a given season
FG%-Percent of successful field goals made by team in a given season
3P-3-point field goals scored by the team in a given season
3PA-3-point field goal attempts made by the team in a given season
3P%-Percent of successful 3-point field goals made by the team in a given season
2P-2-point field goals scored by the team in a given season
2PA-2-point field goal attempts made by the team in a given season
2P%-Percent of successful 2-point field goals made by the team in a given season
FT-Free throws scored by the team in a given season
FTA-Free throw attempts made by the team in a given season
FT%-Percent of successful free throw attempts made by the team in a given season
ORB-Team’s total offensive rebounds in a given season
DRB-Team’s total defensive rebounds in a given season
TRB-Team’s total rebounds (both offensive and defensive) in a given season
AST-Team’s total assists in a given season
STL-Team’s total steals in a given season
BLK-Team’s total blocks in a given season
TOV-Team’s total turnovers in a given season
PF-Team’s total personal fouls in a given season
PTS-Team’s total points scored in a given season

Wow, that’s a lot of variables! Now that understand know the data we’re working with better, let’s see how we can make a simple linear regression model!

If you’re not familiar with basketball jargon, the NBA has a great glossary of basic terms on their website: https://www.nba.com/stats/help/glossary

The K-Best Way To Set Up Your Model

Before we start the juicy analysis, let’s first pick the features we will use for the model. In this post, we’ll explore the Select K-Best algorithm, which is an algorithm commonly used in linear regression to help select the best features for a particular model:

X = NBA.drop(['Season', 'Team', 'W', 'Ht.'], axis=1)
y = NBA['W']

from sklearn.feature_selection import SelectKBest, f_regression
features = SelectKBest(score_func=f_regression, k=5)
features.fit(X, y)

selectedFeatures = X.columns[features.get_support()]
print(selectedFeatures)

Index(['L', 'Finish', 'Age', 'FG%', '3P%'], dtype='object')

According to the Select K-Best algorithm, the five best features to use in the linear regression are L, Finish, Age, FG% and 3P%. In other words, a team’s end-of-season seeding, total losses, average roster age, and percentage of successful field goals and 3-pointers are the five most important features to predict a team’s win total.

How did the model arrive to these conclusions? First of all, I set the X and y variables-this is important as the Select K-Best algorithm needs to know what is the dependent variable and what are possible independent variable selections that can be used in the model. In this example, the dependent (or y) variable is W (for team wins) while the X variable includes all other dataset columns except for W, Team, Season, and Ht. because W is the y variable and the other three variables are categorial (or non-numerical) variables, so they really won’t work in our analysis.

Next we import the SelectKBest and f_regression packages from the sklearn.feature_selection module. Why do we need these two packages? Well, SelectKBest will allow us to use the Select K-Best algorithm while f_regression is like a back-end feature selection method that allows the Select K-Best algorithm to select the best x-amount of features for the model (I used five features for this model).

After setting up the Select K-Best algorithm, we then fit both the X and y variables to the algorithm and then print out our top five selectedFeatures.

Train, test…split!

Once we have our top five features for model, it’s time for the train, test, splitting of the model! What is train, test, split you ask? Well, our linear regression model will be split into two types of data-training data (the data we use for training the model) and testing data (the data we use to test our model). Here’s how we can utilize the train, test, split for this model:

X = NBA[['L', 'Finish', 'Age', 'FG%', '3P%']]
y = NBA['W']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

How does the train, test, split work? Using sklearn’s train_test_split method, we pass in four parameters-our independent variables (X), our dependent variable (y), the size of the test data (a decimal between 0 and 1), and the random state (this can be kept at 0, but it doesn’t matter what number you use-42 is another common number). In this model, I will utilize an 80/20 train, test, split, which indicates that 80% of the data will be for training while the other 20% will be used for testing.

Other common train, test, splits are 70/30, 85/15, and 67/33, but I opted for 80/20 because our dataset is only 300 rows long. I would utilize these other train, test, splits for larger datasets.

Something worth noting: What we’re doing here is called multiple linear regression since we’re using five X variables to predict a Y variable. Simple linear regression would only use one X variable to predict a Y variable. Just thought I’d throw in this quick factoid!

And now, for the model-making

Now that we’ve done all the steps to set up our model, the next thing we’ll need to do is actually create the model!

Here’s how we can get started:

NBAMODEL = LinearRegression()
NBAMODEL.fit(X_train, y_train)

LinearRegression()

In this example, we create a LinearRegression() object (NBAMODEL) and fit it to both the X_train and y_train data.

Predictions, predictions

Once we’ve created our model, next comes the fun part-generating the predictions!

yPredictions = NBAMODEL.predict(X_test)

yPredictions

array([53.20097648, 28.89541793, 52.26551381, 53.22220829, 35.90676716,
       32.15874993, 47.72090936, 48.32896277, 39.4193884 , 40.1548429 ,
       19.62678175, 48.3263792 , 32.13473281, 43.50887634, 43.85260484,
       52.79795145, 27.35822648, 40.23392095, 18.85423981, 61.69624816,
       51.59650403, 23.86311747, 56.18087097, 54.15867678, 49.75211403,
       46.90177259, 31.80109001, 46.82531833, 37.50563942, 32.19863141,
       52.41205133, 25.09011881, 48.94542256, 38.80244997, 24.80146638,
       42.50107728, 43.27320835, 37.45199938, 46.7795962 , 28.11289951,
       57.64388881, 29.35812466, 18.3222965 , 36.26677012, 20.56912227,
       22.15266241, 19.9955299 , 44.84930613, 45.14740453, 23.19471644,
       53.940611  , 26.0780373 , 27.88093669, 61.23347337, 52.99948229,
       34.66653881, 30.04421016, 27.21669768, 48.55215233, 47.11060905])

The yPredictions are obtained through using the predict method on the model’s X_train data, which in this case consists of 60 of the 300 records..

Evaluating the model’s accuracy

Once we’ve created the model and made our predictions on the training data, it’s time to evaluate the model’s accuracy. Here’s how to do so:

from sklearn.metrics import mean_absolute_percentage_error

mean_absolute_percentage_error(y_test,yPredictions)

0.09147159762376074

There are several ways you can evaluate the accuracy of a linear regression model. One good method as shown here is the mean_absolute_percentage_error (imported from the sklearn.metrics package). The mean absolute percentage error evaluates the model’s accuracy by indicating how off the model’s predictions are. In this model, the mean absolute percentage error is 0.09147159762376074, indicating that the model’s predictions are off by roughly 9%-which also indicates that overall, the model’s predictions are roughly 91% accurate. Not too shabby for this model!

Interestingly, the two COVID impacted NBA seasons in the dataset (2019-20 and 2020-21) didn’t throw off the model’s accuracy much.

Don’t forget about the equation!

Evaluating the model’s accuracy isn’t the only thing you should do when analyzing the model. You should also grab the model’s coefficients and intercept-they will be important in the next post!

NBAMODEL.coef_

array([ -0.4663858 ,  -1.30716212,   0.39700734,  34.1325687 ,
       -22.12258585])

NBAMODEL.intercept_

50.945769772855854

All linear regression models will have a coefficient and an intercept, which form the linear regression equation. Since our model had five X variables, there are five coefficients.

Now, what would our equation look like?

Here is the equation in all it’s messy glory. We’re going to be using this equation in the next post.

Linear regression plotting

For the visual learners among my readers, I thought it would be nice to include a simple scatterplot to visualize the accuracy of our linear regression model. Here’s how to create that plot:

import matplotlib.pyplot as plt
plt.scatter(y_test, yPredictions, color="red")
plt.xlabel('Actual values', size=15)
plt.ylabel('Predicted values', size=15)
plt.title('Actual vs Predicted values', size=15)
plt.show()

First, I imported the matplotlib.pyplot module. Then, I ran the plt.scatter() method to create a scatterplot. I used three parameters for this method: the y_test values, the yPredictions values, and the color="red" parameter (this just indicated that I wanted red scatterplot dots). I then used the plt.xlabel(), plt.ylabel(), and plt.title() methods to give the scatterplot an x-label title, y-label title, and title, respectively. Lastly, I used the plt.show() method to display the scatterplot in all of its red-dotted glory.

As you can see from this plot, the predicted values match the actual values fairly closely, hence the 91% accuracy/9% error.

Thanks for reading, enjoy the upcoming NBA season action, and stay tuned for my next post where I reveal my predicted records and standings for each team, East and West! It will be interesting to see how my predictions pan out over the course of the season-after all, it’s certainly something different I’m trying on this blog!

And yes, perfect timing for this blog to come out on NBA season opening day! Serendipity am I right?

Also, here’s a link to the notebook in GitHub-https://github.com/mfletcher2021/DevopsBasics/blob/master/NBA_24_25_predictions.ipynb.