R Analysis 8: Linear Regression & the Top 200 Albums of the 2010s

Advertisements

Hello everybody,

It’s Michael, and welcome to the roaring (20)20s! With that said, it’s fitting that my first post of the 2020s will be about the 2010s-more specifically, Billboard’s Top 200 Albums of the 2010s, which I will analyze using linear regression.

Here’s the dataset-Billboard 200.

As always, let’s open up R, upload the file, and learn more about our dataset:

Screen Shot 2020-01-27 at 10.23.53 PM

This dataset shows all of the album that ended up on Billboard’s Top 200 Decade-End Chart along with information about each album (such as number of tracks).

In total, there are 200 observations of 7 variables:

  • Rank-Where the album ranked on the Billboard’s Top 200 Decade-End Chart (anywhere from 1 to 200)
  • Album-The name of the album
  • Artist-The singer/group (or distributor in the case of some movie soundtrack) who created the album
  • Genre-The main genre of the album (note that albums can fit into several sub-genres, which I will explore in this analysis)
  • Tracks-How many tracks are on the album
  • Metacritic-The album’s Metacritic score
    • For those that don’t know, Metacritic is a movie/TV show/music review site, much like Rotten Tomatoes (except RT doesn’t review music)
  • Release Date-The album’s release date
    • Even though this is a 2010s Decade-End Chart, there are interestingly a handful of albums from the late ’00s. Guess they still held up into the ’10s.

Now, let’s check to see if there’s missing data (remember to install the Amelia package):

  • Also remember to type the command missmap(file) (or whatever variable you called your file) to see the missing-ness map.

As you can see, 97% of observations are present while 3% are missing. All of the missing observations are in the Metacritic column-this is because not all albums have a Metacritic score (there are plenty of other music review sites such as HipHop DX, but I only went off of Metacritic reviews to maintain consistency in the dataset).

Now, I don’t know if I’ve mentioned this before, but when there are missing values in a column in R, there are three things you can do:

  • Don’t use the column in analysis
  • Fill in missing column values with the mean of the column (meaning the mean you get from all non-NA values inn the column)
  • Fill in the missing column values with an arbitrary fixed value

The first option sometimes works, but not for this dataset, as I want to use the Metacritic column in some way in this analysis. The second option might work, but I won’t use it since I feel that imputing the mean Metacritic score for any NAs in the column wouldn’t make much sense (plus this option won’t work with non-numeric columns). In this case, the third option is my best best; I will fill in any missing values in the Metacritic column using the number 0. You could pick any number-I just chose 0 since doing so gives me an easy way to spot the albums without Metacritic scores. Plus 0 won’t impact the mean of the Metacritic column in any significant way.

Here’s the line of code to make the magic happen:

file$Metacritic[is.na(file$Metacritic)] <- 0

Once we run this line of code, here’s what the Metacritic column looks like now:

All the NAs are filled with zeroes, which, in my opinion, makes the column a lot neater looking.

Now, let’s do some linear regression. Here’s a simple model with one independent and one dependent variable:

> model1 <- lm(file$Metacritic~file$Genre)
> summary(model1)

Call:
lm(formula = file$Metacritic ~ file$Genre)

Residuals:
Min 1Q Median 3Q Max
-65.345 -7.384 2.661 13.667 55.692

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.222e-14 1.546e+01 0.000 1.000000
file$GenreChristmas 2.433e+01 2.187e+01 1.113 0.267277
file$GenreCountry 3.581e+01 1.619e+01 2.211 0.028262 *
file$GenreElectronic 4.700e+01 2.046e+01 2.298 0.022708 *
file$GenreFolk 6.650e+01 2.445e+01 2.720 0.007156 **
file$GenreJazz 3.750e+01 2.445e+01 1.534 0.126801
file$GenreMovie Soundtrack 2.231e+01 1.715e+01 1.300 0.195099
file$GenreMusical 8.500e+01 3.093e+01 2.748 0.006584 **
file$GenreOpera 5.400e+01 3.093e+01 1.746 0.082465 .
file$GenrePop 6.534e+01 1.586e+01 4.121 5.71e-05 ***
file$GenreR&B 4.925e+01 1.729e+01 2.849 0.004889 **
file$GenreRap 6.133e+01 1.591e+01 3.855 0.000160 ***
file$GenreReggae -6.545e-14 3.093e+01 0.000 1.000000
file$GenreRock 6.908e+01 1.729e+01 3.996 9.31e-05 ***
file$GenreSoul 7.550e+01 2.046e+01 3.691 0.000294 ***
file$GenreVarious -1.459e-13 2.445e+01 0.000 1.000000

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 26.78 on 184 degrees of freedom
Multiple R-squared: 0.314, Adjusted R-squared: 0.2581
F-statistic: 5.614 on 15 and 184 DF, p-value: 2.224e-09

In this model, I used Metacritic as the dependent variable (I did say I was going to use it in this analysis) and Genre as the independent variable. I chose these two variables because I wanted to analyze whether certain genres tend to get higher/lower Metacritic scores.

What does all of the output mean? Since my last linear regression post was over a year ago, let me give you guys a refresher:

  • The residual standard error refers to the amount that the dependent variable (Metacritic) deviates from the true regression line. In this case, the RSE is 26.78, meaning the Metacritic score deviates from the true regression line by 27 (rounded to the nearest whole number). Since Metacritic scores only go up to 100, 27 is quite a large RSE.
  • The R-squared is the measure of a model’s goodness-of-fit; the closer to 1 means the better the fit. The difference between the Multiple R-Squared and the Adjusted R-Squared is that the former isn’t dependent on the amount of variables in the model while the latter is. In this case, the Multiple R-Squared is 31.4% while the adjusted R-squared is 25.81%. This implies that there isn’t much of a correlation between an album’s genre and Metacritic score.
    • It’s not an official rule, but I’d say the Multiple R-Squared should be at 51% for there to be any correlation between a dependent variable and any independent variable(s). The Adjusted R-Squared can be slightly lower than 51%.
    • In the post R Analysis 2: Linear Regression & NFL Attendance, I mentioned the idea that “correlation does not imply causation”, which holds true here. In the context of this model, just because there is a slight correlation between an album’s genre and its Metacritic score, this doesn’t mean that certain album genres will tend to score higher/lower on Metacritic.
    • Disregard the F-statistic and corresponding p-value. However, if you want more context on both of these things, please check out the link in the previous bullet point for a more in-depth explanation.
  • Notice how the independent variable-Genre-is split up into several different sub-categories. These sub-categories represent all of the album genres listed in this dataset.
    • The asterisks right by the subcategories (after the Pr(>|t|) column) are significance codes, which in this case represent each individual genre’s significance to the album’s Metacritic score. In other words, the significance codes show which genres are likely to have an impact on an album’s Metacritic score. Any genre with two or three asterisks will significantly influence an album’s Metacritic score; such genres include:
      • Folk
      • Musical
      • Pop
      • Rap
      • R&B
      • Rock
      • Soul

Now, I want to try another model using Metacritic, but this time I will use two independent variables-Artist and Genre-and see if this improves the accuracy of the model:

model2 <- lm(file$Metacritic ~ file$Genre + file$Artist)

summary(model2)

Call:
lm(formula = file$Metacritic ~ file$Genre + file$Artist)

Residuals:
Min 1Q Median 3Q Max
-41.667 0.000 0.000 0.687 47.333

Coefficients: (6 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.167e+01 2.844e+01 -0.762 0.448674
file$GenreChristmas 2.167e+01 3.399e+01 0.637 0.525909
file$GenreCountry 1.502e+01 3.425e+01 0.438 0.662358
file$GenreElectronic -6.473e+01 3.785e+01 -1.710 0.091599 .
file$GenreFolk 1.977e+01 3.549e+01 0.557 0.579287
file$GenreJazz 5.917e+01 3.134e+01 1.888 0.063118 .
file$GenreMovie Soundtrack 2.167e+01 2.150e+01 1.008 0.316958
file$GenreMusical 1.067e+02 3.399e+01 3.138 0.002477 **
file$GenreOpera 7.567e+01 3.399e+01 2.226 0.029188 *
file$GenrePop 1.727e+01 2.719e+01 0.635 0.527498
file$GenreR&B 3.297e+01 2.963e+01 1.112 0.269683
file$GenreRap 2.167e+01 3.134e+01 0.691 0.491590
file$GenreReggae 2.167e+01 3.399e+01 0.637 0.525909
file$GenreRock 9.467e+01 3.399e+01 2.785 0.006858 **
file$GenreSoul 1.427e+01 3.549e+01 0.402 0.688886
file$GenreVarious 2.167e+01 3.876e+01 0.559 0.577891
file$Artist21 Pilots 7.000e+00 2.633e+01 0.266 0.791120
file$Artist21 Savage 8.100e+01 2.280e+01 3.552 0.000683 ***
file$ArtistA Boogie wit da Hoodie -1.858e-13 2.280e+01 0.000 1.000000
file$ArtistAdele 7.940e+01 3.115e+01 2.549 0.012978 *
file$ArtistAlicia Keys -1.130e+01 3.331e+01 -0.339 0.735394
file$ArtistAriana Grande 8.115e+01 2.666e+01 3.044 0.003271 **
file$ArtistBarbara Streisand 5.940e+01 3.115e+01 1.907 0.060612 .
file$ArtistBeyonce 7.720e+01 2.428e+01 3.180 0.002182 **
file$ArtistBillie Eilish 8.640e+01 3.115e+01 2.773 0.007083 **
file$ArtistBlake Shelton 7.065e+01 3.747e+01 1.886 0.063440 .
file$ArtistBob Marley NA NA NA NA
file$ArtistBrantley Gilbert 6.650e+00 3.747e+01 0.177 0.859634
file$ArtistBruno Mars 7.140e+01 2.719e+01 2.626 0.010585 *
file$ArtistBryson Tiller -1.130e+01 3.331e+01 -0.339 0.735394
file$ArtistCamila Cabello 8.240e+01 3.115e+01 2.645 0.010052 *
file$ArtistCardi B 8.400e+01 2.280e+01 3.684 0.000445 ***
file$ArtistCarrie Underwood 6.865e+01 3.508e+01 1.957 0.054280 .
file$ArtistCast of Hamilton NA NA NA NA
file$ArtistChris Brown -1.130e+01 3.331e+01 -0.339 0.735394
file$ArtistChris Stapleton 9.165e+01 3.747e+01 2.446 0.016922 *
file$ArtistColdplay 6.940e+01 3.115e+01 2.228 0.029075 *
file$ArtistDaBaby 1.931e-13 2.280e+01 0.000 1.000000
file$ArtistDaft Punk 1.734e+02 4.079e+01 4.251 6.37e-05 ***
file$ArtistDisney -8.179e-14 2.150e+01 0.000 1.000000
file$ArtistDJ Khaled 6.100e+01 2.280e+01 2.675 0.009266 **
file$ArtistDrake 7.500e+01 1.493e+01 5.024 3.63e-06 ***
file$ArtistDrake & Future 7.000e+01 2.280e+01 3.070 0.003033 **
file$ArtistDreamWorks 8.081e-14 2.633e+01 0.000 1.000000
file$ArtistDuck Dynasty -7.019e-13 2.633e+01 0.000 1.000000
file$ArtistEd Sheeran 6.890e+01 2.824e+01 2.440 0.017179 *
file$ArtistEminem 6.567e+01 1.700e+01 3.864 0.000244 ***
file$ArtistEric Church 8.615e+01 3.508e+01 2.456 0.016503 *
file$ArtistFall Out Boy -1.000e+00 2.633e+01 -0.038 0.969811
file$ArtistFetty Wap 6.800e+01 2.280e+01 2.982 0.003920 **
file$ArtistFlorida Georgia Line 6.650e+00 3.508e+01 0.190 0.850186
file$Artistfun 6.440e+01 3.115e+01 2.067 0.042367 *
file$ArtistFuture 7.350e+01 1.862e+01 3.948 0.000184 ***
file$ArtistG-Eazy 7.400e+01 2.280e+01 3.245 0.001791 **
file$ArtistGoyte -4.000e+00 2.633e+01 -0.152 0.879682
file$ArtistHozier 8.640e+01 3.861e+01 2.238 0.028365 *
file$ArtistHunter Hayes 6.650e+00 3.747e+01 0.177 0.859634
file$ArtistImagine Dragons -2.300e+01 2.280e+01 -1.009 0.316560
file$ArtistJ Cole 7.033e+01 1.700e+01 4.138 9.49e-05 ***
file$ArtistJason Aldean 3.715e+01 3.382e+01 1.098 0.275734
file$ArtistJay-Z 6.000e+01 2.280e+01 2.631 0.010426 *
file$ArtistJohn Legend 6.070e+01 3.331e+01 1.823 0.072583 .
file$ArtistJuice WRLD 3.050e+01 1.862e+01 1.638 0.105806
file$ArtistJustin Bieber 7.040e+01 2.666e+01 2.641 0.010160 *
file$ArtistJustin Timberlake 7.190e+01 2.824e+01 2.546 0.013054 *
file$ArtistKane Brown 6.650e+00 3.747e+01 0.177 0.859634
file$ArtistKanye West 7.500e+01 2.280e+01 3.289 0.001566 **
file$ArtistKanye West & Jay-Z 7.600e+01 2.280e+01 3.333 0.001367 **
file$ArtistKaty Perry 6.090e+01 2.824e+01 2.157 0.034405 *
file$ArtistKelly Clarkson 7.300e+01 2.633e+01 2.773 0.007099 **
file$ArtistKendrick Lamar 9.300e+01 1.862e+01 4.995 4.06e-06 ***
file$ArtistKesha 1.404e+02 4.079e+01 3.442 0.000972 ***
file$ArtistKevin Gates 8.100e+01 2.280e+01 3.552 0.000683 ***
file$ArtistKhalid 1.770e+01 3.059e+01 0.579 0.564710
file$ArtistLady Antebellum 6.965e+01 3.508e+01 1.986 0.050951 .
file$ArtistLady Gaga 7.780e+01 2.428e+01 3.205 0.002025 **
file$ArtistLana Del Rey 6.640e+01 3.115e+01 2.131 0.036524 *
file$ArtistLil Baby 2.779e-14 2.280e+01 0.000 1.000000
file$ArtistLil Baby & Gunna 7.600e+01 2.280e+01 3.333 0.001367 **
file$ArtistLil Uzi Vert 7.500e+01 2.280e+01 3.289 0.001566 **
file$ArtistLil Wayne 6.633e+01 1.700e+01 3.903 0.000214 ***
file$ArtistLionel Richie 7.840e+01 3.115e+01 2.517 0.014113 *
file$ArtistLittle Big Town 8.165e+01 3.747e+01 2.179 0.032638 *
file$ArtistLizzo 8.400e+01 2.280e+01 3.684 0.000445 ***
file$ArtistLMFAO 1.334e+02 4.079e+01 3.270 0.001659 **
file$ArtistLorde 8.340e+01 3.115e+01 2.677 0.009219 **
file$ArtistLuke Bryan 4.832e+01 3.425e+01 1.411 0.162647
file$ArtistLuke Combs 6.650e+00 3.747e+01 0.177 0.859634
file$ArtistMacklemore & Ryan Lewis 7.400e+01 2.280e+01 3.245 0.001791 **
file$ArtistMaroon 5 6.007e+01 2.719e+01 2.209 0.030413 *
file$ArtistMeek Mill 7.700e+01 2.280e+01 3.377 0.001193 **
file$ArtistMeghan Trainor 6.340e+01 3.115e+01 2.035 0.045580 *
file$ArtistMetalica -8.437e-14 2.633e+01 0.000 1.000000
file$ArtistMichael Buble NA NA NA NA
file$ArtistMigos 7.400e+01 1.862e+01 3.975 0.000167 ***
file$ArtistMiley Cyrus 6.540e+01 3.115e+01 2.099 0.039351 *
file$ArtistMiranda Lambert 9.265e+01 3.747e+01 2.473 0.015804 *
file$ArtistMumford & Sons -7.500e+00 2.280e+01 -0.329 0.743190
file$ArtistNicki Minaj 6.900e+01 1.862e+01 3.706 0.000414 ***
file$ArtistOf Monsters and Men 6.790e+01 3.861e+01 1.759 0.082931 .
file$ArtistOne Direction 6.840e+01 2.666e+01 2.566 0.012402 *
file$ArtistOneRepublic -8.000e+00 2.633e+01 -0.304 0.762141
file$ArtistPanic! At the Disco 7.440e+01 3.115e+01 2.388 0.019597 *
file$ArtistPentatonix 2.167e+01 3.134e+01 0.691 0.491590
file$ArtistPharrell 7.140e+01 3.115e+01 2.292 0.024884 *
file$ArtistPhillip Phillips 6.540e+01 3.115e+01 2.099 0.039351 *
file$ArtistPink 8.140e+01 3.115e+01 2.613 0.010953 *
file$ArtistPost Malone 2.550e+01 1.862e+01 1.370 0.175117
file$ArtistQueen 7.000e+01 2.633e+01 2.659 0.009690 **
file$ArtistRihanna 7.440e+01 2.824e+01 2.635 0.010324 *
file$ArtistRIhanna 6.690e+01 2.824e+01 2.369 0.020542 *
file$ArtistRobin Thicke 4.400e+00 3.115e+01 0.141 0.888085
file$ArtistSade 8.640e+01 3.861e+01 2.238 0.028365 *
file$ArtistSam Hunt 6.650e+00 3.747e+01 0.177 0.859634
file$ArtistSam Smith 7.140e+01 2.824e+01 2.529 0.013672 *
file$ArtistScotty McCreery 6.650e+00 3.747e+01 0.177 0.859634
file$ArtistShawn Mendes 4.090e+01 2.824e+01 1.449 0.151874
file$ArtistSia 7.140e+01 3.115e+01 2.292 0.024884 *
file$ArtistSony Pictures -8.989e-14 2.633e+01 0.000 1.000000
file$ArtistSusan Boyle NA NA NA NA
file$ArtistSZA 7.470e+01 3.331e+01 2.243 0.028026 *
file$ArtistTaylor Swift 7.965e+01 2.666e+01 2.988 0.003854 **
file$ArtistThe Band Perry 6.650e+00 3.747e+01 0.177 0.859634
file$ArtistThe Black Eyed Peas 6.440e+01 3.115e+01 2.067 0.042367 *
file$ArtistThe Black Keys 1.000e+01 2.280e+01 0.439 0.662319
file$ArtistThe Lumineers NA NA NA NA
file$ArtistThe Weeknd 5.920e+01 3.059e+01 1.935 0.056961 .
file$ArtistThomas Rhett 6.650e+00 3.747e+01 0.177 0.859634
file$ArtistTravis Scott 7.450e+01 1.862e+01 4.001 0.000153 ***
file$ArtistUniversal Studios 2.167e+01 2.150e+01 1.008 0.316958
file$ArtistUsher 4.570e+01 3.331e+01 1.372 0.174332
file$ArtistVarious 3.005e-14 2.280e+01 0.000 1.000000
file$ArtistWarner Bros. -1.194e-13 2.633e+01 0.000 1.000000
file$ArtistXXXTentacion NA NA NA NA
file$ArtistZac Brown Band 3.032e+01 3.425e+01 0.885 0.379001

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 18.62 on 71 degrees of freedom
Multiple R-squared: 0.8721, Adjusted R-squared: 0.6415
F-statistic: 3.782 on 128 and 71 DF, p-value: 3.626e-09

So how does this model differ from the previous model. Let’s analyze:

  • The residual standard error is smaller than that of the previous model by nearly 8 (assuming you round model 1’s RSE to 27 and model 2’s RSE to 19). In the context of Metacritic scores, 19 is a much smaller RSE than 27, so this helps improve the model’s accuracy.
  • This model’s Multiple R-Squared and Adjusted R-Squared are considerably larger than those of the previous model (87.21% and 64.15%, respectively, while the previous model’s Multiple R-Squared and Adjusted R-Squared were 31.4% and 25.81% respectively). This model’s R-Squared (both multiple and adjusted) imply that there is a strong correlation between an album’s genre and Metacritic score if the album’s artist is factored in to the analysis.
  • Just as with the previous model, the independent variables are divided into sub-categories which encompass all the possible values for that variable. All of the genres listed in the dataset are present, as are all of the artists listed in the dataset. The significance codes are also present here, but this time they are present for both the genre and artist subcategories. Any genre that has either two or three asterisks besides it does significantly affect an album’s Metacritic score-same logic applies for any artists with two or three asterisks beside their name.
    • So, what are the genres that most significantly impact an album’s Metacritic score (this is different from the previous model):
      • Musical
      • Rock
    • Now, which artists are most likely to have a significant impact on an album’s Metacritic score?:
      1. 21 Savage
      2. Ariana Grande
      3. Beyonce
      4. Billie Eilish
      5. Cardi B
      6. Daft Punk
      7. DJ Khaled
      8. Drake
      9. Drake & Future (they did an album together so I listed them both as the artist)
      10. Eminem
      11. Fetty Wap
      12. Future
      13. G-Eazy
      14. J Cole
      15. Kanye West
      16. Kanye West & Jay-Z
      17. Kelly Clarkson
      18. Kendrick Lamar
      19. Kesha
      20. Kevin Gates
      21. Lady Gaga
      22. Lil Baby & Gunna
      23. Lil Uzi Vert
      24. Lil Wayne
      25. Lizzo
      26. LMFAO
      27. Lorde
      28. Macklemore & Ryan Lewis
      29. Meek Mill
      30. Migos
      31. Nicki Minaj
      32. Queen
      33. Taylor Swift
      34. Travis Scott
    • Yes, 34 of the 120 artists listed have a significant impact on an album’s Metacritic score. And 23 of them are rappers (though keep in mind that if two artists made an album/mixtape together, I listed them on the same bullet point).
    • Personally, I think it’s interesting that Queen is on this list. But that could be just because of the 2018 Queen movie Bohemian Rhapsody.
  • Remember how I said to disregard the F-statistic and corresponding p-value when I was analyzing the previous model. Since this linear regression model has two independent variables, the F-statistic and corresponding P-value are important. The f-statistic is a numerical measure of the relationship (or lack thereof) between the dependent variable and any independent variables. However, the F-statistic must be analyzed in conjunction with the P-value in order to get a sense of the independent variables’ relationship with the dependent variable.
    • The concepts of null and alternative hypotheses are important here.
      • The null hypothesis states that the independent variables DON’T have a significant impact on the dependent variable while the alternative hypotheses states the opposite.
    • If the P-value is less than 0.001, you can safely reject the null hypotheses. Since the P-value in this model is much less than 0.001, you can reject the null hypothesis. This means that the combination of an album’s genre and respective artist does impact the album’s Metacritic score.

So, which model is better? After analyzing each model, I’d say model #2 is much better than model #1-and not just because model #2 has two independent variables (though that certainly helps make the model more accurate). The fact that model #2 has a lower RSE, higher Multiple/Adjusted R-Squared, 36 statistically significant subcategories (2 for genre and 34 for artist), and a P-value low enough to reject the null hypothesis all help make model #2 the better model.

Thanks for reading and here’s to lots of great content in 2020,

Michael

MySQL Lesson 10: Creating Views

Advertisements

Hello everybody,

This is Michael, and today’s post will be about creating views. This will be the last post in this series of MySQL lessons, but I’ll likely have more MySQL lessons and/or analyses in the future, so keep following my posts.

Anyway, views are basically virtual tables created from queries that allow you to query the data within them. Here’s an example:

Here, I am building a view from the query, which consists of the block of code between the parentheses. In the query, I selected the foreign key Album_idAlbum and the fields Ceremony and Won/Nominated from the Awards table. I narrowed down the criteria so that Album_idAlbum with keys greater than 30 and less than 58 will be displayed. In other words, I only want the query-and in turn, the view-to include awards considerations for albums released in the 2010s, hence the view’s name 2010s Album Awards.

  • The syntax for views is ALWAYS:
    • CREATE VIEW (name of view) AS (query)
  • Parentheses aren’t necessary when creating views. I just use them to improve readability of the code.

If you press the refresh button on the “Schemas” tab, the view will appear.

You can even query the data in the view. Here’s an example:

Using every field from the view, I asked the query to only display the Album_idAlbum of any album that was considered (whether they won or were just nominated) for a Teen Choice Award, which only applied to one album in this case (Usher’s 2010 album Raymond v. Raymond).

Let’s try another example. First we’ll create a view consisting of any debut albums listed on the database (WHERE Album Number = 1 refers to debut albums), along with the corresponding [Album] Release Date and Singer Name (referring to which singer/group created the album):

  • JOINs are perfectly fine to include in views-honestly anything you can include in a normal query you can include in a view.

Then we hit the refresh button on the “Schema” tab to see if the view was created (which it was).

And finally, let’s query this view:

In this query, I selected the fields [Album] Name and Singer Name from the view and asked the query to only display albums WHERE MONTH(Release Date) = 1 OR MONTH(Release Date) = 2, which refers to albums released in January or February. As you can see, two albums fit the criteria.

Thanks for reading,

Michael

 

MySQL Lesson 9: More Joins

Advertisements

Hello everybody,

My name is Michael, and today’s post will be a continuation of the previous post’s lesson on JOINs. As I mentioned in the previous post, JOINs are queries that join columns from different tables together based on a common column in each table (or in the case of SELF-JOINs, connected columns from the same table-more on that later in this post).

The first type of JOIN I will cover in this post is FULL JOIN, which returns all records from both the left-linked and right-linked tables and fills in NULL for missing matches from either table. Here’s an example:

I selected the fields Name and Release Date from the Album table and the fields Ceremony and Won/Nominated from the Awards table (which I FULL JOINed to the Album table). I then linked the two tables using the idAlbum/Album_idAlbum primary/foreign key. Had there been an album that wasn’t in consideration for an award (for instance Grace VanderWaal’s Just The Beginning), then the Ceremony and Won/Nominated fields corresponding to that album would NULL (Name and Release Dat will still be displayed).

With FULL JOINs, keep in mind that if you don’t filter out results, you could end up with a large result set (this query returned 216 rows of data). Here’s the same FULL JOIN query except with a WHERE clause to filter out results:

This query is similar to the previous one, except that it only shows the rap albums that were considered for awards (and as a result, show only 41 rows of data as opposed to the 216 rows returned from the previous query).

The next type of join I will cover is the SELF JOIN, which basically joins a table to itself. Here’s an example:

I selected the fields Singer Name, Age, and Member of from the Singer table, but there’s a catch. See, because I’m joining a table to itself, I need to give aliases to each of the instances of the table in this query. I went with S* for the Singer table before the JOIN clause and B** for the Singer table after the JOIN clause. As you can see, I had to include the aliases before each field name (S for Singer Name and Age and B for Member of). In my WHERE clause, notice how I mention Member of twice in the line WHERE B.Member of IS NOT NULL AND S.Member of IS NOT NULL. I did this so that only the members of a band are listed alongside the corresponding band/group; when I only included WHERE B.Member of IS NOT NULL, the band name would display alongside itself (e.g. Imagine Dragons was listed as a member of Imagine Dragons).

*for Singer

**for Band (or Group)

Now let’s filter the results further:

It’s the same query as above, except with the addition of the AND S.Death IS NOT NULL, which asks the query to display any deceased singers part of groups. As you can see, only 1 singer in the database meets this criteria (Linkin Park’s Chester Bennington, who died on July 20, 2017).

Let’s try a different filter to the query:

This is similar to the first SELF JOIN query I did, except the filter asks the query to only display band singers under 40 years old (hence the S.Age < 40 line), of which there are 10.

The last JOIN I will cover in this post is the UNION query, which, even though it does not have JOIN in its name, still serves a function similar to JOINs (except entire queries are being joined together, not just tables).

Here’s an example of a UNION query:

In each individual query, I selected the fields Track Name and Track Duration from the Tracks table and joined the two queries together with a UNION clause. The one difference between the two queries is their WHERE clauses; the top query retrieves the name of the longest song (Empire of the Clouds by Iron Maiden at a whopping 18 minutes and 1 second) and the bottom query retrieves the name of the shortest song (Intro by 50 Cent at a mere 6 seconds).

Some things to keep in mind with UNION queries include:

  • UNIONs only work with the same number of columns in both queries (I selected two columns in both the top and bottom queries)
  • They work best when the columns selected in each query-along with any aliases- and the order in which the columns/aliases are listed are identical. Though there should be at least one difference between the two queries, as I have in my query (the WHERE clause has MAX for the top query and MIN for the bottom one).

Here’s another slightly more complex UNION query

In each query, I selected the foreign key Album_idAlbum and the aggregate function COUNT(Album_idAlbum)-using the alias Awards Considerations-from the Awards table. I also remember to GROUP BY the same thing in each table (the Album_idAlbum foreign key). The one difference between each query is the WHERE clause, as I selected Album_idAlbum 34 for the top query and Album_idAlbum 48 for the bottom one (both are rock albums, with 34 being Mylo Xyloto by Coldplay and 48 being Smoke+Mirrors by Imagine Dragons). As you can see, Mylo Xyloto was considered for 8 awards while Smoke+Mirrors was only considered for 1 award.

Before I go, here’s a spreadsheet with the modified Singer table (couldn’t save SQL files on WordPress)-Singer table modified.

Thanks for reading,

Michael

MySQL Lesson 8: Joins

Advertisements

Hello everybody,

It’s Michael, and this post (and the next) will be covering joins in MySQL. Joins are essentially queries that are meant to retrieve data from several tables rather than just a single table. I’ll take two posts to cover joins because there are several different types of joins.

But first, here’s a slight modification to the database, which I felt would be appropriate when learning about joins:

The two new fields included are:

  • Singer_idSinger-Ok, this is more of a foreign key, but it indicates the band/group the singer belongs to (e.g. Dan Reynolds is a part of the band Imagine Dragons). For individual singers, the value in this field will be the same as the value in the idSinger primary key.
  • Member of-What band (if any) a singer is part of. If the singer isn’t part of a band, then this field will be NULL.

The arrow on top of the singer table indicates a unary one-to-many relationship. Unary basically means the table relates to itself (which is perfectly fine by MySQL standards); the one-to-many in this case means that a band must have MANY (at least 2) singers while each singer must belong to only ONE band.

This type of relationship works best for SELF JOIN queries, which are queries that join a table to itself. I’ll cover them more in the next post.

The first type of join I will focus on is an INNER JOIN. Here’s a query using INNER JOIN:

INNER JOIN queries basically join columns from different tables using values common to both tables (which are usually foreign/primary keys); results are returned only if a row in both tables matches the JOIN condition. In this case; the common column is Singer_idSinger-the Singer and Album tables both have this column.

  • You don’t have to display the common column in the query results; just remember to include it in the ON clause. This goes for all JOIN queries.

I first selected the field Singer Name from the Singer table and the fields Name and Release Date from the Album table (which is INNER JOINed to the Singer table). I then linked both tables through the Singer_idSinger primary/foreign key (which is the common column among both tables). I finally narrowed down the criteria to only include albums WHERE Release Date > Death so that only posthumous albums (along with corresponding singer name and release dates) are displayed.

  • INNER JOIN queries will work the same without the word INNER, as seen below:

The next type of JOIN I will cover is a LEFT JOIN, which basically goes:

  • SELECT (columns) FROM (table A) LEFT JOIN (table B) ON (table A.column) = (table B.column)

Like INNER JOIN queries, LEFT JOIN queries link tables based on common columns. The difference between these two types of queries are that INNER JOIN queries retrieve results that have matching values in both tables, while LEFT JOIN queries retrieve all results from the left-linked table (table A) and any matching results from the right-linked table (table B).  If there are no matching results from the right-linked table, NULL will be displayed on any rows connected to the right-linked table (this is not the case with INNER JOINS).

Here’s an example of a LEFT JOIN query:

In this query, I am selecting the fields Track Name from the Tracks table and Featured Artist Name from the Featured Artist table (which is LEFT JOINed to the Tracks table). I filtered out the results so that only tracks WHERE Album_idAlbum = 44  (44 is the primary key for Linkin Park’s The Hunting Party if you were wondering) are displayed. The query will display all the records from the left-linked table (in other words, every track from The Hunting Party) along with matching records from the right-linked table (meaning if there are any guest artists on a track). For tracks without guest artists, NULL is displayed on the right side rows.

The last of the joins I will cover is the RIGHT JOIN, which is basically the opposite of LEFT JOIN because the query retrieves all results from the right-linked table (table B) and any matching results from the left-linked table (table A). Should there be no matching results in the left-linked table, NULL will be displayed on any rows connected to the left-linked table (so basically the opposite of what happens in a LEFT JOIN query).

Here’s an example of a RIGHT JOIN query:

In this query, I selected the field Name from the Album table and the field Won/Nominated from the Awards table (which is RIGHT JOINed to the Album table). I then used the idAlbum/Album_idAlbum (they’re the same thing) primary/foreign keys to link the two tables. I also narrowed down the criteria so as to only display albums WHERE Won/Nominated = "Won" AND Ceremony = "Grammys" (referring to album that won Grammys). The query will display all results from the right-linked table (any Grammy wins) and any matching records from the left-linked table (the names of albums that won Grammys).

  • For every result from the right-linked table (Awards), there are matching results from the left-linked table (Album). Had this not been the case, you would be seeing NULL in rows linked to the Name column.

Thanks for reading, and be sure to check out the next post with more JOINs.

Michael

 

 

MySQL Lesson 7: Subqueries

Advertisements

Hello everybody,

It’s Michael, and today’s post will be about subqueries. Subqueries are basically queries-within-queries; think mini-queries within a larger query, and you’ve got the idea. The purpose of subqueries is to retrieve data for the main query that will be used as a condition to filter the query output. You can usually find subqueries in three types of clauses (SELECT, FROM, and WHERE) and inside other subqueries.

Here’s a simple example with a WHERE clause:

In this query, I selected the fields Singer Name and Death (referring to a singer’s date of death) from the Singer table. The line WHERE Death = (SELECT MAX(Death) FROM mydb.Singer) asks the query to select and display the maximum, or most recent, date of death listed in the database (along with displaying the corresponding singer name). As you can see, the most recent date of death in the database is June 18, 2018, which corresponds to the rapper XXXTentacion.

  • Even though MAX is in the subquery, GROUP BY is not necessary. Don’t worry about GROUP BY when including an aggregate function (COUNT, MAX, MIN, etc.) in your subquery.

Here’s a slightly more complicated example:

In this query, I selected the Singer Name, Age, and Gender fields from the Singer table. The subquery in Line 2 asks the query to only display records for singers within a certain age range (18-36, essentially any of the millennials). Line 3 simply orders the results from oldest to youngest singer.

Let’s try another subquery, except this time using the SELECT clause:

The subquery in the select clause-SELECT MAX(Track Duration) FROM mydb.Tracks-tells the main query to display the length of the longest song listed on this table as a column; the result is 18 minutes and 1 second. For those wondering, the song “Empire of the Clouds” from Iron Maiden’s The Book of Souls is the longest song listed on this table.

  • SELECT DISTINCT ensures that the maximum track duration only appears once, otherwise it will appear 800 times (this happened when I first ran the query without the DISTINCT keyword).

Here’s a subquery with the FROM clause:

In this query, I am trying to find out the average song length for songs on each of the 57 albums listed on the database. The subquery `SELECT Album_idAlbum, SEC_TO_TIME(AVG(Track Duration)) AS Average Song Length FROM mydb.Tracks
GROUP BY Album_idAlbum ORDER BY Average Song Length ASC` basically functions as the whole query, as SELECT * asks the query to select everything that matches the criteria listed in the FROM clause subquery.

In the subquery, I selected the Album_idAlbum foreign key and the (average) Track Duration field from the Tracks Table. I then group the results by Album and order the results by Average Song Length in ascending order (from albums with the shortest average track length to albums with the longest average track length).

  • SEC_TO_TIME displays the average track for each album in hour:minute:second form, because without this function the time displays as a decimal.

One last thing I wanted to mention with regards to subqueries is that it’s possible to have subqueries-within-subqueries. In these cases, the innermost subquery will run first, followed by the outer subquery, and finally the main query. Here’s an example:

In this query, I am trying to find out which rap album has the longest duration. I first select the Name, Release Date, and Duration fields from the Album table. The inner subquery-SELECT Genre FROM mydb.Album WHERE Genre IN (RAP)-retrieves all the rap albums from the database. The outer subquery-SELECT MAX(Duration) FROM mydb.Album WHERE Genre IN (inner subquery)-retrieves the rap album with the longest duration. WHERE Duration = (outer subquery(inner subquery)) tells the query to display the rap album (and corresponding release date) with the longest duration (which is Tupac’s album Until The End of Time).

Thanks for reading,

Michael

 

 

 

 

 

MySQL Lesson 6: The WHERE Clause

Advertisements

Hello everybody,

It’s Michael, and today’s post will focus on the WHERE clause in MySQL. For those that don’t know, the WHERE clause basically acts like an IF statement in the sense that it tells the query to only selects rows WHERE certain criteria are met.

OK, so I know I mentioned in MySQL Lesson 4 that the HAVING clause also acts like an if statement. Well, here are some differences between HAVING and WHERE:

  • HAVING is used to filter grouped records (think of the queries I posted in MySQL Lesson 4) while WHERE is used to filter individual records
  • HAVING always goes with GROUP BY while WHERE never goes with GROUP BY; this is because WHERE does not filter aggregate records while HAVING does.

Now, let’s try out a simple WHERE query

In this query, I selected the fields Singer Name, Age, and Gender from the Singer table. The WHERE Age < 30 line asks the query to select only those singers who are under 30 years old, of which there are 5 (Taylor Swift, Soulja Boy, Lorde, Ariana Grande, and Grace VanderWaal).

Let’s try another WHERE query, this time with a logical operator (more on that below)

In this query, I selected the Ceremony, Won/Nominated fields and the Album_idAlbum foreign key from the Awards table. The WHERE Album_idAlbum = 29 OR Album_idAlbum = 53 line asks the query to only display the results corresponding to awards that albums FK29* OR FK53* were considered for  (both FK29 and FK53 are Jay-Z albums, with FK29 being The Blueprint 3 and FK53 being 4:44). As you can see, The Blueprint 3 has two wins out of 3 nominations while 4:44 has 7 nominations.

*FK means having the foreign key, so FK29 means “album with foreign key 29” and FK53 means “album with foreign key 53”

For those wondering what the point of logical operators is, they basically supplement the WHERE clause by setting specific criteria to filter out records. Here are the three most common:

  • AND-select records matching (condition 1) AND (condition 2)
  • OR-select records matching either (condition 1) OR (condition 2)
  • NOT-do not select records matching (condition 1)

You can use all three in the same WHERE statement, just remember that, barring parentheses, NOT will be the first thing read by MySQL, then AND, followed by OR. Portions of the WHERE statement in parentheses are always read first, similar to the order of operations (remember PEMDAS?)

Let’s do another WHERE query, this time working with dates

In this query, I selected the Name and Release Date fields from the Album table. The WHERE MONTH(Release Date) = 12 OR MONTH(Release Date) = 1 OR MONTH(Release Date) = 2 line asks the query to only display the names of albums that were released in the winter months of December, January, or February (represented by 12, 1, and 2, respectively). As you can see, 6 albums meet the criteria

When dealing with dates in a WHERE query, you can either include the whole date (eg. 2011-01-01) or just the YEAR, MONTH, or DAY portion, The correct syntax would be YEAR(field), MONTH(field) or DAY(field), depending on what portion of the date you want to include in your query.

  • If you’re dealing with MONTH, remember that months are always represented by the numbers 1-12 (January through December, respectively). Typing in the name of the month usually won’t work for querying.

Sometimes, including the whole date or part of the date won’t make a difference in the results displayed, as shown by these two queries

Each query selects the Name and Release Date fields from the Album table and sorts the results by release date in descending order (starting from the most recently album and then going further back in time). But as you can see, the WHERE line is different for each query. The first query displays WHERE Release Date > 2009-12-31 while the second query displays WHERE YEAR(Release Date) >= 2010. These queries are examples off how including part of the date or the whole date doesn’t make a difference in the output, as both queries select albums released in the 2010s.

  • Keep in mind that there are some times that selecting the whole date or part of the date does make a difference in query output.

Now let’s try another WHERE query, this time using string patterns.

In this query, I selected the fields Singer Name and Birthplace from the Singer table. The WHERE Birthplace LIKE '%US' line asks the query only to display the names (and birthplaces) of singers who were born in the US.

You might be wondering what the ‘%US’ means. It basically tells the query to select rows in the Birthplace field that end in ‘US’. Some possible variations of this string pattern include:

  • ‘US%’-select all birthplaces that start with ‘US’
  • ‘%US%’-select all birthplaces that have ‘US’ anywhere in the name
  • ‘_US%’-select all birthplaces that have the letters U and S in the second and third positions, respectively
  • ‘US_%_%_%’-select all birthplaces that start with ‘US’ and are at least 4 characters long
  • ‘U%S’-select all birthplaces that start with U and end with S

Remember to always use single quotes (‘ ‘) not back-ticks (“) when dealing with string patterns.

One more thing I wanted to note that I mentioned earlier in the differences between the WHERE and HAVING clauses. Now I know I said that the WHERE clause deals with individual records while the HAVING clause deals with grouped records. One thing I did not mention is that the WHERE and HAVING clauses can be used in the same query. Here’s an example:

In this query I selected the field Featured Artist Name from the Featured Artist table. I also added COUNT(Featured Artist Name) in order to display the amount of times each singer appears on the featured artist table. The WHERE Featured Artist Age < 40 line asks the query only to display the names of featured artists under the age of 40. Lines 3-5 of the query group the results by Featured Artist Name and add the additional criteria of only displaying the names of featured artists that appear at least twice on the table (in addition to being under 40 years old); the results are then ordered based on number of appearances (from most to least).

Here are some things you should know about joint WHERE/HAVING queries:

  • WHERE always comes before HAVING in a query; this is because WHERE deals with data before it has been aggregated and HAVING deals with data after it has been aggregated
  • On a similar note, do not forget the GROUP BY clause in the query, as that must always be included before the HAVING clause
    • This is because the data has to be grouped before it can be filtered further

Thanks for reading,

Michael

MySQL Lesson 4: GROUP BY, HAVING & COUNT

Advertisements

Hello everybody,

Now that I’ve run through the basics of querying, I figured the next functions to work with should be the GROUP BY, HAVING, and COUNT functions. I will cover all three of these functions in this post because I feel they perfectly compliment one another (and you can use all three in the same query).

I’ll begin with a simple GROUP BY query.

As you can see here, I selected the Genre column from the Album table and asked the query to group by Genre. What the query does is group each of the rows by genre; multiple rows with the same genre are grouped together.

Here’s another GROUP BY query, this time using the Singer table.

The results of this query are grouped by Gender (F for female, M for male, and NULL for bands). Multiple rows with the same gender are grouped together.

Now let’s run the first query again with the addition of the COUNT function.

This basically does the same thing as the first query, except it displays how many times each genre appears in the table (that’s the point of including COUNT(Genre)). As you can see, Pop appears the most times (16 of the 57 albums), followed by Rap (15 albums), and Rock (10 albums).

Here’s the second query, also with the addition of the COUNT function

Just like the second query, the results are grouped by gender, except this time the query also counts how many times each gender appears in the table. In other words, this query shows how many female and male singers are listed on this table (17 and 21 respectively). As for why 0 is listed for NULL, MySQL didn’t count the NULL rows in the Gender column (in case you were wondering, there are 10 NULL columns-or 10 bands-listed on this table).

Now let’s run the third query again, this time with the addition of the HAVING function.

The HAVING function is essentially an if statement; it tells the query only select those results that meet certain criteria. In this case, the query will only select genres with COUNT(Genre) greater than 5, meaning that the genre appears more than 5 times on the table. In this case, pop, rap, and rock meet the criteria (appearing 16, 15, and 10 times respectively)

  • Syntax tip to keep in mind-HAVING always comes after GROUP BY.

Let’s run the fourth query again, this time with the addition of the COUNT function.

Just like the query above, only results that meet a certain criteria are selected. In this case, we only want the results where Gender = ‘F’ (meaning how many female singers are listed on this table). As you can see, there are 17 female singers listed.

  • HAVING doesn’t just need to include numerical criteria (such as COUNT(column_name)) but can also include non-numerical criteria as well (as can be seen above)

By the way, here’s the Excel workbook with all of the spreadsheets that contain the information used in this database-MySQL database (blog). I tried uploading the SQL file but was unable to do so.

Thanks for reading,

Michael

 

MySQL Lesson 3: Intro to Querying

Advertisements

Hello everybody,

The MySQL database I wrote for MySQL Lesson 1 & MySQL Lesson 2 is finally ready for querying (and this series of querying posts). Querying, for those that don’t know, is basically asking the database a question and using SQL to find the answer.

First off, let’s start with the most basic query-SELECT * FROM (table). The asterisk means “everything”, as in “Select everything from a certain table”.

In this case, I wrote SELECT * FROM mydb.Singer, which means I would select every record and every column from the Singer table (remember mydb is the name of the schema).

Let’s try that same query again, except with the album table.

Just as with the previous query, every record and every column are selected from this table.

Now, once again using the album table, here’s a query where only certain columns are selected.

For this query, I selected the columns name, duration, and release date, which means my query will only display every record for these three columns.

  • At first my query kept giving me a syntax error message for the Release Date field, so I just put the name of the field in back-ticks; these errors usually happen with fields that have spaces in their names (like Release Date) because MySQL isn’t sure whether the word Date is part of the field name or a separate data type. Back-ticks usually solve this issue.

As you can see, I selected the Singer Name, Age, and Gender columns from the Singer table; thus, only the records for those three columns will be displayed.

Thanks for reading,

Michael

MySQL Lesson 1: Building an ER Diagram

Advertisements

Hello everybody,

This is Michael, and as I mentioned in the last post, I will start building the database that I will be using for this series of posts.

The database will store information about 54 albums (3 for each year from 2000 to 2018) such as track listings, artists, featured artists on certain tracks, genre, release year, duration (of album and individual track), etc.

In the previous post, I did mention that MySQL is meant for query-based analysis. However, before beginning to do queries, we must create our database. As the title explains, this post will focus on the creation of an ER (entity-relationship) diagram. An ER diagram is a graphical representation of items in a database (in this case, albums, track listings, artists, etc.) and how they are related to each other (like how albums can have several track listings).

So, without further hesitation, here is the ER diagram for the database I will be using.

Now you may be confused by all of the arrows and tables in the diagram. Here’s an explanation.

  • This ER diagram will represent the relationship between albums and singers, songs on the album, featured artists, as well as any awards the album either won or received nominations.
  • How do all of these tables relate to each other? Here’s how.
    • Each album must have several tracks, while each track belongs to one and only one album (the thing that looks like a three-pointed arrow means “many” while the thing with two vertical lines represents “one”)
    • Each track can have several, one, or no feature artists (that’s why you see a circle) but each featured artist must belong to one and only track.
    • Each singer can appear more than once (if they have several albums in the database) or just once but each album must correspond to one and only one singer.
    • Each album can be nominated for one or several awards (eg. Grammys, MTV VMAs, etc.) but each award must correspond to only one album.
      • You’ll notice that this is the only dotted line in the diagram. This is because the relationship between album and awards is non-identifying, meaning that you can identify the award based on idAwards field alone, without needing the album field for identification.
      • As for the rest of the relationships (which are known as identifying relationships), each table is dependent on the other table for identification
        • For example, you can’t identify a featured artist without knowing what track they appear on. Likewise, you can’t identify a song without knowing what album it is a part of. Nor can you identify an album without knowing which singer/group created it.

Now what about the attributes in each table (those are the things with diamonds right by them)? Here’s what each of them mean.

  • The album field contains the attributes
    • Name-the name of the album
    • Duration-how long the album is (given in hours:minutes:seconds)
    • Release Year-the year the album came out
    • Album number-how many albums has the artist made up through that point; in other words, is this the artist’s 1st album? 4th? 5th?
    • Genre-the genre of the album
  • The singer field contains the attributes
    • Singer Name-the singer’s (or group’s) name
    • Age-the singer’s age as of August 1, 2018 (if they are still living)
    • Birthplace-the singer’s birthplace (or where the group was formed)
    • Date of Death-the date the artist died
      • You’ll notice the attributes age and date of death have white diamonds right by them; this is because each of them can be null (have no value). For instance, the date of death field can stay blank for living artists. For the other attributes that have blue diamonds besides them, they have to have some sort of value (can’t be null in other words).
  • The tracks field contains the attributes
    • Track Name-the song’s name
    • Track Duration-the length of the song (given in hours:minutes:seconds)
  • The featured artist field contains the attribute
    • Featured artist name-the name of any artist who appears on a particular track
  • The awards field contains the attributes
    • Ceremony-the ceremony where the album either was nominated for or won an award (Grammys, Billboard Music Awards, etc.)
    • Ceremony Year-the year of the ceremony where the album either got nominated for or won an award
    • Won/Nominated-whether an album won or was nominated for a particular award

But wait, what are those keys right by some of the attributes? Those are called primary keys, which are one or more columns with data used to uniquely identify each row in the table. Primary keys are usually stored in auto-incrementing indexes (starting with 1, then 2, then 3, and so on) to ensure uniqueness. For example, in the album table, 1 would be the primary key for the first album in the database, then 2, then 3, all the way to 54.

Take this part of the diagram:

 

idSinger and idAlbum are both primary keys in their respective tables. But wait, why does Singer_idSinger1 appear in the album table? That is because Singer_idSinger1 is a foreign key, which is a column or set of columns in a table that refers to the primary key in another table-which would be the primary key for the singer table. Foreign keys basically serve as a means to connect the referencing table (album) with the referenced table (singer).

If you want to know how primary keys and foreign keys differ from each other, here’s a handy table (Source-https://www.essentialsql.com/what-is-the-difference-between-a-primary-key-and-a-foreign-key/)

That’s all for now. Thanks for reading,

Michael