MySQL Lesson 7: Subqueries

Advertisements

Hello everybody,

It’s Michael, and today’s post will be about subqueries. Subqueries are basically queries-within-queries; think mini-queries within a larger query, and you’ve got the idea. The purpose of subqueries is to retrieve data for the main query that will be used as a condition to filter the query output. You can usually find subqueries in three types of clauses (SELECT, FROM, and WHERE) and inside other subqueries.

Here’s a simple example with a WHERE clause:

22Sep capture

In this query, I selected the fields Singer Name and Death (referring to a singer’s date of death) from the Singer table. The line WHERE Death = (SELECT MAX(Death) FROM mydb.Singer) asks the query to select and display the maximum, or most recent, date of death listed in the database (along with displaying the corresponding singer name). As you can see, the most recent date of death in the database is June 18, 2018, which corresponds to the rapper XXXTentacion.

  • Even though MAX is in the subquery, GROUP BY is not necessary. Don’t worry about GROUP BY when including an aggregate function (COUNT, MAX, MIN, etc.) in your subquery.

Here’s a slightly more complicated example:

In this query, I selected the Singer Name, Age, and Gender fields from the Singer table. The subquery in Line 2 asks the query to only display records for singers within a certain age range (18-36, essentially any of the millennials). Line 3 simply orders the results from oldest to youngest singer.

Let’s try another subquery, except this time using the SELECT clause:

The subquery in the select clause-SELECT MAX(Track Duration) FROM mydb.Tracks-tells the main query to display the length of the longest song listed on this table as a column; the result is 18 minutes and 1 second. For those wondering, the song “Empire of the Clouds” from Iron Maiden’s The Book of Souls is the longest song listed on this table.

  • SELECT DISTINCT ensures that the maximum track duration only appears once, otherwise it will appear 800 times (this happened when I first ran the query without the DISTINCT keyword).

Here’s a subquery with the FROM clause:

In this query, I am trying to find out the average song length for songs on each of the 57 albums listed on the database. The subquery `SELECT Album_idAlbum, SEC_TO_TIME(AVG(Track Duration)) AS Average Song Length FROM mydb.Tracks
GROUP BY Album_idAlbum ORDER BY Average Song Length ASC` basically functions as the whole query, as SELECT * asks the query to select everything that matches the criteria listed in the FROM clause subquery.

In the subquery, I selected the Album_idAlbum foreign key and the (average) Track Duration field from the Tracks Table. I then group the results by Album and order the results by Average Song Length in ascending order (from albums with the shortest average track length to albums with the longest average track length).

  • SEC_TO_TIME displays the average track for each album in hour:minute:second form, because without this function the time displays as a decimal.

One last thing I wanted to mention with regards to subqueries is that it’s possible to have subqueries-within-subqueries. In these cases, the innermost subquery will run first, followed by the outer subquery, and finally the main query. Here’s an example:

In this query, I am trying to find out which rap album has the longest duration. I first select the Name, Release Date, and Duration fields from the Album table. The inner subquery-SELECT Genre FROM mydb.Album WHERE Genre IN (RAP)-retrieves all the rap albums from the database. The outer subquery-SELECT MAX(Duration) FROM mydb.Album WHERE Genre IN (inner subquery)-retrieves the rap album with the longest duration. WHERE Duration = (outer subquery(inner subquery)) tells the query to display the rap album (and corresponding release date) with the longest duration (which is Tupac’s album Until The End of Time).

Thanks for reading,

Michael

 

 

 

 

 

MySQL Lesson 6: The WHERE Clause

Advertisements

Hello everybody,

It’s Michael, and today’s post will focus on the WHERE clause in MySQL. For those that don’t know, the WHERE clause basically acts like an IF statement in the sense that it tells the query to only selects rows WHERE certain criteria are met.

OK, so I know I mentioned in MySQL Lesson 4 that the HAVING clause also acts like an if statement. Well, here are some differences between HAVING and WHERE:

  • HAVING is used to filter grouped records (think of the queries I posted in MySQL Lesson 4) while WHERE is used to filter individual records
  • HAVING always goes with GROUP BY while WHERE never goes with GROUP BY; this is because WHERE does not filter aggregate records while HAVING does.

Now, let’s try out a simple WHERE query

In this query, I selected the fields Singer Name, Age, and Gender from the Singer table. The WHERE Age < 30 line asks the query to select only those singers who are under 30 years old, of which there are 5 (Taylor Swift, Soulja Boy, Lorde, Ariana Grande, and Grace VanderWaal).

Let’s try another WHERE query, this time with a logical operator (more on that below)

In this query, I selected the Ceremony, Won/Nominated fields and the Album_idAlbum foreign key from the Awards table. The WHERE Album_idAlbum = 29 OR Album_idAlbum = 53 line asks the query to only display the results corresponding to awards that albums FK29* OR FK53* were considered for  (both FK29 and FK53 are Jay-Z albums, with FK29 being The Blueprint 3 and FK53 being 4:44). As you can see, The Blueprint 3 has two wins out of 3 nominations while 4:44 has 7 nominations.

*FK means having the foreign key, so FK29 means “album with foreign key 29” and FK53 means “album with foreign key 53”

For those wondering what the point of logical operators is, they basically supplement the WHERE clause by setting specific criteria to filter out records. Here are the three most common:

  • AND-select records matching (condition 1) AND (condition 2)
  • OR-select records matching either (condition 1) OR (condition 2)
  • NOT-do not select records matching (condition 1)

You can use all three in the same WHERE statement, just remember that, barring parentheses, NOT will be the first thing read by MySQL, then AND, followed by OR. Portions of the WHERE statement in parentheses are always read first, similar to the order of operations (remember PEMDAS?)

Let’s do another WHERE query, this time working with dates

In this query, I selected the Name and Release Date fields from the Album table. The WHERE MONTH(Release Date) = 12 OR MONTH(Release Date) = 1 OR MONTH(Release Date) = 2 line asks the query to only display the names of albums that were released in the winter months of December, January, or February (represented by 12, 1, and 2, respectively). As you can see, 6 albums meet the criteria

When dealing with dates in a WHERE query, you can either include the whole date (eg. 2011-01-01) or just the YEAR, MONTH, or DAY portion, The correct syntax would be YEAR(field), MONTH(field) or DAY(field), depending on what portion of the date you want to include in your query.

  • If you’re dealing with MONTH, remember that months are always represented by the numbers 1-12 (January through December, respectively). Typing in the name of the month usually won’t work for querying.

Sometimes, including the whole date or part of the date won’t make a difference in the results displayed, as shown by these two queries

Each query selects the Name and Release Date fields from the Album table and sorts the results by release date in descending order (starting from the most recently album and then going further back in time). But as you can see, the WHERE line is different for each query. The first query displays WHERE Release Date > 2009-12-31 while the second query displays WHERE YEAR(Release Date) >= 2010. These queries are examples off how including part of the date or the whole date doesn’t make a difference in the output, as both queries select albums released in the 2010s.

  • Keep in mind that there are some times that selecting the whole date or part of the date does make a difference in query output.

Now let’s try another WHERE query, this time using string patterns.

In this query, I selected the fields Singer Name and Birthplace from the Singer table. The WHERE Birthplace LIKE '%US' line asks the query only to display the names (and birthplaces) of singers who were born in the US.

You might be wondering what the ‘%US’ means. It basically tells the query to select rows in the Birthplace field that end in ‘US’. Some possible variations of this string pattern include:

  • ‘US%’-select all birthplaces that start with ‘US’
  • ‘%US%’-select all birthplaces that have ‘US’ anywhere in the name
  • ‘_US%’-select all birthplaces that have the letters U and S in the second and third positions, respectively
  • ‘US_%_%_%’-select all birthplaces that start with ‘US’ and are at least 4 characters long
  • ‘U%S’-select all birthplaces that start with U and end with S

Remember to always use single quotes (‘ ‘) not back-ticks (“) when dealing with string patterns.

One more thing I wanted to note that I mentioned earlier in the differences between the WHERE and HAVING clauses. Now I know I said that the WHERE clause deals with individual records while the HAVING clause deals with grouped records. One thing I did not mention is that the WHERE and HAVING clauses can be used in the same query. Here’s an example:

In this query I selected the field Featured Artist Name from the Featured Artist table. I also added COUNT(Featured Artist Name) in order to display the amount of times each singer appears on the featured artist table. The WHERE Featured Artist Age < 40 line asks the query only to display the names of featured artists under the age of 40. Lines 3-5 of the query group the results by Featured Artist Name and add the additional criteria of only displaying the names of featured artists that appear at least twice on the table (in addition to being under 40 years old); the results are then ordered based on number of appearances (from most to least).

Here are some things you should know about joint WHERE/HAVING queries:

  • WHERE always comes before HAVING in a query; this is because WHERE deals with data before it has been aggregated and HAVING deals with data after it has been aggregated
  • On a similar note, do not forget the GROUP BY clause in the query, as that must always be included before the HAVING clause
    • This is because the data has to be grouped before it can be filtered further

Thanks for reading,

Michael

MySQL Lesson 5: More Aggregate Functions, Aliases & Sorting Query Results

Advertisements

Hello everybody,

It’s Michael, and today’s post will be covering more aggregate functions along with aliases and sorting query results (using the same database).

I already covered one aggregate function-COUNT-in the previous post. This time I’ll be covering a few more aggregate functions-AVG, SUM, MAX, MIN. For those unfamiliar, aggregate functions perform a calculation on a set of values and return a single value.

Let’s start with a query using the AVG function.

In this query, I selected the fields Gender and Age from the Singer table and grouped my results by Gender. The AVG(Age) basically calculates the average age of the group of male singers as well as the average age of female singers (results for each gender are displayed separately). As you can see, the average age of female singers on this database is 35.75 years, while the average age for male singers is 42.2143. The AVG function will ignore any null results, so bands/groups and deceased singers didn’t factor into this query, as the Age field is null in those cases.

Let’s try a similar query (by selecting both average age and gender) except using the Featured Artist table this time.

Similar to the previous query, the results are grouped by gender and the average age is calculated for each gender group. In this case, the average age for female featured artists in 38.35 years and for male featured artists it is 41.9304 years. Also, similar to the previous query, null results are ignored, so deceased artists and bands/groups did not factor into the calculation of the results.

However, you may have noticed something different at this portion of the query

  • AVG(Featured Artist Age) AS Average Age

The AS keyword signifies an alias, which basically gives the column a different name in order to improve the query’s readability. Aliases are beneficial because to those unfamiliar with MySQL jargon, column names like AVG(Featured Artist Age) aren’t easy to understand due to their technical nature. Thus, column aliases like Average Age make query results understandable for anybody.

  • You can also use aliases for table names too.

Now let’s introduce another aggregate function-MAX.

In this query, I selected Genre and Release Date from the Album table. The part of the query with MAX(Release Date) AS Most Recent Album will display the maximum release date (which means the date of the most recently released album) for each genre listed in the table.

However, you may have noticed another new concept in this query-the ORDER BY clause.ORDER BY basically sorts the results in either ascending or descending order (indicated by ASC and DESC respectively). In this query, the dates (and in turn, the genres) are sorted in descending order, which goes from the maximum release date (June 29, 2018) to the farthest back date (September 29, 2009).

  • The ORDER BY clause works with both numerical and non-numerical data. If you’re dealing with non-numerical data such as a list of names, ASC and DESC will sort your data in alphabetical and reverse alphabetical order respectively.

Oh, and in case you were wondering which albums correspond to which dates, here’s a rundown

  • September 29, 2009-Barbara Streisand-Love is the Answer
  • May 23, 2011-Lady Gaga-Born This Way
  • November 19, 2012-Rihanna-Unapologetic
  • September 13, 2013-Avicii-True
  • June 13, 2014-Linkin Park-The Hunting Party
  • January 8, 2016-David Bowie-Blackstar
  • April 29, 2016-Martina McBride-Reckless
  • November 3, 2017-Grace VanderWaal-Just The Beginning
  • June 15, 2018-Mike Shinoda-Post Traumatic
  • June 29, 2018-Gorrillaz-The Now Now

Now here’s the next aggregate function I will cover-MIN.

In this query, I selected the Album_idAlbum foreign key and Track Duration field from the Tracks table. TheMIN(Track Duration) AS Shortest Song portion selects the duration of the shortest song in each album (only the ID number for the album is displayed). If you’re wondering which duration corresponds to which song, look at the spreadsheet with the Tracks information I provided in MySQL Lesson 4.

Here’s a query with the last of the four aggregate functions I will cover-SUM.

This query is similar to the first one because I chose the Gender and Age fields from the Album table and grouped the results by Gender, except this time I used the SUM function instead of the AVG function. In this query, the sum function displays the total age for male singers and female singers in separate rows (572 for female singers and 591 for male singers). Remember that null records (representing bands and deceased singers) are not factored into the query’s calculations.

Thanks for reading,

Michael

MySQL Lesson 4: GROUP BY, HAVING & COUNT

Advertisements

Hello everybody,

Now that I’ve run through the basics of querying, I figured the next functions to work with should be the GROUP BY, HAVING, and COUNT functions. I will cover all three of these functions in this post because I feel they perfectly compliment one another (and you can use all three in the same query).

I’ll begin with a simple GROUP BY query.

As you can see here, I selected the Genre column from the Album table and asked the query to group by Genre. What the query does is group each of the rows by genre; multiple rows with the same genre are grouped together.

Here’s another GROUP BY query, this time using the Singer table.

The results of this query are grouped by Gender (F for female, M for male, and NULL for bands). Multiple rows with the same gender are grouped together.

Now let’s run the first query again with the addition of the COUNT function.

This basically does the same thing as the first query, except it displays how many times each genre appears in the table (that’s the point of including COUNT(Genre)). As you can see, Pop appears the most times (16 of the 57 albums), followed by Rap (15 albums), and Rock (10 albums).

Here’s the second query, also with the addition of the COUNT function

Just like the second query, the results are grouped by gender, except this time the query also counts how many times each gender appears in the table. In other words, this query shows how many female and male singers are listed on this table (17 and 21 respectively). As for why 0 is listed for NULL, MySQL didn’t count the NULL rows in the Gender column (in case you were wondering, there are 10 NULL columns-or 10 bands-listed on this table).

Now let’s run the third query again, this time with the addition of the HAVING function.

The HAVING function is essentially an if statement; it tells the query only select those results that meet certain criteria. In this case, the query will only select genres with COUNT(Genre) greater than 5, meaning that the genre appears more than 5 times on the table. In this case, pop, rap, and rock meet the criteria (appearing 16, 15, and 10 times respectively)

  • Syntax tip to keep in mind-HAVING always comes after GROUP BY.

Let’s run the fourth query again, this time with the addition of the COUNT function.

Just like the query above, only results that meet a certain criteria are selected. In this case, we only want the results where Gender = ‘F’ (meaning how many female singers are listed on this table). As you can see, there are 17 female singers listed.

  • HAVING doesn’t just need to include numerical criteria (such as COUNT(column_name)) but can also include non-numerical criteria as well (as can be seen above)

By the way, here’s the Excel workbook with all of the spreadsheets that contain the information used in this database-MySQL database (blog). I tried uploading the SQL file but was unable to do so.

Thanks for reading,

Michael