Python Lesson 29: More Things You Can Do With MATPLOTLIB Bar Charts (MATPLOTLIB pt. 2)

Advertisements

Hello everybody,

Michael here, and today’s lesson will cover more neat things you can do with MATPLOTLIB bar-charts.

In the previous post, I introduced you all to Python’s MATPLOTLIB package and showed you how you can use this package to create good-looking bar-charts. Now, we’re going to explore more MATPLOTLIB bar-chart functionalities.

Before we begin, remember to run these imports:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

#Also include the %matplotlib inline line in your notebook.

Also remember to run this code:

tokyo21medals = pd.read_csv('C:/Users/mof39/OneDrive/Documents/Tokyo Medals 2021.csv')

This code creates a data-frame that stores the Tokyo 2021 medals data. The link to this dataset can be found in the Python Lesson 27: Creating Pandas Visualizations (pandas pt. 4) post.

Now that we’ve done all the necessary imports, let’s start exploring more cool things you can do with a MATPLOTLIB bar-chart.

Let’s say you wanted to add some grid lines to your bar-chart. Here’s the code to do so (using the gold bar vertical bar-chart example from Python Lesson 28: Intro to MATPLOTLIB and Creating Bar-Charts (MATPLOTLIB pt. 1)):

tokyo21medals.plot(x='Country', y='Total', kind='bar', figsize=(20,11), legend=None)
plt.title('Tokyo 2021 Medals', size=15)
plt.ylabel('Medal Tally', size=15)
plt.xlabel('Country', size=15)
xValues = np.array(tokyo21medals['Country'])
yValues = np.array(tokyo21medals['Total'])
plt.bar(xValues, yValues, color = 'gold')
plt.grid()

Pretty neat, right? After all, all you needed to do was pop the plt.grid() function to your code and you get neat-looking grid lines. However, in this bar-chart, it isn’t ideal to have grid lines along both axes.

Let’s say you only wanted grid lines along the y-axis. Here’s the slight change in the code you’ll need to make:

plt.grid(axis='y')

In order to only display grid lines on one axis, pass in an axis parameter to the plt.grid() function and set the value of axis as the axis you wish to use as the parameter (either x or y). In this case, I set the value of axis to y since I want the gridlines on the y-axis.

Here’s the new graph with the gridlines on just the y-axis:

Honestly, I think this looks much neater!

Now, what if you wanted to plot a bar-chart with several differently-colored bars side-by-side? In the context of this dataset, let’s say we wanted to plot each country’s bronze medal, silver medal, and gold medal count side-by-side. Here’s the code we’d need to use:

tokyo21medalssubset = tokyo21medals[0:10]

plt.figure(figsize=(20,11))
X = tokyo21medalssubset['Country']
bronze = tokyo21medalssubset['Bronze Medal']
silver = tokyo21medalssubset['Silver Medal']
gold = tokyo21medalssubset['Gold Medal']
Xaxis = np.arange(len(X))
plt.bar(Xaxis - 0.2, bronze, 0.3, label='Bronze medals', color='#cd7f32')
plt.bar(Xaxis, silver, 0.3, label='Silver medals', color='#c0c0c0')
plt.bar(Xaxis + 0.2, gold, 0.3, label='Gold medals', color='#ffd700')
plt.xticks(Xaxis, X)
plt.xlabel('Country', size=15)
plt.ylabel('Total medals won', size=15)
plt.title('Tokyo 2021 Olympic medal tallies', size=15)
plt.legend()
plt.show()

So, how does all of the code work? Well, before I actually started creating the code that would create the bar-chart, I first created a subset of the tokyo21medals data-frame aptly named tokyo21medalssubset that contains only the first 10 rows of the tokyo21medals data-frame. The reason I did this was because the bar-chart would look rather cramped if I tried to include all countries.

After creating the subset data-frame, I then ran the plt.figure function with the figsize tuple to set the size of the plot to (20,11).

The variable X grabs the x-axis values I want to use from the data-frame-in this case I’m grabbing the Country values for the x-axis. However, X doesn’t create the x-axis; that’s the work of the aptly-named Xaxis variable. Xaxis actually creates the nice, evenly-spaced intervals that you see on the above bar-chart’s x-axis; it does so by using the np.arange() function and passing in len(X) as the parameter.

As for the bronze, silver, and gold variables, they store all of the Bronze Medal, Silver Medal, and Gold Medal values from the tokyo21medalssubset data-frame.

After creating the Xaxis variable, I then ran the plt.bar() function three times-one for each column of the data-frame I used. Each plt.bar() function has five parameters-the bar’s distance from the “center bar” in inches (represented with Xaxis +/- 0.2), the variable representing the column that the bar will use (bronze, silver, or gold), the width of the bar in inches (0.3 in this case), the label you want to use for the bar (which will be used for the bar-chart’s legend), and the color you want to use for the bar (I used the hex codes for bronze, silver, and gold).

  • By “center bar”, I mean the middle bar in a group of bars on the bar-chart. In this bar-chart, the “center bar” is always the grey bar as it is always between the silver and gold bars in all of the bar groups.
  • Don’t worry, I’ll cover color hex codes in greater detail in a future post.

After creating the bronze, gold, and silver bars, I then used the plt.xticks() function-and passed in the X and Xaxis variable to create the evenly-spaced x-axis tick marks on the bar-chart. Once the x-axis tick marks are plotted, I used the plt.title(), plt.xlabel(), and plt.ylabel() functions to set the labels (and display sizes) for the chart’s title, x-axis, and y-axis, respectively.

Lastly, I ran the plt.legend() and plt.show() functions to create the chart’s legend and display the chart, respectively. Remember the label parameter that I used in each of the plt.bar() functions? Well, each of these values were used to create the bar-chart’s legend-complete with the appropriate color-coding!

Now, what if instead of plotting the bronze, silver, and gold bars side-by-side, you wanted to plot them stacked on top of each other. Here’s the code we’d use to do so:

plt.figure(figsize=(20,11))
X = tokyo21medalssubset['Country']
bronze = tokyo21medalssubset['Bronze Medal']
silver = tokyo21medalssubset['Silver Medal']
gold = tokyo21medalssubset['Gold Medal']
Xaxis = np.arange(len(X))
plt.bar(Xaxis, bronze, 0.3, label='Bronze medals', color='#cd7f32')
plt.bar(Xaxis, silver, 0.3, label='Silver medals', color='#c0c0c0', bottom=bronze)
plt.bar(Xaxis, gold, 0.3, label='Gold medals', color='#ffd700', bottom=silver)
plt.xticks(Xaxis, X)
plt.xlabel('Country', size=15) 
plt.ylabel('Total medals won', size=15)
plt.title('Tokyo 2021 Olympic medal tallies', size=15)
plt.legend()
plt.show()

Now, this code is similar to the code I used to create the bar-chart with the side-by-side bars. However, there are some differences the plt.bar() functions between these two charts, which include:

  • There’s no +/- 2 in any parameter, as I’m stacking bars on top of each other rather than plotting them side-by-side
  • For the second and third plt.bar() functions, I included a bottom parameter and set the value of this parameter to the bar I want to plot below the bar I’m plotting.
    • OK, that may sound confusing, but to clarify, when I’m plotting the silver bar, I set bottom equal to bronze as I’m plotting the bronze bar below the silver bar. Likewise, when I plot the gold bar, I set bottom equal to silver, as I want the silver bar below the gold bar.

Honestly, this looks much neater than the side-by-side bar-chart we made.

Aside from the differences in plt.bar() functions between this chart and the chart above, the rest of the code is the same between the two charts.

Thanks for reading,

Michael

Python Lesson 28: Intro to MATPLOTLIB and Creating Bar-Charts (MATPLOTLIB pt. 1)

Advertisements

Hello everybody,

Michael here, and today’s lesson will serve as an intro to Python’s MATPLOTLIB package-this is part 1 in my MATPLOTLIB series. I will also cover bar-chart manipulation with MATPLOTLIB.

Now, as I mentioned in my previous post (Pandas Lesson 27: Creating Pandas Visualizations (pandas pt. 4)), MATPLOTLIB is another Python visualization creation package-just like pandas-but unlike the pandas package, MATPLOTLIB has more functionalities (such as adding interactive components to visualizations).

Now, to work with the MATPLOTLIB package, be sure to run this command to install the package-pip install matplotlib (or run the pip list command to check if you already have it).

For this post, we’ll be working with the same Tokyo 2021 dataset we used for the previous post (click the Pandas Lesson 27 link to find and download that dataset).

Once you’ve installed the MATPLOTLIB package, run this code in your IDE:

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

tokyo21medals = pd.read_csv('C:/Users/mof39/OneDrive/Documents/Tokyo Medals 2021.csv')
  • Since I didn’t discuss the PYPLOT sub-package, I’ll do so right here. PYPLOT is essentially a MATPLOTLIB sub-package that contains the majority of MATPLOTLIB’s utilities-this is why when we import MATPLOTLIB to our IDE, we usually include the PYPLOT sub-package.

You’ll probably recognize all of this code from the previous post. That’s because I used some MATPLOTLIB in the previous post and included the %matplotlib inline line. You’ll also need to import pandas and create a pandas data-frame that stores the Tokyo Medals 2021 dataset into the IDE (just for consistency’s sake, I’ll call this data-frame tokyo21medals).

Now, before we get into more MATPLOTLIB specifics, let’s review the little bit of MATPLOTLIB I covered in the previous lesson.

So, just to recap, here’s the MATPLOTLIB code I used to create the bar-chart in the previous lesson:

tokyo21medals.plot(x='Country', y='Total', kind='bar', figsize=(20,11))
plt.title('Tokyo 2021 Medals', size=15)
plt.ylabel('Medal Tally', size=15)
plt.xlabel('Country', size=15)

And here’s the bar-chart that was generated:

Now, how exactly did I generate this bar-chart? First of all, I used pandas’ plot() function (remember to import pandas) and filled it with four parameters-the column I want to use for the x-axis, the column I want to use for the y-axis, the type of visual I want to create, and the display size I want for said visual.

After creating the blueprint of the visual with pandas’ plot() function, I then used MATPLOTLIB’s plt.title() function to set a title for the bar-chart (I also passed in a size parameter to set the display size of the title). Next, I used MATPLOTLIB’s plt.ylabel() function to set a label for the chart’s y-axis and just as I did with the plt.title() function, I passed in a size parameter to set the display size for the y-axis label. Lastly, I used the plt.xlabel() function to change the bar-chart’s x-axis label, and, just as I did for the plt.title() and plt.xlabel() functions, I also added a size parameter to set the display size for the x-axis label. However, when you first create the bar-chart, you’ll notice that a default x-axis label has already been set-Country-which is the name of the column I chose for the x-axis. In this case, I didn’t change the label name, just the label display size. However, in order to change the label display size, you’ll need to pass in the x-axis label you’d like to use as the first parameter of the plt.xlabel() axis function.

  • Why do all of these functions start with plt? Remember the import matplotlib.pyplot as plt import you did.

Now, MATPLOTLIB bars are blue by default. What if you wanted to change their color? Let’s say we wanted to go with the theme of this dataset and change all the bars to gold (this dataset covers Tokyo 2021 Olympic medal tallies, after all). Here’s the code to do so:

tokyo21medals.plot(x='Country', y='Total', kind='bar', figsize=(20,11))
plt.title('Tokyo 2021 Medals', size=15)
plt.ylabel('Medal Tally', size=15)
plt.xlabel('Country', size=15)
xValues = np.array(tokyo21medals['Country'])
yValues = np.array(tokyo21medals['Total'])
plt.bar(xValues, yValues, color = 'gold')

So, how did I get the gold color on all of these bars? Well, before I discuss that, let me remind you that you’ll need to install NumPy (import numpy as np in case you forgot) here. I’ll explain why shortly.

After you create the outline for the bar-chart (with panda’s plot() function) and set labels for the bar-chart’s x-axis, y-axis, and title, you’ll need to store the values for the x-axis and y-axis in NumPy arrays (this is where the NumPy package comes in). For both the x-axis and y-axis, use the np.array() function and pass in the data-frame columns you used for the x-axis and y-axis, respectively. After creating the NumPy arrays, write this line of code-plt.bar(xValues, yValues, color = 'gold'). The plt.bar() function takes three parameters-the two NumPy arrays you created for you x-axis and y-axis and the color parameter which sets the color of the bars (I set the bars to gold in this case).

  • Hex codes will work for the color as well.

Looks pretty good! But wait, the legend is still blue!

In this case, let’s remove the legend altogether. Here’s the code to do so:

tokyo21medals.plot(x='Country', y='Total', kind='bar', figsize=(20,11), legend=None)
plt.title('Tokyo 2021 Medals', size=15)
plt.ylabel('Medal Tally', size=15)
plt.xlabel('Country', size=15)
xValues = np.array(tokyo21medals['Country'])
yValues = np.array(tokyo21medals['Total'])
plt.bar(xValues, yValues, color = 'gold')

And here’s the bar-chart without the legend:

In order to remove the legend from the bar-chart, all you needed to do was add the line legend=None to the tokyo21medals.plot() function. The legend=None line removes the legend from the bar-chart.

Last but not least, let’s explore how to display the bars horizontally rather than vertically.

Assuming we keep the gold coloring on the bars, here’s the code you’d need to display the bars horizontally:

plt.figure(figsize=(25,25))
plt.title('Tokyo 2021 Medals', size=15)
plt.ylabel('Country', size=15)
plt.xlabel('Medal Tally', size=15)
xValues = np.array(tokyo21medals['Country'])
yValues = np.array(tokyo21medals['Total'])
plt.barh(xValues, yValues, color='gold')

And here’s the new bar-chart with the horizontal bars (well, part of it-the bar-chart was too big to fit in one picture):

As you can see, the code I used to create this horizontal bar-chart is different from the code I used to create the vertical bar-chart. Here are some of those code differences:

  • I didn’t use pandas’ plot() function at all; to create the horizontal bar-chart, PYPLOT functions alone did the trick.
  • Unlike the code I used for the vertical bar-charts, I included PYPLOT’s figsize() function as the first function to be executed in this code block. I passed in a two-element tuple as this function’s parameter in order to set the size of the bar-chart (in this case, I set the bar-chart’s size to 25×25).
    • Just a suggestion, but if you’re using MATPLOTLIB to create your visual, you should set the size of the visual in the first line of code you use to create your visual.
  • Country is in the x-axis NumPy array while Total is in the y-axis NumPy array.
  • To plot the bar chart, I used PYPLOT’s barh() function rather than the bar() function. I still passed in a color parameter to the barh() function, though.

Even with all these differences, I didn’t change the plot title, x-axis label, or y-axis label.

Thanks for reading,

Michael

EXCITING BLOG UPDATE

Advertisements

Hello everybody,

So, this won’t be another post trying to teach you guys an interesting programming lesson (though I’ve got plenty more of those coming). Rather, I just want to use this post to share some an exciting blog update with you.

For the last three years, you’ve all followed me as I shared my amazing programming & data science content on this blog (and I am very very grateful for all of you who’ve read and/or shared my content during this blog’s run).

Now for the update. My blog posts will have ANOTHER home. Yes, I figured that, with three years under my belt and 107 posts under my belt, it was time to grow my little blog (and in turn, grow this little blog’s following). And so, without further ado, I’d like to announce that my future posts (starting with Python Lesson 27: Creating Pandas Visualizations (pandas pt. 4)-posted on September 22, 2021) will also be published on Medium (another great blogging platform for those unaware). Here’s the Medium link to the Python Lesson 27 post-https://medium.com/@michael71314/pandas-lesson-27-creating-pandas-visualizations-pandas-pt-4-ab3e49e838ca.

  • In case you’re wondering, this post will not be published on Medium.

Does this mean you’ll no longer find my great content on WordPress? No! This news just means that my blog will now have TWO homes. For those who love following me on WordPress, you’ll still see all my new content here too.

Now, you’re probably wondering if you’ll see my whole archive of posts on Medium. So far, I’m not planning to add the whole catalog to Medium, but I’ll let you all know if I change my mind.

Thanks again for reading and/or sharing my posts,

Michael

P.S. If you want to connect with me on Medium, look for Michael Orozco-Fletcher-the name shouldn’t be hard to miss 😉 Looking forward to growing my little blog and teaching more people around the world the joys of coding, programming, and data analytics!