R Lesson 22: US Mapmaking with R (pt. 1)

Advertisements

Hello everybody,

Michael here, and today’s post (which is my last post for 2020) will be a lesson on basic US mapmaking with R. Today we’ll only focus on US mapmaking with R, but don’t worry, I intend to do a global mapmaking with R post later on.

Before we get started mapmaking, install these two packages to R-ggplot2 and usmap. Once you get these two packages installed, write this code and see the output:

plot_usmap(regions = "states") + labs(title="US States") + theme(panel.background = element_rect(color="black", fill = "lightblue"))

As you can see, we have created a basic map of the US (with Alaska and Hawaii), complete with a nice blue ocean.

However, this isn’t the only way you can plot a basic map of the US. The regions parameter has four different options for plotting the US map-states (which I just plotted), state, counties, and county.

Let’s see what happens when we use the state option:

plot_usmap(regions = "state", include=c("TN")) + labs(title="Tennessee") + theme(panel.background = element_rect(color="black", fill = "lightblue"))

In this example, I used the state option for the regions parameter to create a plot of the state of Tennessee (but left everything else unaltered).

How did I manage to get a plot of a single state? The plot_usmap function has several optional parameters, one of which is include. To plot the state of Tennessee, I passed a vector to the include parameter that consisted of a single element-TN.

  • Whenever you want to plot a single state (or several states), don’t type in the state’s full name-rather, use the state’s two-letter postal code.
  • Another parameter is exclude, which allows you to exclude certain states from a multi-state map plot (I’ll discuss multi-state map plots later).

Awesome! Now let’s plot our map with the counties option:

plot_usmap(regions = "counties") + labs(title="US Counties") + theme(panel.background = element_rect(color="black", fill = "lightblue"))

This map looks just like the first map, except it shows all of the county lines in each state.

Last but not least, let’s plot our map with the county option:

plot_usmap(regions = "county", include=c("Davidson County")) + labs(title="Davidson County") + theme(panel.background = element_rect(color="black", fill = "lightblue"))

In this example, I attempted to create a plot of Davidson County, TN, but that didn’t work out. The plot didn’t work because, even though I told R to include Davidson County in the plot, R didn’t know which state Davidson County was in, as there are two counties named Davidson in the US-one in Tennessee and another in North Carolina.

This shows you that using the county name alone when using the county argument for the regions parameter won’t work, since there are often multiple counties in different states that share the same name-the most common county name in the US is Washington County, which is shared by 31 states.

So, how do I correctly create a county plot in R? First, I would need to retrieve the county’s FIPS code.

To give you some background, FIPS stands for Federal Information Processing Standards and FIPS codes are 2- or 5-digit codes that uniquely identify states or counties. State FIPS codes have 2-digits and county FIPS codes have 5 digits; the first two digits of a county FIPS code are the corresponding state’s FIPS code. Here’s an example of county FIPS codes using the two Davidson Counties I discussed earlier:

> fips(state="TN", county="Davidson")
[1] "47037"
> fips(state="NC", county="Davidson")
[1] "37057"

In this example, I printed out the county FIPS codes for the two Davidson Counties. The FIPS code for Davidson County, TN is 47307 because Tennessee’s FIPS code is 47. Similarly, the FIPS code for Davidson County, NC is 37057 because North Carolina’s FIPS code is 37.

Now that we know the FIPS code for Davidson County, TN, we can create a plot for the county. Here’s the code to do so:

plot_usmap(regions = "county", include=c(fips(state="TN", county="Davidson"))) + labs(title="Davidson County, TN") + theme(panel.background = element_rect(color="black", fill = "lightblue"))

When I create a map plot of an individual US county, I get the shape of the county.

  • A more efficient way to write the code for this plot would have been plot_usmap(regions = "county", include=c(fips) + labs(title="Davidson County, TN") + theme(panel.background = element_rect(color="black", fill = "lightblue")) where fips would be stored as a variable with the value fips <- fips(state="TN", county="Davidson") .

So how can we get a map of several US counties, or rather, a state map broken down by counties? Here’s the code to do so:

plot_usmap(regions = "counties", include=c("TN")) + labs(title="Tennessee's 95 counties") + theme(panel.background = element_rect(color="black", fill = "lightblue"))

To create a state map broken down by counties, set regions to counties and set the include parameter to include the state you want to plot (TN in this case). As you can see, I have created a map plot of the state of Tennessee that shows all 95 county boundaries in the state.

What if you wanted to plot several states at once? Well, the usmap packages has built-in region parameters that create a plot of certain US regions (as defined by the US Census Bureau), which consist of several states. The regions you can plot include:

  • .east_north_central-Illinois, Indiana, Michigan, Ohio, and Wisconsin
  • .east_south_central-Alabama, Kentucky, Mississippi, and Tennessee
  • .midwest_region-any state in the East North Central and the West North Central regions
  • .mid_atlantic-New Jersey, New York, and Pennsylvania
  • .mountain-Arizona, Colorado, Idaho, Montana, Nevada, New Mexico, Utah, and Wyoming
  • .new_england-Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, and Vermont
  • .northeast_region-any state in the New England or Mid-Atlantic regions
  • .north_central_region-any state in the East and West North Central regions
  • .pacific-Alaska, California, Hawaii, Oregon, and Washington
  • .south_atlantic-Delaware, Florida, Georgia, Maryland, North Carolina, South Carolina, Virginia, Washington DC, and West Virginia
  • .south_region-any state in the South Atlantic, East South Central, and West South Central regions
  • .west_north_central-Iowa, Kansas, Minnesota, Missouri, Nebraska, North Dakota, and South Dakota
  • .west_region-any state in the Mountain and Pacific regions
  • .west_south_central-Arkansas, Oklahoma, Louisiana, and Texas

Let’s plot out a simple region map for the .east_south_central region. Here’s the code to do so:

 plot_usmap(include = .east_south_central) +  labs(title="East South Central US", size=10) + theme(panel.background = element_rect(color="black", fill = "lightblue"))

Simple enough, right? All I did was set the include parameter to .east_south_central.

  • Remember to always include a dot in front of the region name so R reads the region name as one of the built-in regions in usmap; if you don’t include the dot, R would read the region name as a simple String, which will generate errors in your code.

Now let’s break up the region map by counties. Here’s the code to do so:

plot plot_usmap(regions = "counties", include = .east_south_central) +  labs(title="East South Central US", size=10) + theme(panel.background = element_rect(color="black", fill = "lightblue"))

To show all of the county lines in a specific region, simply set the regions parameter to counties. Also (and you probably noticed this already), if you don’t set a value for the regions parameter, regions defaults to states.

OK, so I’ve covered the basics of US map plotting with the usmap package. But did you know you can display state and county names on the map plot? Here’s the code to add state name labels to a map plot of the whole US:

 plot_usmap(regions = "states", labels=TRUE) + labs(title="US States") + theme(panel.background = element_rect(color="black", fill = "lightblue"))

The code I used to create this map plot is almost identical to the code I used to create the first map plot with one major exception-I included a labels parameter and set it to TRUE. If the labels parameter is set to true, the state label name will be displayed on each state (the label name being the state’s 2-letter postal code).

Now let’s display county names on a map, using the state map of Tennessee. Here’s the code to do so:

 plot_usmap(regions = "counties", labels=TRUE, include=c("TN")) + labs(title="Tennessee's 95 counties") + theme(panel.background = element_rect(color="black", fill = "lightblue"))

As you can see, by setting labels to TRUE, I was able to include all of Tennessee’s county names on the map (and most of them fit quite well, though there are a few overlaps).

Thanks for reading,

Michael

Also, since this is my last post of 2020, thank you all for reading my content this year. I know it’s been a crazy year, but hope you all learned something from my blog in the process. Have a happy, healthy, and safe holiday season, and I’ll see you all in 2021 with brand new programming content (including a part 2 to this lesson)!

R Lesson 21: Calendar Plots

Advertisements

Hello everybody,

Michael here, and today’s post will be an R lesson on calendar plots in R. Now I know I already did a lesson on dates and times in R, but this lesson will focus on creating cool calendar plots in R with the calendR package.

Before we start making calendar plots, let’s install the calendR package.

Once we install these packages, let’s start by building a simple calendar plot with this line of code-calendR(). Here’s the output we get:

As you can see, the calendR() prints out a calendar for the current year (2020). If no parameter is specified, calendR() print the current year’s calendar by default.

If you want to print the calendar for a different year, specify the year in the parameter. Let’s say you wanted to print last year’s calendar (2019). To print last year’s calendar, use this line of code-calendR (year=2019). Here’s the output from that line of code:

OK, now let’s try some other neat calendar tricks. First, let’s set up our calendar plot to have weeks start on Monday (weeks start on Sunday by default). Here’s the code to do so:

calendR(start="M")

Since I didn’t specify a year in the parameter, calendR will print out a calendar of the current year by default.

Awesome! Now let’s add some color to certain special days. Here’s the code to do so:

calendR(special.days = c(61,91,121,331,360,366), special.col="lightgreen", low.col="white")

When adding color to certain special days, here are some things to keep in mind:

  • The calendR function has its own parameter to denote special days-aptly named special.days. Special.col specifies a color for the special days while low.col specifies a color for all other days.
  • To choose which days to highlight, specify the corresponding number for the day you want to highlight (e.g. 1 corresponds to January 1, 2 to January 2, and so on). Remember that if the calendar plot is for a leap year (like the plot above), day numbers from March 1 onward will be different than they would be in a non-leap year (e.g. the day number for March 2 in a non-leap year would be 61, while in a leap year it would be 62).

Now what if we wanted to highlight all of the weekends? Let’s see how we can do that. Use this line of code to highlight the weekends in 2020:

calendR(special.days="weekend")

Now what if we wanted to add several color-coded events to the calendar? Here’s the code to do that:

events <- rep(NA, 366) 
events[1] <- "Holidays"
events[45] <- "Holidays"
events[146] <- "Holidays"
events[186] <- "Holidays"
events[251] <- "Holidays"
events[305] <- "Holidays"
events[331] <- "Holidays"
events[360] <- "Holidays"
events[366] <- "Holidays"
events[61] <- "My Birthday"
calendR(special.days=events, special.col=c("green", "orange"), legend.pos="right") 

How did I create this calendar plot? Let me give you a step-by-step breakdown:

  • In order to add events to the calendar plot, you would first need to create a vector of NA values. My vector is called events, but the name of the vector doesn’t matter. Just remember to make the vector the same length of the number of days of days as the corresponding year.
    • Hint-the vector will have a length of 366 for leap years (such as 2020) and 365 for non-leap years (such as 2021)
  • Next, add all of the events you want on the calendar to the corresponding day of the year on the vector. For example, two days I wanted to highlight were January 1 and May 25 (New Year’s Day and Memorial Day 2020, respectively), so I added these two days to positions 1 and 146 of the events vector, as January 1 and May 25 were the 1st and 146th days of 2020, respectively.
    • If you want multiple dates under the same event category (e.g. “Holidays” in this example) , you’ll have to list all of the corresponding days of the year in the vector 1-by-1, unless the dates are consecutive (e.g. an event that goes on for 7 consecutive days). You can’t list all of the corresponding days of the year in a comma-separated array like you can in Java or Python (I know because I tried to do this and it didn’t work).
  • Once you added all of the events to the calendar, use the calendR function to create the color-coded calendar plot with events. The function takes four parameters, which include:
    • year-the year for the calendar plot; if a year isn’t specified, calendR will create a plot of the current year by default (as it does in this example)
    • special.days-the vector of events you created (events in this example)
    • special.col-the colors to represent each of the events
      • Only use as many colors as you have event categories. In this example, I only have two event categories-“My Birthday” and “Holidays”-and thus only use two colors for this function. Also keep in mind that the first color listed will be used for the first event category, the second color for the second event category, and so on.
    • legend.pos-the positioning of the legend on the plot; in this example, legend.pos equals right, which means that the legend will be displayed on the right of the graph

Now, what if you wanted to make a calendar plot for just a single month? Using the current month as an example, here’s how to do so:

calendR(month = 12)

By simply using the code calendR(year = X, month = 1-12) you can create a calendar plot for any month of any year. Remember that if you don’t specify the year, calendR will create a plot for chosen month of the current year (2020) by default.

Great, now let’s add some color to certain days! Here’s the code to do that (along with the corresponding output):

calendR(month = 12, special.days=c(9, 16, 25, 27, 31), special.col="yellow", low.col="white")

OK, what if we wanted to add several color-coded events to the calendar? Let’s see how we can do this:

events2 <- rep(NA, 31)
events2[24:25] <- "Holidays"
events2[31] <- "Holidays"
events2[1:4] <- "Workweek"
events2[16] <- "Birthdays"
events2[27] <- "Birthdays"
calendR(month = 12, special.days=events2, special.col = c("red", "orange", "lightblue"), low.col="white", legend.pos="top") 

The process for adding color-coded events for a month-long calendar plot is the same as it is for a year-long calendar plot, except you would need to make the events vector the length of the days in the month rather than the length of the days in the year. In this case the length of the events vector (called events2 to avoid confusion with the other events vector) would be 31, as there are 31 days in December.

  • Just so you know, low.col in the last two examples refers to the color of the boxes that don’t contain color-coded events.

Great, now let’s say we wanted to add some text to the days. How would we go about this? Here’s how:

calendR(month = 12, text=c("Battlebots", "Birthday", "Christmas", "Birthday", "New Year's Eve"), text.pos=c(3, 16, 25, 27, 31), text.size=4.5, text.col="green4")

Here’s a step-by-step breakdown of how I created this plot:

  • First, I indicated the month I wanted to use for my calendar plot. Since I didn’t specify a year, calendR will automatically plot the chosen month of the current year (December 2020).
  • Next, I created a vector of text I wanted to add to the calendar plot and used text.pos to indicate which days should contain which text (for instance, the first element in the text vector corresponds with the first element of the text.pos vector).
    • Remember to keep the text vector the same size as the text.pos vector.
    • I know this sounds obvious, but know how many days the month on your calendar plot has, or else the calendar plot won’t work. For instance, trying to place text on 32 in this plot won’t work because December doesn’t have a 32nd.
  • Text.size and text.col specify the font size and color of the text, respectively.

Now let’s try something different-creating a lunar calendar, which is a month-long calendar plot that shows the different moon phases on each day of the month. Also, just out of curiosity, let’s create a lunar calendar for next month-here’s the code to do so:

calendR(year = 2021, month = 1, lunar = TRUE, lunar.col="gray60", lunar.size=8)

Here’s a step-by-step breakdown of how I created this lunar calendar:

  • Since I’m creating this lunar calendar for next month, I had to specify both the year (2021) and the month (1), as next month is the January 2021 (crazy, huh).
  • I had to set lunar to TRUE to indicate that this is a lunar calendar
  • Lunar.col signifies the color of the non-visible area of the moons.
  • Lunar.size signifies the size of the moons on the calendar plot.
  • In case you’re wondering what each of the moon phases are, here’s a handy little graphic (feel free to Google this information if you want to learn more about moon phases):

Last but not least, let me show you how we can create a calendar plot with custom start and end dates. A perfect example of this type of calendar plot would be an academic calendar, which I will show you how to create below, using the 2020-21 Metro Nashville Public Schools calendar as an example:

events3 <- rep(NA, 304)
events3[3] <- "Day Off"
events3[6] <- "Day Off"
events3[38] <- "Day Off"
events3[63] <- "Day Off"
events3[66:70] <- "Day Off"
events3[73] <- "Day Off"
events3[84] <- "Day Off"
events3[95] <- "Day Off"
events3[103] <- "Day Off"
events3[117:119] <- "Day Off"
events3[140] <- "Day Off"
events3[143:147] <- "Day Off"
events3[150:154] <- "Day Off"
events3[157:159] <- "Day Off"
events3[171] <- "Day Off"
events3[199] <- "Day Off"
events3[227:231] <- "Day Off"
events3[238] <- "Day Off"
events3[245] <- "Day Off"
events3[299] <- "Day Off"
events3[4] <- "First & Last Days of School"
events3[298] <- "First & Last Days of School"
events3[136:139] <- "Half Days"
events3[293:294] <- "Half Days"
events3[297] <- "Half Days" 

calendR(start_date = "2020-08-01", end_date = "2021-05-31", special.days=events3, special.col=c("salmon", "seagreen1", "gold"), legend.pos="right", title = "Metro Nashville Public Schools 2020-21 Calendar")

So, how did I create this color-coded calendar plot with a custom start and end date? Here’s a step-by-step breakdown of the process:

  • Since I knew the start and end dates I wanted to use for this calendar plot, I created an events vector (named events3 to avoid confusion with the other two events vectors) of length 304. My start and end dates are August 1, 2020 and May 31, 2021, respectively, which cover a 304 day period (including the end date).
  • I then added all of the events to the corresponding days on the vector. Since this isn’t a standard calendar year vector, knowing which element corresponds to which day is trickier. However, I created an excel table with the dates in my date range on one column and the corresponding day in the other column, like this:
  • After inserting all of the events to the corresponding days in the vector, I then used the calendR function to indicate my start_date and end_date.
  • For the special.days argument, I used the events3 vector I created.
  • I then indicated the three colors I wanted to use for the three event categories as the argument for special.col
    • For special.col remember to include the same amount of colors as event categories in this vector. Since I had 3 event categories, I used 3 colors in this vector.
  • I then set legend.pos to right, which tells calendR to place the legend to the right of the calendar plot.
  • Lastly, I set the title of the plot to Metro Nashville Public Schools 2020-21 Calendar.

Thanks for reading,

Michael