R Lesson 5: Graphing Logistic Regression Models

Advertisements

Hello everybody,

It’s Michael, and today I’ll be discussing graphing with logistic regression. This will serve as a continuation of R Lesson 4: Logistic Regression Models (I’ll be using the dataset and the models from that post).

Let’s start by graphing the second model from R Lesson 4. That’s the one that includes season count and premiere year (I feel this would be more appropriate to graph as it is the more quantitative of the two models).

Here’s the formula for the model if you’re interested (as well as the output):

20Jul capture1

Now let’s plot the model (but first, let’s remember to install the ggplot2 package).

Next we have to figure out the probabilities that each show will be renewed (or not).

And finally, let’s plot the model.

What are some conclusions we can draw from the model?

  • The shows with less than 25 seasons and that premiered between 1975 and the early 90s (such as Roseanne which had 10 seasons and premiered in 1988) had no chance at renewal.
  • For shows with less than 25 seasons, the more recently the show premiered, the more likely it was renewed (as shown by the progressively brighter colors).
  • For the few outlier shows with more than 25 seasons (regardless of when they premiered) they had a 100% chance at renewal.
    • The two notable examples would be The Simpsons (at 29 seasons) and SNL (at 43 seasons)

Thanks for reading,

Michael

 

 

 

 

 

 

 

 

 

R Lesson 3: Basic graphing with R

Advertisements

Hello everybody,

This is Michael, and today’s post will be on basic graphing with R. I’ll be using a different dataset for this post-murder_2015_final , which details the change in homicide rates from 2014 to 2015 as well as the individual homicide rates for 2014 and 2015 in 83 US cities (I felt this one was more quantitive than the dataset I used in my last two posts).

So let’s begin with a bar chart.

  • If you can’t read this, here’s the code
    • plot(file$X2015_murders, file$change, pch=20, col=”red”, main=”2014-2015 murder rate changes”, xlab=”2015 murders”, ylab=”Change from 2014 homicide rate”)

As you can see, there are two outliers at the upper-right hand corner of the screen. If you want to find out what those cities might be, here’s how you would add labels to each of the points.

  • Remember not to close the window with the graph when typing this command!

From this graph, we can see that the two outliers (or cities with the largest 2014-to-2015 rise in murder rates) are Chicago and Baltimore.

Let’s try a bar graph now. Here’s the command to make a basic bar chart.

As you can see, 53 of the cities had a year-to-year rise in murder rates, 4 had no change in murder rates, and 26 had a year-to-year drop in murder rates (if you’re wondering what those cities are, check the spreadsheet attached to this post).

Let’s make another graph-the box plot. Here is the command

Some things to know when reading a box plot

  • The bold dashes represent the median value for the murders in a certain state (or only value if a state appears just once)
  • The top and bottom lines represent the lowest and highest values corresponding to a certain state
  • The yellow bars denote the range of the majority of values for a certain state
  • The dashed lines on the top and bottom of the chart show the highest and lowest values not in the range denoted by the yellow bar
    • If there aren’t any dashed lines, then the yellow bars denote all of the values, not just the majority
  • Any circles you see are outliers corresponding to a particular state.

 

One more thing, if you’re wondering where I got this data from, here the website-https://github.com/fivethirtyeight/data/blob/master/murder_2016/murder_2015_final.csv. The website is FiveThirtyEight.com, which writes interesting data-driven articles, such as  The Lebron James Decision-Making Machine. FiveThirtyEight then posts the code and data used in these articles on GitHub so anyone can perform statistical analyses on the data (good place to look for free datasets for your own data analysis project, and much more interesting than the free datasets that come with R with data 40+ years old).

Thank you,

Michael