Hello everybody,
Michael here, and in today’s post, I’m going to revisit an old friend of ours-the language R. As you readers may recall, R was the first language I covered on this blog, and since we’re only a few posts away from the blog’s fifth anniversary, I thought it would be fun to revisit this blog’s roots as an analytics blog (remember the Michael’s Analytics Blog days everyone).
Today’s post will provide a basic introduction on doing calculus with R (including graphing). Why am I doing R calculus? Well, I wanted to do some more fun R posts leading up to the blog’s fifth anniversary and I did have fun writing the trigonometry portion of my previous post-Python Lesson 41: Word2Vec (NLP pt.7/AI pt.7)-that I wanted to dive into more mathematical programming topics. With that said, let’s get started with some R calculus!
Setting ourselves up
In this lesson, we’ll be using this dataset-
This dataset contains the Rotten Tomatoes scores for all MCU (Marvel Cinematic Universe) movies from Iron Man (2008) to Guardians of the Galaxy Vol. 3 (2023). Both critic and audience Rotten Tomatoes scores are included for all MCU movies.
Now, let’s open up our R IDE and read in this CSV file:
MCU <- read.csv("C:/Users/mof39/OneDrive/Documents/MCU movies.csv", fileEncoding="UTF-8-BON")
> MCU
Movie Year RT.score Audience.score
1 Iron Man 2008 0.94 0.91
2 Incredible Hulk 2008 0.67 0.69
3 Iron Man 2 2010 0.71 0.71
4 Thor 2011 0.77 0.76
5 Captain America: The First Avenger 2011 0.80 0.75
6 The Avengers 2012 0.91 0.91
7 Iron Man 3 2013 0.79 0.78
8 Thor The Dark World 2013 0.66 0.75
9 Captain America: The Winter Soldier 2014 0.90 0.92
10 Guradians of the Galaxy 2014 0.92 0.92
11 Avengers: Age of Ultron 2015 0.76 0.82
12 Ant-Man 2015 0.83 0.85
13 Captain America: Civil War 2016 0.90 0.89
14 Doctor Strange 2016 0.89 0.86
15 Guardians of the Galaxy Vol 2 2017 0.85 0.87
16 Spider-Man: Homecoming 2017 0.92 0.87
17 Thor: Ragnarok 2017 0.93 0.87
18 Black Panther 2018 0.96 0.79
19 Avengers: Infinity War 2018 0.85 0.92
20 Ant-Man and the Wasp 2018 0.87 0.80
21 Captain Marvel 2019 0.79 0.45
22 Avengers: Endgame 2019 0.94 0.90
23 Spider-Man: Far From Home 2019 0.90 0.95
24 Black Widow 2021 0.79 0.91
25 Shang-Chi and the Legend of the Ten Rings 2021 0.91 0.98
26 Eternals 2021 0.47 0.77
27 Spider-Man: No Way Home 2021 0.93 0.98
28 Doctor Strange in the Multiverse of Madness 2022 0.74 0.85
29 Thor: Love and Thunder 2022 0.63 0.77
30 Black Panther: Wakanda Forever 2022 0.84 0.94
31 Ant-Man and the Wasp: Quantumania 2023 0.47 0.83
32 Guardians of the Galaxy Vol 3 2023 0.81 0.95
As you can see, we have read the data-frame into R and displayed it on the IDE (there are only 31 rows here).
Now, before we dive into the calculus of everything, let’s explore our dataset:
Movie-the name of the movieYear-the movie’s release yearRT.score-the movie’s Rotten Tomatoes scoreAudience.score-the movie’s audience score on Rotten Tomatoes- R tip-when you are reading in a CSV file into R, it might help to add the
fileEncoding="UTF-8-BON"parameter into theread.csv()function as this parameter will remove the junk text that appears in the name of the dataframe’s first column.
Calculus 101
Now, before we dive headfirst into the fun calculus stuff with R, let’s first discuss calculus and derivatives, which is the topic of this post.
What is calculus? Simply put, calculus is a branch of mathematics that deals with the study of change. Calculus is a great way to measure how things change over time, like MCU movies’ Rotten Tomatoes scores over the course of its 15-year, 32-movie run.
There are two main types of calculus-differential and integral calculus. Differential calculus focuses on finding the rate of change of, well, any given thing over a period of time. Integral calculus, on the other hand, focuses on the accumulation of any given thing over a certain period of time.
A good example of differential calculus would be modelling changes in a city’s population over a certain period of time; differential calculus would be used in this scenario to find the city’s population change rate over time. A good example of integral calculus would be modelling the spread of a disease over time (e.g. COVID-19) in a certain geographic region to analyze that region’s infection rate over a certain time period.
Now, what is a derivative? In calculus, the derivative is the metric used to measure the rate of change at any given point in the measured example. In this example, the derivative (or rather derivatives since we’ll be using two derivatives) would be the change in Rotten Tomatoes scores (both critic and audience) from one MCU movie to the next.
It’s R calculus time!
Now that I’ve explained the gist of calculus and derivatives to you all, it’s time to implement them into R! Here’s how to do so (and yes, we will be finding the derivatives of both critic and audience scores). First, let’s start with the critic scores derivatives:
criticScores <- MCU$RT.score
criticDerivatives <- diff(criticScores)
criticDerivatives
[1] -0.27 0.04 0.06 0.03 0.11 -0.12 -0.13 0.24 0.02 -0.16 0.07 0.07 -0.01 -0.04 0.07 0.01 0.03 -0.11 0.02 -0.08 0.15 -0.04 -0.11 0.12 -0.44 0.46 -0.19 -0.11 0.21 -0.37
[31] 0.34
To calculate the derivatives for each critic score, I first placed all of the critics’ scores (stored in the column MCU$RT.score) into the vector criticScores. I then used R’s built-in diff() function to calculate the difference in critic scores from one MCU movie to the next and-voila!-I have my 31 derivatives.
- Even though there are 32 MCU movies, there are only 31 differences to calculate and thus only 31 derivatives that appear.
Calculating the derivatives of the audience scores works exactly the same way, except you’ll just need to pull your data from the MCU$Audience.score column:
audienceScores <- MCU$Audience.score
audienceDerivatives <- diff(audienceScores)
audienceDerivatives
[1] -0.22 0.02 0.05 -0.01 0.16 -0.13 -0.03 0.17 0.00 -0.10 0.03 0.04 -0.03 0.01 0.00 0.00 -0.08 0.13 -0.12 -0.35 0.45 0.05 -0.04 0.07 -0.21 0.21 -0.13 -0.08 0.17 -0.11
[31] 0.12
Plotting our results
Now that we’ve calculuated the derivatives of both the critic and audience scores, let’s plot them!
Here’s how we’d plot the critic scores:
plot(1:(length(criticScores)-1),criticDerivatives, type = "l", xlab = "MCU Movie Number", ylab = "Change in critic score")

In this example, I used R’s plot() function (which doesn’t require installation of the ggplot2 package) to plot the derivatives of the critic scores. The y-axis represents the change in critic scores, while the x-axis represents the index for a specific MCU movie (e.g. 0 would be Incredible Hulk while 31 would be Guardians of the Galaxy Vol.3).
However, this visual doesn’t seem to helpful. Let’s see how we can fix it!
First, let’s create a vector of the MCU movies to use as labels for this plot:
movies <- MCU$Movie
Next, let’s remove Iron Man from this vector since it won’t have a derivative (after all, it’s the first MCU movie).
movies <- movies[! movies %in% c('Iron Man')]
Great! Now let’s revise our plot to first add a title:
plot(1:(length(criticScores)-1),criticDerivatives, type = "l", main="Changes in MCU movie critic reception", xlab = "MCU Movie Number", ylab = "Change in critic score")
You can see that the plot() function’s main paramater allows you to add a title to the graph.
Next let’s add some labels to our data points-remember to only run this command AFTER you have the initial graph open!
text(1:(length(criticScores)-1),criticDerivatives, labels=movies, pos=3, cex=0.6)
Voila! With the text() function, we’re able to add labels to our data points so that we can tell which movie corresponds with which data point!
- Remember to include the same X and Y axes in the
text()function as you did in theplot()function! In this case, the X axis would be1:(length(criticScores)-1)and the Y axis would becriticDerivatives.
Now that we have a title and labelled data points in our graph, let’s gather some insights. From our graph, we can see that the critical reception for the MCU’s Phases 1 & 2 was up-and-down (these include movies from Iron Man to Ant-Man). The critical reception for MCU’s Phase 3 slate (from Captain America: Civil War to Spider-Man: Far From Home) was its most solid to date, as there are no major positive or negative derivatives in either direction. The most interesting area of the graph is Phases 4 & 5 (from Black Widow onwards), as this era of the MCU has seen some sharp jumps in critical reception from movie to movie. Some of the sharpest changes can be seen from Shang-Chi and the Legend of the Ten Rings to Eternals (a 44% drop in critic score) and from Eternals to Spider-Man: No Way Home (a 46% rise in critic score).
All in all, some insights we can gain from this graph is that MCU Phase 3 was its most critically well-recieved (and as some fans would say, the MCU’s prime) while the entries in Phase 4 & 5 have been hit-or-miss critically (ahem, Eternals).
Now that we’ve analyzed critic derivatives, let’s turn our attention to analyzing audience score derivatives. Here’s the plot we’ll use-and it’s pretty much the same code we used to create the updated critic score derivative plot (except replace the word critic with the word audience in each axis variable and in the title):
plot(1:(length(audienceScores)-1),audienceDerivatives, type = "l", main="Changes in MCU movie audience reception", xlab = "MCU Movie Number", ylab = "Change in audience score")
text(1:(length(audienceScores)-1),audienceDerivatives, labels=movies, pos=3, cex=0.6)
The change in audience reception throughout the MCU’s 15-year, 32-movie run looks a little different than the change in critic reception over that same time period. For one, there are fewer sharp changes in audience score from movie to movie. Also interesting is the greater number of positive derivatives in audience score for the MCU’s Phase 4 & 5 movies-after all, there were far more negative derivatives than positive for the MCU’s Phase 4 & 5 critical reception (this is also interesting because many fans on MCU social media accounts that I follow have griped about the MCU’s quality post-Avengers Endgame). One more interesting insight is that the sharpest changes in audience reception came during the peak of Phase 3 (namely from Black Panther to Avengers: Endgame). As you can see from the graph above, the change in audience reception is fairly high from Black Panther to Avengers: Infinity War then drops from Avengers: Infinity War to Ant-Man and the Wasp. The audience score drops even further from Ant-Man and the Wasp to Captain Marvel before sharply rising from Captain Marvel to Avengers: Endgame. I personally found this insight interesting as some of my favorite MCU movies come from Phase 3 (like Black Panther with its 96% on Rotten Tomatoes-critic score), though I do recall Captain Marvel wasn’t well liked when it came out in March 2019 (but boy oh boy was Avengers: Endgame one of the most hyped things of 2019).
Thanks for reading,
Michael