R Archives - Michael's Programming Bytes

R Lesson 35: Let’s Plot Some Inverse Trigonometric Functions

Advertisements

Hello everybody,

Michael here, and in today’s post, I’ll show you to how to plot some inverse trigonometric functions with R!

In the previous post, we explored how to create R plots of the three basic trigonometric functions-sine, cosine and tangent. This time, we’ll explore how to create R plots of the three basic inverse trigonometric functions-arcsine, arccosine and arctangent. Let’s begin!

First off, the arcsine:

To start off our exploration of plotting inverse trigonometric functions, let’s explore how we can plot the arcsine:

> x <- seq(-3*pi, 3*pi, length.out=100)
> y <- asin(x)
Warning message:
In asin(x) : NaNs produced
> plot(x, y, type='l')

As you can see, we can’t quite use the same approach to plotting the arcsine function that we used to plot the sine function since our sequence of 100 values from -3pi to 3pi yielded all nulls when trying to calculate the arcsine of each value. Let’s try a slightly different approach to plotting the arcsine function, shall we?

> x <- seq(-1, 1, length.out=100)
> y <- asin(x)
> plot(x, y, type='l')

The only modification I made from the previous example was to use a sequence from -1 to 1 (still maintaining 100 equally spaced variables).

Why did I stick with the (-1, 1) sequence? Simply put, the arcsine function is only defined within the range (-1, 1). In other words, it’s not possible to calculate the arcsine of any value outside of the range (-1, 1)-trying to do so will give you an NaN or not a number in R.

Next up, the arccosine

And for our next R plot, let’s graph the arccosine function! Here’s the code to use for a sample arccosine function:

> x <- seq(-1, 1, length.out=100)
> y <- acos(x)
> plot(x, y, type='l')

Aside from using the acos() function, we used the same logic to create this plot that we used for the arcsine plot. Both the arcsine and arccosine functions are only defined for the range (-1, 1), meaning that you will get an NaN in R if you try calculating the arcsine or arccosine for any value outside of this range.

Now, you may have noticed that our arccosine plot looks like a vertical reflection of the arcsine plot. How could that be? The range of x-axis values is the same for both plots, but notice the difference in the range of y-axis values between the two plots. The arcsine plot’s y-axis value range is (-1.5, 1.5) while the arccosine plot’s y-axis value range is (0, 3).

Why do the y-axes in both plots have different value ranges? An easy explanation would be that the arccosine plot is the vertical reflection of the arcsine plot shifted pi/2 radians (or 90 degrees) upward, hence why the arccosine’s y-axis value ranges are higher.

Last but not least, the arctangent

Saving the best for last, let’s plot an arctangent function in R! Here’s the code for a sample arctangent plot:

> x <- seq(-30, 30, length.out=100)
> y <- atan(x)
> plot(x, y, type='l')

For creating the arctangent plot, we used similar logic (aside from the atan() function) that we used to create the arcsine and arccosine plots. However, notice that I didn’t use the (-1, 1) sequence range but rather the range of (-30, 30).

You might be thinking, wouldn’t using a sequence outside of the (-1, 1) range give you a bunch of NaNs? In the case of the arctangent function, no. This is because arctangent, unlike arcsine and arccosine, is defined for the range (-infinity, +infinity). In other words, arctangent functions have no finite range, so you could use any sequence of values you want when creating an arctangent plot (I kept it simple with the -30, 30 range).

However, one interesting thing you’ll notice with the arctangent plot is that its y-axis has a range from (-1.5, 1.5). How is that possible? Even though you could a sequence of literally any two numbers, the range of possible arctangent values will range from approximately -1.5 to 1.5.

Another interesting thing about the arctangent function is that the lower part of the function (the part pointing towards -1.5) represents -infinity in the arctangent function while the upper part of the function (the part pointing away from 1.5) represents +infinity.

Thanks for reading,

Michael

R Lesson 34: Let’s Plot Some Trigonometric Functions

Advertisements

Hello everybody,

Michael here, and in the last two posts we discussed the basics of trigonometry-both with R and basic trig in general. This time, we’ll explore how to plot the three basic trigonometric functions with R-sine, cosine and tangent!

A basic R sine plot

To start our lesson, we’ll create a basic R plot of a sine function. Here’s the code we’ll utilize to create our basic R sine plot:

> x <- seq(0,7*pi,length.out=100)
> y <- sin(x)
> plot(x, y, type='l')

As you can see, these three lines of R code gave us a very simple R sine plot. How did the code accomplish this graph creation? Let’s explain!

The seq() function used for the x variable takes 3 parameters-a starting point for the sequence (0), an ending point for the sequence (7*pi), and a value for length.out which indicates how many equally spaced values you want in the sequence (I opted for 100 in this case). The sequence itself will be represented on the x-axis
The y variable takes the sine of all 100 values genereated from the sequence in the x variable-these sine values are then plotted on the y-axis.
The plot() function takes both the x and y variables along with a type parameter, which indicates the style of graph you want to plot. In this case, I set type to l, indicating I want to plot the sine function with a [solid] linear style.

Now, what exactly does 7*pi mean here? In this case, it indicates that the sequence will end at 7*pi, or roughly 21.98. Something else to note whenever you use pi in trigonometric function plots-sine functions have these things called periods, which in plain English represent the point in a function where it repeats its values. Sine functions have a period of 2*pi which means that they repeat their values every 2*pi-or 6.28-units. Since the endpoint of this sequence is 7*pi, there are 3 1/2 periods in this graph as shown by the 3 1/2 low and high points on this graph.

A basic R cosine plot

Now that we’ve explored sine plots with R, let’s turn our attention to cosine plots! Here’s the code to create a basic cosine plot in R:

> x <- seq(0, 10*pi, length.out=100)
> y <- cos(x)
> plot(x, y, type='l')

Conceptually, the cosine plot works the same way as the sine plot, as both plots have periods of 2*pi (represented by the peaks and valleys in this graph). Since the endpoint of the sequence is 10*pi, this cosine plot will have five periods. Both plots will also generate a sequence of X equally spaced values (X being the number specified in the length.out parameter).

The one difference between the cosine and sine plots? The former calculates out the cosine for all sequence values while the latter calculates the sine for all sequence values.

Last but not least, let’s explore tangent plots!

A basic R tangent plot:

Just as we explored basic R sine and cosine plots, let’s explore how to create a basic R tangent plot! Here’s the code for one such plot:

> x <- seq(-5*pi, 3*pi, length.out=100)
> y <- tan(x)
> plot(x, y, type='l')

Creating the tangent plot follows the same logic as creating the sine and cosine plots, with the exception that you’re looking for the tangent of all the equally spaced values in the sequence.

You may also be wondering why the tangent plot looks so different from the sine and cosine plots. One main reason for this is because tangent functions, unlike sine and cosine functions, has a little something called asymptotes.

What are asymptotes? To explain this concept, I feel it is important to mention that sine and cosine functions have a range of values between -1 and 1, which explains why we get smooth, wave-like plots as shown earlier in this post. However, tangent functions have a range of values between negative infinity and positive infinity. Asymptotes are straight imaginary lines that approach various curves on the tangent plot but never fully meet them.

Still a little confused? Allow me to illustrate:

This is the tangent plot we just created. Pay attention to the two curves on this graph with the red line borders (aka the asymptotes). The asymptotes on the first curve (the one that appears to have its lowest point at -60 on the y-axis) appears to be going down to negative infinity, yet the asymptotes will never touch the curve no matter how far down it goes. Likewise for the second curve (the one that appears to have its highest point at 60), which appears to be going up to positive infinity but similar to the first curve, the asymptotes will never touch the curve no matter how high it goes.

Thanks for reading,

Michael

R Lesson 33: Inverse Trigonometric Ratios in R

Advertisements

Hello everybody,

Michael here, and in today’s post, we’ll be expanding our knowledge of R trigonometry by learning inverse trigonometric ratios in R!

In the previous post, we learned some basics of R trigonomtery (and trigonometry in general). However, let’s explore some more advanced R trigonometrical concepts!

Inverse Trigonometric Functions

In the previous post R Lesson 32: Basic Trigonometry, R Style, we learned about the three basic trigonometric concepts-sine, cosine and tangent. Today, we’ll explore inverse trigonometric functions such as the arc-sine, arc-cosine and arc-tangent.

What do these trigonometric concepts represent? Well, let’s go back to our triangle illustration from the previous post:

This illustration gives a visual representation of the three most basic trigonometric concepts-sine, cosine, and tangent with the classic SOHCAHTOA mnemonic.

Now, I did mention that the sine of an angle in a right triangle is the ratio of the opposite side’s length to the hypotenuse’s length. With that said, arcsine is simply the inverse of sine-meaning arcsine is the ratio of the hypotenuse’s length to the opposite side’s length.

The same logic applies for arccosine and arctangent, as these ratios are simply the inverse of the cosine and tangent ratios, respectively. Arccosine would be the ratio of the hypotenuse’s length to the adjacent side’s length while arctangent would be the ratio of the adjacent side’s length to the opposite side’s length.

How would these inverse trigonometric ratios affect our calculations in this triangle? Let’s find out using the 38 degree angle as an example:

RATIO	NUMERIC FORM
sine	7/12.2~0.57
arcsine	12.2/7~1.74
cosine	10/12.2~0.82
arccosine	12.2/10=1.22
tangent	7/10=0.7
arctangent	10/7~1.43

As you can see here, the three regular trigonometric ratios yielded values less than 1 while the three inverse trigonometric ratios yiedled values greater than 1. Interesting, isn’t it?

And now, let’s explore how to work with more advanced trigonometry in R!

Advanced Trigonometry, R style

Now that we’ve seen how to use the basic trigonometric ratios in R, let’s see how we can utilize these more advanced trigonometric ratios!

> asin(31)
[1] NaN
Warning message:
In asin(31) : NaNs produced
> acos(54)
[1] NaN
Warning message:
In acos(54) : NaNs produced
> atan(14)
[1] 1.499489

As I did when first testing out the sine, cosine, and tangent functions in R, I tested the three inverse trigonometric functions (arcsine, arccosine and arctangent) in R using whole numbers as parameters. However, you can see that for the arcsine and arccosine functions (asin() and acos() respectively), that didn’t quite work out. Interestingly, using a whole number for the atan() function worked just fine.

How can that be? Well, just like the regular trigonometric functions in R, these inverse trigonometric functions calculate the ratios using radians (more on those here: R Lesson 32: Basic Trigonometry, R Style). However, the asin() and acos() functions only take input values ranging from -1 to 1 because they only work with a limited range of angles. Arcsine only works with angles ranging from -π/2 to π/2 radians (-90 to 90 degrees) while arccosine only works with angles ranging from 0 to π radians (or 0 to 180 degrees). Arctangent, on the other hand, can take a wider range of numerical inputs since it works with angles of any length (in fact, the angle lengths arctangent works with encompass negative infinity to positive infinity).

Let’s try executing our inverse trigonometric functions in R with the new inverse trigonometric ratio information that we learned!

> asin(0.5)
[1] 0.5235988
> acos(0.43)
[1] 1.126304
> atan(22)
[1] 1.525373

As you can see, with the rules we discussed above, we’re now able to obtain valid outputs for the asin(), acos() and atan() functions!

Thanks for reading,

Michael

R Lesson 32: Basic Trigonometry, R Style

Advertisements

Hello everybody,

So far this year, we’ve explored the basics on making our own Python game with the pygame module-pretty cool right! After all, it was the first time this blog delved into game development.

However, building a good game takes time; this includes the BINGO game we’ve been developing. With that said, I’ll need to spend a little more time fine-tuning the BINGO game and planning out the game development series going forward. I’d hate to go too long without keeping fresh content on this blog, so with that in mind, in today’s post, we’ll explore something a little different-R trigonometry (you may recall I did a number of R mathematics posts last year)

Last year, we explored some basic R calculus-this year, we’ll explore basic R trigonometry. For those who don’t know what I’m talking about, trigonometry is a branch of mathematics that deals with the study of triangles and their angles.

Trigonometry basics

Before we get into the fun of exploring trig with R, let’s explore some basic trigonometry terms using this handy-dandy illustration:

Here we have a simple right triangle with a 90 degree angle, a 52 degree angle, and a 38 degree angle along with two sides of lengths 10 and 7 and a hyptoensue of length 12.2 (remember the Pythagorean theorem for right triangles-to find the length of the hypotenuse, a squared + b squared = c squared).

If you’ve ever taken at least precalculus with basic trigonometry, you’ll certainly recognize the mnemonic on the upper-right hand corner of the screen-SOHCAHTOA. If you’re not familiar with this mnemonic, here’s what it means:

SOH: To find the sine of an angle in the triangle, take the ratio of the length of the side opposite the angle to the length of the hypotenuse
CAH: To find the cosine of an angle in the triangle, take the ratio of the length of the side adjacent to the hypotenuse to the length of the hypotenuse
TOA: To find the tangent of an angle in the triangle, take the ratio of the length of the side opposite to the hypotenuse to the length of the side adjacent to the hypotenuse

Trigonometry, R style

Now that we’ve discussed the basics of trigonometry, let’s discuss trigonometry, R style. Here are some examples of the three basic trigonometric functions executed in R:

> cos(20)
[1] 0.4080821
> tan(20)
[1] 2.237161
> sin(20)
[1] 0.9129453

In this example, I used R’s built-in cos(), tan() and sin() functions to calculate the cosine, tangent, and sine of a given angle, respectively. Since these functions are built-in to R, there’s no need to install any extra packages to utilize these functions

You may be wondering if these functions can calculate the sine/cosine/tangent of an angle given specific side lengths (as shown in the illustration above). The answer to that is no, but then again, you can easily calculate these trigonometric ratios using simple division. For instance, in the illustration above, the sine of the 52 degree angle in the triangle is ~0.82 (rounded to two decimal places) because the opposite/hypotenuse length ratio is 10/12.2.

So, how does R calculate trigonometric ratios of certain angles? They use a little something called radians, which I’ll explain more right here.

Radians

What are radians, exactly? Well, when measuring angles in shapes, there are two metrics we use-degrees (which I’m sure you’re familiar with) and radians.

Let’s take this illustration of a circle:

As you see, the length of this circle’s radius is 8-the arc of the circle that is formed also has a length of 8. Therefore, the angle that is formed at the circle’s center has a length of 1 radian.

Now, how do you convert radians to degrees. Easy-1 degree is equal to pi/180 radians.

Why are radians expressed as a ratio of pi? This is because, for one, the circumference of a circle is 2pi times the circle’s radius. For two, the length of a full circle is 360 degrees-or 2pi radians; similarly, the length of a half-circle is 180 degrees-or pi radians.

Now, let’s analyze how radians work in the context of our R examples (all of which used 20 degree angles).

I mentioned earlier that 1 degree equals pi/180 radians, so 20 degrees would be 20*(pi/180) radians, which when converted to simplest form, equals pi/9 radians, which is then used to calculate various trigonometric ratios for any given angle.

The degrees-to-radians formula is so versatile that in R, it can be used on any integer, positive or negative. Check out some of these examples:

> cos(-3)
[1] -0.9899925
> sin(355)
[1] -3.014435e-05
> tan(15000)
[1] -1.98891
> cos(412)
[1] -0.8998537
> sin(-1333)
[1] -0.8218865
> tan(10191)
[1] -0.338695

Yes, I can use R’s basic trigonometric functions on a wide variety of integers and obtain valid results from each integer (although let’s be real, where will you ever find a 15000 degree angle)?

Thanks for reading,

Michael

R Lesson 31: Logarithmic Graphs

Advertisements

Hello everybody,

Michael here, and today’s lesson will be a sort-of contiuation of my previous post on R logarithms (R Lesson 30: Logarithms). However, in this post, I’ll cover how to create logarithmic graphs in R.

Let’s begin!

Two types of log plots

There are two main types of logarithmic plots in R-logarithmic scale plots and log-log plots.

What do these log plots do, exactly? Well, in the case of the logarithmic scale plot, only one of the plot’s axes uses a logarithmic scale while the other maintains a linear scale while with the log-log plot both axes use logarithmic scales.

When would you use each type of plot? In the case of logarithmic scale plots, you’d use them for analyses such as exponential growth/decay or percentage changes over a period of time. As for the log-log plots, they’re better suited for analyses such as comparative analyses (which involve comparing datasets with different scales or units) and data where both the x-axis and y-axis have a wide range of values.

And now for the logarithmic scale plots!

Just as the header says, let’s create a logarithmic scale plot in R!

Before we begin, let’s be sure we have the necessary data that we’ll use for this analysis-

bitcoin-halving DownloaD

This dataset is simpler than most I’ve worked with on this blog, as it only contains 11 rows and two columns. Here’s what each column means:

Date-the date that the bitcoin mining reward will be halved
Reward-the amount of bitcoin you will recieve after a successful mining as of the halving date.

For those unfamiliar with basic Bitcoin mining, the gist of the process is that when you mine Bitcoin you get a reward. However, after every 3-4 years, the reward for mining bitcoin is halved. For instance, on the first ever day that you could mine bitcoin (January 1, 2009), you would be able to recieve 50 bitcoin (BTC) for a successful haul. In 2012, the reward was halved to 25 BTC for a successful haul. The next halving is scheduled to occur in Spring 2024, where the reward will be halved to 3.125 BTC. Guess bitcoin mining isn’t as profitable as it was nearly 15 years ago?

There will likely be no more Bitcoin left to mine by the year 2140, so between now and then, the Bitcoin mining reward will get progressively smaller (a perfect example of exponential decay). I did mention that the Bitcoin mining reward will be less than 1 BTC by the year 2032.

I mean, don’t take my word for it, but maybe the supply of mineable bitcoin won’t run out by 2140 and the reward instead would get progressively smaller until it’s well over 1/1,000,000,000th of 1 BTC. Just my theory though :-).

Now, enough about the Bitcoin mining-let’s see some logarithmic scale plots! Take a look at the code below.

First, we’ll read in our dataset:

bitcoin <- read.csv("C:/Users/mof39/OneDrive/Documents/bitcoin halving.csv", encoding="utf-8")

Next, we’ll create our logarithmic scale plot! Since there is only one axis that will use a log-scale (the y-axis in this case), let’s remember to specify that

plot(bitcoin$Date, bitcoin$Reward, log = "y", pch = 16, col = "blue", xlab = "Year", ylab = "BTC Reward")

Error in plot.window(...) : need finite 'xlim' values
In addition: Warning messages:
1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion
2: In min(x) : no non-missing arguments to min; returning Inf
3: In max(x) : no non-missing arguments to max; returning -Inf

If we tried to use the plot() function along with all the parameters specified here, we’d get this error-Error in plot.window(...) : need finite 'xlim' values. Why does this occur?

The simple reason we got the error is because we used a column with non-numeric values for the x-axis. How do we fix this? Let’s create a numeric vector of the years in the Reward column (with the exception of 2009, all of the years in the Reward column are leap years until 2080) and use that vector as the x-axis:

years <- c(2009, 2012, 2016, 2020, 2024, 2028, 2032, 2036, 2040, 2044, 2048, 2052, 2056, 2060, 2064, 2068, 2072, 2076, 2080)

plot(years, bitcoin$Reward, log = "y", pch = 16, col = "red", xlab = "Year", ylab = "BTC Reward", main = "BTC Rewards 2009-2080")

Voila! As you can see here, we have a nice log-scale plot showing the exponential decay in bitcoin mining rewards from 2009 to 2080.

How did create this nice-looking plot? Well, we set the value of the log parameter of the plot() function equal to y, as we are only creating a log-scale plot. If we wanted to create a log-log plot, we would se the value of the log parameter to xy, which would indicate that we would use a logarithmic scale for both the x and y axes.

As for the rest of the values of the parameters in the plot() function, keep them the same as you would for a normal, non-logarithmic R scatter plot (except of course adapting your x- and y-axes and title to fit the scatterplot).

Now, one thing I did want to address on this graph is the scale on the y-axis, which might look strange to you if you’re not familiar with log-scale plots. See, the plot() function’s log parameter in R uses base-10 logs by default, and in turn, the y-axis will use powers of 10 in the scale (the scientific notation makes the display a little neater). For instance, 1e+01 represents 10, 1e+00 represents 0, and so on. Don’t worry, all the data points in the dataset were plotted correctly here.

And now, let’s create a log-log plot

Now that we’ve created a log-scale plot, it’s time to explore how to create a log-log plot in R!

loglog <- data.frame(x=c(2, 4, 8, 16, 32, 64), y=c(3, 9, 27, 81, 243, 729))

plot(loglog$x, loglog$y, log = "xy", pch = 16, col = "red", xlab = "Power of 2", ylab = "Power of 3", main = "Sample Log-Log Plot")

In this example, I created the dataframe loglog and filled both axes with powers of 2 and 3 to provide a simple way to demonstrate the creation of log-log plots in R.

As for the plot() function, I made sure to set the value of the log parameter to xy since we’re creating a log-log plot and thus need both axes to use a logarithmic scale. Aside from that, remember to change the plot’s axes, labels, and titles as appropriate for your plot.

Now, you might’ve noticed somthing about this graph. In R, both log-log and log-scale plots utilize base-10 logs for creating the plot. However, you likely noticed that the scale for the log-scale plot displays its values using scientific notation and powers of 10. The scale (or should I say scales) for the log-log plot doesn’t use scientifc notation and powers of 10 to display its values. Rather, the log-log plot uses conventional scaling to display its values-in other words, the scale for the log-log plot bases its values of the range of values in both axes rather than a powers-of-10 system. Honestly, I think this makes sense since the plot is already using a logarithmic scale for both axes, which would make the whole powers-of-10 thing in the scale values redundant.

Of course, if you want to change the scale displays in either the log-log or log-scale plots, all you would need to do is utilize the axis() function in R after creating the plot. Doing so would allow you to customize your plot’s axis displays to your liking.

Thnaks for reading!

Michael

R Lesson 30: Logarithms

Advertisements

Hello everybody,

Michael here, and I hope you had a little fun with my two-part fifth anniversary post (particularly the puzzle). For the next several posts, we’ll be exploring more mathematics with R. Today we’ll discuss the wonderful world of R logarithms (and logarithms in general).

But first, a word on logarithms

As you may have noticed from my previous R mathematics posts, I always explain the concept in both the context of R and the context of manual calculation so you can have a basic understanding of the concept itself before exploring it in R. I plan to do the same thing here today.

So, what are logarithms exactly? They’re simply another type of mathematical operation. Take a look at this illustration below:

In this example, I used the simple example of 4^3=64. I also noted that the logarithmic form of this expression is log4(64)=3.

What does this mean? The base of the exponentional expression (4^3=64)-4-serves at the base of the logarithm. The number in parentheses-64-serves as the logarithm’s argument, which is the value used to find the logarithm’s exponent (or result), which is 3 in this case.

How do you read a logarithmic expression? In this example, you’d read the expression as log base-4 of 64 equals 3.

That’s how logarithms work. Now let’s explore how to manage them in R.

Logarithms, R style

So, how would you work with logarithms in R? Let’s take a look at the code below:

> log(32, base=2)
[1] 5

Calculating logs in R is as simple as using R’s built-in log() function and adding in two parameters-the argument and the base. With these two parameters and the log() function, R will return the logarithm’s exponent, which in this case is 5 since log base-2 of 32 equals 5.

However, what I discussed above was a very, very basic example of logarithms in R. Let’s look a more specific scenario:

> log10(1000)
[1] 3

In this example, I’m showing what is known as a base-10 logarithm, which is a logarithm with a base of, well, 10. In this case, I’m showing you what log base-10 of 1000 equals-in this case, the exponent/result is 3. Notice how this expression uses the log10() function rather than the log() function.

NOTE: To calculate this base-10 logarithm you can also use the code log(1000, base=10), but I just wanted to point out the log10() function as another way to solve a logarithm.

Another specific logarithmic scenario is binary, or base-2 logarithms. Let’s take a look at those:

> log2(8)
[1] 3

In this example, I’m showing a base-2 logarithm, which is a logarithm with a base of, well, 2. In this case, the base-2 log of 8 is 3, since 2^3=8.

NOTE: Just like the case with the base-10 logarithm, you can also use the code log(8, base=2) to calculate this base-2 algorithm.

Now let’s explore two rather unusual logarithmic scenarios-logarithms of imaginary numbers and logarithms with base e (e as in the mathematical constant). What does it all mean? Let me explain!

> log(11)
[1] 2.397895

> log(3+2i, base=4)
[1] 0.9251099+0.4241542i

The first example above shows the expression log base-e of 11 along with the result, 2.397895. The second example shows the expression log base-4 of 3+2i along with the result, 0.9251099+0.4241542i.

If you’re not familiar with the concept of imaginary numbers and the mathematical constant e, fear not, for I will explain both concepts right here!

A rational section about some irrational numbers

If you have even the most basic knowledge of numbers, you’ll be quite familiar with the numbers 0-9. After all, when you first learned basic counting, you very likely learned how to count to 10.

However, if you study more advanced math (like alegbra and calculus), you’ll notice there are a few numbers that aren’t so neat (and no, I’m not talking about decimals, fractions or negative numbers). These are known as irrational numbers, which are non-terminating, non-repeating numbers that can’t be expressed as simple fractions or decimals.

We just explored calculating logarithms with two types of irrational numbers-e and imaginary numbers. How do these types of numbers work?

In the case of e, e is a mathematical constant known as Euler’s number-named after Swiss mathematician Leonhard Euler. e is an irrational number that stretches on for infinity, but its approximate value is 2.71828. The number e is used in various exponential growth/decay functions, such as a city’s population growth/decline over a certain time period or a substance’s radioactive decay.

A log with a base e is also known as a natural logarithm (like the log(11) example I used earlier). In R, natural logarithms are denoted as log(number) with no base parameter specified. Here’s what natural logarithms look like:

In the case of the imaginary numbers, such as 3+2i, they provide an easy way to handle complex mathematical situations, such as the square roots of negative numbers (which will always be imaginary numbers). Imaginary numbers always consist of a real part multiplied by the imaginary unit i (such as 2i). The number 3+2i is known as a complex imaginary number since it has a real part (3) and an imaginary part (2i). One well known application of imaginary numbers is fractal geometry, which you can find quite a bit of in nature (like in conch shells).

Three more logarithmic scenarios that I think you should know

Before I go, I want to discuss three more logarithmic scenarios with you all, first with natural logs and then with logs that have a numerical base. Take a look at the code below:

> log(1)
[1] 0
> log(0)
[1] -Inf
> log(-12)
[1] NaN
Warning message:
In log(-12) : NaNs produced

In this example, I’m showing you three different natural logarithm scenarios that you should know-logs with arguments of 1, 0, and a negative number. Notice that a log with an argument of 1 yields 0, an log with an argument of 0 yields negative infinity and a log with a negative number yields an NaN (indicating that logs with negative arguments aren’t valid). Why might this be the case?

In the case of log(1), you get 0 because raising a number to the power of 0 always yields 1.
In the case of log(0), you get -Inf (negative infinity) because there is no possible power you can raise e to obtain 0.
In the case of log(-12), you get NaN (not a number) because logs only work with positive number arguments.

Now here are three scenarios with the same arguments, but using a base of 2 instead of e:

> log(1, base=2)
[1] 0
> log(0, base=2)
[1] -Inf
> log(-12, base=2)
[1] NaN
Warning message:
NaNs produced

Notice how you get the same results here as you did for the natural logarithms.

Thanks for reading,

Michael

R Lesson 29: An Integral Part of R Calculus

Advertisements

Hello everybody,

Michael here, and in today’s post, I’ll be discussing an integral part of R calculus-integrals (see what I did there?).

The integral facts about integrals

What are integrals, exactly? Well, we did spend the last two posts discussing derivatives, which are metrics used to measure the rate at which one quantity changes with respect to another quantity (i.e. like the change in Rotten Tomatoes critic scores from one MCU movie to the next as we discussed in this post-R Lesson 27: Introductory R Calculus). Integrals, on the other hand, measure rates of accumulation of a certain quantity over time.

Simple enough, right? Well, is it possible to think of integrals as reverse derivatives? Yes it is! I did mention that derivatives measure change of a given quantity from point A to point B while integrals measure the accumulation of a quantity over a given time period (which could be from point A all the way to point XFD1048576). In the context of mathematical functions, derivatives break up a function into smaller pieces to measure the rate of change at each point while integrals put the pieces of the function back together to measure the rate of change across the entire duration of the function (which obviously can be infinity)

Calculating integrals, the manual way, part 1

Before we dive into R calculations of integrals, let’s first see how to calculate integrals, the manual way, by hand.

Let’s take this polynomial as an example:

Now, how would we calculate the integral of this polynomial. Take a look at the illustration below:

Just so you know, the polynomial in green is the integral of the original polynomial. With that said, how did we get the polynomial 3/4x^4-2/3x^3+2x^2-7x+C as the integral of the original polynomial?

First of all, just as we did with derivatives, we would need to calculate the integral of each term in the polynomial one-by-one. To do so, you’d need to add one to the exponent of each term, then divide that term by the new exponent. Still confused? Let me explain it another way:

The integral for 3x^3 would be 3/4x^4 since 3+1=4 and 3 divided by 4 is, well, 3/4.
The integral for -2x^2 would be -2/3x^3 since 2+1=3 and 2 divided by 3 is, well, 2/3.
The integral for 4x would be 2x^2 since 1+1=2 and 4 divided by 2 is 2.
The integral for -7 would be -7x since constants have a power of 0 and 0+1=1 (and anything divided by 1 equals itself)
You will likely have noticed an additional value in the integral that you may not be aware of-the constant C. I’ll explain more about that right now.

So, you may be wondering what’s up with the C at the end of the integral equation. Remember how earlier in this post I mentioned that you can think of derivatives as reverse integrals. You may recall that in our previous lesson-R Lesson 28: Another Way To Work With Derivatives in R-I discussed that during the process of calculating a derivative for a polynomial, the derivative of any constant in that polynomial is 0. This means that when finding the derivative of any polynomial, the constant disappears.

Now, since integrals can be considered as reverse derivatives, we should remember that when we integrate a polynomial, there was likely a constant that disappeared during differentiation (which I forgot to mention is the name of the process used to find the derivative of a polynomial). The C at the end of an integral represents the infinite number of possible constants that could be used for a given integral.

An integral illustration to this lesson

For my more visual learners, here is an illustration of how integrals work:

Just as a derivative would measure the change from one point to another in this curve, the integral would measure the area under the curve. The area in yellow represents the negative integral while the area in red represents the positive integral.

Still confused? Don’t worry-we’ll definitely go more in depth in this lesson!

Calculating integrals, the R way, part 2

OK, now that we’ve discussed how to calculate integrals the manual way, let’s explore how to calculate integrals the R way. You’ll notice that R won’t just spit out the integral of a given polynomial but rather calculate the integral using an upper and lower limit. Don’t worry-I’ll explain this more later, but for now, let’s see how the magic is done:

integral <- function(x) { 3*x^3-2*x^2+4*x-7 }
result <- integrate(integral, lower=1, upper=2)
result
5.583333 with absolute error < 6.6e-14

In this example, I’m showing you how R does definite integration. What is definite integration? Let me explain it like this.

In the previous section of this post, all we were trying to do was to calculate the integral polynomial of a certain expression. This is known as indefinite integration since we were simply trying to find the integration function of a given polynomial with the arbitrary constant C. As I mentioned in the previous section, the constant C could represent nearly anything, which means there are infinite possible integrals for any given polynomial.

However, with definite integration (like I did above), you’ll be calculating the integral at an upper and lower limit-this is certainly helpful if you’re looking for the integral over a specific range in the polynomial function rather than just a general integral, which can stretch for infinity. In R, to calculate the integral of a function over a given range, specify values for the lower and upper parameters (in this case I used 1 and 2). As you can see from the result I obtained, I got ~5.58 with an absolute error of 6.6e-14, which indicates a very, very, very small margin of error for the integral calculation. In other words, R does a great job with definite integration.

Keep in mind that the integration calculation approach I discussed above will only work with a finite range of integration (e.g. lower=1, upper=2). It won’t work with an infinite range of integration (e.g. from negative infinity to positive infinity).

Plotting an integration function

Now that we know how to calculate integrals, the next thing we’ll explore is plotting the integration function. Here’s how we’d do so-using the polynomial from the first section and an integration range of (0,5):

integral <- function(x) { 3*x^3-2*x^2+4*x-7 }
integrated <- function(x) { integrate(integral, lower=0, upper=50)$value }
vectorIntegral <- Vectorize(integrated)
x <- seq(0, 50, 1)
plot(x,vectorIntegral(x), xlim=c(0,50), xlab="X-values", ylab="Y-values", main="Definite Integration Example", col="blue", pch=16)

So, how did I manage to create this plot? Let me give you a step-by-step explanation:

I first set the function I wish to integrate as the value of the integral value.
I then retrieved the integral of this function at the range (0,5). I also grabbed the value of the integral at this range and nested this result into its own function, which I then stored as the value of the integrated variable.
I then vectorized the value of the integral at the range (0,5) and stored that value into the vectorIntegral variable.
I then created an x-axis sequence to use in my plot that contained the parameters (0, 50, 1) which represent the lower limit, upper limit, and x-axis increment for my plot, respectively. This sequence is stored in the x variable.
Last but not least, I used the plot() function to plot the integral of the polynomial 3x^3-2x^2+4x-7. One thing you may be wondering about is the x, vectorIntegral(x) parameter in the function. The x parameter gathers all the x values for the plot (in this case the integers 0 to 5) while the vectorIntegral(x) parameter calculates all of the correpsonding y-values for each possible x-value and gathers them into a vector, or array, for the plot.
- Why choose vectorization to calculate the corresponding y-values? Well, it’s easier than looping through each possible x-value in the integral range to get the correpsonding y-values, since vectorization simply takes in all possible x-values (0-50 in this case) as the input array and returns an output array containing all possible y-values for each possible x-value (which in this case all seem to be between 4,000,000 and 5,000,000).

Calculating integrals, the manual way, part 3

So, now that I’ve shown you how to do definite integration the R way, let me show you how to do so the manual way. Let’s examine this illustation:

So in this illustration, I’m trying to calculate the integral for the polynomial 3x^3-2x^2+4x-7 using the (7,10) range. How do I do so. Well, first I perform some indefinite integration by finding the integral of the given polynomial-only thing here is that I don’t need the constant C. Next, since my integration range is (7,10), I evaluate the integral function for x=10 and subtract that result from the result I get after evaluating the integral function for x=7. After all my calculations are complete, I get 5342.25 as the value of my integral (rounded to two decimal places) at the integration range of (7,10).

If you’re wondering what that weird-looking S means, that’s just a standard integral writing notation.
To calculate the integral of any given expression for a given range, always remember to first find the integral of the polynomial and then evaluate that integral for x=both the upper and lower limits. Subtract the result of the upper limit evaluation from the result of the lower limit evaluation. And remember that, as we saw in our R integral calculations, there will always be a very very very small margin of error.
In calculus function notation, the capital F represents the f(x) of the integral while the lowercase f represents the f(x) of that integral’s derivative.

Thanks for reading!

Michael

R Lesson 28: Another Way To Work With Derivatives in R

Advertisements

Hello everybody,

Michael here, and in today’s lesson, I’ll show you another cool way to work with derivatives in R.

In the previous post-R Lesson 27: Introductory R Calculus-I discussed how to work with derivatives in R. However, the derivatives method I discussed in that post doesn’t cover the built-in derivatives method R has to calculate derivatives…rather, methods R uses to calculate derivatives (there are two ways of approaching this). What might that method look like? Well, lets dive in!

Calculating derivatives, the built-in R way, part 1

Now, how can we calculate derivatives with the built-in R way? First, we’ll explore the deriv() function. Let’s take a look at this code below, which contains a simple equation for a parabolic curve:

curve <- expression(3*x^2+4*x+5)
print(deriv(curve, "x"))

expression({
    .value <- 3 * x^2 + 4 * x + 5
    .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
    .grad[, "x"] <- 3 * (2 * x) + 4
    attr(.value, "gradient") <- .grad
    .value
})

In this example, I’m using the polynomial 3x^2+4x+5 to represent our hypothetical parabolic curve. To find the derivative function of this polynomial, I ran R’s built in deriv() function and passed in both the curve expression and "x" (yes, in double quotes) to find the derivative expression-as the derivative always relates to x (or whatever character you used to represent an unknown quantity). Now, as you see from the output given, things don’t look too understandable. However, pay attention to the second .grad line (in this case, 3 * (2 * x) + 4), as this will provide the derivative function of the parabolic curve equation-6x+4. We would use this equation to evaluate the rate of change between points in the parabolic curve.

Let’s say we wanted to evaluate the derivative polynomial-6x+4-at x=5. If we use this x value, then the derivative of the parabola at x=5 would be 34.

If you thought the output I’m referring to read as 3(2x)+4, you’d be wrong. Remember to multiply the 3 by the 2x to get the correct answer of 6x (6x+4 to be exact).
When you’re writing a polynomial in R, remember to include all the multiplication signs (*)-yea, I know it’s super annoying.

In case you wanted a visual representation of this parabola, here it is:

I created this illustation of a parabola using a free online tool called DESMOS, which allows you to quickly create a visual representation of a line, parabola, or other curve. Here’s the link to DESMOS-https://www.desmos.com/calculator/dz0kvw0qjg.

Calculating derivatives, the built-in R way, part 2

Now let’s explore R’s other built-in way to calculate derivatives-this time using the D() function. For this example, let’s use the same parabolic curve equation we used for the deriv() example:

curve <- expression(3*x^2+4*x+5)
print(D(curve, "x"))

3 * (2 * x) + 4

As you can see, we passed in the same parameters for the D() function that we used for the deriv() function and got a much simpler version of the output we got for the deriv() function-simpler as in the derivative expression itself was the only thing that was returned.

Since the derivative expression returned from the D() function is the same as the expression returned from the deriv() function,

Calculating derivatives, the manual way, part 3

Yes, I know I was mostly going to focus on the two built-in R functions used to calculate derivatives-deriv() and D()-but I thought I’d include this bonus section for those of you (especially those who enjoy exploring calculus) who were wondering how to manually calculate derivatives.

Let’s take the same polynomial we were working with for the previous two examples:

How would we arrive at the derivative expression of 6x+4? Check out this illustration below:

From this picture, here are some things to keep in mind when calculating derivatives of polynomials (and it’s not the same as calculating derivatives of regular numbers like we did in the previous post R Lesson 27: Introductory R Calculus):

Go term-by-term when calculating derivatives of polynomials. In this example, you’d calculate the derivative of 3x^2, then the derivative of 4x and lastly the derivative of 5.
How would you calculate the derivatives of each term? Here’s how (and it’s quite easy).
The derivative of 3x^2 would be 6x, as you would multiply the number (3) by the power (2) to get 6. You would then reduce power of the x^2 by 1 to get x (any variable in a polynomial without an exponent is raised to the power of 1). Thus, the derivative of 3x^2 would be 6x.
The derivative of 4x would simply be 4. Just as we did with 3x^2, we’d multiply the number (4) by the power (1) in this case to get 4. We would also reduce the power of x by 1 to simply get 4, since x has a power of 1 and 1-1=0. In a polynomial, constants (numbers without variables next to them) have a power of 0.
I mean, we could’ve written the derivative polynomial as 6x+4x^0, but 6x+4 looks a lot nicer.
As for the constant in this polynomial-5-it has a derivative of 0 since the derivative of a constant is always 0 (after all, any constant in a polynomial has a power of 0, so this makes perfect sense). Thus, the derivative of 5 isn’t included in the derivative polynomial of 6x+4.

Thanks for reading,

Michael

R Lesson 27: Introductory R Calculus

Advertisements

Hello everybody,

Michael here, and in today’s post, I’m going to revisit an old friend of ours-the language R. As you readers may recall, R was the first language I covered on this blog, and since we’re only a few posts away from the blog’s fifth anniversary, I thought it would be fun to revisit this blog’s roots as an analytics blog (remember the Michael’s Analytics Blog days everyone).

Today’s post will provide a basic introduction on doing calculus with R (including graphing). Why am I doing R calculus? Well, I wanted to do some more fun R posts leading up to the blog’s fifth anniversary and I did have fun writing the trigonometry portion of my previous post-Python Lesson 41: Word2Vec (NLP pt.7/AI pt.7)-that I wanted to dive into more mathematical programming topics. With that said, let’s get started with some R calculus!

Setting ourselves up

In this lesson, we’ll be using this dataset-

MCU-movies Download

This dataset contains the Rotten Tomatoes scores for all MCU (Marvel Cinematic Universe) movies from Iron Man (2008) to Guardians of the Galaxy Vol. 3 (2023). Both critic and audience Rotten Tomatoes scores are included for all MCU movies.

Now, let’s open up our R IDE and read in this CSV file:

MCU <- read.csv("C:/Users/mof39/OneDrive/Documents/MCU movies.csv", fileEncoding="UTF-8-BON")
> MCU
                                         Movie Year RT.score Audience.score
1                                     Iron Man 2008     0.94           0.91
2                              Incredible Hulk 2008     0.67           0.69
3                                   Iron Man 2 2010     0.71           0.71
4                                         Thor 2011     0.77           0.76
5           Captain America: The First Avenger 2011     0.80           0.75
6                                 The Avengers 2012     0.91           0.91
7                                   Iron Man 3 2013     0.79           0.78
8                          Thor The Dark World 2013     0.66           0.75
9          Captain America: The Winter Soldier 2014     0.90           0.92
10                     Guradians of the Galaxy 2014     0.92           0.92
11                     Avengers: Age of Ultron 2015     0.76           0.82
12                                     Ant-Man 2015     0.83           0.85
13                  Captain America: Civil War 2016     0.90           0.89
14                              Doctor Strange 2016     0.89           0.86
15               Guardians of the Galaxy Vol 2 2017     0.85           0.87
16                      Spider-Man: Homecoming 2017     0.92           0.87
17                              Thor: Ragnarok 2017     0.93           0.87
18                               Black Panther 2018     0.96           0.79
19                      Avengers: Infinity War 2018     0.85           0.92
20                        Ant-Man and the Wasp 2018     0.87           0.80
21                              Captain Marvel 2019     0.79           0.45
22                           Avengers: Endgame 2019     0.94           0.90
23                   Spider-Man: Far From Home 2019     0.90           0.95
24                                 Black Widow 2021     0.79           0.91
25   Shang-Chi and the Legend of the Ten Rings 2021     0.91           0.98
26                                    Eternals 2021     0.47           0.77
27                     Spider-Man: No Way Home 2021     0.93           0.98
28 Doctor Strange in the Multiverse of Madness 2022     0.74           0.85
29                      Thor: Love and Thunder 2022     0.63           0.77
30              Black Panther: Wakanda Forever 2022     0.84           0.94
31           Ant-Man and the Wasp: Quantumania 2023     0.47           0.83
32               Guardians of the Galaxy Vol 3 2023     0.81           0.95

As you can see, we have read the data-frame into R and displayed it on the IDE (there are only 31 rows here).

Now, before we dive into the calculus of everything, let’s explore our dataset:

Movie-the name of the movie
Year-the movie’s release year
RT.score-the movie’s Rotten Tomatoes score
Audience.score-the movie’s audience score on Rotten Tomatoes
R tip-when you are reading in a CSV file into R, it might help to add the fileEncoding="UTF-8-BON" parameter into the read.csv() function as this parameter will remove the junk text that appears in the name of the dataframe’s first column.

Calculus 101

Now, before we dive headfirst into the fun calculus stuff with R, let’s first discuss calculus and derivatives, which is the topic of this post.

What is calculus? Simply put, calculus is a branch of mathematics that deals with the study of change. Calculus is a great way to measure how things change over time, like MCU movies’ Rotten Tomatoes scores over the course of its 15-year, 32-movie run.

There are two main types of calculus-differential and integral calculus. Differential calculus focuses on finding the rate of change of, well, any given thing over a period of time. Integral calculus, on the other hand, focuses on the accumulation of any given thing over a certain period of time.

A good example of differential calculus would be modelling changes in a city’s population over a certain period of time; differential calculus would be used in this scenario to find the city’s population change rate over time. A good example of integral calculus would be modelling the spread of a disease over time (e.g. COVID-19) in a certain geographic region to analyze that region’s infection rate over a certain time period.

Now, what is a derivative? In calculus, the derivative is the metric used to measure the rate of change at any given point in the measured example. In this example, the derivative (or rather derivatives since we’ll be using two derivatives) would be the change in Rotten Tomatoes scores (both critic and audience) from one MCU movie to the next.

It’s R calculus time!

Now that I’ve explained the gist of calculus and derivatives to you all, it’s time to implement them into R! Here’s how to do so (and yes, we will be finding the derivatives of both critic and audience scores). First, let’s start with the critic scores derivatives:

criticScores <- MCU$RT.score
criticDerivatives <- diff(criticScores)
criticDerivatives

[1] -0.27  0.04  0.06  0.03  0.11 -0.12 -0.13  0.24  0.02 -0.16  0.07  0.07 -0.01 -0.04  0.07  0.01  0.03 -0.11  0.02 -0.08  0.15 -0.04 -0.11  0.12 -0.44  0.46 -0.19 -0.11  0.21 -0.37
[31]  0.34

To calculate the derivatives for each critic score, I first placed all of the critics’ scores (stored in the column MCU$RT.score) into the vector criticScores. I then used R’s built-in diff() function to calculate the difference in critic scores from one MCU movie to the next and-voila!-I have my 31 derivatives.

Even though there are 32 MCU movies, there are only 31 differences to calculate and thus only 31 derivatives that appear.

Calculating the derivatives of the audience scores works exactly the same way, except you’ll just need to pull your data from the MCU$Audience.score column:

audienceScores <- MCU$Audience.score
audienceDerivatives <- diff(audienceScores)
audienceDerivatives
 [1] -0.22  0.02  0.05 -0.01  0.16 -0.13 -0.03  0.17  0.00 -0.10  0.03  0.04 -0.03  0.01  0.00  0.00 -0.08  0.13 -0.12 -0.35  0.45  0.05 -0.04  0.07 -0.21  0.21 -0.13 -0.08  0.17 -0.11
[31]  0.12

Plotting our results

Now that we’ve calculuated the derivatives of both the critic and audience scores, let’s plot them!

Here’s how we’d plot the critic scores:

plot(1:(length(criticScores)-1),criticDerivatives, type = "l", xlab = "MCU Movie Number", ylab = "Change in critic score")

In this example, I used R’s plot() function (which doesn’t require installation of the ggplot2 package) to plot the derivatives of the critic scores. The y-axis represents the change in critic scores, while the x-axis represents the index for a specific MCU movie (e.g. 0 would be Incredible Hulk while 31 would be Guardians of the Galaxy Vol.3).

However, this visual doesn’t seem to helpful. Let’s see how we can fix it!

First, let’s create a vector of the MCU movies to use as labels for this plot:

movies <- MCU$Movie

Next, let’s remove Iron Man from this vector since it won’t have a derivative (after all, it’s the first MCU movie).

movies <- movies[! movies %in% c('Iron Man')]

Great! Now let’s revise our plot to first add a title:

plot(1:(length(criticScores)-1),criticDerivatives, type = "l", main="Changes in MCU movie critic reception", xlab = "MCU Movie Number", ylab = "Change in critic score")

You can see that the plot() function’s main paramater allows you to add a title to the graph.

Next let’s add some labels to our data points-remember to only run this command AFTER you have the initial graph open!

text(1:(length(criticScores)-1),criticDerivatives, labels=movies, pos=3, cex=0.6)

Voila! With the text() function, we’re able to add labels to our data points so that we can tell which movie corresponds with which data point!

Remember to include the same X and Y axes in the text() function as you did in the plot() function! In this case, the X axis would be 1:(length(criticScores)-1) and the Y axis would be criticDerivatives.

Now that we have a title and labelled data points in our graph, let’s gather some insights. From our graph, we can see that the critical reception for the MCU’s Phases 1 & 2 was up-and-down (these include movies from Iron Man to Ant-Man). The critical reception for MCU’s Phase 3 slate (from Captain America: Civil War to Spider-Man: Far From Home) was its most solid to date, as there are no major positive or negative derivatives in either direction. The most interesting area of the graph is Phases 4 & 5 (from Black Widow onwards), as this era of the MCU has seen some sharp jumps in critical reception from movie to movie. Some of the sharpest changes can be seen from Shang-Chi and the Legend of the Ten Rings to Eternals (a 44% drop in critic score) and from Eternals to Spider-Man: No Way Home (a 46% rise in critic score).

All in all, some insights we can gain from this graph is that MCU Phase 3 was its most critically well-recieved (and as some fans would say, the MCU’s prime) while the entries in Phase 4 & 5 have been hit-or-miss critically (ahem, Eternals).

Now that we’ve analyzed critic derivatives, let’s turn our attention to analyzing audience score derivatives. Here’s the plot we’ll use-and it’s pretty much the same code we used to create the updated critic score derivative plot (except replace the word critic with the word audience in each axis variable and in the title):

plot(1:(length(audienceScores)-1),audienceDerivatives, type = "l", main="Changes in MCU movie audience reception", xlab = "MCU Movie Number", ylab = "Change in audience score")

text(1:(length(audienceScores)-1),audienceDerivatives, labels=movies, pos=3, cex=0.6)

The change in audience reception throughout the MCU’s 15-year, 32-movie run looks a little different than the change in critic reception over that same time period. For one, there are fewer sharp changes in audience score from movie to movie. Also interesting is the greater number of positive derivatives in audience score for the MCU’s Phase 4 & 5 movies-after all, there were far more negative derivatives than positive for the MCU’s Phase 4 & 5 critical reception (this is also interesting because many fans on MCU social media accounts that I follow have griped about the MCU’s quality post-Avengers Endgame). One more interesting insight is that the sharpest changes in audience reception came during the peak of Phase 3 (namely from Black Panther to Avengers: Endgame). As you can see from the graph above, the change in audience reception is fairly high from Black Panther to Avengers: Infinity War then drops from Avengers: Infinity War to Ant-Man and the Wasp. The audience score drops even further from Ant-Man and the Wasp to Captain Marvel before sharply rising from Captain Marvel to Avengers: Endgame. I personally found this insight interesting as some of my favorite MCU movies come from Phase 3 (like Black Panther with its 96% on Rotten Tomatoes-critic score), though I do recall Captain Marvel wasn’t well liked when it came out in March 2019 (but boy oh boy was Avengers: Endgame one of the most hyped things of 2019).

Thanks for reading,

Michael

R Lesson 26: Creating PowerPoint Presentations in R

Advertisements

Hello everybody,

Michael here, and today’s post will cover how to create a PowerPoint presentations in R.

You’ll recall that in the previous post I discussed how you could create a Word document in R. To create a PowerPoint presentation in R, you’d also use the officer package-however, no need to use the dplyr package here.

After installing the officer package, let’s create a PowerPoint presentation using the read_pptx() function (this is the same idea as using the read_docx() function to create a new Word document):

slideshow <- read_pptx()

Also, just as you would with the read_docx() function, remember to save the output of the read_pptx() function as a variable (I used slideshow as the variable here).

Now, how would you add a slide to the presentation? More specifically, let’s say we wanted to add a title slide to the presentation. Here’s how to do so:

slideshow <- add_slide(slideshow, layout="Title Slide", master="Office Theme")

To add a title slide to the presentation, use the add_slide() function and pass in these three parameters: the name of the PowerPoint object (slideshow in this case), the layout you’d like to use for the title slide (I used Title Slide but Title Only would work here too), and Office Theme for the master parameter.

The master parameter lets you decide the theme you’d like to use for your PowerPoint, but the only theme that will work here is Office Theme. This could be because the officer package probably isn’t built to recognize all of PowerPoint’s different themes-that and different versions of PowerPoint come with distinct theme collections (though you’ll find the Office Theme across all versions of PowerPoint).

Now, here’s how you would add text to the title slide:

slideshow <- ph_with(slideshow, value = "Making a PowerPoint presentation with R", location = ph_location_type(type = "ctrTitle"))

To add text to the title slide, use the ph_with() function and pass in three parameters-the PowerPoint object you want to add the text to, the text you want to add to the presentation as the value parameter, and the location of the text as the location parameter.

When setting the location of the text, you’ll need to use the ph_location_type() function and pass in a single parameter-the location of the text in the slideshow as the type parameter. Now, you may be wondering why I didn’t pass in title as the location type. The reason for this is because title wasn’t going to work with the title slide layout I chose-Title Slide. Had I used a Title Only layout, I could’ve passed in title as the location type. Here’s a visual explanation as to why I chose ctrTitle as the location type:

Structure of the Title Only slide layout:

Structure of the Title Slide layout:

The ph_location_type function has several location types you can choose from, which include:

ctrTitle
subTitle
dt
ftr
sldNum
title
body
pic
chart
tbl
dgm
media
clipArt

For now, I’m just going to explain title, ctrTitle and subTitle. The Title Only slide layout only has 1 location type-title-therefore title works as the location type here. However, you’ll see that the Title Slide layout has two location types-ctrTitle and subTitle; ctrTitle is the main title and subTitle is, well, the subtitle. Therefore, for the main title in the Title Slide layout, ctrTitle is the appropriate location type.

Since there’s a subtitle in the Title Slide, we’ll need to run the ph_with() function (along with the nested ph_location_type() function) again. The only differences are the text we’ll use along with the location type needed (subTitle rather than ctrTitle). Here’s the syntax needed:

slideshow <- ph_with(slideshow, value = "Created by Michael the Blogger", location = ph_location_type(type = "subTitle"))

Nice! Now let’s add another slide, this time with a Title and Content layout:

slideshow <- add_slide(slideshow, layout = "Title and Content", master = "Office Theme")

And with this slide, let’s add some text. Here’s the title text:

slideshow <- ph_with(slideshow, value = "About Me", location = ph_location_type(type = "title"))

Now, why would I use title as the location type here? Take a look at the structure of the Title and Content layout:

The Title and Content layout has two parts-the title and the body.

Now, let’s add some content to this slide:

slideshow <- ph_with(slideshow, value = "My name is Michael and I launched this blog on June 13, 2018", location = ph_location_type(type = "body"))

Great! Now let’s add another Title and Content slide-but this time, let’s have a bullet-point list in the body. But first, let’s add the title:

slideshow <- add_slide(slideshow, layout = "Title and Content", master = "Office Theme")

slideshow <- ph_with(slideshow, value = "Some facts about this blog", location = ph_location_type(type = "title"))

Now, let’s add the bullet-point list to the body:

slideshow <- ph_with(slideshow, value = c("This is the 103rd entry of this blog", "This blog covers 5 different programming tools", "I created an entry on Git & GitHub for my 100th post"), location = ph_location_type(type = "body"))

To add a bullet-point list to a slide, pass in a vector c() containing all the elements you want to include in your bullet-point list; separate each element with a comma and enclose each element in double quotes.

Next, let’s say we wanted to a third Title and Content slide but this time, we’ll add an image as part of the content. Also, let’s change the font of the title:

slideshow <- add_slide(slideshow, layout = "Title and Content", master = "Office Theme")

First off, here’s how to modify the font of the title. We’d start by creating a text object like this:

fontObject <- fp_text(font.size = 20, color = "blue", font.family = "Century Gothic")

In this case, the purpose of the fontObject is to set the font formatting (which includes the coloring, size, and font family to use).

To see a list of font families that you can use, go to PowerPoint and click on the circled dropdown to see a list of available font families (NOTE: different PowerPoint versions will likely have different font families available for your use):

Now that we’ve got a font object set, let’s add a new slide that utilizes the fontObject (let’s also use the Title and Content layout):

slideshow <- add_slide(slideshow, layout = "Title and Content", master = "Office Theme")

Now, before we add text to the slide, we’d need to store that text in a new object; this is because storing the text in a new object will allow us to add the text to the slide with the fontObject properties we set earlier. Here’s how to create the text object:

textObject <- fpar(ftext("Here is what I look like (from December 2020)", fontObject))

To create a new text object, use the fpar function; pass in the ftext function and the fontObject as the parameters. Inside the nested ftext function, pass in the text you’d like to add as well as any optional styling parameter (e.g. bold).

If you wanted to add two different lines of text, you’d need to use multiple fpar functions, each with a nested ftext function. You’d also need to wrap each fpar–ftext function inside a block_list function, which allows you to write multiple lines of text on the title (or content) of a slide.

Great! Now that we have the text object set, let’s add it to the title of the slide:

slideshow <- ph_with(slideshow, value = textObject, location = ph_location_type(type = "title"))

To add the content of a text object to a slide, use the ph_with() function and set the value to the name of your text object (in this case, I kept it simple and just used textObject)

Now, let’s add an image to the body of this slide! Here’s how to do so:

slideshow <- ph_with(slideshow, external_img("C:/Users/mof39/OneDrive/Pictures/ball.png", width=6, height=6), location=ph_location_type(type="body"))

To add an image to the slide, you’d first need to use the ph_with() function (much like you’d do when adding text to a slide). Inside the ph_with() function, you’d need to include three parameters-the slideshow that you’ll be adding the content to, the external_img() function with three parameters of its own, and a location parameter using the ph_location_type() function that indicates where in the presentation you would like to add your content.

Inside the external_img() function, you’ll need to include three parameters-the location on your computer where the image you’d like to add is stored (and yes, you’ll need to include the full path), the height and the width of the image (both in pixels).

Last but not least, let’s save this slideshow to your computer. Here’s how to do so:

print(slideshow, "C:/Users/mof39/OneDrive/Documents/Blog Slideshow.pptx")

To save your slideshow to your computer, use the print function and pass in two parameters-the slideshow you created and the location on your computer where you want to save the slideshow (remember to save it with the PPTX extension).

Now, let’s open up the slideshow and see how R’s officer package worked it’s magic:

Upon opening the slideshow, you can see that the officer package worked its magic here! All four slides were successfully created.

Take a look at slide 4:

The image was successfully retrieved from my computer and displayed on this slide. The title text also successfully adopted the font style I had specified in fontObject (size 20 blue Century Gothic).

Thanks for reading,

Michael