Hello everybody,
It’s Michael, and today I’ll be discussing machine learning-supervised and unsupervised-in R. I won’t be doing any coding or analytics here, as I am using this post to give you guys background on the topics I will be discussing in the next few posts, which will contain some interesting analyses.
Anyway, what is machine learning? It’s basically an automated way to create analytical models. See, the whole idea of machine learning is that R (or any programming tool for that matter) is capable of learning from data, finding patterns, and making decisions on its own. A good example of machine learning would be the logistic and linear regression models I covered in earlier R lessons.
Now, there are two types of machine learning-supervised and unsupervised. Supervised machine learning occurs when you have clearly defined input and output variables. Think of the function y=f(x). You know what x-the input-will be and you can figure out a pattern for what y-the output-will be depending on the value of x. That function is similar to the whole idea of supervised machine learning-the analyst provides a template for R (or any other analytical tool) to draw conclusions and create analytical models.
Supervised machine learning can be classified into two main categories-classification and regression-depending on their output variable. A classification problem has a category value as the output, such as “male” or “female”, “puppy” or “kitty”. Regression problems have real values as the output; a person’s age would be a good example of a real value.
Unsupervised learning, on the other hand, has defined input variables, but no clearly defined output variables. Thus, there is no template to create analytical models, which is the whole point of unsupervised machine learning. See, where the ideas of supervised and unsupervised machine learning differ is that in unsupervised machine learning, R can discern pattern, draw conclusion, and create models without a pre-written template; this is not the case for supervised machine learning. However, one thing supervised and unsupervised machine learning have in common is each methodology has two main categories. In unsupervised machine learning, those categories are clustering and association problems (unlike supervised machine learning, however, these categories don’t depend on your output values). In clustering problems, you are trying to group data based on certain similarities (like demographics for toy sales). In association problems, you are trying to discover trends that describe your data (like girls that tend to buy toy X will usually buy toy Y as well).
Now before I go, here’s a visual example of the concepts I just discussed:

This is a photo of puppies and kittens, which I will use to illustrate the concepts of supervised and unsupervised machine learning.
In supervised machine learning, the machine is trained to identify puppies and kittens based on certain common traits (e.g. dogs have longer noses than cats).
Let’s say I give the machine a new photo and ask it to identify whether the animal shown is a puppy or kitten:
Since the machine knows that puppies have longer noses than cats, and that this animal has a longer nose, then the machine will identify this animal as a puppy based on what it has learned from previous data.
Now, let’s say we still wanted the machine to identify puppies and kittens, but we want to do so using unsupervised machine learning. However, unlike supervised machine learning, the machine doesn’t have a clear idea as to the traits that identify a puppy or kitten, so the machine would have to automatically analyze the traits of puppies and kittens in order to find out which traits define puppies and which define kittens.
Let’s use the photo of the puppy above as an example. Using unsupervised machine learning, the machine will automatically find traits unique to dogs and cats based on any information provided and based on the machine’s findings, the animal will either be classified as a dog or cat.
Thanks for reading,
Michael
One thought on “R Lesson 10: Intro to Machine Learning-Supervised and Unsupervised”