It’s Michael, and today’s Java post will be a little different from the previous six because I am not giving you guys a lesson today. See, when I launched this blog in June 2018, I said I was going to posts lessons and analyses with various programs. However, in the case of Java, I feel it is better to give you guys program demos that cover the concepts I’ve discussed so far (loops, if-else statements, etc.). In this post, I will show you guys two programs that cover all the Java concepts I’ve discussed so far.
public static void main(String[] args)
{
Scanner sc = new Scanner (System.in);
System.out.println(“Please choose a number: “);
int num = sc.nextInt();
int start = 0;
int t2 = 1;
if (num >= 0)
{
while (start <= num)
{
System.out.print(start + ” + “);
int sum = start + t2;
start = t2;
t2 = sum;
}
}
else
{
System.out.println(“Need a 0 or a positive number”);
}
}
}
This program prints every number in the Fibonacci sequence up to a certain point. If you’re wondering, the Fibonacci sequence is a pattern of numbers that starts with 0 and 1 and calculates each subsequent element by finding the sum of the two previous numbers.
Here are the first 10 numbers of the Fibonacci sequence-0,1,1,2,3,5,8,13,21,34,55. Notice how every element after the 3rd is the sum of the previous two numbers (e.g. 13 is found by calculating 5+8-the two numbers that directly precede 13 in the series)
To calculate the Fibonacci sequence, I first ask the user to type in a number. That number will serve as the endpoint for my while loop where I calculate each element in the Fibonacci sequence; in other words, my while loop will find every element in the sequence until it reaches the closest possible number to the user’s input. For instance, if I wanted to find every number in the Fibonacci sequence stopping at 600, the program would stop at the closest possible number to 600-which is 377.
I decided to add an endpoint because the Fibonacci sequence is infinite, and had I decided not to have an endpoint, I would’ve created an infinite loop.
Notice the if-else statement in my program. The loop will only run IF the user inputs a positive number. Otherwise, the user will just see a message that goes Need a 0 or a positive number. I decided to include the if-else statement since Fibonacci sequences don’t have any negative numbers.
run:
Please choose a number:
-5
Need a 0 or a positive number
BUILD SUCCESSFUL (total time: 3 seconds)
As you can see, my first output-112-displays the Fibonacci sequence up to 112 while my second output-negative 5-displays a message that is shown when a negative number is chosen.
Here is the code for my next program (video in link)-Factorial program:
package javalessons;
import java.util.Scanner;
public class JavaLessons
{
public static void main(String[] args)
{
Scanner sc = new Scanner (System.in);
System.out.println(“Please choose a number: “);
int num = sc.nextInt();
long factorial = 1;
if (num >= 1)
{
for(int i = 1; i <= num; ++i)
{
factorial *= i;
}
System.out.println(“The factorial of ” + num + ” is ” + factorial);
}
else
{
System.out.println (“Number is too low”);
}
}
}
This program calculates the factorial of a number greater than 0. For those wondering, a factorial is the product of a positive number and all positive numbers below it (stopping at 1). For instance 3 factorial is 6, since 3*2*1=6. Just so you guys know, factorials are denoted by the exclamation point (!), so 3 factorial is 3!.
I have three conditional statements in this program that account for the three possible scenarios of user input (>0,0, and <0). If the user enters a number greater than 0, then the for loop will execute and the factorial of the number will be calculated. If the user enters 0, then the message The factorial of 0 is 1 will be displayed, since 0!=1. If the user enters a negative number, then the message Number is too low will be displayed, since factorials only involve positive numbers.
If you are wondering why I used long for the factorial variable, I thought it would be more appropriate since factorials can get pretty long (13! is over 6 billion), and long has a wider range than int (9 quintillion as opposed to 2 billion).
Now, let’s see some sample output
run:
Please choose a number:
11
The factorial of 11 is 39916800
BUILD SUCCESSFUL (total time: 12 seconds)
run:
Please choose a number:
0
The factorial of 0 is 1
BUILD SUCCESSFUL (total time: 3 seconds)
run:
Please choose a number:
-14
Number is too low
BUILD SUCCESSFUL (total time: 3 seconds)
run:
Please choose a number:
344
The factorial of 344 is 0
BUILD SUCCESSFUL (total time: 3 seconds)
In the first output sample, I chose 11 and got 39,916,800 as the factorial (11!=39,916,800). In the second output sample, I chose 0 and got 1 as the factorial (remember than 0! is 1). In the third output sample, I chose -14 and got the Number is too low message since factorials only involve positive integers.
Now I know I mentioned that there were three possible scenarios for this program. There are actually 4. See, if you type in a number like 700 or 344 (as I did above), you will get 0 as the factorial, even though the actual factorial will be considerably larger. A 0 will be displayed if the actual factorial is greater than 9 quintillion since long doesn’t extend past 9 quintillion.
This concludes my current series of Java posts, but don’t worry, there will be more Java lessons soon! In the meantime, you can look forward to more R posts.
It’s Michael, and today’s lesson will be on Java loops. For those who do not know, loops repeat groups of statements in Java by executing the same actions over and over until a certain condition is met. The block of code inside the loop is called the body, while each repeat of the loop is referred to as an iteration of the loop.
Loops are important because Java programs often need to repeat actions. For instance, if you were trying to make a step-counter program, you would need a loop to increase the step-count by 1 every time a step is taken. You would ideally want to keep increasing by 1 until a certain count is reached-let’s say 10,000 steps (though you could also make your step-counter loop infinite).
There are three types of loops-while, do-while, and for-which each have their own distinct functions.
First off, I’ll discuss the for loop. Here’s a program demonstrating the use of a for loop (source code below)-For loops demo.
package javalessons;
public class JavaLessons
{
public static void main(String[] args)
{
for (int i = 2; i <= 400; i*=2)
{
System.out.println(i);
}
}
}
The basic structure of the for loop goes like this:
for (variable assignment; condition to end the for loop; updating action), followed by the body
ALWAYS remember to write your loop in this order; the compiler will throw error messages if you try to do otherwise!
In this program, I am using a for loop to calculate successive powers of 2 and display the results from top to bottom. My variable assignment-int i=2-goes first so that the loop knows the starting condition (that I am starting off with 2). The condition to end the loop-i <= 400-goes next so that the loop knows where to end (the program will keep displaying powers of 2 until a number that is either the closest to 400 or exactly 400 is reached). The updating condition-i*=2-tells the loop how to update the output (keep multiplying i by 2 until the ending condition is met).
By the way, you can put i = i* 2 instead of i*=2 . Either will be accepted by the compiler, the latter is just convenient shorthand. The same logic goes for addition, division, and subtraction.
The program will display all the powers of 2 that are less than 400, starting with 2 and ending with the maximum power of 2 that is less than 400-296.
Next I’ll discuss the while loop, with a demo program (source code below)-
package javalessons;
import java.util.Scanner;
public class JavaLessons
{
public static void main(String[] args)
{
Scanner sc = new Scanner (System.in);
System.out.println(“Pick an even numbered year: “);
int year = sc.nextInt();
int baseYear = 2100;
while (year < baseYear)
{
if (year % 4 != 0)
{
System.out.println (year);
System.out.println(“Winter Olympics Year”);
}
The basic structure of the while loop goes like this:
while (condition to keep while loop going), followed by the body.
Variable declarations are always included outside of the while loop; ideally just above the line where the while loop begins.
In this program, I enter an even numbered year and the program will tell me if it’s a Winter or Summer Olympics year and based on that, it will tell me the next Winter/Summer Olympics years until 2100 (since my while condition was that the loop will keep going and increasing the year by 4 until 2100 is reached).
If you’re wondering what the % means in Java, it means remainder. The statement if (year % 4 != 0) executes the output in the body of the code if the year entered has a remainder and isn’t divisible by 4, since Winter Olympics years occur during even numbered years that aren’t easily divisible by 4 (2006, 2010, 2014, and so on). If the year entered is divisible by 4, then Summer Olympics years would be displayed, since they occur during even-numbered leap years (which are years divisible by 4)-think 2012, 2016, 2020 and so on.
Let’s try some sample output:
Output 1:
run:
Pick an even numbered year:
2014
2014
Winter Olympics Year
2018
Winter Olympics Year
2022
Winter Olympics Year
2026
Winter Olympics Year
2030
Winter Olympics Year
2034
Winter Olympics Year
2038
Winter Olympics Year
2042
Winter Olympics Year
2046
Winter Olympics Year
2050
Winter Olympics Year
2054
Winter Olympics Year
2058
Winter Olympics Year
2062
Winter Olympics Year
2066
Winter Olympics Year
2070
Winter Olympics Year
2074
Winter Olympics Year
2078
Winter Olympics Year
2082
Winter Olympics Year
2086
Winter Olympics Year
2090
Winter Olympics Year
2094
Winter Olympics Year
2098
Winter Olympics Year
BUILD SUCCESSFUL (total time: 10 minutes 55 seconds)
I chose an even-numbered non-leap year, 2014, and the program printed out all Winter Olympics years from 2014 until the Games happening right before the year 2100 (the 2098 Winter Games).
Output 2:
run:
Pick an even numbered year:
2020
2020
Summer Olympics Year
2024
Summer Olympics Year
2028
Summer Olympics Year
2032
Summer Olympics Year
2036
Summer Olympics Year
2040
Summer Olympics Year
2044
Summer Olympics Year
2048
Summer Olympics Year
2052
Summer Olympics Year
2056
Summer Olympics Year
2060
Summer Olympics Year
2064
Summer Olympics Year
2068
Summer Olympics Year
2072
Summer Olympics Year
2076
Summer Olympics Year
2080
Summer Olympics Year
2084
Summer Olympics Year
2088
Summer Olympics Year
2092
Summer Olympics Year
2096
Summer Olympics Year
BUILD SUCCESSFUL (total time: 3 seconds)
I chose an even-numbered leap-year, 2020, and the program printed out all Summer Olympics years from the year 2020 until the Games right before the year 2100 (which would be the 2096 Summer Games).
Output 3:
run:
Pick an even numbered year:
2110
BUILD SUCCESSFUL (total time: 3 seconds)
I chose a year greater than 2100-2110 and the loop doesn’t execute at all, since my initial response exceeded the base year.
The main difference between the while and for loops is that variable declarations and updating conditions (like incrementing variables) are done either in the body of the loop or outside the loop when dealing with while, but both of those things are handled in the constructor of the loop when dealing with for.
Finally, the last loop I will discuss is the do-while loop, which is very similar to the while loop, except that the body of a do-while loop always runs at least once, while the body of a while loop might not run at all. This is because the body of the do-while loop comes before the constructor, not after.
Here’s the basic structure of the do-while loop:
do {body of loop} while (condition to keep loop running)
After each iteration, the condition is checked to see if it still holds true. As long as it does, the loop keeps running.
public static void main(String[] args)
{
Scanner sc = new Scanner (System.in);
System.out.println(“Pick a number: “);
double number = sc.nextDouble();
double base = 1000;
do
{
number *= 1.05;
System.out.print (number + ” , “);
} while (number <= base);
}
}
In this program, I enter a number and the do-while loop will calculate 5% growth so long as the number being increased by 5% is less than the base of 1000. I could’ve written the program using a while loop, but there would’ve been a chance that the while loop might not have run at all, but the do-while loop will run at least once, which I wanted (plus I felt it was important to discuss).
I typed in 456, and the program kept displaying 5% increases starting with the first increase-478.8-until the program reached a number that was either 1000 or the closest possible calculation to 1000.
Now, when dealing with Java loops, one thing you should watch out when writing your code is the infinite loop, which are loops that don’t end.
Infinite loops can be caused by something as simple as a sign. Here’s an infinite loop from the do-while example in this post (code first, then output)-:
package javalessons;
import java.util.Scanner;
public class JavaLessons
{
public static void main(String[] args)
{
Scanner sc = new Scanner (System.in);
System.out.println(“Pick a number: “);
double number = sc.nextDouble();
double base = 1000;
do
{
number *= 1.05;
System.out.print (number + ” , “);
} while (number >= base);
}
}
As you can see, I changed the sign in the do-while loop from “less than or equal to” to “greater than or equal to”. If you type a number greater than the base of 1000, then your loop can go on and on, as you can see here (of course, you can always hit the stop button).
It’s Michael, and today’s lesson will be a little different. There won’t be actual coding involved here, rather, I will discuss a topic which is important in Java-numbering systems.
What’s the point of numbering systems? Think of this way-we communicate with each other with words and numbers, but since computers only understand numbers, numbers are how computers communicate with each other. When we enter data in our programs, that data is converted into electronic pulse, which is then converted into numeric format by ASCII (an acronym for American Standard Code for Information Interchange), which is the system used for encoding characters (whether 0-9, A-Z lowercase or uppercase, or some special characters like the backslash and the semicolon) based off the four main numbering systems (binary, octal, hexadecimal, and decimal). Each character, of which there are 128, has their own unique ASCII value for each of the four numbering systems.
Now, let’s explain each of these four numbering systems:
Binary is a base-2 number system, as there are only two possible values-0 and 1. 0 represents absence of electronic pulse while 1 represents presence of electronic pulse. A group of four bits, like 1100, is called a nibble while a group of eight bits, like 01010011, is called a byte. The position of each digit in a group of bits represents a certain power of 2.
To find out the decimal form of 01010011, multiply each 0 and 1 by a power of 2. You can determine which power of two to use based on the position of the 0 or 1 in the group. For instance, the 1 at the end of the group would be multiplied by 2^0, since you would always start with the power of 0. Then work your way left until you reach the last bit in the group (and increase the exponent by 1 each time).
The number in decimal form is 83, which can be found through this expression (2^0+2^1+2^4+2^6). Granted, you don’t need to worry about the 0 bits, since 0 times anything is 0. And for the 1 bits, you just need to calculate the powers of two and add them together; don’t worry about multiplying 1 to each power since 1 times anything equals itself.
And remember to ALWAYS start with the power of 0 and work your way left, increasing the exponent by 1 each time.
Octal is very similar to binary, except it is a base-8 system, so there are 8 possible values-0 to 7 (Java tends to like to start with 0).
Octal-to-decimal conversions are very much like binary-to-decimal conversions, except you’re dealing with successive powers of 8 as opposed to powers of 2.
Take the number 145. What would this be in decimal form?
If you said 101, you’re right. You can find this out using the expression (1*8^2+4*8^1+5*8^0). Remember to multiply each digit by their respective power of 8, then find the sum of all the products.
And ALWAYS remember to start with the power of 0.
Decimal is the number system we use every day when talking about numbers; it is a base-10 system, so there are ten possible values (0 to 9). You are probably very familiar with this system, so there’s not much explaining to do here.
Hexadecimal is an interesting system for a few reasons. First of all, it is a base-16 system, so there are 16 possible values. Secondly, this is the only alphanumeric system of the four I discussed, as this system uses the digits 0-9 and the letters A-F which represent the numbers 10-15.
Hexadecimal-to-decimal conversions follow the same process as octal-to-decimal and binary-to-decimal conversions, except you’re dealing with successive powers of 16.
What do you think 45ACB could be in decimal form?
If you answered 285,387, you’re right. Here’s the expression: (4*16^4+5*16^3+A*16^2+C*16^1+B*16^0). Remember that A is 10, B is 11, C is 12, D is 13, E is 14, and F is 15.
Remember that these four systems aren’t the only possible number systems, as there are other number systems for several different bases. For instance, a base-4 system would use the numbers 0-3 and utilize successive powers of 4 in conversions. Here’s some things to know about other number systems:
Base-X means that there are X possible values (so base-9 would have 9 possible values)
The starting point for number systems is always ZERO; the endpoint is X-1 (so 5 would be the endpoint for a base-6 system)
You would convert base-X numbers to decimal form utilizing successive powers of X (so trying to convert a number in base-5 decimal form would utilize successive powers of 5-5^0, 5^1, and so on), multiplying each digit by the power of X, and finding the sum of all the products.
When the base is greater than 10, use letters of the alphabet for values above 9. So for base-19, use the letters A-I for the values 10-18.
Now, here are some conversion problems for you. Can you solve them? I will post the answers on the next post:
It’s Michael, and today’s post will be on if-else statements and logical/comparison operators in Java. If-else statements allow your program to choose between two or more alternative actions.
Why are if-else statements so important? Well, with Java programs, as in everyday life, things can go several different ways. Let’s say you just took a final exam for a college class. Depending on how you did on that exam, your grade can come out one of five ways (A, B, C, D, or F). The same sort of thinking applies to programming, as there can be several different actions a program can perform depending on certain scenarios (like if an int is greater than/less than/equal to a certain amount).
Let’s demonstrate using a program example (I had to copy and paste the code since you can’t see all of it from a screenshot):
package javalessons;
import java.util.Scanner;
public class JavaLessons
{
public static void main(String[] args)
{
Scanner sc = new Scanner (System.in);
System.out.println (“Please enter a number: “);
double temperature = sc.nextDouble ();
System.out.println (“C or F: “);
String unit = sc.next();
double temp;
if (“C”.equals(unit))
{
temp = (1.8*temperature)+32;
System.out.println (“The temperature is ” +temp+ ” degrees Farenheit”);
}
else if (“F”.equals(unit))
{
temp = (temperature-32)*0.56;
System.out.println (“The temperature is ” +temp+ ” degrees Celsius”);
}
else
System.out.println(“Not a valid answer”);
}
}
This program is a temperature unit converter; the user enters a number and a “C” or “F” (for Celsius or Fahrenheit) and, depending on what letter was entered, converts the temperature from Celsius to Fahrenheit or vice versa.
I want to clarify that there is a difference between else if and else. Else if acts as another if statement; you would use else if if your program was trying to test two or more conditions, such as whether a “C” or “F” was typed. The else statement would come at the very end of your code since that would be the statement that executes should neither of the if/else if conditions be true; the output I used for the else statement would only display if neither a “C” nor an “F” was typed.
Always use .equals for String variables such as these. Don’t use =.
You need to use brackets if you if or else if statements use more than one line of code. If you only need one line of code, you don’t need the brackets.
Let’s try inputting a number:
run:
Please enter a number:
19
C or F:
C
The temperature is 66.2 degrees Farenheit
BUILD SUCCESSFUL (total time: 17 seconds)
I entered 19 and C, meaning I wanted to find out what 19 degrees Celsius was in Fahrenheit. I got 66.2, which means that 19 degrees Celsius is equal to 66.2 degrees Fahrenheit.
Now let’s see what happens when I don’t type a “C” or an “F”:
run:
Please enter a number:
45
C or F:
W
Not a valid answer
BUILD SUCCESSFUL (total time: 8 seconds)
I typed 45 and W and as a result, “Not a valid answer” displayed (which is the output I had set to display if neither “C” nor “F” was typed).
Now let me introduce some new concepts that perfectly relate to if and else if statements-comparison and logical operators.
Comparison operators are pretty self-explanatory, as they are used to compare two things (usually testing a variable and a value). If you know basic algebra, you’d be familiar with most of these:
== is “equal to” (remember to use .equals for Strings)
!= is “not equal to”
> is “greater than”
>= is “greater than or equal to”
< is “less than”
<= is “less than or equal to”
Logical operators are similar to comparison operators since both compare statements. However, comparison operators are only used for single boolean statements, such as (temperature > 70) while logical operators are used for groups of boolean statements, such as (temperature >= 60) && (temperature <= 80).
Here are the three main logical operators:
&&-AND
||-OR
!-NOT
OK, so even though I said logical operators are used for groups of boolean statements, the NOT operator is more for single boolean statements, such as !(temperature > 70)-meaning that you’re looking for temperatures that aren’t greater than 70 degrees. You can, however, also use the NOT operator for groups of boolean statements, such as (!(temperature >= 50) && (temperature <= 0))-meaning that you’re looking for temperatures that aren’t between 0 and 50 degrees.
Here’s the one takeaway with the NOT operator-it negates boolean statements (whether an individual statement or groups of statements). In other words, it essentially contradicts whatever is in the statement(s).
Now, one more thing I want to cover regarding logical operators-the truth table. The truth table doesn’t involve any coding, rather it’s a helpful guide with regards to boolean statements in your program. The truth table compares two boolean statements, looks at the values of each statement (whether they are true or false), and based on the values of each individual statement, returns true or false for that pair of statements. Here’s the table below:
When dealing with AND, both statements in a pair have to be to true in order for the statement pair to be true. If one or both statements are false, then the statement pair returns false. With OR however, only one statement has to be true for the pair to be true, but if both statements are false, then the pair is false. And as I mentioned earlier, NOT contradicts boolean statement(s), so if you not right by a boolean statement pair that is true, then that pair is false (and vice versa).
Let’s test this logic with an example program:
In this program I have two boolean statements (5 > 7) and (12 < 16). I then use System.out.println to test whether the statement pair greaterThan && lessThan is true or false. Since I used AND and one of the statements is false-greaterThan-both of the statements are false. Remember with AND, if one of the statements are false, then both are false.
Now let’s apply our knowledge of truth tables to more than two boolean statements:
When dealing with more than two boolean statements, logical evaluation is done from left to right. The Java compiler would first read the first pair of statements-greaterThan && lessThan-to evaluate whether that statement pair is true or false (it’s false). The compiler then takes the false result and compares it with the subtraction statement, which is true (thus statements 1-3 are collectively true). Finally, the compiler takes the collective true result from statements 1-3 and compares it with the multiplication statement, which is false. But, since OR is being used to compare the statements, you only need 1 true statement to result in true, which is the case here. After analyzing all 4 statements, the compiler states that the 4 statements together are collectively true.
Now, let’s analyze a program with several if options. In this program, you would type in a year, and the program will tell you what generation you belong to; keep in mind that the dates provided are just approximations, as different sources use different dates to mark the beginning and end of a generation. Here is the code:
package javalessons;
import java.util.Scanner;
public class JavaLessons
{
public static void main(String[] args)
{
Scanner sc = new Scanner (System.in);
System.out.println (“Enter your birth year: “);
int year = sc.nextInt();
if (year >= 1900 && year <= 1928)
{
System.out.println(“You are part of the Greatest Generation”);
}
else if (year >= 1929 && year <= 1945)
{
System.out.println(“You are part of the Silent Generation”);
}
else if (year >= 1946 && year <= 1965)
{
System.out.println(“You are a Baby Boomer”);
}
else if (year >= 1966 && year <= 1981)
{
System.out.println(“You are a Gen-Xer”);
}
else if (year >= 1982 && year <= 2000)
{
System.out.println(“You are a Millenial”);
}
else if (year >= 2001 && year <= 2019)
{
System.out.println(“You are a Gen-Zer”);
}
else
{
System.out.println(“Invalid input”);
}
}
}
Please note that you will get a certain output (i.e. “You are a Millennial”) if both conditions are true; “You are a Millennial” will only print out if the year you typed is between 1982 and 2000.
Now let’s use the year 1946 (President Trump’s birth year):
run:
Enter your birth year:
1946
You are a Baby Boomer
BUILD SUCCESSFUL (total time: 3 seconds)
As you can see, 1946 corresponds to Baby Boomer, meaning President Trump is a Baby Boomer.
Now let’s try 1996 (my birth year):
run:
Enter your birth year:
1996
You are a Millenial
BUILD SUCCESSFUL (total time: 3 seconds)
According to the program, I am a millennial.
For those who want a video demo of the program, check this out-Generation program.
Now, a concept you should know that ties in with the material in this post is precedence. Precedence is similar to order of operations in the sense that both concepts involve doing things in a certain order. However, while order of operations just deals with arithmetic and exponents, precedence deals with arithmetic, comparison, and logical operators. With precedence, there is a certain order which the compiler evaluates statements with multiple arithmetic/comparison/logical operators. Here’s a handy guide:
First-the operators +,-,++,–, and !
The ++ and — operators represent incrementing and decrementing, respectively. These operators basically increase or decrease (or increment and decrement) a number by 1; you’ll see them more when I cover loops.
Second-the operators *, / and %
Third-the operators + and –
The plus and minus signs that take highest precedence actually refer to positive and negative numbers, while the one that are 3rd in precedence order actually refer to addition and subtraction
Fourth-the operators <, >, <= and >=
Fifth-the operators == and !=
Sixth-the operator &
Seventh-the operator |
Eighth- the operator &&
Ninth-the operator ||
The difference between the single & and || and the double && and || is that the former two are known as bitwise operators, which analyze data of types int, long, short, char, and byte bit by bit.
Check out this forum to learn more about bitwise operators-Bitwise Operators.
Keep in mind that parentheses ALWAYS take highest precedence (yes even higher than the operators listed as highest precedence). Any statements inside the parentheses are evaluated in order of operator precedence; if you’re dealing with parentheses-within-parentheses, the expression in the innermost set of parentheses is evaluated first, after which the compiler will work its way outward.
Here’s a series of statements; can you guess the result (true or false) based on precedence:
temperature = 44;
temperature – 2 < 32 || (temperature + 5 != 55) && temperature / 2 < 30
If you said true, you are right. Let me break it down:
The statement temperature + 5 != 55 is evaluated first, as it is in parentheses. It is true.
The next statement to be evaluated is temperature / 2 < 30, as it right by the && sign, which takes higher precedence than ||. That statement is also true, so that results, along with the true statement in parentheses, makes for a true statement pair.
The last statement to be evaluated is temperature - 2 < 32, since it’s right by the || sign. Even though this statement is false, the true statement pair to its right makes the whole group of statements collectively true (remember with OR, only one true makes for a collective true).
It’s Michael, and today’s post will be on user input, importing classes, and doing math in Java. I know these may seem like three totally unrelated topics, but I think they relate well to one another, which is why I’m going to cover all of them in this post.
First off, let’s discuss how to manage user input in your Java program. If you weren’t already aware, Java output doesn’t always have to be predefined-you can write your code so that you can choose the output.
The first thing you would need to do is import the Scanner class from Java, which is the class that handles all text input (don’t think about using it for graphics-heavy programs). Importing the Scanner class (or any class for that matter) allows you to use all its methods.
Here’s an example of a Scanner program:
The first thing I did was to import the Scanner class. I then created s, which is a variable of type Scanner. Remember to always create a Scanner variable every time you want to use the Scanner class in your program! And remember to ALWAYS set the parameters as (System.in).
I then used System.out.println to write the command asking the user for their input. The user’s input then is stored in a variable, in this case int year. You must always store your input in a variable if you want it to display on the screen. What variable type you choose depends on what kind of input you are seeking. In this case, since I am seeking integers, I want to use type int. The value you assign to the variable would be (name of Scanner variable).next(type of variable your input is). Since my Scanner variable is named s and my input variable is of type int, the value I assigned my variable was s.nextInt(). The next______ method you use depends on the input variable’s type, so if I had a char variable I would use s.nextChar(). Please note that there is no nextString method so for String variables use next or nextLine.
For those who would like a video demonstration of the Scanner class, check out this program-Scanner demo.
Now, Scanner isn’t the only class you can import from Java. In fact there are plenty of other classes (and methods) you can use and import for a variety of purposes, such as the class Calendar for calendar programs. The best part is that you don’t need to memorize all of the classes available on Java, as there is a handy online dictionary where you can look up classes and methods that you need for your program. This dictionary does a very good job of explaining how you can use each available class.
Now at the beginning of this post, I said I was going to cover how to do math in Java. Well, there is a certain class that I will show you how to use to do just that-it’s the Math class. To access this class, use the line import java.lang.Math. Here are three programs utilizing the Math class and its methods.
This program asks the user for a negative number, then finds the absolute value of that number. Since I typed -33, the program calculated 33 as the absolute value.
This program asks the user for two numbers, then will calculate the value of the first number raised to the power of the second number. Since my first number was 9 and my second number was 3, the program printed out 729, or 9 cubed (which is a more colloquial way of saying “to the third power”)
This last program asks the user for a number, which in this context is an angle measurement for a right triangle. The program will then display the cosine (recall SOHCAHTOA) for that angle measurement; since I used a 49 degree angle in this example, the cosine is 0.3006 (rounded to four decimal places).
However, one thing to note is that the Math is more geared towards complex mathematical operations (like finding logs, trigonometric ratios, and square roots). You don’t need the Math class for basic arithmetic.
In case you’re wondering, here are two programs showing how to do basic arithmetic in Java:
This program demonstrates simple division; the user is asked for a number divisible by 3, then the program will print out the quotient of that number divided by 3. Since I used 612, the quotient displayed was 204.
This program utilizes the order of operations (think PEMDAS); the user is asked to type in a number and the program will print out the answer to the problem (pemdas/2)*3+8-4. Remember pemdas is the name of the variable where you will store the number you choose. I chose to store the user input as a double because I wanted a precise answer; had I stored the input as int , I would’ve gotten the nearest whole number (12) instead of the more precise answer I wanted-11.5.
Something I should have pointed out earlier-ALWAYS remember to put semicolons after each statement! To be clear, this only means the lines of code inside each method or class along with any import statements. This doesn’t mean semicolons are needed after class/method declarations themselves!
This isn’t totally necessary when coding, but the // symbol right before a line of code is a way to take notes within your program. The compiler won’t read any line of code that begins with //.
Now before I go, here’s a video demonstration on doing math in Java (both with the Math class and without)-Java Math.
It’s Michael here, and today’s post will be on variables and data types in Java. Variables are pieces of memory in Java that store information necessary for your program to run. Take a look at this statement:
int gamesWon = 2
This assignment statement (which is what you call the line of code where you create a variable and give it a value) displays the variable’s name (known in Java jargon as an identifier), value, and data type (int). Data types, such as int, indicate what kind of value the variable is. In the above example, the data type int indicates that the value is an integer.
When naming your variables, remember that you can only use letters, the numbers 0-9, and the underscore character. Also, the first character of your variable name can’t be a number.
The variable names 4twenty, go-Browns, and Linkin....Park won’t work, and the complier will complain if you try to use any of them.
One thing I didn’t mention in the previous Java post is that Java is case-sensitive, so that variables like southPark, Southpark, and SouthPark would be treated as 3 different variables rather than the same variable. Having 3 same-named but differently-capitalized variables is perfectly acceptable in Java, though it will get confusing for you the programmer.
A peculiar Java syntax convention indicates that, with multi-word variable names like gamesWon, the first word (in this case games) should start with a lowercase letter and any subsequent words should start with an uppercase letter. It’s just good practice though; after all, the Java compiler won’t display error messages if you use GamesWon or gamesWon instead of gamesWon.
There are two types of variables in Java-primitive types and class types. Class type variables belong to a certain class (which can either be a built-in class such as String or Array or a class you create in your program). On the other hand, primitive type variables don’t belong to a certain class but are rather the eight basic types of variables in Java, which consist of:
byte
short
int
long
float
double
char
boolean
The first four variable types mentioned (byte, short, int, long) are all of type int. The only difference between them is the integer ranges they represent. Byte has the smallest range, spanning from only -128 to 127, while short goes from approximately -32,000 to 32,000, int from roughly negative 2 billion to positive 2 billion, and long from negative 9 quintillion to positive 9 quintillion.
Float and double are floating-point numbers, which is a more technical way of saying that float and double are decimals. Numbers like 3.5, -9.99, and 1.042 would qualify as either float or double. Like the four integer variable types mentioned in the previous paragraph, the only difference between float and double are the ranges they represent, with double having the wider range of decimals.
The last two variable types, char and boolean, don’t store numbers. Rather, char stores single characters, which can consist of any of the letters of the alphabet, the numbers 0-9, and symbols like the pound sign (or hashtag for the millennial crowd) or percent sign. Char can only be 1 character, so the variables g^ksf and dipppk can’t be of type char since they have multiple characters (String would be a more appropriate type). And yes, even though I said char doesn’t store numbers, a variable like char = 0 wouldn’t be regarded as a number. Boolean stores the values true and false and is used when dealing with logic statements (more on those in a future post). For instance the statement (41 > 55) would qualify as a logic statement by Java and thus would be boolean. If you’re wondering, this statement would return false (which can be figured out easily because 41 is less than 55).
Now, if you’d like to see how variables work, here’s a little demonstration:
Inside the main method, I declared an int variable-currentYear-and gave it a value-2019 (well, because that’s the current year). Remember to put the assignment statement inside the main method, because the compiler will complain if you try to do otherwise. I then asked the program to print out the currentYear, which it did.
One thing to know with System.out.println and variable names is that you can simply put the name of your variable in the parentheses and the value will print out.
For those that would like to see a video demonstration of how variables work, click this link-Variables Demo Java.
This is Michael, and first of all, Happy New Year! Can’t believe it’s already 2019! I hope you all had a wonderful holiday season. I also wanted to say thank you for reading my posts in 2018, and I hope you learned some new skills along the way. Get ready for some exciting programming and analytical content in 2019!
Now, since is the first post of 2019, I thought I’d try discussing something new-Java. Some of you might be wondering “What is Java?” Java is basically a general-purpose computing language that is used for Android apps (whether on the phone or tablet), server apps for financial services companies (think large hedge funds like Goldman Sachs), software tools (like NetBeans-more on that later), and games like Minecraft.
One thing to keep in mind with Java is that is more for building applications and programs. Java isn’t really useful for statistical analyses like R is, or database building/querying like MySQL.
Now to understand Java, I will discuss a concept called IDEs, or Integrated Development Environments. IDEs are basically the software systems you will use to write, test, and debug your programs. My personal favorite IDE is NetBeans, which is what I will use for all of my Java posts (there are other IDEs to choose from like JCreator, which I’ve used but didn’t like as much as NetBeans). NetBeans, and most other Java IDEs, are free to download; another perk is that they will point out and explain errors in your code (whether syntax or otherwise) by displaying a red line underneath the line(s) of code with the error(s). You might also see yellow lines underneath certain portions of code, but that is usually displayed to suggest how to make your code neater rather than to point out errors in your code.
So let’s get started and open up NetBeans:
This screen is what you will see every time you open up NetBeans. The other tabs contain other Java projects I’ve done in my spare time.
How do we create a new Java project? Here’s a video showing how to create a new Java project-Java Lesson 1.
As you can see, you must first create a project, then create a class. You can have several classes in your project, but for now let’s stick with a single main class.
Once you finish creating your project and class, you should see your interface, which looks like this:
The interface is where the magic of Java programming happens; here, you write, run, and debug the code you will use for your program.
Now let’s demonstrate the programming process using the most famous example for beginner coders-the “Hello World” program. Here’s a video demonstration for anyone interested-Hello World demo.
Some of you are probably wondering “What does all this code mean?” Let me explain.
A package is a mechanism for grouping classes. The package javalessons gets its name from our main class-JavaLessons. If we had several packages in our project, then they would all belong to the javalessons package.
There are two types of packages-user defined (like the javalessons package) and built-in (which come with Java). Some built-in packages include:
java.util– contains utility classes which implement data structures such as Dictionary; commonly used for Date/Time operations
java.awt– commonly used for graphical interfaces purposes (such as creating buttons or drop-down menus)
Two concepts that you should know are objects and classes, which are the two most fundamental concepts in Java.
Java is an object-oriented programming language, which means that Java programs consist of objects that can either act alone or interact with one another.
Think of it this way-the world is full of objects (people, trees, dogs, food, electronics, houses, etc.). Each of these objects can perform several actions, and each action affects some of the other objects in the world. For instance, people buy and eat food, while dogs pee on trees.
Simple enough, right? Let’s say we had a program that analyzes the inflow and outflow of animals at an animal shelter. The program would have an object to represent each dog or cat that enters or exits the shelter as well as objects for cages, people that surrender/adopt animals, and so on.
Every object has characteristics, or attributes. Using the aforementioned shelter example, animals have names, ages, genders, breeds, medical conditions, and so on.
The values of an object’s attributes give the object a state. For instance, a dog can be named Spots, be 10 years old, male, and a Chocolate Labrador. The things an object can do are its behaviors; using the example I just mentioned, dogs can bark, growl, roll over, eat, sleep, poop , and so on.
In Java, each object’s behavior is defined by a method (more on that later)
Similar objects of similar data types belong to the same class, which is essentially a blueprint for creating objects. Using the dog example, all Chocolate Labradors that enter or exit the shelter would belong to the class ChocolateLabrador.
You’ll notice that the words package, public class, and public static void are all in blue. That’s because they are reserved words, which means you can’t use them as the names of objects or variables because these words are already reserved for the syntax of Java programming. Here’s a handy glossary of reserved words-Reserved Words in Java.
One of the first methods programmers learn is the public static void main (String [] args) method; this method is essentially the entry point for Java programs. This method contains the block of code that will perform a certain action-in this case, display our output. Let’s break it down word by word:
public-meaning any object (whether in the same class or different classes) can use the main method
static-means the method belongs to its class and not to any particular object
void-meaning the method doesn’t return any value
main-this is just the name of the method
String [] args-meaning the necessary parameters for this method are arrays of Strings; arrays are just linear arrangements of things. I’ll cover arrays more in a future lesson.
public class means the class can be called from anywhere. For instance, if you were trying to create a game with several different character classes, then provided that your game class is public, the class (and any public methods) can be called by any of the character classes.
System.out.println(string) is one of the first lines of code beginner Java programmers learn. This line of code prints out whatever is in the parentheses (which is usually a string). In this case, we’ll be printing out “Hello World” (I know it’s cliche, but it’s a great first Java lesson). Now let’s break down this line of code word-by-word:
System-a built-in class that contains useful methods and varaibles
out-a static variable in the System class that represents the output of your program
println-the method that prints a line of code
Now, before I go, I want to mention a little caveat when it comes to the println method-the difference between it and the print method. See, since println and print are so similarly named, you might think you can use the two methods interchangeably and get the same result. Not true-check out the pictures below for proof.
Granted, I’m trying to do the same thing in both examples (print “Hello World” twice). But in the first picture, since I used println both times, the two instances of “Hello World” are displayed one on top of the other. In the second picture, where I used print both times, the two instances of “Hello World” are displayed right next to each other without a space.
Thanks for reading, and here’s to expanding our analytical and programming knowledge in 2019!
It’s Michael, and today’s post will be on time series analysis, which analyzes time-dependent data (such as the weather in Miami, Florida over the course of 2018 or the Cleveland Browns’ records over the last decade or the price of bitcoin in the last 2 years, just to give some examples) over a certain period of time.
In the post, I will be utilizing search data from Google Trends to analyze how often certain famous people are searched. For those that don’t know, Google Trends is a fascinating tool that allows you to see how often something (whether a food, person, event, animal, etc.) is searched on Google over a certain timeframe (all the way back to January 1, 2004). Google Trends also has several fascinating analyses, like The Year in Search (which details the most popular worldwide Google searches in a given year).
Here’s the spreadsheet-Google Trends-I swapped out <1s for 0s so that R would be everything as an int and not a factor.
Anyway, I’ll be making several graphs, analyzing two people at a time in each graph that have something in common.
Now, let’s load our file and try to understand the data:
This data basically consists of 52 dates (shown by the Week variable), and the search popularity in the US for 22 people over the last year (from the week of December 17, 2017 to December 9, 2018). The numbers 0-100 are used as a metric to determine how often a certain person’s name was searched in a given week; 0 means there either wasn’t enough data or that person’s name wasn’t searched at all while 100 means the person’s name was searched for (presumably) millions of times that week. All the dates listed are Sundays (12/17/17, 12/24/17, etc.), meaning in this case, a week is measured from Sunday-Saturday,
Now before we start graphing, we need to be sure the strings in the Week variable are converted to dates for the purpose of the graph, which is what this line does (more specifically, dates are converted into month/day/year format-exactly the way they are listed in the spreadsheet)
Now time to graph (remember to install the ggplot2 package). I will be looking at two people (who have something in common) at a time and doing a comparative analysis.
I’ll start by analyzing Jared Fogle and Bill Cosby-two celebrities who had very public falls from grace and are both currently incarcerated.
As you can see, Bill Cosby was more popular than Jared Fogle in American Google Searches. This is likely because Fogle has been incarcerated for his crimes since November 2015, while Cosby was re-tried, convicted and ultimately sent to prison in the span of five months (April-September 2018). Cosby had plenty of legal drama this year, which could explain the greater fluctuation in his graph (compared to Fogle’s). Cosby also has two major peaks in his search history graph for the weeks of April 22 and September 23-the weeks he was convicted and sent to prison, respectively.
The peaks aren’t the only things you should be analyzing. Check the numbers on the y-axis to get an idea of the maximum search history metric. For instance, Jared Fogle’s highest search metric is 1, while Bill Cosby’s is 100. This indicates that more Americans searched Cosby’s name than Fogle’s (Cosby was also the more newsworthy of the two this past year)
Now let’s analyze US search history trends for Kyle Kulinski and Ana Kasparian-two famous left-wing commentators. Kulinski hosts the Secular Talk YouTube channel while Kasparian is a member of progressive YouTube news channel The Young Turks.
As you can see, even though Kasparian’s graph has more fluctuation than Kulinski’s, searches for Kulinski’s name were more popular than searches for Kasparian because the search history metric for Kulinski goes above 75 two times, while the metric for Kasparian doesn’t exceed 25. The discrepancy between Kulinski’s search history metric and Kasparian’s could be because more people subscribe to Secular Talk than The Young Turks (I’m just theorizing here).
Now let’s analyze the search metric history for Mike Shinoda and Chester Bennington-two members of Linkin Park.
As you can see, Chester Bennington’s highest metric is 100, while Mike Shinoda’s highest metric is 44. I’m guessing the reason Bennington’s metric is higher is that many people still enjoy listening to Linkin Park’s music-and hear his voice-after his death. Also worth noting is that Bennington’s metric peaked on the week of July 15, which was around the one-year anniversary of his death on 7-20-17. Shinoda’s search history metric peaked on the week of June 17, which was when his solo album Post Traumatic was released in its entirety (and which he created after Bennington’s death).
Now time to compare the American search metric history for JaMarcus Russell and Ryan Leaf-two of the biggest NFL busts of all time (and both quarterbacks).
As you can see, Leaf’s graph has more fluctuation than Russell’s, but Russell’s graph peaks at 100 while Leaf’s only peaks at 26. Then again, the search history metric average for Russell is 5.7 and for Leaf is only 5.3, meaning neither individual’s name widely pops up in US Google Searches. However, the one thing that can explain Russell’s peak of 100 on the week of November 4 could be this article with an interesting story about Russell-https://bleacherreport.com/articles/2804453-david-diehl-raiders-gave-jamarcus-russell-blank-tapes-to-see-if-qb-watched-film.
Now let’s compare the search history metrics of Dwayne Wade and Hassan Whiteside, two current Miami Heat players.
As you can see, Wade’s graph peaks higher than Whiteside’s (100 to Whiteside’s 20). This is likely because Wade had a more eventful year than Whiteside, as he returned to the Heat (week of February 4), announced his retirement (week of September 16), welcomed another baby (week of November 4), and played in his 1000th career game (week of December 9).
Now time to analyze the search history metrics for Samuel J Comroe and Shin Lim-two contestants on AGT Season 13. Samuel J Comroe was a stand-up comedian who finished in 4th place, while Shin Lim was a close-up magician who finished as the season’s winner.
As you can see, Shin Lim’s peak is much higher than Samuel J Comroe’s (100 to 8, respectively). Neither contestant has much fluctuation in their graphs, but both peak on the week of September 16 (this was the week of the AGT Finals, which both Comroe and Lim competed in and finished in the Top 5).
Now let’s analyze the search history metrics for Tom Brady and Nick Foles-the two starting quarterbacks for Super Bowl LII.
As you can see, neither QB’s graph fluctuates much. Both graphs hit their peaks on the weeks of January 21 (AFC/NFC Championships) and February 4 (Super Bowl LII). Interestingly enough, Brady’s graph has the higher peak (100 to Foles’s 54), even though Foles and the Eagles won the Super Bowl. I guess this means that Brady is still the more popular of the two QBs (after all, Foles was a backup after the Eagles lost their main QB Carson Wentz).
Now time to analyze Alexandria Ocasio-Cortez and Rick Scott, two politicians who got elected to Congress during the 2018 midterm elections. Ocasio-Cortez (D-NY) got elected to the House and Scott (R-FL) got elected to the Senate.
Both Scott’s and Ocasio-Cortez’s graphs have relatively high peaks (100 for Scott and 61 for Ocasio-Cortez) since both had quite eventful elections. Ocasio-Cortez’s graph peaks on the weeks of June 24 and November 4, which was the week of her stunning primary upset against 10-term Democrat Joe Crowley and the week of her eventual election to the House. Scott’s graph also peaks on the week of November 4, which was the week he got elected to the Senate (this was right before the tense recount between him and incumbent Bill Nelson, after which Scott was confirmed the winner). One reason I think Scott’s graph has the higher peak is because his name is the more recognized of the two; after all, Scott was governor of Florida when he got elected to the Senate while Ocasio-Cortez was a relatively unknown bartender when she won the primaries and eventually, the house.
Now time to analyze Meghan Markle and Kate Middleton, two women who had very public (and televised) royal weddings (Markle’s being this year while Middleton’s was in 2011). The women’s husbands also happened to be siblings-Prince William (Middleton’s husband) and Prince Harry (Markle’s husband).
Markle’s graph has a much higher peak than Middleton’s (100 to Middelton’s 17), most likely because her royal wedding was this year, while Middleton’s was in 2011. Unsurprisingly, Markle’s graph peaks on the week of May 13, which was the week of her royal wedding. Some other reasons why Markle’s graph peaks higher than Middleton’s could be because Markle is one of the few Americans to marry into British royalty (Wallis Simpson, who married England’s King Edward VII in 1937, is another notable example), she’s also one of the first biracial royal fiancees, she’s older than Prince Harry (most royal grooms are older than the brides), and she was quite famous in the US having had an extensive acting career on shows like Suits.
The next analysis will be comparing Fred Guttenberg and Andrew Pollack, two Parkland parent-activists who lost their daughters in the Stoneman Douglas shooting.
Both individuals have high peaks (Pollack at 100, Guttenberg at 61) likely because both parents have appeared on several media outlets (CNN, Fox News, etc.) plenty of times since the shooting. One reason I think Pollack’s graph peaks higher than Guttenberg’s is because unlike many of the Parkland students and parents, he isn’t campaigning for tighter gun laws. A photo of Pollack in a Trump 2020 shirt also got considerable attention during the few days after the shooting-this could also explain the higher peak.
Finally, let’s analyze Mikaela Shiffrin and Maia Shibutani, two female participants of this year’s Winter Olympics in PyongCheng. Shiffrin is an alpine skier specializing in slalom skiing while Shibutani is a figure skater who competes with her older brother Alex.
Both graphs are pretty stagnant, save for a single peak (Shiffrin’s occurring on the week of February 11 and Shibutani’s occurring on the week of February 18, both during the 2018 Winter Olympics). Shiffrin’s peak is much higher though (100 compared to Shibutani’s 7), likely because Shiffrin won golds and silvers while Shibutani only won bronzes.
Now, before I go, remember that just because a graph fluctuates a lot doesn’t mean the search history metric is always going to be very high. R adjusts the scales on the graphs based on the highest number in a column.
It’s Michael, and today’s lesson will be about predictions for both linear and logistic regression models. I will be using the same dataset that I used for R Analysis 2: Linear Regression & NFL Attendance, except I added some variables so I could create both linear and logistic regression models from the data. Here is the modified dataset-NFL attendance 2014-18
Now, as always, let’s first try to understand our variables:
I described most of these variables in R Analysis 2, but here are what the two new ones mean (I’m referring to the two bottommost variables):
Playoffs-whether or not a team made the playoffs. Teams that made playoffs are represented by a 1, while teams that didn’t make playoffs are represented by a 0. Recall that teams who finished 1st-6th in their respective conferences made playoffs, while teams that finished 7th-16th did not.
Division-What division a team belongs to, of which there are 8:
1-AFC East (Patriots, Jets, Dolphins, Bills)
2-AFC North (Browns, Steelers, Ravens, Bengals)
3-AFC South (Colts, Jaguars, Texans, Titans)
4-AFC West (Chargers, Broncos, Chiefs, Raiders)
5-NFC East (Cowboys, Eagles, Giants, Redskins)
6-NFC North (Packers, Bears, Vikings, Lions)
7-NFC South (Falcons, Saints, Panthers, Buccaneers)
8-NFC West (Seahawks, 49ers, Cardinals, Rams)
I added these two variables so that I could create logistic regression models from the data. In both cases, I used dummy variables (remember those?).
Another function I think will help you in your analyses is sapply. Here’s how it works:
As you can see, you can do two things with supply-find out if there are any missing variables (as seen on the top function) or find out how many unique values there are for a certain variable (as seen on the bottom function). According to the output, there are no missing values for any variables (in other words, there are no blank spots in any column of the spreadsheet). Also, on the bottom function, you can see how many distinct values correspond to a certain variable (e.g. Conference Standing has 16 distinct values).
Before I get into analysis of the models, I want to introduce two new concepts-training data and testing data:
The difference between training and testing data is that training data are used as guidelines for how a model (whether linear or logistic) should make decisions while testing data just gives us an idea as to how well the model is performing. When splitting up your data, a good rule of thumb is 80-20, meaning that 80% of the data should be for training while 20% of the data should be for testing (It doesn’t have to be 80-20, but it should always be majority of the data for training and the minority of the data for testing). In this model, observations 1-128 are part of the training dataset while observations 129-160 are part of the testing dataset.
I will post four models in total-two using linear regression and two using logistic regression. I will start with the logistic regression:
In this model, I chose playoffs as the binary dependent variable and Division and Win Total as the independent variables. As you can see, intercept (referring to Playoffs) and Win Total are statistically significant variables, while Division is not statistically significant. Also, notice the data = train line, which indicates that the training dataset will be used for this analysis (you should always use the training dataset to create the model)
Now let’s create some predictions using our test dataset:
The fitted.results variable calculates the predictions while the ifelse function determines whether each of the observations in our test dataset (observations 129-160) is significant to the model. A 1 under an observation number indicates that the observation has at least a 50% significance to the model while a 0 indicates that the observation has less than a 50% significance to the model.
If we wanted to figure out exactly how significant each observation is to the model (along with the overall accuracy of the model), here’s how:
The misClasificError basically indicates the model’s margin of error using the fitted.results derived from the test dataset. The accuracy is calculated by subtracting 1 from the misClasificError, which turns out to be 87%, indicating very good accuracy (and indicating that the model’s margin of error is 13%).
Finally, let’s plot the model:
We can also predict various what-if scenarios using the model and the predict function. Here’s an example:
Using the AFC South as an example, I calculated the possible odds for a team in that division to make the playoffs based on various possible win totals. As you can see, an AFC South team with 10 or 14 wins is all but guaranteed to make the playoffs, as odds for both of those win totals are greater than 1. However, AFC South teams with only 2 or 8 wins aren’t likely to go to playoffs because the odds for both of those win totals are negative (however 8 wins will fare better than 2).
Let’s try another example, this time examining the effects of 9 wins across all 8 divisions (I chose 9 because 9 wins sometimes results in playoff berths, sometimes it doesn’t):
As you can see, 9 wins will most likely earn a playoff berth for AFC East teams (55.6% chance) and least likely to earn a playoff spot in the NFC West (35.7% chance)
I know it looks like all the lines are squished into one big line, but you can imply that the more wins a team has, the greater its chances are at making the playoffs. The pink line that appears to be the most visible represents the NFC West (Rams, Seahawks, 49ers, Cardinals). Unsurprisingly, the teams likeliest to make the playoffs were the teams with 9 or more wins (expect for the 2017 Seahawks, who finished 9-7 and missed the playoffs).
Now let’s create another logistic regression model that is similar to the last one except with the addition of the Total Attendance variable
The summary output looks similar to that of the previous model (I also use the training dataset for this model), except that this time, none of the variables have asterisks right by them, meaning none of them are statistically significant (which happens when the p-value is above 0.1). Nevertheless, I’ll still analyze this model to see if it is better than my first logistic regression model.
Now let’s create some predictions using the test dataset:
Like our previous model, this model also has a nice mix of 0s and 1s, except this model only has 11 1s, while the previous model had 14 1s.
And now let’s find the overall accuracy of the model:
Ok, so I know 379% seems like crazy accuracy for a logistic regression model. Here’s how it was calculated:
R took the sum of these numbers and divided that sum by 32 to find the average of the fitted results. R then subtracted 1 from the average to get the accuracy measure.
Just as we did with the first model, we can also create what-if scenarios. Here’s an example:
Using the AFC North as an example, I analyzed the effect of win total on a team’s playoff chances while keeping total attendance the same (1,400,000). Unsurprisingly (if total attendance is roughly 1.4 million fans in a given season), teams with a losing record (7-8-1 or lower) are less likely to make the playoffs than teams with a split or winning record (8=8 or higher). Given both record and a total attendance of 1,400,000 fans, the threshold for clinching a playoff berth appears to be 12 or 13 wins (though barring attendance, most AFC North teams fare well with 10, 9, or even 8 wins).
Now here’s another example. this time using the NFC East (and changing both win totals and total attendance):
So given increasing win totals and total attendance, an NFC East team’s playoff chances increase. The playoff threshold here, just as it been with most of my predictions, is 9 or 10 wins.
Now let’s see what happens when win totals increase but attendance goes down (also using the NFC East):
Ultimately (with regards to the NFC East), it’s not total attendance that matters, but a team’s win totals. As you can see, regardless of total attendance, playoff clinching odds increase with higher win totals (win threshold remains at 9 or 10).
And here’s our model plotted:
Now, I know this graph is just about as easy-to-read as the last graph (not very, but that’s how R works), but just like with the last graph, you can draw some conclusions. Since this graph factors in Total Attendance and Win Total (even though only Total Attendance is displayed), you can tell that even though a team’s fanbase may love coming to their games, if the wins are low, so are the playoff chances.
Now, before we start the linear regression models, let’s compare the logistic regression models to see which is the better of the two by analyzing various criteria:
Difference between null & residual deviance
Model 1-73.25 with a decrease of two degrees of freedom
Model 2-115.82 with a decrease of three degrees of freedom
Better model-Model 1
AIC
Model 1-101.86
Model 2-60.483
Better model-Model 2 (41.377 difference)
Number of Fisher Scoring Iterations
Model 1-5
Model 2-7
Better model-Model 1 (less Fisher iterations)
Overall Accuracy
Model 1-87%
Model 2-379%
Better model-Model 1 (379% sounds too good to be true)
Overall better model: Model 1
Now here’s the first linear regression model:
This model has Win Total as the dependent variable and Total Attendance and Conference Standing as the independent variables. This will also by my first model created with multiple linear regression, which is basically linear regression with more than one independent variable.
And finally, let’s plot the model:
In cases of multiple linear regression such as this, I had to graph each independent variable separately; graphing Total Attendance and Conference Standing separately allows us to examine the effects each independent variable has on our dependent variable (Win Total). As you can see, Total Attendance increases with an increasing Win Total while Conference Standing decreases with a decreasing Win Total. Both graphs make lots of sense, as fans are more tempted to come to a team’s games when the team has a high win total and conference standings tend to decrease with lower win totals (an interesting exception is the 2014 Carolina Panthers, who finished 4th in the NFC despite a 7-8-1 record).
In case you are wondering what the layout function does, it basically allowed two graphs to be displayed side by side. I can also alter the function depending on how many independent variables I use; if for instance I used 4 independent variables, I could change c to 2,2 to display the graphs in a 2 by 2 matrix.
Multiple linear regression equations are quite similar to those of simple linear regression, except for an added variable. In this case, the equation would be:
Win Total = 6.366e-6(Total Attendance)-5.756e-1(Conference Standing)+5.917
Now, using the predict function that I showed you for my logistic regression models won’t be very efficient here, so we can go the old-fashioned way by plugging numbers into the equation. Here’s an example:
Regardless of what conference a team is part of, a total attendance of at least 750,000 fans and a bottom seed in the conference should at least bring the team a 1-15 record. For teams with a total attendance of at least 1.1 million fans who fall just short of the playoffs with a 7th seed, a 9-7 record would be likely. Top of the conference teams with an attendance of at least 1.45 million should net a 14-2 record.
Now, let’s see what happens when conference standing improves, but attendance decreases:
According to my predictions, bottom-seeded teams with a total attendance of at least 1.5 million fans should net at least a 6-10 record. However, as conference standings improve and total attendance decreases, predicted records stagnate at either 9-7 or 8-8.
Now here’s my second linear model:
In this model, I used two different independent variables-Home Attendance and Average Age of Roster-but I still used Win Total as my dependent variable.
The equation goes like this:
Win Total = 1.051e-5(Home Attendance)+5.534e-1(Average Age of Roster)-1.229e+1
Now just like I did with both of my logistic regression models and the linear regression model, let’s create some what-if scenarios:
In this scenario, home attendance is increasing along with the average age of roster. Win total also increases with a higher average age of roster. For instance, teams with a home attendance of at least 350,000 fans and an average roster age of 24 (meaning the team is full of rookies and other fairly-fresh faces) should expect at least a 5-11 record. On the other hand, teams with a roster full of veterans (yes, 28.5 is old for an average roster age) and a home attendance of at least 1.2 million fans should expect a perfect 16-0 season.
Now let’s try a scenario where home attendance decreases but average age of roster increases:
In this scenario, when home attendance decreases but average age of roster increases, a team’s projected win total also goes down. For teams full of fresh-faces and breakout stars (average age 24) and a home attendance of at least 1.1 million fans, a 13-3 record seems likely. On the other hand, for teams full of veterans (average age 28.5) and a home attendance of at least 300,000 fans, a 7-9 record appears in reach.
One thing to keep in mind with my linear regression predictions is that I rounded projected win totals to the nearest whole number. So I got the 13-3 record projection from the 12.5526 output.
Now let’s plot the model:
Just as I did with linear1, I graphed the two independent variables separately, not only because it’s the easiest way to graph multiple linear regression but also because we can see each variable’s effect on Win Total. As you can see, Home Attendance and Average Age of Roster increases with an increasing win total, though the increase in Average Age of Roster is smaller than that of Home Attendance. Each scenario makes sense, as teams are likelier to have a higher win total if they have more supportive fans in attendance (particularly in their 7 or 8 home games per season) and having more recognizable veterans on a team (like the Saints with QB Drew Brees or the Broncos with LB Von Miller) will be better for the team’s overall record than having a team full of newbies (like the Browns with QB Baker Mayfield or the Giants with RB Saquon Barkley).
The Home Attendance numbers are displayed in scientific notation, which is how R displays large numbers. 1e+05 is 100,000, 3e+05 is 300,000, and so on.
Now, before I go, let’s compare the two linear models:
Residual Standard Error
Model 1-1.09 wins
Model 2-2.948 wins
Better Model-Model 1 (less deviation)
R-Squared (Multiple and Adjusted respectively)
Model 1-88.72% and 88.58%
Model 2-17.49% and 16.44%
Better Model-Model 1 (much higher than Model 2)
F-statistic & P-Value (since there are 2 degrees of freedom, this is an important metric)
Model 1-617.5 on 2 and 157 degrees of freedom; 2.79e-7
Model 2-16.64 on 2 and 157 degrees of freedom; 2.79e-7
Better Model-Model 1 (both result in the same p-value, but the f-statistic on Model 1 is much larger)
It’s Michael, and today’s post will be an R analysis post using the concept of linear regression. The dataset I will be using NFL attendance 2014-18, which details NFL attendance for each team from the 2014-2018 NFL seasons along with other factors that might affect attendance (such as average roster age and win count).
First, as we should do for any analysis, we should read the file and understand our variables:
Team-The team name corresponding to a row of data; there are 32 NFL teams total
Home Attendance-How many fans attend a team’s home games (the NFL’s International games count towards this total)
Road Attendance-How many fans attend a team’s road games
Keep in mind that teams have 8 home games and 8 away games.
Total Attendance-The total number of fans who go see a team’s games in a particular season (attendance for home games + attendance for away games)
Win Total-how many wins a team had for a particular season
Win.. (meaning win percentage)-the precent of games won by a particular team (keep in mind that ties are counted as half-wins when calculating win percentages)
NFL Season-the season corresponding to the attendance totals (e.g. 2017 NFL season is referred to as simply 2017)
Conference Standing-Each team’s seeding in their respective conference (AFC or NFC), which ranges from 1 to 16-1 being the best and 16 being the worst. The teams that were seeded 1-6 in their conference made the playoffs that season while teams seeded 7-16 did not; teams seeded 1-4 won their respective divisions while teams seeded 5 and 6 made the playoffs as wildcards.
As the 2018 season is still in progress, these standings only reflect who is LIKELY to make the playoffs as of Week 11 of the NFL season. So far, no team has clinched a playoff spot yet.
Average Age of Roster-The average age of a team’s players once the final 53-man roster has been set (this is before Week 1 of the NFL regular season)
One thing to note is that I removed the thousands separators for the Home Attendance, Road Attendance, and Total Attendance variables so that they would read as ints and not factors. The file still has the separators though.
Now let’s set up our model (I’m going to be using three models in this post for comparison purposes):
In this model, I used Total Attendance as the dependent variable and Win Total as the independent variables. In other words, I am using this model to determine if there is any relationship between fans’ attendance at a team’s games and a team’s win total.
Remember how in R Lesson 7 I mentioned that you should pay close attention to the three bottom lines in the output? Here’s what they mean for this model:
As I mentioned earlier, the residual standard error refers to the amount that the response variable (total attendance) deviates from the true regression line. In this case, the RSE is 1,828,000, meaning the total attendance deviates from the true regression line by 1,828,000 fans.
I didn’t mention this in the previous post, but the way to find the percentage error is to divide the RSE by the average of the dependent variable (in this case, Total Attendance). The lower the percentage error, the better.
In this case, the percentage error is 185.43% (the mean for Total Attendance is 985,804 fans, rounded to the nearest whole number).
The R-Squared is a measure of the goodness-of-fit of a model-the closer to 1, the better the fit. The difference between the Multiple R-Squared and the Adjusted R-Squared is that the former isn’t dependent on the amount of variables in the model while the latter is. In this model, the Multiple R-Squared is 20.87% while the Adjusted R-Squared is 20.37%, indicating a very slight correlation.
Remember the idea that “correlation does not imply causation”, which states that even though there may be a strong correlation between the dependent and independent variable, this doesn’t mean the latter causes the former.
In the context of this model, even though a team’s total attendance and win total have a very slight correlation, this doesn’t mean that a team’s win total causes higher/lower attendance.
The F-squared measures the relationship (or lack thereof) between independent and dependent variables. As I mentioned in the previous post, for models with only 1 degree of freedom, the F-squared is basically the independent variable’s t-value squared (6.456²=41.68). The F-Squared (and resulting p-value) aren’t too significant to determining the accuracy of simple linear regression models such as this one, but are more significant when dealing with with multiple linear regression models.
Now let’s set up the equation for the line (note the coef function I mentioned in the previous post isn’t necessary):
Remember the syntax for the equation is just like the syntax of the slope-intercept equation (y=mx+b) you may remember from algebra class. The equation for the line is (rounded to 2 decimal places):
Total Attendance = 29022(Win Total)+773943
Let’s try the equation out using some scenarios:
“Perfect” Season (no wins): 29022(0)+773943=expected total attendance of 773,943
Split Season (eight wins): 29022(8)+773943=expected total attendance of 1,006,119
Actual Perfect Season (sixteen wins): 29022(16)+773943=expected total attendance of 1,238,295
And finally, let’s create the graph (and the regression line):
As seen in the graph above, few points touch the line (which explains the low Multiple R-Squared of 20.68%). According to the regression line, total attendance INCREASES with better win totals, which indicates a direct relationship. One possible reason for this is that fans of consistently well-performing teams (like the Patriots and Steelers) are more eager to attend games than are fans of consistently struggling teams (like the Browns and Jaguars). An interesting observation would be that the 2015 4-12 Dallas Cowboys had better total attendance than the 2015 15-1 Carolina Panthers had. The 2016 and 2017 Cleveland Browns fared pretty well for attendance-each of those seasons had a total attendance of at least 900,000 fans (the records were 1-15 and 0-16 respectively).
Let’s create another model, once again using Total Attendance as the dependent variable but choosing Conference Standing as the independent variable:
So, is this model better than lr1? Let’s find out:
The residual standard error is much smaller than that of the previous model (205,100 fans as opposed to 1,828,000). As a result, the percentage error is much smaller-20.81%-and there is less variation among the observation points around the regression line.
The Multiple R-Squared and Adjusted R-Squared (0.4% and -0.2% respectively) are much lower than the R-Squared amounts for lr1. Thus, there is even less of a correlation between Total Attendance and Conference Standing than there is between Total Attendance and Win Total (for.a particular team).
Disregard the F-statistic and p-value.
Now let’s set up our equation:
From this information, we get the equation:
Total Attendance = -2815(Conference Standing)+1009732
Here are some scenarios using this equation:
Top of the conference (1st place): -2815(1)+1009732=expected total attendance of 1,006,917
Conference wildcard (5th place): -2815(5)+1009732=expected total attendance of 995,657
Bottom of the pack (16th place): -2815(16)+1009732=expected total attendance of 964,692
Finally, let’s make a graph:
As seen in the graph, few points touch the line (less points touch this line than in the line for lr1). The line itself has a negative slope, which implies that total attendance DECREASES with WORSE conference standings (or increases with better conference standings). Yes, I know the numbers under conference standing are increasing, but keep in mind that 1 is the best possible conference finish for a team, while 16 is the worst possible finish for a team. One possible reason that total attendance decreases with lower conference standings is that fans are possibly more enticed to come to games for consistently top conference teams and division winners (like the Patriots and Panthers) rather than teams who miss playoffs year after year (like the Jaguars, save for the 2017 squad that made it to the AFC Championship). Interestingly enough, the 2015 4-12 Dallas Cowboys rank second overall in total attendance (16th place in their conference), just behind the 2016 13-3 Dallas Cowboys (first in their conference).
Now let’s make one more graph, this time using Average Age of Roster as the independent variable:
Is this model better than lr2? Let’s find out:
The residual standard error is the smallest one of the three (204,600 fans) and thus, the percentage error is the smallest of the three-20.75%.
The Multiple R-Squared and Adjusted R-Squared are smaller than those of lr1 but larger than that of lr2 (0.84% and 0.22% respectively). Thus, Average Age of Roster correlates better with Total Attendance than does Conference Standing, however Win Total correlates the best with Total Attendance.
Once again, disregard the F-Statistic & corresponding p-value.
Now let’s create the equation:
Total Attendance = 36556(Average Age of Roster)+33594
Here are some scenarios using this equation
Roster with mostly rookies and 2nd-years (an average age of 24)=36556(24)+33594=expected total attendance of 910,938
Roster with a mix of newbies and veterans (an average age of 26)=36556(26)+33594=expected total attendance of 984,050
Roster with mostly veterans (an average age of 28)=36556(28)+33594=expected total attendance of 1,057,162
And finally, let’s create a graph:
Like the graph for lr2, few points touch the line. As for the line itself, the slope is positive, implying that Total Attendance INCREASES with an INCREASING Average Age of Roster. One possible reason for this is that fans are more interested in coming to games if the team has several veteran stars* (names like Phillip Rivers, Tom Brady, Jordy Nelson, Antonio Gates, Rob Gronkowski, Richard Sherman, Julius Peppers, Marshawn Lynch and many more) rather than if the team is full of rookies and/or unknowns (Myles Garrett, Sam Darnold, Josh Rosen, Leighton Vander Esch, among others). Interestingly enough, the team with the oldest roster (the 2018 Oakland Raiders, with an average age of 27.4), have the second lowest total attendance, just ahead of the 2018 LA Chargers (with an average age of 25.8).
*I’ll use any player who has at least 6 seasons of NFL experience as an example of a “veteran star”.
So, which is the best model to use? I’d say lr1 would be the best model to use, because even though it has the highest RSE (1,828,000), it also has the best correlation between the independent and dependent variables (a Multiple R-Squared of 20.87%). All in all, according to my three analyses, a team’s Win Total has the greatest influence on how many fans go to their games (both home and away) during a particular season.
Thanks for reading, and happy Thanksgiving to you all. Enjoy your feasts (and those who are enjoying those feasts with you),