Hello everybody,
It’s Michael, and today’s post will be on time series analysis, which analyzes time-dependent data (such as the weather in Miami, Florida over the course of 2018 or the Cleveland Browns’ records over the last decade or the price of bitcoin in the last 2 years, just to give some examples) over a certain period of time.
In the post, I will be utilizing search data from Google Trends to analyze how often certain famous people are searched. For those that don’t know, Google Trends is a fascinating tool that allows you to see how often something (whether a food, person, event, animal, etc.) is searched on Google over a certain timeframe (all the way back to January 1, 2004). Google Trends also has several fascinating analyses, like The Year in Search (which details the most popular worldwide Google searches in a given year).
Here’s the spreadsheet-Google Trends-I swapped out <1s for 0s so that R would be everything as an int and not a factor.
Anyway, I’ll be making several graphs, analyzing two people at a time in each graph that have something in common.
Now, let’s load our file and try to understand the data:

This data basically consists of 52 dates (shown by the Week variable), and the search popularity in the US for 22 people over the last year (from the week of December 17, 2017 to December 9, 2018). The numbers 0-100 are used as a metric to determine how often a certain person’s name was searched in a given week; 0 means there either wasn’t enough data or that person’s name wasn’t searched at all while 100 means the person’s name was searched for (presumably) millions of times that week. All the dates listed are Sundays (12/17/17, 12/24/17, etc.), meaning in this case, a week is measured from Sunday-Saturday,
Now before we start graphing, we need to be sure the strings in the Week variable are converted to dates for the purpose of the graph, which is what this line does (more specifically, dates are converted into month/day/year format-exactly the way they are listed in the spreadsheet)
Now time to graph (remember to install the ggplot2 package). I will be looking at two people (who have something in common) at a time and doing a comparative analysis.
I’ll start by analyzing Jared Fogle and Bill Cosby-two celebrities who had very public falls from grace and are both currently incarcerated.
As you can see, Bill Cosby was more popular than Jared Fogle in American Google Searches. This is likely because Fogle has been incarcerated for his crimes since November 2015, while Cosby was re-tried, convicted and ultimately sent to prison in the span of five months (April-September 2018). Cosby had plenty of legal drama this year, which could explain the greater fluctuation in his graph (compared to Fogle’s). Cosby also has two major peaks in his search history graph for the weeks of April 22 and September 23-the weeks he was convicted and sent to prison, respectively.
- The peaks aren’t the only things you should be analyzing. Check the numbers on the y-axis to get an idea of the maximum search history metric. For instance, Jared Fogle’s highest search metric is 1, while Bill Cosby’s is 100. This indicates that more Americans searched Cosby’s name than Fogle’s (Cosby was also the more newsworthy of the two this past year)
Now let’s analyze US search history trends for Kyle Kulinski and Ana Kasparian-two famous left-wing commentators. Kulinski hosts the Secular Talk YouTube channel while Kasparian is a member of progressive YouTube news channel The Young Turks.
As you can see, even though Kasparian’s graph has more fluctuation than Kulinski’s, searches for Kulinski’s name were more popular than searches for Kasparian because the search history metric for Kulinski goes above 75 two times, while the metric for Kasparian doesn’t exceed 25. The discrepancy between Kulinski’s search history metric and Kasparian’s could be because more people subscribe to Secular Talk than The Young Turks (I’m just theorizing here).
Now let’s analyze the search metric history for Mike Shinoda and Chester Bennington-two members of Linkin Park.
As you can see, Chester Bennington’s highest metric is 100, while Mike Shinoda’s highest metric is 44. I’m guessing the reason Bennington’s metric is higher is that many people still enjoy listening to Linkin Park’s music-and hear his voice-after his death. Also worth noting is that Bennington’s metric peaked on the week of July 15, which was around the one-year anniversary of his death on 7-20-17. Shinoda’s search history metric peaked on the week of June 17, which was when his solo album Post Traumatic was released in its entirety (and which he created after Bennington’s death).
Now time to compare the American search metric history for JaMarcus Russell and Ryan Leaf-two of the biggest NFL busts of all time (and both quarterbacks).
As you can see, Leaf’s graph has more fluctuation than Russell’s, but Russell’s graph peaks at 100 while Leaf’s only peaks at 26. Then again, the search history metric average for Russell is 5.7 and for Leaf is only 5.3, meaning neither individual’s name widely pops up in US Google Searches. However, the one thing that can explain Russell’s peak of 100 on the week of November 4 could be this article with an interesting story about Russell-https://bleacherreport.com/articles/2804453-david-diehl-raiders-gave-jamarcus-russell-blank-tapes-to-see-if-qb-watched-film.
Now let’s compare the search history metrics of Dwayne Wade and Hassan Whiteside, two current Miami Heat players.
As you can see, Wade’s graph peaks higher than Whiteside’s (100 to Whiteside’s 20). This is likely because Wade had a more eventful year than Whiteside, as he returned to the Heat (week of February 4), announced his retirement (week of September 16), welcomed another baby (week of November 4), and played in his 1000th career game (week of December 9).
Now time to analyze the search history metrics for Samuel J Comroe and Shin Lim-two contestants on AGT Season 13. Samuel J Comroe was a stand-up comedian who finished in 4th place, while Shin Lim was a close-up magician who finished as the season’s winner.
As you can see, Shin Lim’s peak is much higher than Samuel J Comroe’s (100 to 8, respectively). Neither contestant has much fluctuation in their graphs, but both peak on the week of September 16 (this was the week of the AGT Finals, which both Comroe and Lim competed in and finished in the Top 5).
Now let’s analyze the search history metrics for Tom Brady and Nick Foles-the two starting quarterbacks for Super Bowl LII.
As you can see, neither QB’s graph fluctuates much. Both graphs hit their peaks on the weeks of January 21 (AFC/NFC Championships) and February 4 (Super Bowl LII). Interestingly enough, Brady’s graph has the higher peak (100 to Foles’s 54), even though Foles and the Eagles won the Super Bowl. I guess this means that Brady is still the more popular of the two QBs (after all, Foles was a backup after the Eagles lost their main QB Carson Wentz).
Now time to analyze Alexandria Ocasio-Cortez and Rick Scott, two politicians who got elected to Congress during the 2018 midterm elections. Ocasio-Cortez (D-NY) got elected to the House and Scott (R-FL) got elected to the Senate.
Both Scott’s and Ocasio-Cortez’s graphs have relatively high peaks (100 for Scott and 61 for Ocasio-Cortez) since both had quite eventful elections. Ocasio-Cortez’s graph peaks on the weeks of June 24 and November 4, which was the week of her stunning primary upset against 10-term Democrat Joe Crowley and the week of her eventual election to the House. Scott’s graph also peaks on the week of November 4, which was the week he got elected to the Senate (this was right before the tense recount between him and incumbent Bill Nelson, after which Scott was confirmed the winner). One reason I think Scott’s graph has the higher peak is because his name is the more recognized of the two; after all, Scott was governor of Florida when he got elected to the Senate while Ocasio-Cortez was a relatively unknown bartender when she won the primaries and eventually, the house.
Now time to analyze Meghan Markle and Kate Middleton, two women who had very public (and televised) royal weddings (Markle’s being this year while Middleton’s was in 2011). The women’s husbands also happened to be siblings-Prince William (Middleton’s husband) and Prince Harry (Markle’s husband).
Markle’s graph has a much higher peak than Middleton’s (100 to Middelton’s 17), most likely because her royal wedding was this year, while Middleton’s was in 2011. Unsurprisingly, Markle’s graph peaks on the week of May 13, which was the week of her royal wedding. Some other reasons why Markle’s graph peaks higher than Middleton’s could be because Markle is one of the few Americans to marry into British royalty (Wallis Simpson, who married England’s King Edward VII in 1937, is another notable example), she’s also one of the first biracial royal fiancees, she’s older than Prince Harry (most royal grooms are older than the brides), and she was quite famous in the US having had an extensive acting career on shows like Suits.
The next analysis will be comparing Fred Guttenberg and Andrew Pollack, two Parkland parent-activists who lost their daughters in the Stoneman Douglas shooting.
Both individuals have high peaks (Pollack at 100, Guttenberg at 61) likely because both parents have appeared on several media outlets (CNN, Fox News, etc.) plenty of times since the shooting. One reason I think Pollack’s graph peaks higher than Guttenberg’s is because unlike many of the Parkland students and parents, he isn’t campaigning for tighter gun laws. A photo of Pollack in a Trump 2020 shirt also got considerable attention during the few days after the shooting-this could also explain the higher peak.
Finally, let’s analyze Mikaela Shiffrin and Maia Shibutani, two female participants of this year’s Winter Olympics in PyongCheng. Shiffrin is an alpine skier specializing in slalom skiing while Shibutani is a figure skater who competes with her older brother Alex.
Both graphs are pretty stagnant, save for a single peak (Shiffrin’s occurring on the week of February 11 and Shibutani’s occurring on the week of February 18, both during the 2018 Winter Olympics). Shiffrin’s peak is much higher though (100 compared to Shibutani’s 7), likely because Shiffrin won golds and silvers while Shibutani only won bronzes.
Now, before I go, remember that just because a graph fluctuates a lot doesn’t mean the search history metric is always going to be very high. R adjusts the scales on the graphs based on the highest number in a column.
Thanks for reading and happy holidays,
Michael