Python Program Demo 1: Using the Jupyter Notebook

Advertisements

Hello everybody,

It’s Michael, and here’s my last post for 2019. Today’s post will be a Python program demo on how to use the Jupyter Notebook in Python. I figured a Python program demo would be appropriate since I didn’t do one during my series of Python posts back in August, plus the Jupyter notebook is another great Python IDE (to refresh your memory, IDE stands for Integrated Development Environment, which is the tool you use to write and run code).

  • The word Jupyter is a portmanteau of the words Julia, Python, and R, which are the three programming languages that the Jupyter notebook was originally intended for; Jupyter notebooks now support many more languages. Just a little fun fact for you guys.
    • Also, I didn’t know there was a programming language called Julia. Apparently it’s another data science-oriented language.

Now, if you installed Anaconda, here’s where you would find the Jupyter notebook:

Screen Shot 2019-12-21 at 12.22.08 PM

If you want to launch the Jupyter notebook, you would first click on the icon that says Jupyter-not Jupyter Lab-which can be found on the Anaconda home page (which is the page that appears on this screenshot).

However, keep in mind that you won’t see the notebook right away when you click the launch button. Rather, the command prompt will open; this is because the Jupyter notebook isn’t automatically included with Anaconda, so you will need to install the Jupyter notebook via the command prompt using three simple words-pip install jupyter. You can swap pip for conda if your Anaconda installation came with an Anaconda command prompt (mine didn’t, but that could be because I’m using a Mac).

Once the installation is complete, type this URL to your browser-http://localhost:8888/tree. Once you’ve done so, you should see the Jupyter notebook’s landing page:

  • Keep in mind that after you first install the Jupyter notebook, you will need to open the Jupyter notebook from the Anaconda home page whenever you want to use a Jupyter notebook in the future. You could also copy and paste this URL-http://localhost:8888/tree-in your browser, but you must have Anaconda open for this to work.
  • Another interesting tidbit about the Jupyter notebook is that even though it runs in a web browser, you can use the Jupyter notebook without Wi-Fi. This would come in handy if you want to work on some Python code if the internet is out at your place.

The home page for the Jupyter notebook simply shows all of your folders; your home page will look different depending on what folders you have in your computer.

Now, to create a new Jupyter notebook you would first click on the New button, then click on the Python 3 option in the dropdown. You could click any of the other 3 options (Text File, Folder, or Terminal) if you didn’t want a Python file, but for now, I’m going to focus on Python files.

  • Python 3 is the current version of Python (as of December 22, 2019), but depending on when you are reading this post, Python 3 could be long-deprecated by then.

Now, here’s what an empty Python file looks like:

The blank text box is called a cell; this is the place where you would write and run your code. To run your code, click the run button at the top of the page.

  • If you have an error in your code, an error message would be displayed.
  • You can change the Untitled header to a name of your choice.

Now, here’s what a Python file looks like with some code:

As you can see, I have some simple test Python code here. Everything ran just fine, but you might be wondering why the text Here is some test code isn’t in a cell. This is because I used a setting called Markdown, which you can see in the dropdown in the toolbar. There are four settings in the dropdown box-Code, Markdown, Raw NBConvert  and Heading-but the two I will use the most are Code and Markdown. Code formats the cells as regular, runnable Python code, while Markdown formats the cells as regular text. Markdown is good if you want to write notes with your code (OK, you can also use the hashtag/pound sign besides lines of code to denote comments, but Markdown makes for readable notes).

Now here’s what the Jupyter notebook homepage looks like with this test file:

Regarding this picture, here are two important things to know:

  • The green icon means that the notebook is currently running. To shut the notebook down, click on the checkbox to the left of the icon, then click the Shutdown button that appears after you click the checkbox.
    • For the most part, the Jupyter notebook will save your code automatically, but just in case, click the floppy disk icon before you exit the Python file to ensure that your code will be saved (after all, technology isn’t perfect)
  • Test File is saved with a .ipynb extension, which is the extension that all Jupyter notebook files use. One thing to note is that even though .ipynb files will show up amongst all of your other files, you can only use the Jupyter notebook to open files with this extension.

Personally, I think the Jupyter notebook is a neat tool, but I also like Spyder. Jupyter notebooks will work especially well if you’re doing data analytics with Python (Python data analytics series coming in 2020). However, I’d stick to Spyder if I’m trying to make something like an app or a game.

Thank you all for reading and following this blog in 2019. Hope you all have a great holiday season, and I can’t wait to give you more amazing programming/data analytics content in 2020 (yes, just 9 days left in this decade. Crazy stuff).

Happy holidays and see you all in 2020,

Michael

 

 

 

R Analysis 7: Time-Series Data and Streaming Services

Advertisements

Hello everybody,

Michael here, and today’s post will be an R analysis on streaming services using time-series data (yes, it’s my first R post since July 27). For this analysis, I will be analyzing Google Trends data from the past year (12/7/18-12/6/19) for ten major streaming services-two of which haven’t launched yet.

  • Don’t worry guys, I wasn’t paid by any of these services for this post. I just thought this would be an interesting topic to analyze since there seems to be so many streaming services on the way.

As we should always do in R, let’s remember to read our csv of the data (Streaming Services) and understand our variables:

The logic here is the same as that of the time-series analysis I did in R Lesson 9: Time Series Data, except you would replace the names of people with the names of streaming services. The numbers still represent search metrics, and 100 is still the highest while 0 is still the lowest. Also, just as with the aforementioned R post, I replaced any instances of <1 with 0. The Week variable is still the same, and we still have to convert it to a date as follows:

  • The weeks also start on Sunday, so the first week listed is 12/9/2018 while the last week listed is 12/1/2019

Now that we’ve explained our variables, let’s get ready to graph by first installing the ggplot2 package.

  • Also keep in mind that just because a graph fluctuates a lot, that doesn’t mean all graphs have the same maximum and/or minimum.

Next, let’s start by looking at the graph for the first service listed, HBO Max (this is one of the services that hasn’t launched yet, but will debut in May 2020):

  • Maximum-100
  • Mean-9.6
  • Minimum-0

The first streaming service that I will analyze is HBO Max, which is scheduled to launch in May 2020 (exact date TBA). HBO Max is expected to have a massive library of content not only consisting of all of HBO’s programming but also content owned by WarnerMedia, HBO’s parent company. That includes content from Cartoon Network, TBS, CNN, and other networks.

Three popular shows that will be on HBO Max are Friends, The Big Bang Theory, and South Park. In fact, HBO Max had a search metric of 100 on the week of October 27, 2019-the week the service had secured the rights to all 23 seasons of South Park, plus the rights to stream any new episodes of South Park 24 hours after they air on Comedy Central.

  • Maximum-100
  • Mean-25.4
  • Minimum-20

The next service I will be analyzing is Disney+, which launched on November 12, 2019. Even though the average and minimum Disney+ search metric is higher than those of HBO Max, the overall search metric remains surprisingly low until the week of November 10, 2019 (the week Disney+ launched). I thought Disney+ would’ve trended much higher on the weeks of March 17, and April 7, 2019, as those were the weeks of Disney’s FOX acquisition and Disney+ securing the rights to stream all three decades of The Simpsons, respectively.

  • Maximum-100
  • Mean-55.6
  • Minimum-47

The next service I will analyze is Netflix, which was one of the first services to stream original content. Granted, the service has been around since 1997, but Netflix didn’t start streaming their own content until 2013 (they simply carried other media companies’ content). The average and minimum search metric is higher than those of either Disney+ or Netflix, which implies that public interest in Netflix has not waned, despite the rise of multiple other streaming services.

  • Maximum-100
  • Mean-58.2
  • Min-44

The next service I will analyze is Hulu, which has a higher average search metric than Netflix but a lower minimum metric. Interestingly, Hulu’s search metric hit 100 on the week of November 10, 2019-the same week Disney+’s metric hit 100. The only possible theory I can give as to why this is the case is that Disney+ offered Disney+, Hulu, and ESPN+ as a $12.99/month bundle for all three services upon the launch of Disney+ (remember that all three services are all owned by Disney).

Another tidbit to note is that Hulu surprisingly didn’t peak on the week of May 12, 2019, when it was announced Disney would be taking full control of Hulu (Disney previously shared a stake in Hulu with other media companies such as Comcast and FOX).

  • Maximum-100
  • Mean-40.9
  • Minimum-20

The next service I will be analyzing is CBS All Access. This service isn’t as broad as the others I previously analyzed with regards to content, as CBS All Access consists of mostly CBS programming such as Criminal Minds and The Late Show with Stephen Colbert. To their credit, they do have original programming, such as Star Trek Discovery, but they are about to lose streaming rights to The Big Bang Theory to HBO Max. Another thing to note is that the search metric for CBS All Access hit 100 on the week of February 3, 2019, which was the week of Super Bowl LIII (CBS had the rights to broadcast the Super Bowl in 2019, and CBS All Access broadcast the game live).

  • Maximum-100
  • Mean-38.9
  • Minimum-26

The next service I will analyze is Apple TV+, which is Apple’s streaming platform that launched on November 1, 2019-just 11 days before Disney+. Apple TV+ also happens to be the narrowest streaming service with regards to content, as there were only eight original series plus a documentary at launch; unlike the other streaming services analyzed here, Apple TV+ doesn’t carry content from other networks.

An interesting tidbit about Apple TV+ is that its search metric hit 100 on the week of November 10, 2019 (the week of the Disney+ launch)-not the week of October 27, 2019, when Apple TV+ launched.

  • Maximum-100
  • Mean-79
  • Minimum-66

The next service I will analyze is Amazon Prime Video, which is the Amazon counterpart of Netflix. So far, Amazon Prime Video has the highest mean and minimum search metric of all the streaming services I analyzed. I find this interesting considering it’s not a new service (it’s been around since 2006) and that, unlike services such as HBO Max, they haven’t acquired the streaming rights to any major shows such as the Big Bang Theory or South Park.

  • Maximum-100
  • Mean-8.4
  • Minimum-0

The next service I will analyze is Peacock, which, like HBO Max, has yet to launch (Peacock is scheduled for an April 2020 debut-exact date TBA). Peacock is Universal’s streaming service, thus it will contain Universal movies (e.g. Fast and Furious series) and NBC programming such as Parks and Recreation and The Office (remember that NBC is owned by Universal).

The mean and minimum search metrics for Peacock are the lowest of all the streaming services I’ve analyzed so far, which isn’t surprising given the name and launch date weren’t even announced until the week of September 15, 2019 (when the search metric hit 100). A possible reason for the high search metric that week could be that, aside from the launch date and name announcement, the service will be free to pay-TV subscribers (although the free version will have ads)

  • Maximum-100
  • Mean-38.5
  • Minimum-15

The next service I will analyze is DC Universe, which is the most niche service I’ve analyzed so far, as DC Universe carries nothing but DC Comics content-both original programming starring DC Comics characters and DC Comics movies. Given this service’s highly narrow scope of content, I’m surprised its mean and minimum search metrics are higher than those of Peacock.

  • Maximum-100
  • Mean-65.5
  • Minimum-45

The last service I will analyze is ESPN+, which, like DC Universe, has a narrow scope of content (albeit their focus is sports content rather than comic book content). However, the mean and minimum search metric are higher than those for DC Universe.

The only major event for ESPN+ occurred on the week of August 4, 2019, when it was announced that Disney would offer a three-service bundle for ESPN+, Disney+, and Hulu for the price of $12.99/month for the trio.

As always, thanks for reading,

Michael