Automatically adding IMDb TV Shows information to a Notion database

Automatically adding IMDb TV Shows information to a Notion database

Netflix, Amazon Prime Video, Disney+ and more. We have never had so many options of TV shows to watch at our disposal. At the same time picking the best next candidate has become increasingly difficult, we want to get the best out of the time we invest in front of the TV but at the same time we don’t want to always allow Netflix to choose what we should watch next for us. If you find yourself in a similar situation, in this post we will check how to keep track of a TV Shows watch list and get information from IMDb so we can decide what is going to be our next marathon candidate.

Template

image

We searched for the best Notion database template to track our TV Shows and we think the free TV Show Tracker template by Noelle is one of the best. She wrote a quick blog post about it in her blog explaining her motivations and how she came up with the estructure. If you find the template useful consider buying her a coffee as a show of appreciation.

We will be making a couple of changes to the database to add additional information. We want to show the rating from IMDb and current popularity for each TV Show, so we need to create two additional properties of type number. Notion2Sheets requires the database to have both Created time and Last edited time, so go ahead and add those four additional columns to your database.

image

Synchronizing the database with Google Sheets

We plan to get the IMDb rating and popularity from IMDb and update our Notion database with that information. Doing this manually would take some time and it is a tedious task. To save time we will use a feature from Notion2Sheets called synced columns. Synced columns allow us to send data from Google Sheets to Notion, think of it as something similar to a formula property in Notion, where the data comes from Google Sheets.

If you don’t have Notion2Sheets installed yet, you can do it so from Google Workspace Marketplace and watch this 3 minutes YouTube video on how to use it.


Using IMPORTXML to get the information we want from IMDb

If you haven’t heard about =IMPORTXML formula before, it basically allows you to import data from any of various structured data types including XML, HTML and CSV among others. In our case we are going to use it to extract the rating and popularity from IMDb, here you can see where the information is for Prison Break.

image

IMPORTXML accepts two parameters, the first one a url pointing to the HTML page that you want to parse, and the second one is the xpath to query the HTML document and get the data that we want. It sounds complicated but with the help of Chrome we will be able to get the path easily and the only thing we will have to do is a copy and paste.

The way of achieving it is doing a right click on the element where the data is, in our case the IMDb rating, and then selecting Inspect which will open the development tools and allow us to see the HTML document. As you can see in the next screenshot, there is an element that contains the 8.3 value that we want to extract. We then do a right click -> Copy -> Copy XPath.

image

The result of this action will be the xpath that we will use with IMPORTXML to get the information, it looks like this: //*[@id="__next"]/main/div/section[1]/section/div[3]/section/section/div[1]/div[2]/div/div[1]/a/div/div/div[2]/div[1]/span[1]. We will make a small adjustment replacing the double quotes for single quotes and repeating the same process for the popularity. As a bonus we will do the same for episodes and seasons. If you are having troubles during this step, here you can see the four xpath that we will be using.

  1. IMDb rating: //*[@id='__next']/main/div/section[1]/section/div[3]/section/section/div[1]/div[2]/div/div[1]/a/div/div/div[2]/div[1]/span[1]
  2. Popularity: //*[@id='__next']/main/div/section[1]/section/div[3]/section/section/div[1]/div[2]/div/div[3]/a/div/div/div[2]/div[1]
  3. Episodes: //*[@id='__next']/main/div/section[1]/div/section/div/div[1]/section[1]/div[1]/a/h3/span
  4. Seasons: //*[@id='browse-episodes-season']

Getting the data in Google Sheets

We have our database in sync with Google Sheets and we now know how to get the data that we want. The only manual task that we will need for each TV Show is getting the IMDb link so we can extract the information. IMDb search functionality is great and super fast, it took me 2 minutes to get the links for all the US TV shows in the database.

To improve the clarity of our formulas, let’s create a sheet named XPaths and copy a paste each XPath in it so we can reference them in IMPORTXML easily. We can also use named ranges to give the cell a name.

image

If you have come this far congratulations, hopefully you have learnt some new tricks to add to your tools belt. Now comes the easy part, adding the formulas to our columns and let Google Sheets do the work for us. In this example G is the column where we have the urls to IMDb.

  1. IMDb rating: =IMPORTXML(G2, Rating)
  2. Popularity: =IMPORTXML(G2, Popularity)
  3. Episodes: =IMPORTXML(G2, Episodes)
  4. Seasons: =INDEX(IMPORTXML(G2, Seasons), 1, 2)

You can click to open the GIF full screen.

image

Sending the data to Notion

The last and most important step, we want to send all this information to Notion. For now Notion2Sheets only allows the creation of one synced column for free, but you can still send the data for all four columns!

The process will be simple, we will create a synced column first for IMDb rating and once we get the email confirmation that the sync has finished we will remove the synced column. We will repeat this process for the remaining three columns.

If you want to keep one of the columns in sync and updated automatically, for example Popularity (which is probably the one that changes more frequently) then create that one last and then don’t remove it. Notion2Sheets will update the values in Notion automatically hourly. You can check the process of creating the synced columns and how Notion gets updated in the next video.

And we are done 🙌 ! We have our Notion TV Shows Tracker database with number of seasons, episodes, IMDb rating and popularity populated.

Recap

We saw how is it possible to scrape data from a website using IMPORTXML formula from Google Sheets and how to get the xpath of what we want using Chrome Inspect feature. We used named ranges to give some clarity to our formulas and how we can send that data back into Notion for free using Notion2Sheets.

Do you want us to write about a specific use case? Let us know!



Subscribe to get amazing Notion and Sheets related content