Marketing Exploration

Questions

This is a list of hyphotetical marketing-related questions about the DVD rental store under study. In the next section I provide an answer for each of these questions by running queries against the Sakila database.

  1. Where are our most popular stores?
  2. What is the total revenue of all of our stores?
  3. Which stores make the most money? Which make the least?
  4. What are the most popular films that customers rent?
  5. What is the average rental duration?
  6. Do we make the most money from long or short rentals?
  7. We want to acquire more films. Which genres are most popular?
  8. We want to hire an actor to do ads for us. Which actor is in the most films?
  9. We want to hire an actor to do ads for us. Which actors/actresses are most popular given our rental history?
  10. Pick the top customers. Given the films they’ve rented, which new ones should we suggest to them?

Answers

First things first, how many stores are there?

SELECT COUNT(DISTINCT store_id)
FROM store
;

+--------------------------+
| COUNT(DISTINCT store_id) |
+--------------------------+
|                        2 |
+--------------------------+

There is a total of two DVD rental stores.

Assuming that the most popular store is the one that has the biggest number of customers, I aggregate the number of customers per store by running the following query:

SELECT
  store_id
  ,address
  ,city
  ,country
  ,COUNT(DISTINCT customer_id) AS customer_number
FROM store
  INNER JOIN address USING(address_id)
  INNER JOIN city USING(city_id)
  INNER JOIN country USING(country_id)
  INNER JOIN customer USING(store_id)
GROUP BY store_id
;
Store IDAddressCityCountryNumber of Customers
147 MySakila DriveLethbridgeCanada326
228 MySQL BoulevardWoodridgeAustralia273

The store located in Lethbridge, Canada, is the most popular store according to the number of customers.


2. What is the total revenue of all of our stores?

SELECT SUM(amount) 
FROM payment
;

+-------------+
| SUM(amount) |
+-------------+
|    67416.51 |
+-------------+

The total revenue from all stores is $67416.51


3. Which stores make the most money? Which make the least?

The following table breaks down the total revenue per store.

SELECT
  store_id
  ,SUM(amount) AS total_sales
FROM store
  INNER JOIN staff USING(store_id)
  INNER JOIN payment USING(staff_id)
GROUP BY store_id
;

+----------+-------------+
| store_id | total_sales |
+----------+-------------+
|        1 |    33489.47 |
|        2 |    33927.04 |
+----------+-------------+

And we can see that the store in Australia (store_id 2) makes the most money even when it’s not the store with the most number of customers.

The store that makes the least money is the store in Canada (store_id 1).


SELECT
    film_id
  ,film.title
  ,category.name AS category
  ,COUNT(*) AS times_rented
FROM inventory
  LEFT JOIN rental USING(inventory_id)
  LEFT JOIN film USING(film_id)
  LEFT JOIN film_category USING(film_id)
  LEFT JOIN category USING(category_id)
GROUP BY film_id
ORDER BY 4 DESC -- times rented
LIMIT 10
;

Here is a top 10 of the most popular films that customers rent:

Film IDFilm TitleCategoryTimes Rented
103BUCKET BROTHERHOODTravel34
738ROCKETEER MOTHERForeign33
489JUGGLER HARDLYAnimation32
382GRIT CLOCKWORKGames32
331FORWARD TEMPLEGames32
767SCALAWAG DUCKMusic32
730RIDGEMONT SUBMARINENew32
891TIMBERLAND SKYClassics31
621NETWORK PEAKFamily31
369GOODFELLAS SALUTESci-Fi31

5. What is the average rental duration?

SELECT AVG(DATEDIFF(return_date, rental_date)) AS avg_rental_time
FROM rental
;

+-----------------+
| avg_rental_time |
+-----------------+
|          5.0252 |
+-----------------+

The average rental duration is aproximately 5 days.

Note that DATEDIFF() expresses the difference as a value in days. If you’d like to have the result in hours, minutes or other unit of measure, you can use TIMESTAMPDIFF().


6. Do we make the most money from long or short rentals?

Since the average rental duration is 5 days, I will consider rentals that last less than 5 days as short, and rentals of 5 or more days as long rentals.

SELECT SUM(amount)
FROM payment
  INNER JOIN rental USING(rental_id)
WHERE DATEDIFF(return_date, rental_date) >= 5.0252
;

+-------------+
| SUM(amount) |
+-------------+
|    39780.39 |
+-------------+

The total revenue from long rentals is $39780.39

SELECT SUM(amount)
FROM payment
  INNER JOIN rental USING(rental_id)
WHERE DATEDIFF(return_date, rental_date) < 5.0252
;

+-------------+
| SUM(amount) |
+-------------+
|    27108.00 |
+-------------+

The total revenue from short rentals is $27108.00

The store makes more money from the long rentals.

We can get the same result writing one single query:

SELECT
    SUM(CASE
  WHEN DATEDIFF(return_date, rental_date) >= 5.0252 THEN amount
  ELSE 0
  END) AS long_rental_revenue
  , SUM(CASE
  WHEN DATEDIFF(return_date, rental_date) < 5.0252 THEN amount
  ELSE 0
  END) AS short_rental_revenue
FROM payment
INNER JOIN rental USING(rental_id)
;

+---------------------+----------------------+
| long_rental_revenue | short_rental_revenue |
+---------------------+----------------------+
|            39780.39 |             27108.00 |
+---------------------+----------------------+

Notice: If we add up the revenue from shorts and long rentals, the total value obtained is $66888.39. When I answered question 2, What is the total revenue of all of our stores?, I found that the total revenue from all stores was $67416.51. There is a difference of $528.12 between both values. Why?

I have one theory, and it’s that the the rental_id in the payment table can be NULL and it’s not matching all the rental_id values in the rental table (rental_id is not a foreign key in the payment table, so it can have NULL values).

SELECT SUM(amount)
FROM payment
WHERE rental_id IS NULL
;

+-------------+
| SUM(amount) |
+-------------+
|        9.95 |
+-------------+

The value I obtained from running that query is $9.95.

I’m still missing $518.17 to complete the $528.12 difference. I try this query:

SELECT SUM(amount)
FROM payment
  INNER JOIN rental USING(rental_id)
WHERE rental_date IS NULL OR return_date IS NULL
;

+---------------+
| SUM(amount) |
+---------------+
|        518.17 |
+---------------+

In this case I thought that maybe the rental_date or the return_date in the rental table could be NULL, and the value I obtained is $518.17, which is exactly the amount I was missing.

To display the NULL values I run:

SELECT
  amount
  ,rental_date
  ,return_date
FROM payment
  INNER JOIN rental USING(rental_id)
WHERE rental_date IS NULL OR return_date IS NULL
LIMIT 10;
PaymentRental DateReturn Date
0.992006-02-14 15:16:03\N
4.992006-02-14 15:16:03\N
0.992006-02-14 15:16:03\N
4.992006-02-14 15:16:03\N
3.982006-02-14 15:16:03\N
0.002006-02-14 15:16:03\N
2.992006-02-14 15:16:03\N
4.992006-02-14 15:16:03\N
2.992006-02-14 15:16:03\N
2.992006-02-14 15:16:03\N

And I can see that in all the cases the return date is NULL, which means that those films have not been returned yet but maybe the customer payed in advance.


SELECT
  category.name AS category
  ,COUNT(rental_id) AS times_rented
FROM rental
  INNER JOIN inventory USING(inventory_id)
  INNER JOIN film USING(film_id)
  INNER JOIN film_category USING(film_id)
  INNER JOIN category USING(category_id)
GROUP BY name
ORDER BY 2 DESC -- times_rented
;
CategoryTimes Rented
Sports1179
Animation1166
Action1112
Sci-Fi1101
Family1096
Drama1060
Documentary1050
Foreign1033
Games969
Children945
Comedy941
New940
Classics939
Horror846
Travel837
Music830

The table above lists the total number of rentals divided per film category. The three most popular categories are sports, animation and action; if the company wants to buy more films I would advise to look at those categories first, and then go down the list.


8. We want to hire an actor to do ads for us. Which actor is in the most films?

SELECT
  actor_id
  ,first_name
  ,last_name
  ,COUNT(DISTINCT film_id) AS number_of_films
FROM film_actor
  INNER JOIN actor USING(actor_id)
GROUP BY actor_id
ORDER BY 4 DESC  -- number_of_films
LIMIT 10
;
Actor IDFirst NameLast NameNumber of Films
107GINADEGENERES42
102WALTERTORN41
198MARYKEITEL40
181MATTHEWCARREY39
23SANDRAKILMER37
81SCARLETTDAMON36
106GROUCHODUNST35
60HENRYBERRY35
13UMAWOOD35
37VALBOLGER35

Gina Degeneres (actor_id 107), is the actress in most films, followed by Walter Torn (actor_id 102), and Mary Keitel (actor_id 198). If the store wants to hire an actor/actress to do ads, I would advise to reach out to those three actors first.


SELECT
    actor_id
  ,first_name
  ,last_name
  ,COUNT(rental_id) AS times_rented
FROM rental
  INNER JOIN inventory USING(inventory_id)
  INNER JOIN film USING(film_id)
  INNER JOIN film_actor USING(film_id)
  INNER JOIN actor USING(actor_id)
GROUP BY actor_id
ORDER BY 4 DESC -- times_rented
LIMIT 10
;
Actor IDFirst NameLast NameNumber of Times Rented
107GINADEGENERES753
181MATTHEWCARREY678
198MARYKEITEL674
144ANGELAWITHERSPOON654
102WALTERTORN640
60HENRYBERRY612
150JAYNENOLTE611
37VALBOLGER605
23SANDRAKILMER604
90SEANGUINESS599

Given the store rental history, the most popular actresses are Gina Degeneres (actor_id 107), Matthew Carrey (actor_id 181), and Mary Keitel (actor_id 198).

Note: Is this table correct? Let’s corroborate the data with a few separate queries.

This query gives me all the films in which the actress with actor_id 107 performs.

SELECT *
FROM film_actor
WHERE actor_id = 107
LIMIT 10
;

This query shows all the copies of the previous films in inventory in the different stores.

SELECT inventory_id
FROM inventory
WHERE film_id IN (SELECT * 
    FROM film_actor 
    WHERE actor_id = 107)
;

Now let’s count how many times those films where rented.

SELECT COUNT(*) 
FROM rental 
WHERE inventory_id IN (SELECT inventory_id 
    FROM inventory 
    WHERE film_id IN (SELECT film_id 
        FROM film_actor 
	WHERE actor_id = 107))
;

+----------+
| COUNT(*) |
+----------+
|      753 |
+----------+

The number obtained is 753 and it’s the same we can see in the summary table that we wanted to corroborate.


10. Pick the top customers. Given the films they’ve rented, which new ones should we suggest to them?

NOTE: I have to learn more SQL tricks to be able to answer this question appropriately - I’m working on it!

However, this is a first attempt…

First let’s pick our top customers.

SELECT COUNT(customer_id) 
FROM customer
;

+--------------------+
| COUNT(customer_id) |
+--------------------+
|                599 |
+--------------------+

There is a total of 599 customers considering all the rental stores.

The following query results in a table with the total number of films rented per customer. I’ll consider that the top customers are the ones that rented the most films.

SELECT
  customer_id
  ,count(rental_id) AS times_rented
FROM rental
GROUP BY customer_id
ORDER BY times_rented DESC
LIMIT 5
;
Customer IDTotal Number of Rentals
14846
52645
23642
14442
7541

The top customers are the ones with customer_id: 148, 526, 236, 144, and 75.

One approach would be to find the film categories that these customers usually acquire an suggest more films in those categories.

Let’s look at the categories that customer 148 prefers.

SELECT
  customer_id
  ,name
  ,COUNT(name) AS times_rented
FROM rental
  INNER JOIN inventory USING(inventory_id)
  INNER JOIN film USING(film_id)
  INNER JOIN film_category USING(film_id)
  INNER JOIN category USING(category_id)
WHERE customer_id = 148
GROUP BY name
ORDER BY times_rented DESC
;
Customer IDFilm CategoryTimes Rented
148Sci-Fi7
148Family6
148Travel5
148Classics4
148Horror4
148Foreign4
148Games3
148Documentary3
148Action3
148New2
148Comedy2
148Sports1
148Drama1
148Music1

In this case I would suggest more films in the Sci-Fi and Family categories.