Marketing Exploration
Questions
This is a list of hyphotetical marketing-related questions about the DVD rental store under study. In the next section I provide an answer for each of these questions by running queries against the Sakila database.
- Where are our most popular stores?
- What is the total revenue of all of our stores?
- Which stores make the most money? Which make the least?
- What are the most popular films that customers rent?
- What is the average rental duration?
- Do we make the most money from long or short rentals?
- We want to acquire more films. Which genres are most popular?
- We want to hire an actor to do ads for us. Which actor is in the most films?
- We want to hire an actor to do ads for us. Which actors/actresses are most popular given our rental history?
- Pick the top customers. Given the films they’ve rented, which new ones should we suggest to them?
Answers
1. Where are our most popular stores?
First things first, how many stores are there?
SELECT COUNT(DISTINCT store_id)
FROM store
;
+--------------------------+
| COUNT(DISTINCT store_id) |
+--------------------------+
| 2 |
+--------------------------+
There is a total of two DVD rental stores.
Assuming that the most popular store is the one that has the biggest number of customers, I aggregate the number of customers per store by running the following query:
SELECT
store_id
,address
,city
,country
,COUNT(DISTINCT customer_id) AS customer_number
FROM store
INNER JOIN address USING(address_id)
INNER JOIN city USING(city_id)
INNER JOIN country USING(country_id)
INNER JOIN customer USING(store_id)
GROUP BY store_id
;
Store ID | Address | City | Country | Number of Customers |
---|---|---|---|---|
1 | 47 MySakila Drive | Lethbridge | Canada | 326 |
2 | 28 MySQL Boulevard | Woodridge | Australia | 273 |
The store located in Lethbridge, Canada, is the most popular store according to the number of customers.
2. What is the total revenue of all of our stores?
SELECT SUM(amount)
FROM payment
;
+-------------+
| SUM(amount) |
+-------------+
| 67416.51 |
+-------------+
The total revenue from all stores is $67416.51
3. Which stores make the most money? Which make the least?
The following table breaks down the total revenue per store.
SELECT
store_id
,SUM(amount) AS total_sales
FROM store
INNER JOIN staff USING(store_id)
INNER JOIN payment USING(staff_id)
GROUP BY store_id
;
+----------+-------------+
| store_id | total_sales |
+----------+-------------+
| 1 | 33489.47 |
| 2 | 33927.04 |
+----------+-------------+
And we can see that the store in Australia (store_id 2
) makes the most money even when it’s not the store with the most number of customers.
The store that makes the least money is the store in Canada (store_id 1
).
4. What are the most popular films that customers rent?
SELECT
film_id
,film.title
,category.name AS category
,COUNT(*) AS times_rented
FROM inventory
LEFT JOIN rental USING(inventory_id)
LEFT JOIN film USING(film_id)
LEFT JOIN film_category USING(film_id)
LEFT JOIN category USING(category_id)
GROUP BY film_id
ORDER BY 4 DESC -- times rented
LIMIT 10
;
Here is a top 10 of the most popular films that customers rent:
Film ID | Film Title | Category | Times Rented |
---|---|---|---|
103 | BUCKET BROTHERHOOD | Travel | 34 |
738 | ROCKETEER MOTHER | Foreign | 33 |
489 | JUGGLER HARDLY | Animation | 32 |
382 | GRIT CLOCKWORK | Games | 32 |
331 | FORWARD TEMPLE | Games | 32 |
767 | SCALAWAG DUCK | Music | 32 |
730 | RIDGEMONT SUBMARINE | New | 32 |
891 | TIMBERLAND SKY | Classics | 31 |
621 | NETWORK PEAK | Family | 31 |
369 | GOODFELLAS SALUTE | Sci-Fi | 31 |
5. What is the average rental duration?
SELECT AVG(DATEDIFF(return_date, rental_date)) AS avg_rental_time
FROM rental
;
+-----------------+
| avg_rental_time |
+-----------------+
| 5.0252 |
+-----------------+
The average rental duration is aproximately 5 days.
Note that DATEDIFF() expresses the difference as a value in days. If you’d like to have the result in hours, minutes or other unit of measure, you can use TIMESTAMPDIFF()
.
6. Do we make the most money from long or short rentals?
Since the average rental duration is 5 days, I will consider rentals that last less than 5 days as short, and rentals of 5 or more days as long rentals.
SELECT SUM(amount)
FROM payment
INNER JOIN rental USING(rental_id)
WHERE DATEDIFF(return_date, rental_date) >= 5.0252
;
+-------------+
| SUM(amount) |
+-------------+
| 39780.39 |
+-------------+
The total revenue from long rentals is $39780.39
SELECT SUM(amount)
FROM payment
INNER JOIN rental USING(rental_id)
WHERE DATEDIFF(return_date, rental_date) < 5.0252
;
+-------------+
| SUM(amount) |
+-------------+
| 27108.00 |
+-------------+
The total revenue from short rentals is $27108.00
The store makes more money from the long rentals.
We can get the same result writing one single query:
SELECT
SUM(CASE
WHEN DATEDIFF(return_date, rental_date) >= 5.0252 THEN amount
ELSE 0
END) AS long_rental_revenue
, SUM(CASE
WHEN DATEDIFF(return_date, rental_date) < 5.0252 THEN amount
ELSE 0
END) AS short_rental_revenue
FROM payment
INNER JOIN rental USING(rental_id)
;
+---------------------+----------------------+
| long_rental_revenue | short_rental_revenue |
+---------------------+----------------------+
| 39780.39 | 27108.00 |
+---------------------+----------------------+
Notice: If we add up the revenue from shorts and long rentals, the total value obtained is $66888.39. When I answered question 2, What is the total revenue of all of our stores?, I found that the total revenue from all stores was $67416.51. There is a difference of $528.12 between both values. Why?
I have one theory, and it’s that the the rental_id in the payment
table can be NULL and it’s not matching all the rental_id
values in the rental
table (rental_id
is not a foreign key in the payment
table, so it can have NULL values).
SELECT SUM(amount)
FROM payment
WHERE rental_id IS NULL
;
+-------------+
| SUM(amount) |
+-------------+
| 9.95 |
+-------------+
The value I obtained from running that query is $9.95.
I’m still missing $518.17 to complete the $528.12 difference. I try this query:
SELECT SUM(amount)
FROM payment
INNER JOIN rental USING(rental_id)
WHERE rental_date IS NULL OR return_date IS NULL
;
+---------------+
| SUM(amount) |
+---------------+
| 518.17 |
+---------------+
In this case I thought that maybe the rental_date
or the return_date
in the rental
table could be NULL, and the value I obtained is $518.17, which is exactly the amount I was missing.
To display the NULL values I run:
SELECT
amount
,rental_date
,return_date
FROM payment
INNER JOIN rental USING(rental_id)
WHERE rental_date IS NULL OR return_date IS NULL
LIMIT 10;
Payment | Rental Date | Return Date |
---|---|---|
0.99 | 2006-02-14 15:16:03 | \N |
4.99 | 2006-02-14 15:16:03 | \N |
0.99 | 2006-02-14 15:16:03 | \N |
4.99 | 2006-02-14 15:16:03 | \N |
3.98 | 2006-02-14 15:16:03 | \N |
0.00 | 2006-02-14 15:16:03 | \N |
2.99 | 2006-02-14 15:16:03 | \N |
4.99 | 2006-02-14 15:16:03 | \N |
2.99 | 2006-02-14 15:16:03 | \N |
2.99 | 2006-02-14 15:16:03 | \N |
And I can see that in all the cases the return date is NULL, which means that those films have not been returned yet but maybe the customer payed in advance.
7. We want to acquire more films. Which genres are most popular?
SELECT
category.name AS category
,COUNT(rental_id) AS times_rented
FROM rental
INNER JOIN inventory USING(inventory_id)
INNER JOIN film USING(film_id)
INNER JOIN film_category USING(film_id)
INNER JOIN category USING(category_id)
GROUP BY name
ORDER BY 2 DESC -- times_rented
;
Category | Times Rented |
---|---|
Sports | 1179 |
Animation | 1166 |
Action | 1112 |
Sci-Fi | 1101 |
Family | 1096 |
Drama | 1060 |
Documentary | 1050 |
Foreign | 1033 |
Games | 969 |
Children | 945 |
Comedy | 941 |
New | 940 |
Classics | 939 |
Horror | 846 |
Travel | 837 |
Music | 830 |
The table above lists the total number of rentals divided per film category. The three most popular categories are sports, animation and action; if the company wants to buy more films I would advise to look at those categories first, and then go down the list.
8. We want to hire an actor to do ads for us. Which actor is in the most films?
SELECT
actor_id
,first_name
,last_name
,COUNT(DISTINCT film_id) AS number_of_films
FROM film_actor
INNER JOIN actor USING(actor_id)
GROUP BY actor_id
ORDER BY 4 DESC -- number_of_films
LIMIT 10
;
Actor ID | First Name | Last Name | Number of Films |
---|---|---|---|
107 | GINA | DEGENERES | 42 |
102 | WALTER | TORN | 41 |
198 | MARY | KEITEL | 40 |
181 | MATTHEW | CARREY | 39 |
23 | SANDRA | KILMER | 37 |
81 | SCARLETT | DAMON | 36 |
106 | GROUCHO | DUNST | 35 |
60 | HENRY | BERRY | 35 |
13 | UMA | WOOD | 35 |
37 | VAL | BOLGER | 35 |
Gina Degeneres (actor_id 107
), is the actress in most films, followed by Walter Torn (actor_id 102
), and Mary Keitel (actor_id 198
). If the store wants to hire an actor/actress to do ads, I would advise to reach out to those three actors first.
9. We want to hire an actor to do ads for us. Which actors/actresses are most popular given our rental history?
SELECT
actor_id
,first_name
,last_name
,COUNT(rental_id) AS times_rented
FROM rental
INNER JOIN inventory USING(inventory_id)
INNER JOIN film USING(film_id)
INNER JOIN film_actor USING(film_id)
INNER JOIN actor USING(actor_id)
GROUP BY actor_id
ORDER BY 4 DESC -- times_rented
LIMIT 10
;
Actor ID | First Name | Last Name | Number of Times Rented |
---|---|---|---|
107 | GINA | DEGENERES | 753 |
181 | MATTHEW | CARREY | 678 |
198 | MARY | KEITEL | 674 |
144 | ANGELA | WITHERSPOON | 654 |
102 | WALTER | TORN | 640 |
60 | HENRY | BERRY | 612 |
150 | JAYNE | NOLTE | 611 |
37 | VAL | BOLGER | 605 |
23 | SANDRA | KILMER | 604 |
90 | SEAN | GUINESS | 599 |
Given the store rental history, the most popular actresses are Gina Degeneres (actor_id 107
), Matthew Carrey (actor_id 181
), and Mary Keitel (actor_id 198
).
Note: Is this table correct? Let’s corroborate the data with a few separate queries.
This query gives me all the films in which the actress with actor_id 107
performs.
SELECT *
FROM film_actor
WHERE actor_id = 107
LIMIT 10
;
This query shows all the copies of the previous films in inventory in the different stores.
SELECT inventory_id
FROM inventory
WHERE film_id IN (SELECT *
FROM film_actor
WHERE actor_id = 107)
;
Now let’s count how many times those films where rented.
SELECT COUNT(*)
FROM rental
WHERE inventory_id IN (SELECT inventory_id
FROM inventory
WHERE film_id IN (SELECT film_id
FROM film_actor
WHERE actor_id = 107))
;
+----------+
| COUNT(*) |
+----------+
| 753 |
+----------+
The number obtained is 753 and it’s the same we can see in the summary table that we wanted to corroborate.
10. Pick the top customers. Given the films they’ve rented, which new ones should we suggest to them?
NOTE: I have to learn more SQL tricks to be able to answer this question appropriately - I’m working on it!
However, this is a first attempt…
First let’s pick our top customers.
SELECT COUNT(customer_id)
FROM customer
;
+--------------------+
| COUNT(customer_id) |
+--------------------+
| 599 |
+--------------------+
There is a total of 599 customers considering all the rental stores.
The following query results in a table with the total number of films rented per customer. I’ll consider that the top customers are the ones that rented the most films.
SELECT
customer_id
,count(rental_id) AS times_rented
FROM rental
GROUP BY customer_id
ORDER BY times_rented DESC
LIMIT 5
;
Customer ID | Total Number of Rentals |
---|---|
148 | 46 |
526 | 45 |
236 | 42 |
144 | 42 |
75 | 41 |
The top customers are the ones with customer_id
: 148, 526, 236, 144, and 75.
One approach would be to find the film categories that these customers usually acquire an suggest more films in those categories.
Let’s look at the categories that customer 148 prefers.
SELECT
customer_id
,name
,COUNT(name) AS times_rented
FROM rental
INNER JOIN inventory USING(inventory_id)
INNER JOIN film USING(film_id)
INNER JOIN film_category USING(film_id)
INNER JOIN category USING(category_id)
WHERE customer_id = 148
GROUP BY name
ORDER BY times_rented DESC
;
Customer ID | Film Category | Times Rented |
---|---|---|
148 | Sci-Fi | 7 |
148 | Family | 6 |
148 | Travel | 5 |
148 | Classics | 4 |
148 | Horror | 4 |
148 | Foreign | 4 |
148 | Games | 3 |
148 | Documentary | 3 |
148 | Action | 3 |
148 | New | 2 |
148 | Comedy | 2 |
148 | Sports | 1 |
148 | Drama | 1 |
148 | Music | 1 |
In this case I would suggest more films in the Sci-Fi and Family categories.