Tables and Views
Previous to working on this project I audited the course Managing Big Data with MySQL which is an introduction to how to use relational databases in business analysis.
The course focuses mainly in understanding data query language (DQL), that is, the part of SQL that allows you to retrieve data from a database and display it in a table or tables.
“After all, if you are becoming a data analyst for the business world, somehow you will have to figure out how to get data out of a database in order to do any kind of analysis.”
- Jana Schaich Borg, Assistant Research Professor, Social Science Research Institute.
Having a sound knowledge of the SQL SELECT
statement (the most commonly used data query language command), I started exploring Sakila, and I created the following tables.
Note: In order to display the datasets as tables I exported them from MySQL to CSV files.
The Customer List Table
The customer list table provides the first name, last name, address, postal code, city, country and phone number for each customer of the company. It incorporates data from the customer
, address
, city
, and country
tables.
SELECT
first_name
,last_name
,address
,postal_code
,city
,country
,phone
FROM customer
INNER JOIN address USING(address_id)
INNER JOIN city USING(city_id)
INNER JOIN country USING(country_id)
ORDER BY first_name
LIMIT 10
;
First Name | Last Name | Address | Postal Code | City | Country | Phone Number |
---|---|---|---|---|---|---|
AARON | SELBY | 1519 Santiago de los Caballeros Loop | 22025 | Mwene-Ditu | Congo, The Democratic Republic of the | 409315295763 |
ADAM | GOOCH | 230 Urawa Drive | 2738 | Adoni | India | 166898395731 |
ADRIAN | CLARY | 1986 Sivas Place | 95775 | Udine | Italy | 182059202712 |
AGNES | BISHOP | 866 Shivapuri Manor | 22474 | Sambhal | India | 778502731092 |
ALAN | KAHN | 753 Ilorin Avenue | 3656 | Emeishan | China | 464511145118 |
ALBERT | CROUSE | 1641 Changhwa Place | 37636 | Bamenda | Cameroon | 256546485220 |
ALBERTO | HENNING | 502 Mandi Bahauddin Parkway | 15992 | Barcelona | Venezuela | 618156722572 |
ALEX | GRESHAM | 251 Florencia Drive | 16119 | Uruapan | Mexico | 118011831565 |
ALEXANDER | FENNELL | 231 Kaliningrad Place | 57833 | Bergamo | Italy | 575081026569 |
ALFRED | CASILLAS | 1727 Matamoros Place | 78813 | Sawhaj | Egypt | 129673677866 |
The Film List Table
This list contains information about the films that the store has in inventory, it displays the film ID, film title, film category and the total number of times that a particular film was rented.
It combines data from the tables inventory
, rental
, film
, film_category
, and category
.
SELECT
film_id
,title
,category.name AS category,
COUNT(rental_id) AS times_rented
FROM inventory
LEFT JOIN rental USING(inventory_id)
LEFT JOIN film USING(film_id)
LEFT JOIN film_category USING(film_id)
LEFT JOIN category USING(category_id)
GROUP BY film_id
LIMIT 10
;
Film ID | Film Title | Category | Times Rented |
---|---|---|---|
1 | ACADEMY DINOSAUR | Documentary | 23 |
2 | ACE GOLDFINGER | Horror | 7 |
3 | ADAPTATION HOLES | Documentary | 12 |
4 | AFFAIR PREJUDICE | Horror | 23 |
5 | AFRICAN EGG | Family | 12 |
6 | AGENT TRUMAN | Foreign | 21 |
7 | AIRPLANE SIERRA | Comedy | 15 |
8 | AIRPORT POLLOCK | Horror | 18 |
9 | ALABAMA DEVIL | Horror | 12 |
10 | ALADDIN CALENDAR | Sports | 23 |
The Sales by Store View
Something that got my attention when studying the schema was the presence of views.
Views are stored queries that when invoked produce a result set. A database view acts as a virtual table.
sales_by_store
is a view at Sakila. It provides a list of total sales, broken down by store. It incoporates data from the city
, country
, payment
, rental
, inventory
, store
, address
and staff
tables.
This query displays all the information in the view sales_by_store
.
SELECT *
FROM sales_by_store
;
We can get the same result by joining all the correspondent tables with this query:
SELECT
CONCAT(city, ',', country) AS store,
CONCAT(first_name,' ', last_name) AS manager,
SUM(amount) AS total_sales
FROM country
INNER JOIN city USING(country_id)
INNER JOIN address USING(city_id)
INNER JOIN store USING(address_id)
INNER JOIN staff USING(store_id)
INNER JOIN payment USING(staff_id)
GROUP BY store.store_id
;
In both cases, this is the information displayed:
Store | Manager | Total Sales |
---|---|---|
Woodridge,Australia | Jon Stephens | 33726.77 |
Lethbridge,Canada | Mike Hillyer | 33679.79 |
Generating CSV Files in MySQL
CSV files were generated by MySQL/MariaDB with a query similar to this one.
SELECT *
FROM sales_by_store
INTO OUTFILE '/tmp/sales-by-store.csv'
FIELDS ENCLOSED BY '"'
TERMINATED BY ','
LINES TERMINATED BY '\n'
;