Data Warehousing Assignment #1
Analytical SQL Fall 2020
By: Fullwin Liang
Question 3:
Reverse engineer the Classic Models schema into a physical level diagram using SQL Developer’s Data
Modeler -> Import -> Data Dictionary function. Use your own database connection, select the
classicmodels schema, and the following tables: CUSTOMERS, ORDERS, ORDERDETAILS, OFFICES,
EMPLOYEES, PAYMENTS, PRODUCTS, and PRODUCTLINES
Physical level diagram:
Question 4:
Write a SQL statement to show each customer’s name, state and total dollar amount ordered on the product
‘1962 Volkswagen Microbus' but only for customers with higher than average Credit Limits.
SQL:
Select Customers.Customername, Customers.State,
Quantityordered*Priceeach As Total_Price
From Customers
Join Orders On Customers.Customernumber = Orders.Customernumber
Join Orderdetails On Orders.Ordernumber = Orderdetails.Ordernumber
Where Creditlimit > (Select Avg(Creditlimit) From Customers) And
Productcode = 'S24_2300';
Results:
Question 5:
Write a SQL statement to show a Ranking of the total sales (dollar amount) summarized by country and each
month of each Year (YYYY-MM). The largest sales by country by month should be ranked #1. Include the
total sales along with the rank
SQL:
SELECT customers.country, to_char(orderdate, 'YYYY-MONTH') AS MONTH,
SUM(orderdetails.quantityordered*(orderdetails.priceeach)) AS
total_sale, RANK() OVER (ORDER BY
SUM(orderdetails.quantityordered*(orderdetails.priceeach)) DESC)
price_rank
FROM customers
JOIN orders ON customers.customernumber = orders.customernumber
JOIN orderdetails ON orders.ordernumber = orderdetails.ordernumber
GROUP BY customers.country, to_char(orderdate, 'YYYY-MONTH');
Result:
7) Assume the BUYPRICE is the price the company pays for the product and PRICEEACH is the price the
customer paid to buy it. What was the most profitable month in 2019? Justify your rationale and show your
SQL query.
SQL:
SELECT To_Char(Orderdate, 'YYYY-MM') As Month,
SUM(orderdetails.quantityordered*(orderdetails.priceeach)) AS
total_sales, SUM(orderdetails.quantityordered*products.buyprice) as
Cost_of_Goods,
SUM(orderdetails.quantityordered*(orderdetails.priceeach)) -
SUM(orderdetails.quantityordered*products.buyprice) AS PROFIT
FROM orderdetails
JOIN Orders ON orderdetails.ordernumber = orders.ordernumber
JOIN Products ON products.productcode = orderdetails.productcode
WHERE To_Char(Orderdate, 'YYYY-MM')LIKE '2019%'
GROUP BY To_Char(Orderdate, 'YYYY-MM')
order BY SUM(orderdetails.quantityordered*(orderdetails.priceeach)) -
SUM(orderdetails.quantityordered*products.buyprice) DESC,
To_Char(Orderdate, 'YYYY-MM');
Result:
Findings:
The most profitable month in 2019 would be November of 392370.92. First, I wanted to group the date into
month and years. Then I wanted to aggregate the sum of total sales for each month to find the revenue and the
sum of the cost of goods sold as expenses. I then created a calculated column of Profit = Revenue Expenses
to find the profit of each month. Which in our case is, “Profit = total_sales – cost_of_goods”. Lastly, I sorted
the rows from highest profit to least profit and concluded that the month of November had the highest profit
in 2019.
8) Who is the “best” customer? Justify your rationale and back it up with a query showing the data. You may also
wish to graph your data to support your justification. DO NOT just total up sales. Look at other tables and analysis
as well (CLV and RFM for example).
SQL:
SELECT customers.customername,COUNT(orders.ordernumber)/98 AS
purchase_frequency, SUM(quantityordered*priceeach)/COUNT(orders.ordernumber)
AS average_order_value, (COUNT(orders.ordernumber)/98) *
(SUM(quantityordered*priceeach)/COUNT(orders.ordernumber)) AS clv FROM
customers
JOIN orders ON customers.customernumber = orders.customernumber
JOIN orderdetails ON orders.ordernumber = orderdetails.ordernumber
GROUP BY customers.customername
ORDER BY clv DESC;
Result:
Finding: First I wanted to find out the total sales per unique customer to later calculate customer lifetime value.
The formula I used to calculate customer lifetime value was Customer Value = Average Order Value * Purchase
Frequency. First, I decided to create a calculated field for purchase frequency using the formula Purchase
Frequency = Total Orders / Total Customers. Then I created another calculated column for Average Order Value
using the formula Average Order Value = Total Sales / Order Count. Finally, I multiplied Purchase Frequency and
Average Order Value resulting in the CLV or customer lifetime value. Then, I sorted the rows by a descending
order from greatest to least according to CLV and then total sales. I can see that customer lifetime value correlates
positively and is directly proportional to total sales. This is because we can see that, as CLV increases (decreases)
, total sale also increases (decreases). According to my query, I believe Mini Gifts Distributors Ltd. Is the best
customer.
9) Create a VIEW in your own schema that joins together all of the columns in the CUSTOMERS,
ORDERS, ORDERDETAILS, EMPLOYEES and PRODUCTS tables (do not include PAYMENTS).
SQL:
create view dataset AS
select
c.*,
e.LASTNAME,
e.FIRSTNAME,
e.EXTENSION,
e.EMAIL,
e.OFFICECODE,
e.REPORTSTO,
e.JOBTITLE,
o.ORDERNUMBER,
o.ORDERDATE,
o.REQUIREDDATE,
o.SHIPPEDDATE,
o.STATUS,
o.COMMENTS,
od.PRODUCTCODE,
od.QUANTITYORDERED,
od.PRICEEACH,
od.ORDERLINENUMBER,
p.PRODUCTNAME,
p.PRODUCTLINE,
p.PRODUCTSCALE,
p.PRODUCTVENDOR,
p.PRODUCTDESCRIPTION,
p.QUANTITYINSTOCK,
p.BUYPRICE,
p.MSRP
from customers c
JOIN EMPLOYEES e on c.SALESREPEMPLOYEENUMBER = e.EMPLOYEENUMBER
join orders o on c.CUSTOMERNUMBER = o.CUSTOMERNUMBER
join orderdetails od on o.ORDERNUMBER = od.ORDERNUMBER
join products p on od.PRODUCTCODE = p.PRODUCTCODE;
select count(*) from dataset;
Result:
10) Import all of the data from your view into Microsoft Excel. Create a pivot table from the resulting data set and
then summarize the data according to total dollars ordered by productline and customer country.
11) Import all of the data from your view into Tableau. Create an appropriate visualization from the resulting
data set that summarizes the data according to total sales by productline and customer credit limit over time.
11) Import all of the data from your view into Tableau. Create an appropriate visualization from the resulting data set
that summarizes the data according to total sales by productline and customer credit limit over time.
Result:
First, I imported the csv into Tableau and created a calculated field for total sales. I then wanted to show total sales by
product line and customer credit limit over time separately to get an idea of the variable values. I then changed the
order date variable to a continuous variable to show credit limit over time because it is a time series value. Finally, I
added all the variables into one plot and retrieved my final visualization. From the graph, we can see trends such as, as
total credit limit increases, total sale will also increase. The two variables total sales and credit limit have a positive
correlation because when one increases, the other variable also increases.