A SQL guide for comparing each department's monthly average salary against the company's monthly average salary.
Problem Restatement
We are given two tables: Salary and Employee.
The Salary table stores each employee’s salary for a month.
| Column | Type | Meaning |
|---|---|---|
id | int | Salary record id |
employee_id | int | Employee id |
amount | int | Salary amount |
pay_date | date | Salary payment date |
The Employee table stores each employee’s department.
| Column | Type | Meaning |
|---|---|---|
employee_id | int | Employee id |
department_id | int | Department id |
We need to compare each department’s average salary with the company’s average salary for the same month.
Return:
| Column | Meaning |
|---|---|
pay_month | Month in YYYY-MM format |
department_id | Department id |
comparison | higher, lower, or same |
The problem asks for the comparison result of each department’s average salary against the company average salary.
Input and Output
Input tables:
Salary
EmployeeOutput columns:
pay_month, department_id, comparisonThe comparison is:
| Condition | Output |
|---|---|
| Department average > company average | higher |
| Department average < company average | lower |
| Department average = company average | same |
Example
Salary table:
| id | employee_id | amount | pay_date |
|---|---|---|---|
| 1 | 1 | 9000 | 2017-03-31 |
| 2 | 2 | 6000 | 2017-03-31 |
| 3 | 3 | 10000 | 2017-03-31 |
| 4 | 1 | 7000 | 2017-02-28 |
| 5 | 2 | 6000 | 2017-02-28 |
| 6 | 3 | 8000 | 2017-02-28 |
Employee table:
| employee_id | department_id |
|---|---|
| 1 | 1 |
| 2 | 2 |
| 3 | 2 |
For March 2017, the company average is:
(9000 + 6000 + 10000) / 3 = 8333.33Department 1 average:
9000So department 1 is higher.
Department 2 average:
(6000 + 10000) / 2 = 8000So department 2 is lower.
For February 2017, the company average is:
(7000 + 6000 + 8000) / 3 = 7000Department 1 average is 7000.
Department 2 average is also:
(6000 + 8000) / 2 = 7000So both are same.
Output:
| pay_month | department_id | comparison |
|---|---|---|
| 2017-02 | 1 | same |
| 2017-03 | 1 | higher |
| 2017-02 | 2 | same |
| 2017-03 | 2 | lower |
First Thought: Compute Two Averages Separately
We need two values for each month and department:
| Average | Grouping |
|---|---|
| Company average | Month |
| Department average | Month and department |
A direct approach is to build one query for company averages and another query for department averages, then join them.
That works, but it is more verbose.
A window function can compute both averages from the joined salary and employee rows.
Key Insight
Once we join Salary to Employee, each salary row knows its department.
Then we can compute:
AVG(amount) OVER (PARTITION BY pay_date)for the company average in that month.
And:
AVG(amount) OVER (PARTITION BY pay_date, department_id)for the department average in that month.
Because pay_date is a full date but the output needs month format, we also convert it with:
DATE_FORMAT(pay_date, '%Y-%m')Algorithm
Join Salary with Employee using employee_id.
For each joined row:
- Compute
pay_month. - Compute company monthly average.
- Compute department monthly average.
- Compare the two averages with
CASE. - Use
DISTINCTbecause each employee row in the same department-month would otherwise repeat the same result.
SQL Solution
WITH averages AS (
SELECT
DATE_FORMAT(s.pay_date, '%Y-%m') AS pay_month,
e.department_id,
AVG(s.amount) OVER (
PARTITION BY s.pay_date
) AS company_avg,
AVG(s.amount) OVER (
PARTITION BY s.pay_date, e.department_id
) AS department_avg
FROM Salary AS s
JOIN Employee AS e
ON s.employee_id = e.employee_id
)
SELECT DISTINCT
pay_month,
department_id,
CASE
WHEN department_avg > company_avg THEN 'higher'
WHEN department_avg < company_avg THEN 'lower'
ELSE 'same'
END AS comparison
FROM averages;Code Explanation
First, join salaries to departments:
FROM Salary AS s
JOIN Employee AS e
ON s.employee_id = e.employee_idNow every salary row has a department_id.
The output month is created with:
DATE_FORMAT(s.pay_date, '%Y-%m') AS pay_monthThe company average is partitioned only by month:
AVG(s.amount) OVER (
PARTITION BY s.pay_date
) AS company_avgThis groups all employees paid in the same month.
The department average is partitioned by both month and department:
AVG(s.amount) OVER (
PARTITION BY s.pay_date, e.department_id
) AS department_avgThis groups employees from the same department in the same month.
Finally, compare both averages:
CASE
WHEN department_avg > company_avg THEN 'higher'
WHEN department_avg < company_avg THEN 'lower'
ELSE 'same'
END AS comparisonWe use SELECT DISTINCT because the CTE has one row per employee salary record, but the answer needs one row per month and department.
Correctness
After the join, each salary amount is associated with exactly one department.
For each month, company_avg is computed over all salary rows in that month, so it equals the company-wide average salary for that month.
For each month and department, department_avg is computed over exactly the salary rows belonging to that department in that month, so it equals that department’s average salary for that month.
The CASE expression returns higher, lower, or same according to the comparison between these two averages.
Since DISTINCT removes duplicate rows for employees in the same department-month, the final result contains exactly one comparison for each department in each month.
Therefore, the query returns the required comparison result.
Complexity
Let n be the number of salary rows.
| Metric | Value | Why |
|---|---|---|
| Time | O(n log n) | Window partitions may require sorting by month and department |
| Space | O(n) | The intermediate joined rows and window results may be stored |
With indexes on Salary.employee_id, Employee.employee_id, and Salary.pay_date, the join and partitioning can be optimized.
Alternative: Group Then Join
We can compute department and company averages separately, then join them.
WITH department_average AS (
SELECT
DATE_FORMAT(s.pay_date, '%Y-%m') AS pay_month,
e.department_id,
AVG(s.amount) AS department_avg
FROM Salary AS s
JOIN Employee AS e
ON s.employee_id = e.employee_id
GROUP BY
DATE_FORMAT(s.pay_date, '%Y-%m'),
e.department_id
),
company_average AS (
SELECT
DATE_FORMAT(pay_date, '%Y-%m') AS pay_month,
AVG(amount) AS company_avg
FROM Salary
GROUP BY DATE_FORMAT(pay_date, '%Y-%m')
)
SELECT
d.pay_month,
d.department_id,
CASE
WHEN d.department_avg > c.company_avg THEN 'higher'
WHEN d.department_avg < c.company_avg THEN 'lower'
ELSE 'same'
END AS comparison
FROM department_average AS d
JOIN company_average AS c
ON d.pay_month = c.pay_month;This version is slightly longer, but it avoids DISTINCT because each CTE already has the desired granularity.
Testing
Sample data:
CREATE TABLE Salary (
id INT,
employee_id INT,
amount INT,
pay_date DATE
);
CREATE TABLE Employee (
employee_id INT,
department_id INT
);
INSERT INTO Salary (id, employee_id, amount, pay_date) VALUES
(1, 1, 9000, '2017-03-31'),
(2, 2, 6000, '2017-03-31'),
(3, 3, 10000, '2017-03-31'),
(4, 1, 7000, '2017-02-28'),
(5, 2, 6000, '2017-02-28'),
(6, 3, 8000, '2017-02-28');
INSERT INTO Employee (employee_id, department_id) VALUES
(1, 1),
(2, 2),
(3, 2);Expected output:
| pay_month | department_id | comparison |
|---|---|---|
| 2017-02 | 1 | same |
| 2017-02 | 2 | same |
| 2017-03 | 1 | higher |
| 2017-03 | 2 | lower |