# LeetCode 615: Average Salary: Departments VS Company

## Problem Restatement

We are given two tables: `Salary` and `Employee`.

The `Salary` table stores each employee's salary for a month.

| Column | Type | Meaning |
|---|---|---|
| `id` | int | Salary record id |
| `employee_id` | int | Employee id |
| `amount` | int | Salary amount |
| `pay_date` | date | Salary payment date |

The `Employee` table stores each employee's department.

| Column | Type | Meaning |
|---|---|---|
| `employee_id` | int | Employee id |
| `department_id` | int | Department id |

We need to compare each department's average salary with the company's average salary for the same month.

Return:

| Column | Meaning |
|---|---|
| `pay_month` | Month in `YYYY-MM` format |
| `department_id` | Department id |
| `comparison` | `higher`, `lower`, or `same` |

The problem asks for the comparison result of each department's average salary against the company average salary.

## Input and Output

Input tables:

```sql
Salary
Employee
```

Output columns:

```sql
pay_month, department_id, comparison
```

The comparison is:

| Condition | Output |
|---|---|
| Department average > company average | `higher` |
| Department average < company average | `lower` |
| Department average = company average | `same` |

## Example

`Salary` table:

| id | employee_id | amount | pay_date |
|---:|---:|---:|---|
| 1 | 1 | 9000 | 2017-03-31 |
| 2 | 2 | 6000 | 2017-03-31 |
| 3 | 3 | 10000 | 2017-03-31 |
| 4 | 1 | 7000 | 2017-02-28 |
| 5 | 2 | 6000 | 2017-02-28 |
| 6 | 3 | 8000 | 2017-02-28 |

`Employee` table:

| employee_id | department_id |
|---:|---:|
| 1 | 1 |
| 2 | 2 |
| 3 | 2 |

For March 2017, the company average is:

```text
(9000 + 6000 + 10000) / 3 = 8333.33
```

Department `1` average:

```text
9000
```

So department `1` is `higher`.

Department `2` average:

```text
(6000 + 10000) / 2 = 8000
```

So department `2` is `lower`.

For February 2017, the company average is:

```text
(7000 + 6000 + 8000) / 3 = 7000
```

Department `1` average is `7000`.

Department `2` average is also:

```text
(6000 + 8000) / 2 = 7000
```

So both are `same`.

Output:

| pay_month | department_id | comparison |
|---|---:|---|
| 2017-02 | 1 | same |
| 2017-03 | 1 | higher |
| 2017-02 | 2 | same |
| 2017-03 | 2 | lower |

## First Thought: Compute Two Averages Separately

We need two values for each month and department:

| Average | Grouping |
|---|---|
| Company average | Month |
| Department average | Month and department |

A direct approach is to build one query for company averages and another query for department averages, then join them.

That works, but it is more verbose.

A window function can compute both averages from the joined salary and employee rows.

## Key Insight

Once we join `Salary` to `Employee`, each salary row knows its department.

Then we can compute:

```sql
AVG(amount) OVER (PARTITION BY pay_date)
```

for the company average in that month.

And:

```sql
AVG(amount) OVER (PARTITION BY pay_date, department_id)
```

for the department average in that month.

Because `pay_date` is a full date but the output needs month format, we also convert it with:

```sql
DATE_FORMAT(pay_date, '%Y-%m')
```

## Algorithm

Join `Salary` with `Employee` using `employee_id`.

For each joined row:

1. Compute `pay_month`.
2. Compute company monthly average.
3. Compute department monthly average.
4. Compare the two averages with `CASE`.
5. Use `DISTINCT` because each employee row in the same department-month would otherwise repeat the same result.

## SQL Solution

```sql
WITH averages AS (
    SELECT
        DATE_FORMAT(s.pay_date, '%Y-%m') AS pay_month,
        e.department_id,
        AVG(s.amount) OVER (
            PARTITION BY s.pay_date
        ) AS company_avg,
        AVG(s.amount) OVER (
            PARTITION BY s.pay_date, e.department_id
        ) AS department_avg
    FROM Salary AS s
    JOIN Employee AS e
        ON s.employee_id = e.employee_id
)
SELECT DISTINCT
    pay_month,
    department_id,
    CASE
        WHEN department_avg > company_avg THEN 'higher'
        WHEN department_avg < company_avg THEN 'lower'
        ELSE 'same'
    END AS comparison
FROM averages;
```

## Code Explanation

First, join salaries to departments:

```sql
FROM Salary AS s
JOIN Employee AS e
    ON s.employee_id = e.employee_id
```

Now every salary row has a `department_id`.

The output month is created with:

```sql
DATE_FORMAT(s.pay_date, '%Y-%m') AS pay_month
```

The company average is partitioned only by month:

```sql
AVG(s.amount) OVER (
    PARTITION BY s.pay_date
) AS company_avg
```

This groups all employees paid in the same month.

The department average is partitioned by both month and department:

```sql
AVG(s.amount) OVER (
    PARTITION BY s.pay_date, e.department_id
) AS department_avg
```

This groups employees from the same department in the same month.

Finally, compare both averages:

```sql
CASE
    WHEN department_avg > company_avg THEN 'higher'
    WHEN department_avg < company_avg THEN 'lower'
    ELSE 'same'
END AS comparison
```

We use `SELECT DISTINCT` because the CTE has one row per employee salary record, but the answer needs one row per month and department.

## Correctness

After the join, each salary amount is associated with exactly one department.

For each month, `company_avg` is computed over all salary rows in that month, so it equals the company-wide average salary for that month.

For each month and department, `department_avg` is computed over exactly the salary rows belonging to that department in that month, so it equals that department's average salary for that month.

The `CASE` expression returns `higher`, `lower`, or `same` according to the comparison between these two averages.

Since `DISTINCT` removes duplicate rows for employees in the same department-month, the final result contains exactly one comparison for each department in each month.

Therefore, the query returns the required comparison result.

## Complexity

Let `n` be the number of salary rows.

| Metric | Value | Why |
|---|---:|---|
| Time | `O(n log n)` | Window partitions may require sorting by month and department |
| Space | `O(n)` | The intermediate joined rows and window results may be stored |

With indexes on `Salary.employee_id`, `Employee.employee_id`, and `Salary.pay_date`, the join and partitioning can be optimized.

## Alternative: Group Then Join

We can compute department and company averages separately, then join them.

```sql
WITH department_average AS (
    SELECT
        DATE_FORMAT(s.pay_date, '%Y-%m') AS pay_month,
        e.department_id,
        AVG(s.amount) AS department_avg
    FROM Salary AS s
    JOIN Employee AS e
        ON s.employee_id = e.employee_id
    GROUP BY
        DATE_FORMAT(s.pay_date, '%Y-%m'),
        e.department_id
),
company_average AS (
    SELECT
        DATE_FORMAT(pay_date, '%Y-%m') AS pay_month,
        AVG(amount) AS company_avg
    FROM Salary
    GROUP BY DATE_FORMAT(pay_date, '%Y-%m')
)
SELECT
    d.pay_month,
    d.department_id,
    CASE
        WHEN d.department_avg > c.company_avg THEN 'higher'
        WHEN d.department_avg < c.company_avg THEN 'lower'
        ELSE 'same'
    END AS comparison
FROM department_average AS d
JOIN company_average AS c
    ON d.pay_month = c.pay_month;
```

This version is slightly longer, but it avoids `DISTINCT` because each CTE already has the desired granularity.

## Testing

Sample data:

```sql
CREATE TABLE Salary (
    id INT,
    employee_id INT,
    amount INT,
    pay_date DATE
);

CREATE TABLE Employee (
    employee_id INT,
    department_id INT
);

INSERT INTO Salary (id, employee_id, amount, pay_date) VALUES
    (1, 1, 9000, '2017-03-31'),
    (2, 2, 6000, '2017-03-31'),
    (3, 3, 10000, '2017-03-31'),
    (4, 1, 7000, '2017-02-28'),
    (5, 2, 6000, '2017-02-28'),
    (6, 3, 8000, '2017-02-28');

INSERT INTO Employee (employee_id, department_id) VALUES
    (1, 1),
    (2, 2),
    (3, 2);
```

Expected output:

| pay_month | department_id | comparison |
|---|---:|---|
| 2017-02 | 1 | same |
| 2017-02 | 2 | same |
| 2017-03 | 1 | higher |
| 2017-03 | 2 | lower |