A clear explanation of Managers with at Least 5 Direct Reports using grouping and a self join.
Problem Restatement
We are given an Employee table.
| Column | Type | Meaning |
|---|---|---|
id | int | Employee id |
name | varchar | Employee name |
department | varchar | Department name |
managerId | int | Id of this employee’s manager |
If managerId is NULL, the employee does not report to anyone.
We need to find managers who have at least five direct reports.
A direct report means an employee whose managerId is equal to that manager’s id.
Return the manager names in any order. The official prompt asks to find managers with at least five direct reports and return the result table in any order.
Input and Output
| Item | Meaning |
|---|---|
| Input | Employee table |
| Output | Names of managers with at least five direct reports |
| Direct report | A row where managerId = manager.id |
| Required column | name |
| Order | Any order |
Expected output column:
nameExample
Input:
| id | name | department | managerId |
|---|---|---|---|
| 101 | John | A | NULL |
| 102 | Dan | A | 101 |
| 103 | James | A | 101 |
| 104 | Amy | A | 101 |
| 105 | Anne | A | 101 |
| 106 | Ron | B | 101 |
Employees 102, 103, 104, 105, and 106 all have:
managerId = 101So John has five direct reports.
Output:
| name |
|---|
| John |
First Thought: Count Reports Per Manager
The core task is counting how many employees report to each manager.
This is a grouping problem.
If we group rows by managerId, then each group represents all employees who report to the same manager.
For example:
GROUP BY managerIdThen:
COUNT(*)gives the number of direct reports for that manager.
We only keep groups with at least five rows:
HAVING COUNT(*) >= 5That gives manager ids. Then we need the manager names.
Key Insight
The table contains both employees and managers.
A manager is also an employee row.
So after finding manager ids with at least five reports, we can join those ids back to the Employee table to get the corresponding manager names.
There are two common ways to write this:
- Group first, then join.
- Use a self join and group by manager.
Both are valid.
Algorithm
- Group employees by
managerId. - Keep only manager ids where the group size is at least
5. - Join those manager ids back to
Employee.id. - Select the manager’s
name.
Correctness
For each employee row, managerId tells us the id of that employee’s direct manager.
Therefore, all rows with the same non-null managerId are exactly the direct reports of that manager.
The grouped subquery computes the number of direct reports for each manager id.
The HAVING COUNT(*) >= 5 condition keeps exactly those manager ids with at least five direct reports.
Joining the kept manager ids back to Employee.id retrieves the employee row for each manager. Selecting name returns exactly the required manager names.
Complexity
Let n be the number of rows in Employee.
| Metric | Value | Why |
|---|---|---|
| Time | O(n) average | The database scans and groups employee rows |
| Space | O(n) | The grouping step may store counts for manager ids |
The exact cost depends on the database engine and available indexes.
An index on managerId can help the grouping step.
Implementation
SELECT e.name
FROM Employee AS e
JOIN (
SELECT managerId
FROM Employee
GROUP BY managerId
HAVING COUNT(*) >= 5
) AS reports
ON e.id = reports.managerId;Code Explanation
The subquery counts direct reports:
SELECT managerId
FROM Employee
GROUP BY managerId
HAVING COUNT(*) >= 5Each group contains employees who report to the same manager.
The HAVING clause filters grouped results. We use HAVING, not WHERE, because COUNT(*) is an aggregate value.
Then we join the result back to the employee table:
ON e.id = reports.managerIdThis finds the manager’s own row.
Finally, we return the manager name:
SELECT e.nameSelf-Join Version
We can also solve it with a self join.
SELECT manager.name
FROM Employee AS employee
JOIN Employee AS manager
ON employee.managerId = manager.id
GROUP BY manager.id, manager.name
HAVING COUNT(*) >= 5;Here, employee represents a direct report.
manager represents that report’s manager.
After joining, each manager row appears once per direct report. Grouping by manager and counting rows gives the direct report count.
Testing
Create sample data:
CREATE TABLE Employee (
id INT PRIMARY KEY,
name VARCHAR(255),
department VARCHAR(255),
managerId INT
);
INSERT INTO Employee (id, name, department, managerId) VALUES
(101, 'John', 'A', NULL),
(102, 'Dan', 'A', 101),
(103, 'James', 'A', 101),
(104, 'Amy', 'A', 101),
(105, 'Anne', 'A', 101),
(106, 'Ron', 'B', 101);Expected result:
| name |
|---|
| John |
Additional cases:
| Case | Expected behavior |
|---|---|
| Manager has exactly five reports | Include the manager |
| Manager has four reports | Exclude the manager |
| Multiple managers have at least five reports | Return all of them |
Employee has managerId = NULL | Does not count as a report to any manager |
| Manager appears as a normal employee row | Join retrieves their name |