Optimizing MySQL Queries for Finding Top Earners by Department (leetcode : 184. Department Highest Salary)
Optimizing MySQL Queries for Finding Top Earners by Department
In this blog post, we’ll explore how to write and optimize a MySQL query to find employees with the highest salary in each department. This is a common problem in database querying, often encountered in interviews and real-world applications. We’ll compare two solutions: one using a correlated subquery and an optimized version using a Common Table Expression (CTE). We’ll explain the approach, logic, time complexity, and provide the complete SQL code for both.
Problem Statement
Given two tables:
- Employee: Contains employee details with columns
id,name,salary, anddepartmentId. - Department: Contains department details with columns
idandname.
We need to find each employee who earns the highest salary in their department, returning the department name, employee name, and salary. If multiple employees share the highest salary in a department, all should be included.
Solution 1: Using a Correlated Subquery
Approach
The first solution uses a correlated subquery to identify the maximum salary for each employee’s department and filters for employees whose salaries match this maximum.
- Join Tables: Join
EmployeeandDepartmentondepartmentId = idto associate employees with department names. - Correlated Subquery: For each employee, compute the maximum salary in their department using a subquery referencing the employee’s
departmentId. - Filter: Select employees whose salary equals their department’s maximum salary.
SQL Query
SELECT d.name AS Department, e.name AS Employee, e.salary AS Salary
FROM Employee e
JOIN Department d
ON e.departmentId = d.id
WHERE e.salary = (
SELECT MAX(salary)
FROM Employee
WHERE departmentId = e.departmentId
);
Explanation
- Join: The
JOINlinks each employee to their department usinge.departmentId = d.id. - Subquery: For each row in
Employee, the subquerySELECT MAX(salary) FROM Employee WHERE departmentId = e.departmentIdfinds the maximum salary for that department. It’s correlated because it usese.departmentIdfrom the outer query. - Filter: The
WHERE e.salary = (...)ensures only employees with the maximum salary are selected. - Output: Returns
d.name(department name),e.name(employee name), ande.salary(salary).
Time Complexity
Let’s define:
- n: Number of rows in
Employee. - m: Number of rows in
Department. - k: Average number of employees per department (n ≈ k * m).
- Join: With indexes on
Employee.departmentIdandDepartment.id(primary key), a hash join costs O(n + m). A nested loop join with an index ond.idcosts O(n log m). - Correlated Subquery: Runs for each of n rows in
Employee.- Filter
departmentId = e.departmentId: With an index, costs O(log n + k) (index lookup plus scanning k rows). - Compute
MAX(salary)over k rows: O(k). - Total per subquery: O(log n + k).
- Across n rows: O(n * (log n + k)).
- Filter
- Filtering/Output: Comparing salaries and selecting columns is O(n).
Total: O(n * (log n + k)).
- If k is small, approximates O(n log n).
- If k is large (k ≈ n/m), becomes O(n²/m).
- Without an index on
departmentId, subquery costs O(n) each, leading to O(n²).
Drawbacks
- Performance: The subquery executes n times, inefficient for large
Employeetables. - Scalability: Performance degrades as n or k grows.
- Readability: Nested subquery is harder to follow for complex scenarios.
Solution 2: Using a Common Table Expression (CTE)
Approach
To improve performance, we precompute the maximum salary per department using a CTE and join it with Employee and Department.
- Precompute Maximum Salaries: Use a CTE to calculate the maximum salary for each department with
GROUP BY. - Join with Employee: Join the CTE with
Employeeto find employees matching the maximum salary. - Join with Department: Join with
Departmentfor department names. - Output: Select the required columns.
SQL Query
WITH MaxSalaries AS (
SELECT departmentId, MAX(salary) AS max_salary
FROM Employee
GROUP BY departmentId
)
SELECT d.name AS Department, e.name AS Employee, e.salary AS Salary
FROM Employee e
JOIN MaxSalaries ms ON e.departmentId = ms.departmentId AND e.salary = ms.max_salary
JOIN Department d ON e.departmentId = d.id;
Explanation
- CTE (
MaxSalaries): TheWITH MaxSalaries AS (...)creates a temporary result set with columnsdepartmentIdandmax_salary. The subquery computes the maximum salary per department, producing one row per department. - Joins:
JOIN MaxSalaries ms ON e.departmentId = ms.departmentId AND e.salary = ms.max_salary: Matches employees to their department’s maximum salary.JOIN Department d ON e.departmentId = d.id: Links employees to department names.
- Output: Returns
d.name,e.name, ande.salaryfor employees with the maximum salary. - Handling Ties: Includes all employees whose salaries match the maximum.
Time Complexity
- CTE (
MaxSalaries):- Scan
Employee: O(n). GROUP BY departmentId: With an index, streaming costs O(n); sorting costs O(n log n).- Compute
MAX(salary)for k rows per department, across m departments: O(m * k) = O(n). - Total: O(n) (streaming) or O(n log n) (sorting).
- Output: m rows.
- Scan
- Joins:
- Employee JOIN MaxSalaries: With indexes on
Employee(departmentId, salary), hash join costs O(n + m). Index-based join costs O(m log n) for r matches per department (r ≤ k). - Join with Department: For n’ result rows (n’ ≤ n), lookup in
Departmentcosts O(n’ log m). - Total: O(n + m) (hash join) or O(m log n + n’ log m).
- Employee JOIN MaxSalaries: With indexes on
- Output: O(n’), where n’ is small.
Total: O(n log n) (sorting) or O(n) (streaming with hash joins).
Advantages
- Performance: Computes maximum salaries once, avoiding n subquery executions.
- Readability: CTE separates logic, improving clarity.
- Scalability: Scales better with large datasets.
- Flexibility: CTE can be extended for additional aggregates.
Requirements
CTEs require MySQL 8.0+. For older versions, use a derived table:
SELECT d.name AS Department, e.name AS Employee, e.salary AS Salary
FROM Employee e
JOIN (
SELECT departmentId, MAX(salary) AS max_salary
FROM Employee
GROUP BY departmentId
) ms ON e.departmentId = ms.departmentId AND e.salary = ms.max_salary
JOIN Department d ON e.departmentId = d.id;
Comparison
| Aspect | Solution 1 (Correlated Subquery) | Solution 2 (CTE) |
|---|---|---|
| Time Complexity | O(n * (log n + k)) | O(n log n) or O(n) |
| Readability | Moderate (nested subquery) | High (clear, modular CTE) |
| Performance | Poor for large n or k | Efficient, single scan |
| Scalability | Limited | Scales well |
| MySQL Version | Any | 8.0+ (or derived table) |
Optimization Tips
- Indexes: Create a composite index on
Employee(departmentId, salary):CREATE INDEX idx_emp_dept_salary ON Employee(departmentId, salary);
- Execution Plan: Use
EXPLAINto verify index usage:EXPLAIN SELECT ...;
- Test with Data: Compare performance with large datasets.
- Edge Cases:
- Empty tables: Returns no rows.
- No employees in a department: Excludes such departments.
- Ties: Includes all maximum-salary employees.
Conclusion
The correlated subquery is intuitive but inefficient for large datasets due to repeated executions. The CTE-based solution precomputes maximum salaries, reducing scans and improving performance. It’s also more readable and maintainable. With proper indexes and MySQL 8.0+, the CTE approach is ideal. Run these queries, use EXPLAIN, and optimize for your data. Happy querying!
Additional Resources
For another perspective on solving this problem, check out this LeetCode solution by Bikash. It provides an easy-to-understand approach that complements the solutions discussed here.
Comments
Post a Comment