Optimizing MySQL Queries for Finding Top Earners by Department (leetcode : 184. Department Highest Salary)
Optimizing MySQL Queries for Finding Top Earners by Department
In this blog post, we’ll explore how to write and optimize a MySQL query to find employees with the highest salary in each department. This is a common problem in database querying, often encountered in interviews and real-world applications. We’ll compare two solutions: one using a correlated subquery and an optimized version using a Common Table Expression (CTE). We’ll explain the approach, logic, time complexity, and provide the complete SQL code for both.
Problem Statement
Given two tables:
- Employee: Contains employee details with columns
id
,name
,salary
, anddepartmentId
. - Department: Contains department details with columns
id
andname
.
We need to find each employee who earns the highest salary in their department, returning the department name, employee name, and salary. If multiple employees share the highest salary in a department, all should be included.
Solution 1: Using a Correlated Subquery
Approach
The first solution uses a correlated subquery to identify the maximum salary for each employee’s department and filters for employees whose salaries match this maximum.
- Join Tables: Join
Employee
andDepartment
ondepartmentId = id
to associate employees with department names. - Correlated Subquery: For each employee, compute the maximum salary in their department using a subquery referencing the employee’s
departmentId
. - Filter: Select employees whose salary equals their department’s maximum salary.
SQL Query
SELECT d.name AS Department, e.name AS Employee, e.salary AS Salary FROM Employee e JOIN Department d ON e.departmentId = d.id WHERE e.salary = ( SELECT MAX(salary) FROM Employee WHERE departmentId = e.departmentId );
Explanation
- Join: The
JOIN
links each employee to their department usinge.departmentId = d.id
. - Subquery: For each row in
Employee
, the subquerySELECT MAX(salary) FROM Employee WHERE departmentId = e.departmentId
finds the maximum salary for that department. It’s correlated because it usese.departmentId
from the outer query. - Filter: The
WHERE e.salary = (...)
ensures only employees with the maximum salary are selected. - Output: Returns
d.name
(department name),e.name
(employee name), ande.salary
(salary).
Time Complexity
Let’s define:
- n: Number of rows in
Employee
. - m: Number of rows in
Department
. - k: Average number of employees per department (n ≈ k * m).
- Join: With indexes on
Employee.departmentId
andDepartment.id
(primary key), a hash join costs O(n + m). A nested loop join with an index ond.id
costs O(n log m). - Correlated Subquery: Runs for each of n rows in
Employee
.- Filter
departmentId = e.departmentId
: With an index, costs O(log n + k) (index lookup plus scanning k rows). - Compute
MAX(salary)
over k rows: O(k). - Total per subquery: O(log n + k).
- Across n rows: O(n * (log n + k)).
- Filter
- Filtering/Output: Comparing salaries and selecting columns is O(n).
Total: O(n * (log n + k)).
- If k is small, approximates O(n log n).
- If k is large (k ≈ n/m), becomes O(n²/m).
- Without an index on
departmentId
, subquery costs O(n) each, leading to O(n²).
Drawbacks
- Performance: The subquery executes n times, inefficient for large
Employee
tables. - Scalability: Performance degrades as n or k grows.
- Readability: Nested subquery is harder to follow for complex scenarios.
Solution 2: Using a Common Table Expression (CTE)
Approach
To improve performance, we precompute the maximum salary per department using a CTE and join it with Employee
and Department
.
- Precompute Maximum Salaries: Use a CTE to calculate the maximum salary for each department with
GROUP BY
. - Join with Employee: Join the CTE with
Employee
to find employees matching the maximum salary. - Join with Department: Join with
Department
for department names. - Output: Select the required columns.
SQL Query
WITH MaxSalaries AS ( SELECT departmentId, MAX(salary) AS max_salary FROM Employee GROUP BY departmentId ) SELECT d.name AS Department, e.name AS Employee, e.salary AS Salary FROM Employee e JOIN MaxSalaries ms ON e.departmentId = ms.departmentId AND e.salary = ms.max_salary JOIN Department d ON e.departmentId = d.id;
Explanation
- CTE (
MaxSalaries
): TheWITH MaxSalaries AS (...)
creates a temporary result set with columnsdepartmentId
andmax_salary
. The subquery computes the maximum salary per department, producing one row per department. - Joins:
JOIN MaxSalaries ms ON e.departmentId = ms.departmentId AND e.salary = ms.max_salary
: Matches employees to their department’s maximum salary.JOIN Department d ON e.departmentId = d.id
: Links employees to department names.
- Output: Returns
d.name
,e.name
, ande.salary
for employees with the maximum salary. - Handling Ties: Includes all employees whose salaries match the maximum.
Time Complexity
- CTE (
MaxSalaries
):- Scan
Employee
: O(n). GROUP BY departmentId
: With an index, streaming costs O(n); sorting costs O(n log n).- Compute
MAX(salary)
for k rows per department, across m departments: O(m * k) = O(n). - Total: O(n) (streaming) or O(n log n) (sorting).
- Output: m rows.
- Scan
- Joins:
- Employee JOIN MaxSalaries: With indexes on
Employee(departmentId, salary)
, hash join costs O(n + m). Index-based join costs O(m log n) for r matches per department (r ≤ k). - Join with Department: For n’ result rows (n’ ≤ n), lookup in
Department
costs O(n’ log m). - Total: O(n + m) (hash join) or O(m log n + n’ log m).
- Employee JOIN MaxSalaries: With indexes on
- Output: O(n’), where n’ is small.
Total: O(n log n) (sorting) or O(n) (streaming with hash joins).
Advantages
- Performance: Computes maximum salaries once, avoiding n subquery executions.
- Readability: CTE separates logic, improving clarity.
- Scalability: Scales better with large datasets.
- Flexibility: CTE can be extended for additional aggregates.
Requirements
CTEs require MySQL 8.0+. For older versions, use a derived table:
SELECT d.name AS Department, e.name AS Employee, e.salary AS Salary FROM Employee e JOIN ( SELECT departmentId, MAX(salary) AS max_salary FROM Employee GROUP BY departmentId ) ms ON e.departmentId = ms.departmentId AND e.salary = ms.max_salary JOIN Department d ON e.departmentId = d.id;
Comparison
Aspect | Solution 1 (Correlated Subquery) | Solution 2 (CTE) |
---|---|---|
Time Complexity | O(n * (log n + k)) | O(n log n) or O(n) |
Readability | Moderate (nested subquery) | High (clear, modular CTE) |
Performance | Poor for large n or k | Efficient, single scan |
Scalability | Limited | Scales well |
MySQL Version | Any | 8.0+ (or derived table) |
Optimization Tips
- Indexes: Create a composite index on
Employee(departmentId, salary)
:CREATE INDEX idx_emp_dept_salary ON Employee(departmentId, salary);
- Execution Plan: Use
EXPLAIN
to verify index usage:EXPLAIN SELECT ...;
- Test with Data: Compare performance with large datasets.
- Edge Cases:
- Empty tables: Returns no rows.
- No employees in a department: Excludes such departments.
- Ties: Includes all maximum-salary employees.
Conclusion
The correlated subquery is intuitive but inefficient for large datasets due to repeated executions. The CTE-based solution precomputes maximum salaries, reducing scans and improving performance. It’s also more readable and maintainable. With proper indexes and MySQL 8.0+, the CTE approach is ideal. Run these queries, use EXPLAIN
, and optimize for your data. Happy querying!
Additional Resources
For another perspective on solving this problem, check out this LeetCode solution by Bikash. It provides an easy-to-understand approach that complements the solutions discussed here.
Comments
Post a Comment