MOHAMMAD SHAHBAZ ALAM

Posted on Jun 2, 2023

SQL Case Study 2: Human Resources

#sql #join #cte #data

In the world of data management and analysis, SQL (Structured Query Language) plays a vital role in handling and querying vast amounts of information stored in databases. In this case study, we will explore the application of SQL in the context of Human Resources (HR). By leveraging SQL, we can effectively manage employee data, analyze departmental information, and gain valuable insights into the workforce. Let’s dive into the case study and see how SQL can be utilized to tackle HR-related challenges.

Overview of the Data:

To begin with, we have three tables: ‘departments’, ‘employees’, and ‘projects’.The ‘departments’ table contains information about different departments, including the department’s ID, name, and manager’s ID. The ‘employees’ table stores data on individual employees, such as their ID, name, hire date, job title, and the department they belong to. Lastly, the ‘projects’ table contains details about various projects, including the project’s ID, name, start date, end date, and associated department.

The challenge for today is provided by Data in Motion and you can find the challenge here.

Let's create a database:

CREATE DATABASE departmentsDB;

Let’s create the table in our database:

-- Create 'departments' table
CREATE TABLE departments (
    id SERIAL PRIMARY KEY,
    name VARCHAR(50),
    manager_id INT
);

-- Create 'employees' table
CREATE TABLE employees (
    id SERIAL PRIMARY KEY,
    name VARCHAR(50),
    hire_date DATE,
    job_title VARCHAR(50),
    department_id INT REFERENCES departments(id)
);

-- Create 'projects' table
CREATE TABLE projects (
    id SERIAL PRIMARY KEY,
    name VARCHAR(50),
    start_date DATE,
    end_date DATE,
    department_id INT REFERENCES departments(id)
);

Now, let’s enter data into tables:

-- Insert data into 'departments'
INSERT INTO departments (name, manager_id)
VALUES ('HR', 1), ('IT', 2), ('Sales', 3);

-- Insert data into 'employees'
INSERT INTO employees (name, hire_date, job_title, department_id)
VALUES ('John Doe', '2018-06-20', 'HR Manager', 1),
       ('Jane Smith', '2019-07-15', 'IT Manager', 2),
       ('Alice Johnson', '2020-01-10', 'Sales Manager', 3),
       ('Bob Miller', '2021-04-30', 'HR Associate', 1),
       ('Charlie Brown', '2022-10-01', 'IT Associate', 2),
       ('Dave Davis', '2023-03-15', 'Sales Associate', 3);

-- Insert data into 'projects'
INSERT INTO projects (name, start_date, end_date, department_id)
VALUES ('HR Project 1', '2023-01-01', '2023-06-30', 1),
       ('IT Project 1', '2023-02-01', '2023-07-31', 2),
       ('Sales Project 1', '2023-03-01', '2023-08-31', 3);

Update ‘manager_id’ in the ‘departments’ table:

-- Update 'manager_id' in 'departments'
UPDATE departments
SET manager_id = (SELECT id FROM employees WHERE name = 'John Doe')
WHERE name = 'HR';

UPDATE departments
SET manager_id = (SELECT id FROM employees WHERE name = 'Jane Smith')
WHERE name = 'IT';

UPDATE departments
SET manager_id = (SELECT id FROM employees WHERE name = 'Alice Johnson')
WHERE name = 'Sales';

Now, let’s try to answer the questions with the help of the above data.

Questions:

Find the longest ongoing project for each department.

SELECT dept.name, proj.name, MAX(end_date - start_date) AS duration
 FROM projects proj
JOIN departments dept ON proj.department_id = dept.id
GROUP BY dept.name, proj.name
ORDER BY duration DESC;

The query retrieves the department name, project name, and the maximum duration (in days) for each project in each department. The results are ordered in descending order of duration, allowing us to identify the project with the longest ongoing duration in each department.

Find all employees who are not managers.

SELECT *
FROM employees
WHERE id NOT IN(SELECT manager_id FROM departments);

This query selects all rows from the ‘employees’ table where the ‘id’ of the employee is not present in the ‘manager_id’ column of the ‘departments’ table.

** Find all employees who have been hired after the start of a project in their department.**

SELECT emp.name, proj.start_date, emp.hire_date
FROM employees emp
JOIN projects proj ON emp.department_id = proj.department_id
WHERE emp.hire_date > proj.start_date;

This query joins the ‘employees’ and ‘projects’ tables based on the department_id column. It selects the employee name, hire date, department ID, project name, and project start date. The WHERE clause filters the results to include only employees who were hired after the start date of a project in their department.

4.** Rank employees within each department based on their hire date (the earliest hire gets the highest rank).**

SELECT emp.name AS employee_name, emp.hire_date, emp.department_id,
    RANK() OVER(PARTITION BY emp.department_id ORDER BY emp.hire_date ASC) AS rank
FROM employees emp
ORDER BY emp.department_id, rank;

This query uses the RANK() function along with the OVER clause to rank employees within each department based on their hire date. The PARTITION BY clause divides the employees into partitions based on the department_id, and the ORDER BY clause orders the employees within each partition by their hire_date in ascending order.

Find the duration between the hire date of each employee and the hire date of the next employee hired in the same department.

WITH ranked_employees AS (
    SELECT emp.name AS employee_name, emp.hire_date, emp.department_id,
        RANK() OVER(PARTITION BY emp.department_id ORDER BY emp.hire_date ASC) AS rank_num
    FROM employees emp
),
lead_ranked AS (
    SELECT *,
        LEAD(hire_date) OVER(PARTITION BY department_id ORDER BY hire_date ASC) AS next_hire_date
    FROM ranked_employees
)

SELECT re1.department_id, re1.employee_name AS employee, re2.employee_name AS next_hired_employee,
    re2.hire_date - re1.hire_date AS duration
FROM lead_ranked re1
JOIN lead_ranked re2 ON re1.rank_num = re2.rank_num - 1 AND re1.department_id = re2.department_id;

This query calculates the duration between the hire date of each employee and the hire date of the next employee hired in the same department. The results include the department ID, employee name, next hired employee name, and the duration between their hire dates.

Conclusions:

SQL proves to be a powerful tool for managing HR-related data. In this case study, we explored various applications of SQL, including identifying the longest ongoing project per department, identifying non-manager employees, ranking employees by hire date, and calculating durations between hire dates. By harnessing the capabilities of SQL, HR professionals can optimize workforce management, gain valuable insights, and make data-driven decisions that contribute to the overall success of the organization.

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.

DEV Community

SQL Case Study 2: Human Resources

Top comments (0)

Read next

Data Science Simplified: Tips for Aspiring Data Scientists in 2025

Planning and creating a database before you start working on your backend can save you time

Data Traceability: Key Concepts and Best Practices

Major Technologies Worth Learning in 2025 for Data Professionals