I'd like to share an example of a complex SQL query, which will return all possible data from Database.
DB Schema looks like this:
The task was "to displays all employees and their related info even if some info is missing. Get as much information as you can about the employees".
My final SQL query was like this:
SELECT e.employee_id AS "Employee #" , e.first_name || ' ' || e.last_name AS "Name" , e.email AS "Email" , e.phone_number AS "Phone" , TO_CHAR(e.hire_date, 'MM/DD/YYYY') AS "Hire Date" , TO_CHAR(e.salary, 'L99G999D99', 'NLS_NUMERIC_CHARACTERS = ''.,'' NLS_CURRENCY = ''$''') AS "Salary" , e.commission_pct AS "Comission %" , 'works as ' || j.job_title || ' in ' || d.department_name || ' department (manager: ' || dm.first_name || ' ' || dm.last_name || ') and immediate supervisor: ' || m.first_name || ' ' || m.last_name AS "Current Job" , TO_CHAR(j.min_salary, 'L99G999D99', 'NLS_NUMERIC_CHARACTERS = ''.,'' NLS_CURRENCY = ''$''') || ' - ' || TO_CHAR(j.max_salary, 'L99G999D99', 'NLS_NUMERIC_CHARACTERS = ''.,'' NLS_CURRENCY = ''$''') AS "Current Salary" , l.street_address || ', ' || l.postal_code || ', ' || l.city || ', ' || l.state_province || ', ' || c.country_name || ' (' || r.region_name || ')' AS "Location" , jh.job_id AS "History Job ID" , 'worked from ' || TO_CHAR(jh.start_date, 'MM/DD/YYYY') || ' to ' || TO_CHAR(jh.end_date, 'MM/DD/YYYY') || ' as ' || jj.job_title || ' in ' || dd.department_name || ' department' AS "History Job Title" FROM employees e -- to get title of current job_id JOIN jobs j ON e.job_id = j.job_id -- to get name of current manager_id LEFT JOIN employees m ON e.manager_id = m.employee_id -- to get name of current department_id LEFT JOIN departments d ON d.department_id = e.department_id -- to get name of manager of current department -- (not equal to current manager and can be equal to the employee itself) LEFT JOIN employees dm ON d.manager_id = dm.employee_id -- to get name of location LEFT JOIN locations l ON d.location_id = l.location_id LEFT JOIN countries c ON l.country_id = c.country_id LEFT JOIN regions r ON c.region_id = r.region_id -- to get job history of employee LEFT JOIN job_history jh ON e.employee_id = jh.employee_id -- to get title of job history job_id LEFT JOIN jobs jj ON jj.job_id = jh.job_id -- to get namee of department from job history LEFT JOIN departments dd ON dd.department_id = jh.department_id ORDER BY e.employee_id;
Let's go through it step by step:
Before creating our final Select statement, let's see which tables require LEFT joins and which do not. We need this check because LEFT JOINS are slower than INNER JOINS, so for efficiency we will be using (INNER) JOINS were possible.
To check that every employee has a job_id. If there are results (id's of employees), so we need LEFT JOIN, because there are employees who do not have (IS NULL == true) job_id. Otherwise (IS NULL == false) we use only JOIN:
SELECT employee_id FROM employees WHERE job_id IS NULL;
Result: empty. Conclusion: we use JOIN for 'jobs' table.
The same check as above, this time checking department_id:
SELECT employee_id FROM employees WHERE department_id IS NULL;
Result: 178. Conclusion: we use LEFT JOIN for 'departments' table.
Because there is one employee without department_id, the whole chain of future joins (departments-loations-countries-regions) should be also LEFT Joined, despite the fact that these tables can be INNER joined between each other in a separate query.
The same check as above, this time checking manager_id:
SELECT employee_id FROM employees WHERE manager_id IS NULL;
Result: 100. Conclusion: we use LEFT JOIN (self join) for 'employees' table. It means, that one employee doesn't have a manager.
And some general comments on this final query:
- we use it to define alias, column names in the result table look more human-readable.
- we use this to concatenate strings
- we enclose any text into single quotes
- in this way we can set format of the date
TO_CHAR(e.salary, 'L99G999D99', 'NLS_NUMERIC_CHARACTERS = ''.,'' NLS_CURRENCY = ''$''')
- in this way we can output currency with separators. My detailed post about it can be found here.
JOIN jobs j ...
- this 'j' means alias to a table name. The same table can be joined to itself (self join) and for this reason this table alias for the same table can and should be different.
The query returned 110 results, though there are only 107 employees. Because out of 107 employees, 3 have two job history entries. The first couple of rows of the result look like this:
We could definitely go further and get all information about departments, where according to job history entries a couple of employees were working, but I decided that it would be too much.
Originally posted on my website.
I hope this example will be useful for newbies. And to those of you, who are experienced, I'd like to address some questions:
- Is it really important which JOIN to apply: LEFT or INNER? I mean is there any real gain in productivity?
- Is it a good practice to pack all possible information into one query? Or to split it into small queries is a better approach?
- What mistakes were made in this query, if any?