DEV Community

Discussion on: Welcome to SQL 3: Building Relationships and Combining Data Sets

Collapse
 
dominicbisset profile image
Dominic Bisset

That's a solid introduction to joins and unions. I particularly appreciated the animation - it's a great way of representing the filtering effect of joins, and certainly clearer than a Venn diagram.

I think you've missed some of the nuance of the full outer join though. The way you describe it as producing the Cartesian product of the two incoming tables - the rows in one table multiplied by the rows in the other - is more true of the CROSS JOIN (which you suggested is an alias of the INNER JOIN).

As it happens your Pandas example will produce the same output if you join the two tables on column B regardless of whether you do an FULL OUTER, LEFT or even INNER JOIN. The problem lies not in the type of join but that you're trying to join two tables with tangentially related data. There's not enough commonality between the sets to make sensible decisions about what should match to what. If you wanted to reduce the number of rows you'd have to use data sets with more columns so you have composite keys that overlap more.

Rather than showing rows that have multiple matches, a better example for FULL OUTER joins considers what happens when rows don't have a match. You end up with some complete rows, some rows with NULLs at the end and data at the start (assuming a select *), and others with NULLs at the start with data towards the end.

They are niche. There aren't many cases where you'd use one over a left join because largely you have got a primary driving data source that you're trying to add supplementary info to. A use case I came across was when I had two conceptually-similar tables from different sources - reels of paper tracked by different factory data collection systems. Both tables had some common columns, and some extra info that wasn't available in the other system. Both sources knew about some of the reels of paper but not all of them, and a LEFT (or RIGHT) JOIN would filter out the rows from the "other" system that the driving system didn't know about. By FULL OUTER JOINing them together and COALESCING the resultant NULLs to sensible defaults I was able to get the most detailed picture of the system as a whole.