Pandas borrows a core concept from SQL:
the Join
But there are so many different types of joining two DataFrames
Let's make this easy and go through Left, Right, Inner and Outer joins
Knowing these is great in interviews in addition to your usual DataFrame shenanigans
Joins always happen on two DataFrames.
This explanation will use: ๐ฉ๐ชโฌ
- ๐ฉ: The left DataFrame
- ๐ช: The right DataFrame
- โฌ: The result
Don't get the word "join" wrong though, you can actually end up with a smaller DataFrame โฌ than either or ๐ฉ๐ช
โฌ ๏ธ The Left Join is selfish
This one takes the complete left DataFrame ๐ฉ and only checks for overlaps from the right ๐ช
No ๐ช from outside of the bounds of ๐ฉ will make it into โฌ
โก๏ธ The Right Join is almost the same as Left
Only take everything in ๐ช and overlapping ๐ฉ
โคต๏ธ The Inner Join
This one is tricky.
Almost always โฌ will be smaller than ๐ฉ&๐ช.
For the Inner join, you only look at the parts of ๐ฉ and ๐ช that overlap.
Nothing is included in โฌ that exists outside of this common area.
โ๏ธ The Outer Join
Is possibly the simples one.
It is exactly what we would expect from a "join".
Take all of ๐ฉ and all of ๐ช and combine it into โฌ.
All of the data is in our result.
TL;DR
- Pandas borrows from SQL using Joins
- Left and Right join maintain the original and whatever overlaps in the other
- Inner is only the common ground
- Outer uses all the data in both DataFrames
Top comments (0)