SQL Case
The SQL CASE
function is one of my favorite. The command basically works like if-then command. If you are familiar with if-then commands, then feel free to skip this next bit.
If-Then
One of the reasons we have the amazing devices we do today is because a computer is capable of reasoning. A computer can compare two things and decide which one it likes.
Now, this may sound simple, but it's actually a subtle miracle. Anyone who has been stuck on the toothpaste isle trying to decide between the 45 kinds of toothpaste probably understands making decisions is difficult. Of course, human decision making and computer decision making are not even on the same level. Humans can make comparisons of all 45 products at once(sort of). Computers, they have to make a decision between two objects, then, two objects, then two objects, so forth, until it has made it through all 45. Fortunately, computers can make these decisions blazing fast.
In computer programming we call this computer decision making process control flow. But let's write some pseudocode for a little better understanding:
If (Computer Likes Toothpaste 1) then buy Toothpaste 1
Pretty simple, right? The only thing a computer can't do is decide if it likes Toothpaste 1 on its own. We have to program it to do that.
Well, this sentence makes sense if a computer is trying to decide to buy toothpaste or no toothpaste, but what if there are more than two toothpaste options? We just create another if-then statement.
If (Computer Likes Toothpaste 1 Best) then buy Toothpaste 1
If (Computer Likes Toothpaste 2 Best) then buy Toothpaste 2
Because the computer makes decisions in order it read them, then if it buys Toothpaste 1 then it will not buy Toothpaste 2. However, if he doesn't like Toothpaste 1 the best, then if he thinks Toothpaste 2 is the best he'll buy it. Otherwise, he will not buy any toothpaste--which makes sense, computers don't have teeth.
This is almost everything we need to know about if-then
, two more little catches.
First, what do we do if the computer doesn't like any of the Toothpaste and don't want him to just give up? We need a way to say, "Look computer, if you don't like any toothpaste the best then go ask for help."
In programming this is known as if-then-else
statements. They are similar to if-then
but with a contingency clause if something goes wrong.
Let's take a look:
if (Computer Likes Toothpaste 1 Best) then buy Toothpaste 1
if (Computer Likes Toothpaste 2 Best) then buy Toothpaste 2
else Go Ask a Computer Dentist what to buy
Ok, that's it. Now let's apply it to SQL.
SQL CASE WHEN
SQL applies if-then logic in several ways. We've already looked at the WHERE
statement, which basicaly works like an if-then
.
SELECT * FROM data WHERE Name = 'Bob'
See how this could be written as
SELECT * FROM data IF Name = 'Bob'
But the most likely SQL statement used for if-then-else
logic is the CASE WHEN
statement.
Here's an example to be run in R.
library(sqldf)
################### Data DO NOT CHANGE ###########################
peopleDf <- data.frame(PersonalID=c("ZP1U3EPU2FKAWI6K5US5LDV50KRI1LN7", "IA26X38HOTOIBHYIRV8CKR5RDS8KNGHV", "LASDU89NRABVJWW779W4JGGAN90IQ5B2"),
FirstName=c("Timmy", "Fela", "Sarah"),
LastName=c("Tesa", "Falla", "Kerrigan"),
DOB=c("2010-01-01", "1999-1-1", "1992-04-01"))
##################################################################
peopleDf1 <- sqldf("SELECT *,
CASE WHEN DOB > '2000-1-1' THEN 'Yes' ELSE 'No' END As 'Millennial'
FROM peopleDf")
Here is the output:
PersonalID | FirstName | LastName | DOB | Gender | Millennial |
---|---|---|---|---|---|
ZP1U3EPU2FKAWI6K5US5LDV50KRI1LN7 | Timmy | Tesa | 2010-01-01 | Male | Yes |
IA26X38HOTOIBHYIRV8CKR5RDS8KNGHV | Fela | Falla | 1999-1-1 | Female | No |
LASDU89NRABVJWW779W4JGGAN90IQ5B2 | Sarah | Kerrigan | 1992-04-01 | Female | No |
The SQL query, specifically the CASE WHEN
statement created a column called Millennial
, it then went through every person's date of birth, comparing it. When the query found a person who was born after 2000-01-01 it inserted a 'Yes' in the Millennial column. If they were not born after 2000-01-01 then it set the Millennial
column to 'No.' Nifty, right?
Notice, the ELSE
is required to get the 'No'. Otherwise, the query would leave everyone else blank.
Here's a few more examples of using CASE WHEN for powerful results.
Using OR with CASE WHEN
peopleDf2 <- sqldf("SELECT *,
CASE WHEN DOB > '2000-1-1' OR FirstName = 'Sarah' THEN 'PersonIsCool' ELSE 'NotHip' END As 'Cool?'
FROM peopleDf")
PersonalID | FirstName | LastName | DOB | Gender | Cool |
---|---|---|---|---|---|
ZP1U3EPU2FKAWI6K5US5LDV50KRI1LN7 | Timmy | Tesa | 2010-01-01 | Male | PersonIsCool |
IA26X38HOTOIBHYIRV8CKR5RDS8KNGHV | Fela | Falla | 1999-1-1 | Female | NotHip |
LASDU89NRABVJWW779W4JGGAN90IQ5B2 | Sarah | Kerrigan | 1992-04-01 | Female | PersonIsCool |
Using AND with CASE WHEN
peopleDf3 <- sqldf("SELECT *,
CASE WHEN FirstName = 'Sarah' AND LastName = 'Kerrigan' THEN 'Yes' ELSE ''
END As 'Queen of Blades'
FROM peopleDf")
PersonalID | FirstName | LastName | DOB | Gender | Queen of Blades |
---|---|---|---|---|---|
ZP1U3EPU2FKAWI6K5US5LDV50KRI1LN7 | Timmy | Tesa | 2010-01-01 | Male | |
IA26X38HOTOIBHYIRV8CKR5RDS8KNGHV | Fela | Falla | 1999-1-1 | Female | |
LASDU89NRABVJWW779W4JGGAN90IQ5B2 | Sarah | Kerrigan | 1992-04-01 | Female | Yes |
Using SUM with CASE WHEN
Using CASE WHEN
in combination with SUM
is a great way to get counts of different discrete data. Below is an example of getting total counts of males and females within the peopleDf
count1 <- sqldf("SELECT
SUM(
CASE WHEN Gender = 'Female' THEN 1 ELSE 0 END
) As 'NumberOfFemales',
SUM(
CASE WHEN Gender = 'Male' THEN 1 ELSE 0 END
) As 'NumberOfMales'
FROM peopleDf")
NumberOfFemales | NumberOfMales |
---|---|
2 | 1 |
Using Multiple CASES
So far, we've only covered one if-then
statement, but in our example with the toothpaste we could string them together. The same can be done with CASE WHEN
.
peopleDf4 <- sqldf("SELECT *, CASE WHEN DOB >= '1980-01-01' AND DOB < '1990-01-01' THEN 'X'
WHEN DOB >= '1990-01-01' AND DOB < '2000-01-01' THEN 'Y'
WHEN DOB >= '2000-01-01' AND DOB < '2010-01-01' THEN 'Millennial'
WHEN DOB >= '2010-01-01' AND DOB < '2020-01-01' THEN 'NotYetDefined'
END As 'Generation'
FROM peopleDf")
PersonalID | FirstName | LastName | DOB | Gender | Generation |
---|---|---|---|---|---|
ZP1U3EPU2FKAWI6K5US5LDV50KRI1LN7 | Timmy | Tesa | 2010-01-01 | Male | NotYetDefined |
IA26X38HOTOIBHYIRV8CKR5RDS8KNGHV | Fela | Falla | 1999-1-1 | Female | Y |
LASDU89NRABVJWW779W4JGGAN90IQ5B2 | Sarah | Kerrigan | 1992-04-01 | Female | Y |
Paste
The paste()
in R is meant for manipulating strings of text. You pass it strings as parameters and it returns one string containing all the strings passed into it. Let's take a look.
greeting <- paste("Hello how are you,", "Bob?")
After running this line the greeting
variable contains the following string Hello how are you, Bob?
. This can be used by printing the contents of the variable using the print()
print(greeting)
Side note, print()
will actually print out anything you pass it to the console. This can be useful when trying to debug code.
Back to our combined strings, notice whenever the greeting prints out there is a space inserted between 'you,' and 'Bob?', this is done automatically by paste. It will insert a space between every string you pass it, unless you pass the additional parameter sep
. This parameter will take whatever you set it as and insert it between the two strings.
greeting <- paste("Hello how are you,", "Bob?", sep = "!!")
print(greeting)
This time print()
will display "Hello how are you,!!Bob?" in the console. But, inserting exclamation marks is probably not what we want. Most of the time we will not want paste to insert anything and we can tell it to insert nothing.
greeting <- paste("Hello how are you,", "Bob?", sep = "")
print(greeting)
Print will spit out "Hello how are you,Bob?". Notice, there is no longer any character between "you," and "Bob?".
Paste is a pretty straightforward function, the one last trick is knowing you can pass in multiple strings.
greeting <- paste("Hello", " how are you,", " Bob?", sep = "")
print(greeting)
This will produce the string "Hello how are you, Bob?". Notice the spaces were inserted manually so the end string is readable to humans.
Dynamic SQL with Paste()
Prepare to have your mind blown. One of the powers of the paste()
is building a sqldf
string. Remember using SQLdf like this?
library(sqldf)
################### Data DO NOT CHANGE ###########################
peopleDf <- data.frame(PersonalID=c("ZP1U3EPU2FKAWI6K5US5LDV50KRI1LN7", "IA26X38HOTOIBHYIRV8CKR5RDS8KNGHV", "LASDU89NRABVJWW779W4JGGAN90IQ5B2"),
FirstName=c("Timmy", "Fela", "Sarah"),
LastName=c("Tesa", "Falla", "Kerrigan"),
DOB=c("2010-01-01", "1999-1-1", "1992-04-01"))
##################################################################
peopleDf1 <- sqldf("SELECT * FROM peopleDf WHERE DOB > '2001-01-01'")
This creates the table
PersonalID | FirstName | LastName | DOB |
---|---|---|---|
ZP1U3EPU2FKAWI6K5US5LDV50KRI1LN7 | Timmy | Tesa | 2010-01-01 |
This is a dataframe of everyone who was born after January 1st, 2001. This method of filtering data works for a static date. But let's say you wanted to easily change out the 2001-01-01
with other dates. You could replace the date with a different date, but when that date is in multiple SQL calls it can be easy to miss one. A better way to do it is using the paste()
. And remember, everything inside the sqldf()
parentheses is a string.
targetDate <- "2001-01-01"
sqlString <- paste("SELECT * FROM peopleDf WHERE DOB > '", targetDate, "'", sep = "")
peopleDf5 <- sqldf(sqlString)
Ok, let's take this slow, there's a lot going on. First, we create a variable called targetDate
and assign it the string 2001-01-01
. Next, we create a complex string using the paste()
which looks a lot like a SQLdf string, but instead of hardcoding the date, we insert the targetDate
variable. This creates the following string:
"SELECT * FROM peopleDf WHERE DOB > '2001-01-01'"
Which is then inserted into the variable sqlString
, which is a string.
Lastly, we pass the sqlString
variable into the sqldf()
which executes the fancy SQL query. Awesome, right?
Now, if we want to look at those born after a different date, we simply change the targetDate
variable and re-run the script.
targetDate <- "1980-01-01"
sqlString <- paste("SELECT * FROM peopleDf WHERE DOB > '", targetDate, "'", sep = "")
peopleDf5 <- sqldf(sqlString)
Top comments (0)