DEV Community

Cover image for Mastering Logical Comparison, Control Flow, Filtering on Numpy Array and Pandas DataFrame
PyProDev
PyProDev

Posted on • Originally published at linkedin.com

Mastering Logical Comparison, Control Flow, Filtering on Numpy Array and Pandas DataFrame

In this article, we will learn about different comparison operators, how to combine them with Boolean operators, and how to use  Boolean outcomes in control structures. Boolean logic is the foundation of decision-making in Python programs. We'll also learn to filter data in pandas DataFrames using logic, a skill that a data scientist must have.

Comparison Operators

Comparison operators are operators that can tell how two values relate, and result in a boolean.

Numeric comparisons

In the simplest sense, we can use these operators on numbers. For example, if we want to check if 2 is smaller than 3, we type 2 less than sign 3.

print(2 < 3)
Enter fullscreen mode Exit fullscreen mode
output:
True
Enter fullscreen mode Exit fullscreen mode

Because 2 is less than 3, we get True. we can also check if two values are equal, with a double equals sign. From this call, we see that 5 equals 6 gives us False.

print(5 == 6)
Enter fullscreen mode Exit fullscreen mode
output:
False
Enter fullscreen mode Exit fullscreen mode

It makes sense because 5 is not equal to 6. We can also make a combination of equality and smaller than. Have a look at this command that checks if 5 is smaller than or equal to 6.

print(5 <= 6)
Enter fullscreen mode Exit fullscreen mode
output:
True
Enter fullscreen mode Exit fullscreen mode

It's TRUE, but also 6 smaller than or equal to 6 is True.

print(6 <= 6)
Enter fullscreen mode Exit fullscreen mode
output:
True
Enter fullscreen mode Exit fullscreen mode

Of course, we can also use comparison operators directly on variables that represent these integers.

x = 5
y = 6
print(x < y)
Enter fullscreen mode Exit fullscreen mode
output:
True
Enter fullscreen mode Exit fullscreen mode

Comparison between strings

All these operators also work for strings. Let's check if "abc" is smaller than "acd".

print("abc" < "acd")
Enter fullscreen mode Exit fullscreen mode
output:
True
Enter fullscreen mode Exit fullscreen mode

According to the alphabet order, "abc" comes before "acd", so the result is True.

Comparison between integer and string

Let's find out if comparing a string and an integer works. Here if the integer 2 is smaller than the string "abc".

print(2 < "abc")
Enter fullscreen mode Exit fullscreen mode
output:
TypeError: '<' not supported between instances of 'int' and 'str'
Enter fullscreen mode Exit fullscreen mode

We get an error (TypeError: '<' not supported between instances of 'int' and 'str'). Typically, Python can't tell how two objects with different types relate.

Comparison between integer and float

Different numeric types, such as floats and integers, are exceptions.

print(3 < 4.12)
Enter fullscreen mode Exit fullscreen mode
output:
True
Enter fullscreen mode Exit fullscreen mode

No error this time. In general, always make sure that we make comparisons between objects of the same type.

Compare on Numpy array

Another exception arises when we compare on NumPy array, lengths, with an integer, 22. This works perfectly.

import numpy as np
lengths = np.array([21.85, 20.97, 21.75, 24.74, 21.44])
print(type(lengths))
print(lengths > 22)
Enter fullscreen mode Exit fullscreen mode
output:
<class 'numpy.ndarray'>
[False False False  True False]
Enter fullscreen mode Exit fullscreen mode

NumPy figures out that we want to compare every element in lengths with 22, and returns corresponding booleans. Behind the scenes, NumPy builds a NumPy array of the same size filled with the number 22, and then performs an element-wise comparison. This is concise, very efficient code, which data scientists love!

We can also compare two NumPy arrays element-wise. house1 and house2 contain the areas for the kitchen, living room, bedroom and bathroom in the same order. Which areas in house1 are smaller than the ones in house2 like this?

house1 = np.array([18.0, 20.0, 10.75, 9.50])
house2 = np.array([14.0, 24.0, 14.25, 9.0])
print(house1 < house2)
Enter fullscreen mode Exit fullscreen mode
output:
[False  True  True False]
Enter fullscreen mode Exit fullscreen mode

It appears that the living room and bedroom in house1 are smaller than the corresponding areas in house2.

Comparators

Here is the table that summarizes all comparison operators.

Comparator Meaning
< less than
<= less than or equal to
> greater than
>= greater than or equal to
== equal to
!= not equal to

We are already familiar with some of these. They're all pretty straightforward, except for the not equal !=. The exclamation mark followed by an equals sign stands for inequality. It's the opposite of equality.

Equality

To check if two Python values, or variables, are equal you can use ==. To check for inequality, you need !=. Have a look at the following examples that all result in True.

print(2 == (1 + 1))
print("PYTHON" != "python")
print(True != False)
print("Python" != "python")
Enter fullscreen mode Exit fullscreen mode
output:
True
True
True
True
Enter fullscreen mode Exit fullscreen mode

Write a code to see if True equals False.

# Comparison of booleans
print(True == False)
Enter fullscreen mode Exit fullscreen mode
output:
False
Enter fullscreen mode Exit fullscreen mode

Write Python code to check if -3 * 15 is not equal to 45.

# Comparison of integers
print(( -3 * 15 ) != 45)
Enter fullscreen mode Exit fullscreen mode
output:
True
Enter fullscreen mode Exit fullscreen mode

Ask Python whether the strings "python" and "Python" are equal.

# Comparison of strings
print("python" == "Python")
Enter fullscreen mode Exit fullscreen mode
output:
False
Enter fullscreen mode Exit fullscreen mode

Note that strings are case-sensitive. What happens if you compare booleans and integers? Write code to see if True and 1 are equal.

# Compare a boolean with an integer
print(True == 1)
print(True == 2)
Enter fullscreen mode Exit fullscreen mode
output:
True
False
Enter fullscreen mode Exit fullscreen mode

A boolean is a special kind of integer: True corresponds to 1False corresponds to 0.

Greater and less than

We also talked about the less than and greater than signs, < and > in Python. We can combine them with an equals sign to get <= and >=. Note that =< and => are not valid. For examples.

print(3 < 4)
print(3 <= 4)
print("alpha" <= "beta")
Enter fullscreen mode Exit fullscreen mode
output:
True
True
True
Enter fullscreen mode Exit fullscreen mode

Remember that for string comparison, Python determines the relationship based on alphabetical order.

Check if x is greater than or equal to -13.

# Comparison of integers
x = -4 * 3
print(x >= -13)
Enter fullscreen mode Exit fullscreen mode
output:
True
Enter fullscreen mode Exit fullscreen mode

Check if True is greater than False.

# Comparison of booleans
print(True > False)
Enter fullscreen mode Exit fullscreen mode
output:
True
Enter fullscreen mode Exit fullscreen mode

Remember that True is 1 and False is 0 in value.

Boolean Operators

We can produce booleans by performing comparison operations. The next step is combining these booleans. We can use boolean operators for this. The three most common ones are

  • and,

  • or, and

  • not.

and

The and operator works just as we would expect. It takes two booleans and returns True only if both the booleans themselves are True.

Case1 Case2 Case1 and Case2
True True True
True False False
False True False
False False False
print(True and True)
print(True and False)
print(False and True)
print(False and False)
Enter fullscreen mode Exit fullscreen mode
output:
True
False
False
False
Enter fullscreen mode Exit fullscreen mode

Instead of using booleans, we can also use the results of comparisons. Suppose we have a variable x, equal to 8. To check if this variable is greater than 5 but less than 15, we can use x greater than 5 and x less than 15.

x = 8
print(x > 5 and x < 15)
Enter fullscreen mode Exit fullscreen mode
output:
True
Enter fullscreen mode Exit fullscreen mode

As we already learned, the first part will evaluate to True. The second part will also evaluate to True. So the result of this expression, True and True, is True. This makes sense, because 8 lies between 5 and 15.

or

The or operator works similarly, but the difference is that only at least one of the booleans should be True.

Case1 Case2 Case1 or Case2
True True True
True False True
False True True
False False False
print(True or True)
print(True or False)
print(False or True)
print(False or False)
Enter fullscreen mode Exit fullscreen mode
output:
True
True
True
False
Enter fullscreen mode Exit fullscreen mode

Also here we can make combinations with variables, like this example that checks if a variable y, which is equal to 3, is less than 5 or above 10.

y = 3
print(y < 5 or y > 10)
Enter fullscreen mode Exit fullscreen mode
output:
True
Enter fullscreen mode Exit fullscreen mode

3 less than 5 is True3 greater than 10 is False. The or operation thus returns True.

not

Finally, let's the not operator. It simply negates the boolean value we use it on. not True is False, not False is True. The not operation is typically useful if we're combining different boolean operations and then want to negate that result.

print(not True)
print(not False)
Enter fullscreen mode Exit fullscreen mode
output:
False
True
Enter fullscreen mode Exit fullscreen mode

Nested Boolean operators

Let's take the boolean operators to another level.

Note that not has a higher priority than and and or, it is executed first.

x = 8
y = 9
not(not(x < 3) and not(y < 8 or y > 14))
Enter fullscreen mode Exit fullscreen mode
output:
False
Enter fullscreen mode Exit fullscreen mode

Correct! x < 3 is Falsey < 8 or y > 14 is False as well. If you continue working like this, simplifying from inside to outward, you'll end up with False.

Filtering on NumPy arrays

Now, for NumPy arrays, things are different. Retaking the lengths example, we can try to find out which lengths are higher than 21, but lower than 22. The output of lengths greater than 21 is easily found, so is the one for the lengths lower than 22.

print(lengths)
print(lengths > 21)
print(lengths < 22)
Enter fullscreen mode Exit fullscreen mode
output:
[21.85 20.97 21.75 24.74 21.44]
[ True False  True  True  True]
[ True  True  True False  True]
Enter fullscreen mode Exit fullscreen mode

Let's now try to combine those with the and operator we just learned.

print(lengths > 21 and lengths < 22)
Enter fullscreen mode Exit fullscreen mode
output:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Enter fullscreen mode Exit fullscreen mode

Oops, python return ValueError: The truth value of an array with more than one element is ambiguous. Clearly it doesn't like an array of booleans to work on.

Numpy provides these "array equivalents" of andor and not functions,

  • logical_and,

  • logical_or and

  • logical_not.

To find out which lengths are between 21 and 22, we will use these functions. Again, as we expect from NumPy, the and operation is performed element-wise.

print(np.logical_and(lengths > 21, lengths < 22))
Enter fullscreen mode Exit fullscreen mode
output:
[ True False  True False  True]
Enter fullscreen mode Exit fullscreen mode

To select only these lengths are between 21 and 22, we can use the resulting array of booleans in square brackets.

print(lengths[np.logical_and(lengths > 21, lengths < 22)])
Enter fullscreen mode Exit fullscreen mode
output:
[21.85 21.75 21.44]
Enter fullscreen mode Exit fullscreen mode

Again, NumPy wins when it comes to writing short yet very expressive Python code. How about this on Pandas DataFrames, the de facto standard for dataset manipulation?

Boolean operators on NumPy Array

Before, the operational operators like < and >= worked with NumPy arrays out of the box. Unfortunately, this is not true for the boolean operators andor, and not.

To use these operators with NumPy, we will need np.logical_and()np.logical_or() and np.logical_not(). Here's an example on the house1 and house2 arrays.

Generate boolean arrays that answer the following questions:

  • Which areas in my_house are greater than 18.5 or smaller than 10?
# house1 greater than 18.5 or smaller than 10
print(np.logical_or(house1 > 18.5, house2 < 10))
Enter fullscreen mode Exit fullscreen mode
output:
[False  True False  True]
Enter fullscreen mode Exit fullscreen mode
  • Which areas are smaller than 11 in both house1 and house2?
# Both house1 and house2 smaller than 11
print(np.logical_and(house1 < 11, house2 < 11))
Enter fullscreen mode Exit fullscreen mode
output:
[False False False  True]
Enter fullscreen mode Exit fullscreen mode

Filtering on pandas DataFrames

The NumPy array can be useful to do comparison operations and boolean operations on an element-wise basis. Let's now use this knowledge on Pandas DataFrame. Click here to download the countries.csv file. First, let's import the countries dataset from the CSV file using pandas.

import pandas as pd
countries = pd.read_csv('countries.csv', index_col=0)
print(countries)
Enter fullscreen mode Exit fullscreen mode
output:
       country    capital  population
IND      India  New Delhi  1393409030
MMR    Myanmar     Yangon    54806010
THA   Thailand    Bangkok    69950840
SGP  Singapore  Singapore     5453570
CHN      China    Beijing  1412360000
Enter fullscreen mode Exit fullscreen mode

Suppose you now want to keep the countries, for which the population is greater than 100,000,000. There are three steps to this.

  1. First of all, we want to get the population column from countries.

  2. Next, we perform the comparison on this column and store its result.

  3. Finally, we should use this result to do the appropriate selection on the DataFrame.

Step 1: Get the column

So the first step, getting the population column from countries. There are many different ways to do this. What's important here, is that we ideally get a Pandas Series, not a Pandas DataFrame. Let's do this with square brackets, like this.

print(type(countries['population']))
print(countries['population'])
Enter fullscreen mode Exit fullscreen mode
output:
<class 'pandas.core.series.Series'>
IND    1393409030
MMR      54806010
THA      69950840
SGP       5453570
CHN    1412360000
Name: population, dtype: int64
Enter fullscreen mode Exit fullscreen mode

This loc alternative and this iloc version would also work perfectly fine.

print(countries.loc[:, 'population'])
Enter fullscreen mode Exit fullscreen mode
output:
IND    1393409030
MMR      54806010
THA      69950840
SGP       5453570
CHN    1412360000
Name: population, dtype: int64
Enter fullscreen mode Exit fullscreen mode
print(countries.iloc[:, 2])
Enter fullscreen mode Exit fullscreen mode
output:
IND    1393409030
MMR      54806010
THA      69950840
SGP       5453570
CHN    1412360000
Name: population, dtype: int64
Enter fullscreen mode Exit fullscreen mode

Step 2: Compare

Next, we perform the comparison. To see which rows have a population greater than 100,000,000, we simply append greater than 100000000 to the code from before, like this.

print(countries['population'] > 100000000)
Enter fullscreen mode Exit fullscreen mode
output:
IND     True
MMR    False
THA    False
SGP    False
CHN     True
Name: population, dtype: bool
Enter fullscreen mode Exit fullscreen mode

Now we get a Series containing booleans. If you compare it to the population values, you can see that the population with a value over 100000000 corresponds to True, and the ones with a value under 100000000 correspond to False now. Let's store this Boolean Series as is_huge.

is_huge = countries['population'] > 100000000
print(is_huge)
Enter fullscreen mode Exit fullscreen mode
output:
IND     True
MMR    False
THA    False
SGP    False
CHN     True
Name: population, dtype: bool
Enter fullscreen mode Exit fullscreen mode

Step 3: Subset the DataFrame

The final step is using this boolean Series is_huge to subset the Pandas DataFrame. To do this, we put is_huge inside square brackets.

print(countries[is_huge])
Enter fullscreen mode Exit fullscreen mode
output:
    country    capital  population
IND   India  New Delhi  1393409030
CHN   China    Beijing  1412360000
Enter fullscreen mode Exit fullscreen mode

The result is exactly what we want: only the countries with an population greater than 100000000, namely India and China.

Summary

So let's summarize this: we selected the population column, performed a comparison on the population column and stored it as is_huge so that we can use it to index the countries DataFrame. These different commands do the trick. However, we can also write this in one line. simply put the code that defines is_huge directly in the square brackets.

print(countries[countries['population'] > 100000000])
Enter fullscreen mode Exit fullscreen mode
output:
    country    capital  population
IND   India  New Delhi  1393409030
CHN   China    Beijing  1412360000
Enter fullscreen mode Exit fullscreen mode

Great! Pandas help data scientists' life much easy.

Boolean operators on Pandas DataFrame

Now we haven't used boolean operators yet. Remember that we used this logical_and function from the NumPy package to do an element-wise boolean operation on NumPy arrays? Because Pandas is built on NumPy, we can also use that function here. Let's write the codes which keep the observations that have a population between 10,000,000 and 90,000,000.

print(countries)
Enter fullscreen mode Exit fullscreen mode
output:
       country    capital  population
IND      India  New Delhi  1393409030
MMR    Myanmar     Yangon    54806010
THA   Thailand    Bangkok    69950840
SGP  Singapore  Singapore     5453570
CHN      China    Beijing  1412360000
Enter fullscreen mode Exit fullscreen mode
print(np.logical_and(countries['population'] > 10000000, countries['population'] < 90000000))
Enter fullscreen mode Exit fullscreen mode
output:
IND    False
MMR     True
THA     True
SGP    False
CHN    False
Name: population, dtype: bool
Enter fullscreen mode Exit fullscreen mode

The only thing left to do is placing this code inside square brackets to subset countries appropriately. This time, only Myanmar and Thailand are included. Look how easy it is to filter DataFrames to get interesting results.

print(countries[np.logical_and(countries['population'] > 10000000, countries['population'] < 90000000)])
Enter fullscreen mode Exit fullscreen mode
output:
      country  capital  population
MMR   Myanmar   Yangon    54806010
THA  Thailand  Bangkok    69950840
Enter fullscreen mode Exit fullscreen mode

Now we know about comparison operators such as

  • <

  • <=

  • >

  • >=

  • ==

  • !=

and we also know how to combine the boolean results, using boolean operators such as

  • and,

  • or and

  • not.

Control Flow

Things get interesting when we can use these concepts to change how our program behaves. Depending on the outcome of our comparisons, we might want our Python code to behave differently. we can do this with conditional statements in Python:

  • if,

  • else and

  • elif.

if

Suppose we have a variable x, equal to 4. If the value is even, we want to print out: "x is even".

x = 4
if x % 2 == 0:
    print('x is even.')
Enter fullscreen mode Exit fullscreen mode
output:
x is even.
Enter fullscreen mode Exit fullscreen mode

The modulo operator % with 2 will return 0 if x is even. Python checks if the condition holds. It's true, so the corresponding code is executed: "x is even" gets printed out.

Let's compare this to the general recipe for an if statement. It reads as follows: if the condition is True, execute the codes. 

Notice the colon at the end, and the fact that we simply have to indent the Python code with four spaces (or a tab) to tell Python what to do in case the condition succeeds. To exit the if statement, simply continues with some Python code without indentation, and Python will know that it's not part of the if statement. It's perfectly possible to have more lines inside the if statement, like this for example.

x = 4
if x % 2 == 0:
    print('Cheching if x (', x, ') is divisible by 2...')
    print('x is even.')
Enter fullscreen mode Exit fullscreen mode
output:
Cheching if x ( 4 ) is divisible by 2...
x is even.
Enter fullscreen mode Exit fullscreen mode

The script now prints out two lines if we run it. If the condition does not pass, the expression is not executed. You can see this if we change x to be 3 and rerun the code.

x = 3
if x % 2 == 0:
    print('Cheching if x (', x, ') is divisible by 2...')
    print('x is even.')
Enter fullscreen mode Exit fullscreen mode
output:
Enter fullscreen mode Exit fullscreen mode

There's no output. Suppose now that we want to print out "x is odd" in this case. How to do this?

else

Well, we can simply use an else statement, like this.

x = 3
if x % 2 == 0:
    print('x is even.')
else:
    print('x is odd.')
Enter fullscreen mode Exit fullscreen mode
output:
x is odd.
Enter fullscreen mode Exit fullscreen mode

If we run it with x equal to 3, the condition is not true, so the expression for the else statement gets printed out. The general recipe looks like this: for the else statement, we don't need to specify a condition. The else corresponding expression gets run if the condition of the if statements don't hold True.

elif

We can think of cases where even more customized behavior is necessary. Say we want different printouts for numbers that are divisible by 2 and by 3. We can use some elif in there to get the job done. Here is an example.

x = 3
if x % 2 == 0: # False
    print('x is divisible by 2.')
elif x % 3 == 0: # True
    print('x is divisible by 3.')
else:
    print('x is not divisible by both 2 & 3.')
Enter fullscreen mode Exit fullscreen mode
output:
x is divisible by 3
Enter fullscreen mode Exit fullscreen mode

If x equals 3, the first condition is False, so it goes over to check the next condition. This condition holds True so the corresponding print statement is executed.

Suppose now that x equals 6. Both the if and elif conditions hold True in this case. Will two printouts occur?

x = 6
if x % 2 == 0: # True
    print('x is divisible by 2.')
elif x % 3 == 0: # never reach here
    print('x is divisible by 3.')
else:
    print('x is not divisible by both 2 & 3.')
Enter fullscreen mode Exit fullscreen mode
output:
x is divisible by 2.
Enter fullscreen mode Exit fullscreen mode

Nope. As soon as Python finds a true condition, it executes the corresponding code and then leaves the whole control structure after that. This means the second condition, corresponds to the elif, is never reached so there's no corresponding printout. Control flow can be extremely powerful when we're writing Python scripts.

In this article, we learned logical comparison, control flow, and filtering on Numpy Array and Pandas DataFrame.


Read the original article.

Connect & Discuss with us on LinkedIn


Top comments (0)