## DEV Community

Mohamed M El-Kalioby

Posted on

# Work on CSV files with Leopards

Filtering a CSV file is one of the first steps when trying to interpolate data for data science. This article will show how to use Leopards to filter a CSV file.

We will use this csv file to do our filtrations.

### Install leopards

``````pip install leopards
``````

### Open csv file

We will use the build-in Python module `csv` to load CSV

``````import csv
rows = csv.DictReader(open('people-100.csv'), delimiter=",")
``````

## Example: Count How many of each gender is in the set

Let's see how many males/females are in the dataset

``````from  leopards import Count
res = Count(rows,['Sex'])
print(res)
``````

output:

``````dict_values([{'Sex': 'Male', 'count': 47}, {'Sex': 'Female', 'count': 53}])
``````

Lets run the same code again.

``````res = Count(rows,['Sex'])
print(res)
``````

output:

``````dict_values([])
``````

### Explanation

Why did it return result in the first run but not the second run?
DictReader is an iterator so with the first run, we were on the first item, so leopards was able to iterate over it but with the second run, the iterator was already consumed so it is like an empty list. so to solve the issue either load the rows in memory like below `rows=[r for r in rows]` or recreate the iterator by `rows = csv.DictReader(open('people-100.csv'), delimiter=",")`

## Example: Find people whose email is '@example.com'

``````from  leopards import Q
rows = csv.DictReader(open('people-100.csv'), delimiter=",")
res = Q(rows,Email__icontains='example.com')
for item in res:
print(item)
``````

output:

``````{'Index': '2', 'User Id': 'f90cD3E76f1A9b9', 'First Name': 'Phillip', 'Last Name': 'Summers', 'Sex': 'Female', 'Email': 'bethany14@example.com', 'Phone': '214.112.6044x4913', 'Date of birth': '1910-03-24', 'Job Title': 'Phytotherapist'}
{'Index': '3', 'User Id': 'DbeAb8CcdfeFC2c', 'First Name': 'Kristine', 'Last Name': 'Travis', 'Sex': 'Male', 'Email': 'bthompson@example.com', 'Phone': '277.609.7938', 'Date of birth': '1992-07-02', 'Job Title': 'Homeopath'}
{'Index': '4', 'User Id': 'A31Bee3c201ef58', 'First Name': 'Yesenia', 'Last Name': 'Martinez', 'Sex': 'Male', 'Email': 'kaitlinkaiser@example.com', 'Phone': '584.094.6111', 'Date of birth': '2017-08-03', 'Job Title': 'Market researcher'}
``````

23 more records

## Example: from the people with Email '@example.com' Return how many are 'Male'

``````from  leopards import Q
rows = csv.DictReader(open('people-100.csv'), delimiter=",")
res = Q(rows,Email__icontains='example.com',Sex='Male')
print(len(list(res)))
``````

output:

``````10
``````

The above code is eqivalent to this

``````from  leopards import Q
rows = csv.DictReader(open('people-100.csv'), delimiter=",")
kwargs={"Email__icontains":'example.com' ,"Sex":'Male'}
res = Q(rows,**kwargs)
print(len(list(res)))
``````

### Example: from the people with email '@example.com' Return how many are 'Male' and not born in 1992.

``````from  leopards import Q
rows = csv.DictReader(open('people-100.csv'), delimiter=",")
kwargs={"Email__icontains":'example.com' ,"Sex":'Male', "Date of birth__nstartswith":"1992"}
res = Q(rows,**kwargs)
print(len(list(res)))
``````

output:

``````9
``````

Note: Index '3' is removed.

This can be written as follows using `NOT`:

``````from  leopards import Q
rows = csv.DictReader(open('people-100.csv'), delimiter=",")
kwargs={"Email__icontains":'example.com' ,"Sex":'Male', "NOT": {"Date of birth__startswith":"1992"}}
res = Q(rows,**kwargs)
print(len(list(res)))
``````

### Example: Find the Males who emails are '@example.com' OR '@example.net'

``````from  leopards import Q
rows = csv.DictReader(open('people-100.csv'), delimiter=",")
kwargs = kwargs = {"Sex": "Male", "OR":[{"Email__icontains":'example.com'},{"Email__icontains":"example.net"}]}
res = Q(rows,**kwargs)
print(list(res)[:3])
``````

output:

``````[{'Index': '1', 'User Id': '88F7B33d2bcf9f5', 'First Name': 'Shelby', 'Last Name': 'Terrell', 'Sex': 'Male', 'Email': 'elijah57@example.net', 'Phone': '001-084-906-7849x73518', 'Date of birth': '1945-10-26', 'Job Title': 'Games developer'},
{'Index': '3', 'User Id': 'DbeAb8CcdfeFC2c', 'First Name': 'Kristine', 'Last Name': 'Travis', 'Sex': 'Male', 'Email': 'bthompson@example.com', 'Phone': '277.609.7938', 'Date of birth': '1992-07-02', 'Job Title': 'Homeopath'},
{'Index': '4', 'User Id': 'A31Bee3c201ef58', 'First Name': 'Yesenia', 'Last Name': 'Martinez', 'Sex': 'Male', 'Email': 'kaitlinkaiser@example.com', 'Phone': '584.094.6111', 'Date of birth': '2017-08-03', 'Job Title': 'Market researcher'}]
``````

## Conculsion

This tutorial showed how easy it is to filter CSV and do some aggregations quickly using Leopards.

If you have any question, you can ask me.