DEV Community

Paige Niedringhaus
Paige Niedringhaus

Posted on • Originally published at paigeniedringhaus.com on

Filter, Merge, and Update Python Lists Based on Object Attributes

Handwritten list

Introduction

Last year, I wrote a web scraping program to collect data from one of the NFT collections on the NFTrade site. My friend wanted the following data included in a CSV: all the NFTs currently for sale in the collection, the total price of each NFT in US dollars based on the current market price of the BNB cryptocurrency that the NFT is for sale in, and the price in USD per rarity point (a value randomly assigned to each NFT in the collection).

The NFTrade website does not have a public API so instead of writing a Node.js script to fetch the data via HTTP calls, I built a small site scraping script to go to the website and actually "scrape" the data from it.

Having not written a web scraper before, I chose to write the program in Python, and as I built the scraper, the project requirements got a bit more complex, and I learned a bunch of useful techniques when coding in Python, which I'm sharing in a series of posts.

After choosing the Selenium Python package to use Selenium WebDriver to scrape the data from NFTrade and extract the details from each NFT that I wanted (the NFT's ID and price in BNB), I needed to update my new list of NFT data in several ways:

  • I needed to filter out any NFTs that weren't currently for sale (some that were scraped off the site weren't actually for sale),
  • I needed to match all the NFTs for sale with their "rarity scores" (as defined in a separate JSON list) and include those scores along with the rest of the NFT data,
  • I needed to compute the total cost and cost per rarity point for each NFT in USD based on the current market price of BNB and add those prices to each NFT in the list as well.

I know this sounds quite complicated, but I broke each of these requirements down into separate methods inside my Python script and learned a lot about working with lists in Python along the way.

Today, I'll show you how to filter lists by whether an attribute exists in an object, how to merge two lists of items together based on matching attributes, and even how to add new object properties to the objects within a list in Python.

NOTE: I am not normally a Python developer so my code examples may not be the most efficient or elegant Python code ever written, but they get the job done.


Sample Python Data

Before I dive into the specifics of my list manipulations in Python, let me give you a little background on what the data looks like that I was working with. Here's a small sample of what the list of NFT data looked like before I started mutating it.

Sample NFT data scraped from the NFTrade site

[
    {'id': 6774, 'nft_price': '0.22'},
    {'id': 5710, 'nft_price': '0.16'},
    {'id': 3187, 'nft_price': '0.8'},
    {'id': 6482, 'nft_price': '1.1'},
    {'id': 7689, 'nft_price': '0.5'},
    {'id': 335, 'nft_price': '4'},
    {'id': 7025, 'nft_price': '1.057'},
    {'id': 597, 'nft_price': '5'},
    {'id': 3936, 'nft_price': '3.1'},
    {'id': 2834, 'nft_price': '0.649'},
    {'id': 763, 'nft_price': '1.65'},
    {'id': 7683, 'nft_price': None},
    {'id': 7914, 'nft_price': None}
 ]
Enter fullscreen mode Exit fullscreen mode

As you can see from the output above, the original data I started with was pretty sparse: the ID number for each NFT and the price in BNB (if it existed) were the only pieces of data present in each object from the info scraped off the NFTrade site. I had my work cut out for me to clean this list up and add more useful data to it, so let's move on to how I did so in the next section.

NOTE: If you'd like to see more about how to scrape the browser data and gather just the necessary bits, read my first couple of blog posts here and here.

Filter objects in a list on whether an attribute exists or not

As I mentioned in the introduction, the first thing I needed to do was clean this list up by removing any NFTs that didn't have a price.

Due to how I had to lazily load and scrape the data from the NFTrade website initially, there was a good chance there were a handful of NFTs I gathered up that weren't for sale, and therefore didn't have prices, so I needed to weed them out first.

Technically every NFT in my list had an nft_price attribute, but if there was no price listed in the card's scraped data, the nft_price attribute was assigned None, which proved very useful.

Inside of the __main__ method in my Python script, I'd already scraped the data from the webpage with the get_cards() method, then looped through the NFT data to grab just the bits of relevant data with the get_nft_data() method. Now I wanted to filter down the cards to only include ones listed for sale.

Here's the __main__ method code first:

for_sale_scraper.py

if __name__ == ' __main__':
   scraper = ForSaleNFTScraper();
   cards = scraper.get_cards(max_card_count=200)
   card_data = [] 
   for card in cards:
    info = (scraper.get_nft_data(card))
    card_data.append(info)

   # filter out any extra cards that aren't for sale
   cards_for_sale = scraper.filter_priced_cards(card_data)
Enter fullscreen mode Exit fullscreen mode

And here's the method I came up with to filter down to just the NFTs for sale: filter_priced_cards().

def filter_priced_cards(self, card_list):
    """Filter card list to only cards with NFT cost."""

    # filter out any cards in the list that don't have an NFT price equal to None
    cards_for_sale = list(filter(lambda card: card['nft_price'] != None, card_list))
    return cards_for_sale
Enter fullscreen mode Exit fullscreen mode

Let's break down what's happening in the second line of the filter_priced_cards() function.

I used Python's built-in filter() function to iterate over the card_list passed to the function to create a new list named cards_for_sale. The anonymous lambda function inside of filter() takes each card in the card_list and returns True if the nft_price attribute of the card is not None, and False if it is - this is how it filters out all the cards that don't have a price.

The list() function that wraps the filter() converts the result back to a list, because filter() returns a filter object which is an iterator, not a list.

And finally, the new cards_for_sale list is returned.

[Merge two lists together by matching object keys

Once the NFTs not for sale have been filtered out, the next step is to add the rarity score to each NFT based on its ID.

For this particular set of NFTs, each NFT had a "rarity score" that had been randomly assigned to it. The rarity scores for each NFT were listed in a separate JSON file in the project and they look like this.

id_rs_score.json

[
  {"id": 1, "rs": 18},
  {"id": 2, "rs": 13},
  {"id": 3, "rs": 14},
  {"id": 4, "rs": 10},
  {"id": 5, "rs": 22},
  {"id": 6, "rs": 13},
  {"id": 7, "rs": 10},
  {"id": 8, "rs": 13},
  {"id": 9, "rs": 13},
  {"id": 10, "rs": 9},
  // more ids and rarity scores ("rs") below
]
Enter fullscreen mode Exit fullscreen mode

I needed to combine my list of cards_for_sale with the rarity scores in the JSON file by matching up the id attribute in each list of objects. For this task, I came up with the following function: get_cards_rarity_score().

def get_cards_rarity_score(self, card_list):
    """Combine rarity scores with card list by ID."""

    # get rs data for each card from json file 
    with open("id_rs_list.json") as file:
        id_rs_list = json.load(file)

    # merge together cards with id_rs_list by their matching ID numbers 
    match_cards_with_rs_list = groupby(sorted(card_list + id_rs_list, key=itemgetter("id")), itemgetter("id"))
    combined_cards = [dict(ChainMap(*g)) for k, g in match_cards_with_rs_list]

    # filter out all the items in the merged list without a "for sale" value 
    filtered_combined_cards = []
    for card in combined_cards:
        if 'nft_price' in card:
            filtered_combined_cards.append(card)

    return filtered_combined_cards
Enter fullscreen mode Exit fullscreen mode

To combine the rarity score with any of the NFT objects contained in the card_list list, the first thing that had to happen was to read the data from the id_rs_list.json file and assign it to a variable.

    # get rs data for each card from json file 
    with open("id_rs_list.json") as file:
        id_rs_list = json.load(file)
Enter fullscreen mode Exit fullscreen mode

Once the JSON list was extracted from the file, the card_list and id_rs_list needed to be merged together based on their matching IDs.

The groupby() function groups elements with the same ID, and then ChainMap() merged the grouped items into Python dictionaries (objects). The result was a list of dictionaries (combined_cards) where each dictionary represented a card with combined information from both lists.

    # merge together cards with id_rs_list by their matching ID numbers 
    match_cards_with_rs_list = groupby(sorted(card_list + id_rs_list, key=itemgetter("id")), itemgetter("id"))
    combined_cards = [dict(ChainMap(*g)) for k, g in match_cards_with_rs_list]
Enter fullscreen mode Exit fullscreen mode

One thing to note: the combined_cards list has every NFT listed from the id_rs_list, not just the ones whose IDs match the IDs in the card_list. So the combined_cards list looks like the data below - but for every item in id_rs_list.

[ 
 {'id': 1, 'rs': 4},
 {'id': 2, 'nft_price': '3', 'rs': 6},
 {'id': 3, 'rs': 22},
 {'id': 4, 'rs': 4},
 {'id': 5, 'rs': 10},
 {'id': 6, 'nft_price': '5', 'rs': 1},
 {'id': 7, 'rs': 1},
 {'id': 8, 'nft_price': '0.1', 'rs': 14},
 {'id': 9, 'nft_price': '1.5', 'rs': 5},
 {'id': 10, 'rs': 1},
 # more IDs and NFT data 
]
Enter fullscreen mode Exit fullscreen mode

Since the combined_cards list had every single NFT in it (not just ones for sale), once more I had to filter the list down so that every item without an "nft_price" was omitted.

    # filter out all the items in the merged list without a "for sale" value 
    filtered_combined_cards = []
    for card in combined_cards:
        if 'nft_price' in card:
            filtered_combined_cards.append(card)
Enter fullscreen mode Exit fullscreen mode

In this case, since there's a (very likely) chance the NFT data in the combined_cards list did not have the "nft_price" attribute, I checked if each card had the key "nft_price" and if so, the card was added to the new filtered_combined_cards list.

The filtered_combined_cards list ended up looking like the code snippet below.

[
 {'id': 4, 'nft_price': '0.8', 'rs': 10},
 {'id': 42, 'nft_price': '1.1', 'rs': 5},
 {'id': 174, 'nft_price': '1.4', 'rs': 5},
 {'id': 184, 'nft_price': '1.6' 'rs': 19},
 {'id': 256, 'nft_price': '2', 'rs': 15},
 {'id': 335, 'nft_price': '4', 'rs': 2},
 {'id': 562, 'nft_price': '1.2', 'rs': 2},
 {'id': 584, 'nft_price': '5', 'rs': 14},
 {'id': 597, 'nft_price': '5', 'rs': 17},
 # more NFT data here
]
Enter fullscreen mode Exit fullscreen mode

Once all this data manipulation and list combining was done, the function returned the final list of cards (filtered_combined_cards) that had both rarity score information and an "nft_price" attribute included.

return filtered_combined_cards
Enter fullscreen mode Exit fullscreen mode

For reference, here's the __main__ function in the Python script, which called the get_cards_rarity_score().

for_sale_scraper.py

if __name__ == ' __main__':
   scraper = ForSaleNFTScraper();
   cards = scraper.get_cards(max_card_count=200)
   card_data = [] 
   for card in cards:
    info = (scraper.get_nft_data(card))
    card_data.append(info)

   # filter out any extra cards that aren't for sale
   cards_for_sale = scraper.filter_priced_cards(card_data)

   # filter out any extra cards that aren't for sale
   cards_for_sale = scraper.filter_priced_cards(card_data)
Enter fullscreen mode Exit fullscreen mode

Add new object properties to each object in a list

All right, here's the last Python list manipulation tip I'll be sharing in this post: how to add new properties to each object in a list.

After filtering the NFTs to just the ones for sale, and adding the rarity scores from the id_rs_list JSON file, I needed to fetch the current price of 1 BNB compared to US dollars, calculate the current price of each NFT in USD, and calculate the cost per rarity point for each NFT.

Fortunately the cryptocurrency data aggregation site CoinGecko, has a REST API that I could use to get the current market price of BNB cryptocurrency in US dollars, and then calculate the rest of the required data based on the info in my NFT card list.

Here is the add_pricing_to_cards() function I came up with to calculate the prices.

def add_pricing_to_cards(self, card_list):
    """Get current price of BNB and compute cost per rarity point"""

    URL="https://api.coingecko.com/api/v3/simple/price?ids=binancecoin&vs_currencies=USD"
    response = requests.get(URL).json()
    bnb = response['binancecoin']['usd']

    # add the current value of bnb to the card_list 
    cards_bnb_price = [dict(card, bnb=bnb) for card in card_list]

    # compute the current price of usd for each card based on its bnb price       
    cards_with_usd_price= [dict(card, price_usd=round(float(card['nft_price'])*card['bnb'], 2)) for card in cards_bnb_price]

    # compute the current cost usd of each rarity score point         
    cards_with_rs_prices = [dict(card, cost_per_rs=round(card['price_usd']/card['rs'], 2)) for card in cards_with_usd_price]

    return cards_with_rs_prices  
Enter fullscreen mode Exit fullscreen mode

In the function, the first thing I did was call the CoinGecko price API to get the current price of BNB in USD.

    URL="https://api.coingecko.com/api/v3/simple/price?ids=binancecoin&vs_currencies=USD"
    response = requests.get(URL).json()
    bnb = response['binancecoin']['usd']
Enter fullscreen mode Exit fullscreen mode

Next, I added the bnb to each object in the input card_list and created a new list named cards_bnb_price.

    # add the current value of bnb to the card_list 
    cards_bnb_price = [dict(card, bnb=bnb) for card in card_list]
Enter fullscreen mode Exit fullscreen mode

After including the current BNB price in USD, I was able to compute the total price in USD for each NFT in the list by multiplying the card's original price in BNB by the current price of BNB in USD.

    # compute the current price of usd for each card based on its bnb price       
    cards_with_usd_price= [dict(card, price_usd=round(float(card['nft_price'])*card['bnb'], 2)) for card in cards_bnb_price]
Enter fullscreen mode Exit fullscreen mode

And I also calculated the price in USD per rarity score point as well, simply by dividing the card's total price in USD by the rarity score number (rs).

    # compute the current cost usd of each rarity score point         
    cards_with_rs_prices = [dict(card, cost_per_rs=round(card['price_usd']/card['rs'], 2)) for card in cards_with_usd_price]
Enter fullscreen mode Exit fullscreen mode

The function then returned the list of cards with the added pricing info, including BNB price, USD price, and USD cost per rarity score point. The final list data looked like this.

[
    {'bnb': 352.44, 'cost_per_rs': 28.2, 'id': 4, 'nft_price': '0.8', 'price_usd': 281.95, 'rs': 10},
    {'bnb': 352.44, 'cost_per_rs': 77.54, 'id': 42, 'nft_price': '1.1', 'price_usd': 387.68, 'rs': 5},
    {'bnb': 352.44, 'cost_per_rs': 98.68, 'id': 174, 'nft_price': '1.4', 'price_usd': 493.42, 'rs': 5}, 
    {'bnb': 352.44, 'cost_per_rs': 29.68, 'id': 184, 'nft_price': '1.6', 'price_usd': 563.9, 'rs': 19},
    {'bnb': 352.44, 'cost_per_rs': 46.99, 'id': 256, 'nft_price': '2', 'price_usd': 704.88, 'rs': 15},
    # more NFT data
]
Enter fullscreen mode Exit fullscreen mode

The add_pricing_to_cards() function is called from the main Python function like so:

for_sale_scraper.py

if __name__ == ' __main__':
   scraper = ForSaleNFTScraper();
   cards = scraper.get_cards(max_card_count=200)
   card_data = [] 
   for card in cards:
    info = (scraper.get_nft_data(card))
    card_data.append(info)

   # filter out any extra cards that aren't for sale
   cards_for_sale = scraper.filter_priced_cards(card_data)

   # filter out any extra cards that aren't for sale
   cards_for_sale = scraper.filter_priced_cards(card_data)

   # add rarity scores to all cards in the list by merging them with the id_rs_list
   cards_with_rs = scraper.get_cards_rarity_score(cards_for_sale)
Enter fullscreen mode Exit fullscreen mode

And now that I had all the data that my friend requested for each NFT in the collection for sale on NFTrade, all that was left to do was turn the whole list into a downloadable CSV that would be easy to sort and manipulate. I'll save that for a future post.


Conclusion

When I had to use Python to build a website scraper to get NFT data off of the NFTrade site, I learned a lot of useful new coding tricks along the way.

After I'd managed to scrape the data with the help of Selenium Python, and extract the initial data I needed from each NFT by using WebDriver's XPath, my job was far from complete.

I needed to take the little data I had and combine those NFTs with "rarity scores" in a JSON file, fetch the current market price for BNB cryptocurrency in US dollars, and then compute the total cost of each NFT and cost per rarity point, and as I completed these tasks I learned a heck of a lot about how to work with lists of complex objects in various new ways. And I feel confident these new techniques will help me out in any future Python endeavors I might undertake.

Check back in a few weeks — I’ll be writing more blogs about the problems I had to solve while building this Python website scraper in addition to other topics on JavaScript or something else related to web development.

If you’d like to make sure you never miss an article I write, sign up for my newsletter here: https://paigeniedringhaus.substack.com

Thanks for reading. I hope learning to filter, merge, and alter objects within lists in Python proves helpful for you in your own projects.


Further References & Resources

Top comments (0)