Background
I recently wrote a Python Script that extracted data from US Department of Agriculture website, formatted into the desired format and saved it into a CSV file. Since I did some research online while working on the script, I thought it would only be fair that I give back to the community by writing down the steps I followed plus some example code in the hope that someone out there will benefit.
Problem statement
The FoodData Central database has amino acid levels for many foods. Extract data from the SR 'Legacy Foods' category and list the food and amino acid level in CSV format with
rows for each food and fields are the various amino acids (18 amino acids)
Solution
FoodData Central API provides access to the FoodData Central (FDC), to gain access you have to register for an API Key through this link https://fdc.nal.usda.gov/api-key-signup.html.
For the purpose of this article, I created the Get query on the API web interface, specified all the parameters then copied the URL with parameters and API Key into my script, normally you would specify the parameters in your code. Below is the full 'GET' query URL
endpoint = 'https://api.nal.usda.gov/fdc/v1/foods/list?dataType=SR%20Legacy&pageSize=20&pageNumber=3&sortBy=fdcId&sortOrder=asc&api_key=XXXXXXXXX'
Next step is I want to query the API endpoint using the python 'requests'HTTP Library
response = requests.get(endpoint).json()
The response variable above contains the data from the FoodData Central Database, it's time to format the data in the desired format. For each food item, I want to have name of the food item (Description) and the quantity of each amino acid that particular food item contains. Therefore my CSV file will have a total of 19 fields. Next Step is to specify the keys/headers list and an empty dictionary that I will populate with data
# keys global variable will be used to generate keys for the response_dict dictionary and for fieldnames for the csv file
keys = ['Description','Alanine', 'Arginine', 'Aspartic','Cysteine','Glutamic','Glycine','Histidine','Isoleucine','Leucine','Lysine','Methionine','Phenylalanine','Proline','Serine', 'Threonine','Tryptophan','Tyrosine','Valine']
The next block of code will loop through the response data, match the response data to the respective dictionary key then append that data to the Food_list variable declared above
Below is the complete function
def get_data():
# The API endpoint to query data from preformated with the desired parameters
endpoint = 'https://api.nal.usda.gov/fdc/v1/foods/list?dataType=SR%20Legacy&pageSize=20&pageNumber=3&sortBy=fdcId&sortOrder=asc&api_key=3VzJQvEyn9UBBwFgK3pdli3x82rCnpy9sVJFNt3Z'
# The python HTTP requests library queries the endpoint and returns the data in JSON format
response = requests.get(endpoint).json()
food_list = []
for data in response:
#empty dictionary initialized with keys from the 'keys' list variable declared globally
response_dict = {key: 0 for key in keys}
response_dict['Description'] = data['description']
for nutrient in data['foodNutrients']:
if nutrient['name'] in keys:
response_dict[nutrient["name"]] = nutrient['amount']
food_list.append(response_dict)
return food_list
Finally let's write our food_list data into a CSV file
filename = 'FoodDatacsv.csv'
with open(filename, 'w', newline='') as myfile:
writer = csv.DictWriter(myfile, fieldnames=keys)
writer.writeheader()
writer.writerows(get_data())
Below is a sample of the extracted data.
And that is it for today.
P.S.
If you would like some data extraction done, click Here
Top comments (2)
Great!!
Thank you Pato...