Introduction
This article focuses on building an equal-weight portfolio allocation strategy with Python. If you had $10,000 that you’d like to invest in the fifty top-performing companies in the S&P 500 index, how would you allocate capital across these stocks? In this article, you will learn how to extract value from the top S&P 500 companies by tabulating the Ticker, current trading price, 1-year % return, and calculating the number of shares to buy on the top 50 performing stocks.
Creating a new Jupyter Notebook.
Google Colab is a cloud-based Jupyter notebook that allows you to write and execute python code on the web.
Importing relevant Tickers and libraries.
Follow this link to download the S&P500 ticker symbols. This will make it easy for you to extract the data associated with each ticker.
Installing Yahoo Finance and importing libraries.
Use the following codes to import Yahoo-Finance data.
!pip install "yfinance".
When you are done, import the following libraries.
import pandas as pd
import numpy as np
import os
from datetime import datetime, timedelta
Pandas- present your data as dataframes and series, allowing you to clean, manipulate, and analyze data with in-built functionalities. Numpy is a library used for working with arrays and general mathematical functions. Importing os helps you manipulate file paths that will used further in the project. Finally, the Datetime class helps you work with dates and times and helps us manipulate dates and times in general. Timedelta, as the name implies, is used to find a duration within a time period, beginning and end.
Extracting the 1 year, 6 month, 3 month, and Monthly return on each stock in the s&p500 index.
These periods will help you extract the returns of each stock within each time frame. Comments are added to each chunk of code to explain what is happening.
def get_first_last_trading_days(stocks_file, years):
# Initialize an empty dictionary to store data
data = {}
# Read the stock tickers from the CSV file.
stocks = pd.read_csv(stocks_file)['Ticker'].tolist()
if not os.path.exists('stockss_dfs'):
os.makedirs('stockss_dfs')
def rating(df, startdate, enddate, freq):
# Offset is defined based on the time frequency
# Define offset based on time frequency
if freq == 'Y':
offset = '366 days'
elif freq == 'M':
offset = '31 days'
elif freq == '3M':
offset = '93 days'
elif freq == '6M':
offset = '183 days'
else:
raise ValueError("Frequency not supported. Use 'Y', 'M', '3M', or '6M'.")
# Filter the dataframe and calculate the % change ratio, that ranks returns
dff = df.loc[(df.index >= pd.Timestamp(startdate) - pd.Timedelta(offset)) & (df.index <= pd.Timestamp(enddate))]
dfy = dff.groupby(pd.Grouper(level='Date', freq=freq)).tail(1)
ratio = (dfy['Close'] / dfy['Close'].shift() - 1) * 100
return ratio
# For sake of scalability, we avoid hardcoding years and try to insert the specified year as a parameter.
for year in years:
# start and end dates for the year
start_date = f"{year}-01-01"
end_date = f"{year}-12-31"
# Loop through each stock ticker
for stock in stocks:
# Download the data for each s&p stock and create a file for each stock if it's not already available.
file_path = f'stockss_dfs/{stock}_{year}.csv'
if not os.path.exists(file_path):
try:
df = yf.download(stock, start=start_date, end=end_date)
df.index = pd.to_datetime(df.index)
df.index = df.index.tz_localize(None)
if not df.empty:
period_rating = rating(df, start_date, end_date, freq='Y')
period_rating_monthly = rating(df, start_date, end_date, freq='M')
period_rating_3months = rating(df, start_date, end_date, freq='3M')
period_rating_6months = rating(df, start_date, end_date, freq='6M')
# Store the ratings in the data dictionary
data[stock] = {
'Yearly': period_rating,
'Monthly': period_rating_monthly,
'3 Months': period_rating_3months,
'6 Months': period_rating_6months
}
# Save the results to CSV
df_results = pd.DataFrame({
'Yearly': period_rating,
'Monthly': period_rating_monthly,
'3 Months': period_rating_3months,
'6 Months': period_rating_6months
})
df_results.to_csv(file_path, index=True)
except Exception as e:
print(f"Error processing {stock} for year {year}: {e}")
continue
return data
# Get the current year and the previous year
current_year = datetime.now().year
years = [current_year - i for i in range(1, 2)]
# Retrieve the data
stocks_file = '/content/sp_500_stocks.csv'
data = get_first_last_trading_days(stocks_file, years)
When you run the code, you will get the following:
In the image above, you will observe that the 1-year column is empty. This is because we need the difference at the end of two years to get the return on a year. Alternatively, we can creatively add up the values of individual months for a 12 month cycle, and we trust that will give us the result of the 1 year result. The following code does just that for us.
def extract_sum_of_1_year_return(directory):
all_instruments = []
# List all the files in the directory
files = os.listdir(directory)
# Iterate over the files
for file in files:
file_path = os.path.join(directory, file)
if os.path.isfile(file_path):
df = pd.read_csv(file_path)
df_sum = df['Monthly'].sum()
# split the file along the slash
file_full_path = file_path.split('/')
real_file_path = file_full_path[3].split('_')
ticker_name = real_file_path[0]
all_instruments.append({
"ticker": ticker_name,
"yearly_sum": df_sum
})
return all_instruments
# Directory containing the CSV files
directory = "/content/stockss_dfs"
# Call the function and print the result
results = extract_sum_of_1_year_return(directory)
# Read the stock tickers from the CSV file
# stocks = pd.read_csv('/content/sp_500_stocks.csv')['Ticker'].tolist()
def get_stocks(results):
# Initialize a list to hold the stock data that was successfully processed
successful_stocks = []
# Loop through the first 10 stocks
for stock in results:
try:
api_url = yf.Ticker(stock['ticker'])
stock_instrument = api_url.info
current_price = stock_instrument.get('currentPrice', None)
# Only add to successful_stocks if both values are not None
if current_price is not None:
successful_stocks.append({
'ticker': stock['ticker'],
'current_price': current_price,
'yearly_sum': stock['yearly_sum']
})
except Exception as e:
continue
return successful_stocks
final_stocks = get_stocks(results)
Output :
Selecting the top 50 performing stocks.
You will calculate the number of shares per stock you can buy with a certain amount in capital. First you have to select the first 50 stocks with the highest return within a one-year time frame.
# Ensure the '1-year-return' column is numeric
final_stocks_df['yearly_sum'] = pd.to_numeric(final_stocks_df['yearly_sum'], errors='coerce')
# Drop rows with NaN values in 'yearly_sum'
final_stocks_df.dropna(subset=['yearly_sum'], inplace=True)
# Sort the dataframe by 'yearly_sum' in descending order
final_stocks_df.sort_values('yearly_sum', ascending=False, inplace=True)
# Select the top 50 rows
final_stocks_df = final_stocks_df[:50]
# Drop the 'level_0' column
final_stocks_df.drop(columns=['level_0'], inplace=True)
# Display the dataframe
final_stocks_df
Output :
Calculating portfolio amount
Here, you choose an initial starting balance for your portfolio, this amount will be split in equal weights across all the stocks.
def portfolio_input():
global portfolio_size
portfolio_size = input('Enter the value of your portfolio ')
try:
float (portfolio_size)
except ValueError:
print("That's not a number! \nPlease try again:")
portfolio_size = input('Enter the value of yout portfolio: ')
val = float(portfolio_size)
portfolio_input()
print (portfolio_size)
Output:
Calculating the number of shares to buy
Divide the portfolio size by the total number of stocks in the s&p500 index to get average amount of investable capital, then calculate the number of shares to buy by dividing the value you got by the current price the stock is trading at.
# Find the mean of the portfolio size.
position_size = float(portfolio_size) / len(final_stocks_df.index)
# Insert the result of 'Enterprise value' / 'Stock Price' into the column of 'Number of Shares to Buy'.
final_stocks_df['Number of Shares to Buy'] = np.floor(position_size / final_stocks_df['current_price']).astype(int)
final_stocks_df
Output:
Conclusion.
In this article, you learned how to allocate capital among the top 50 performing stocks in the S&P 500. You cleaned the data to drop NAN(not a number) data that would have messed with results. This article was inspired by freecode camp’s tutorial (https://www.youtube.com/watch?v=xfzGZB4HhEE), but since much original thought went into writing the code, I decided to write and publish. I hope you learned a thing or two, see you next time.
Top comments (0)