DEV Community

Cover image for Stock prediction algorithm in Python
Davide Santangelo
Davide Santangelo

Posted on

Stock prediction algorithm in Python

To create a stock prediction algorithm in Python, you will need to follow these steps:

  1. Collect historical data for the stock you want to predict. You can use a financial API or web scraping to get this data. Make sure to get data for multiple years, as it will be used to train the prediction model.

  2. Preprocess the data by cleaning and organizing it. This may include removing missing values, handling outliers, and converting the data into a format that is suitable for modeling.

  3. Split the data into training and testing sets. The training set will be used to train the prediction model, while the testing set will be used to evaluate the model's performance.

  4. Choose a prediction model and train it on the training data. There are many different models you can use for stock prediction, such as linear regression, decision trees, and support vector machines.

  5. Test the model on the testing data and evaluate its performance. You can use metrics such as mean absolute error (MAE) and root mean squared error (RMSE) to measure the model's accuracy.

  6. Fine-tune the model by adjusting its hyperparameters and/or using different model architectures.

  7. Use the trained model to make predictions on unseen data, such as future stock prices.

It's worth noting that stock prediction is a challenging task, and it's difficult to achieve high accuracy. There are many factors that can influence stock prices, and it's hard to account for all of them in a predictive model. As such, it's important to be cautious when interpreting the results of your predictions.

import pandas as pd
import numpy as np

# Load the data
df = pd.read_csv('stock_data.csv')

# Preprocess the data
df.dropna(inplace=True)  # Remove rows with missing values
df = df[df['Close'] > 0]  # Remove rows with invalid close price

# Split the data into training and testing sets
train_data = df[df['Date'] < '2020-01-01']
test_data = df[df['Date'] >= '2020-01-01']

# Choose a prediction model
model = LinearRegression()

# Train the model on the training data
X_train = train_data[['Open', 'High', 'Low', 'Volume']]
y_train = train_data['Close']
model.fit(X_train, y_train)

# Test the model on the testing data
X_test = test_data[['Open', 'High', 'Low', 'Volume']]
y_test = test_data['Close']
predictions = model.predict(X_test)

# Evaluate the model's performance
mae = mean_absolute_error(y_test, predictions)
rmse = np.sqrt(mean_squared_error(y_test, predictions))
print(f'MAE: {mae:.2f}')
print(f'RMSE: {rmse:.2f}')

# Fine-tune the model (optional)
# ...

# Make predictions on unseen data
# ...
Enter fullscreen mode Exit fullscreen mode

This code assumes that you have a stock_data.csv file that contains the historical stock data, and that the data includes columns for the open price, high price, low price, volume, and close price. The code preprocesses the data by removing missing values and rows with invalid close prices, and then splits the data into training and testing sets. The code then trains a linear regression model on the training data, tests the model on the testing data, and evaluates the model's performance using the mean absolute error (MAE) and root mean squared error (RMSE). Finally, the code shows how you could fine-tune the model (optional) and make predictions on unseen data.

Here is an example of what the stock_data.csv file could look like:

Date,Open,High,Low,Close,Volume
2020-01-02,148.25,150.62,146.87,150.06,20768456
2020-01-03,150.01,151.44,149.56,150.47,19819854
2020-01-06,150.72,152.43,149.57,151.5,23793456
2020-01-07,151.5,152.44,150.49,151.74,26989857
2020-01-08,151.7,152.92,150.9,152.09,22369456
2020-01-09,152.31,153.72,152.01,153.61,23445678
...
Enter fullscreen mode Exit fullscreen mode

This file contains daily stock data for a single company, with one row per day. The columns are:

Date: the date of the stock data
Open: the open price of the stock on that day
High: the highest price of the stock on that day
Low: the lowest price of the stock on that day
Close: the close price of the stock on that day
Volume: the number of shares traded on that day

You may also have additional columns, depending on the data you have available and the needs of your prediction model.

# Test the model on a single example
example_input = np.array([150.01, 151.44, 149.56, 19819854]).reshape(1, -1)
prediction = model.predict(example_input)[0]
print(f'Prediction for input {example_input}: {prediction:.2f}')

# Test the model on multiple examples
test_inputs = np.array([
    [148.25, 150.62, 146.87, 20768456],
    [152.31, 153.72, 152.01, 23445678],
    [149.06, 149.40, 148.46, 15423456]
])
predictions = model.predict(test_inputs)
print(f'Predictions for inputs {test_inputs}: {predictions}')

# Test the model on the entire testing set
predictions = model.predict(X_test)
print(f'Predictions for entire test set: {predictions}')
Enter fullscreen mode Exit fullscreen mode

This code shows how you can test your model on a single example, multiple examples, and the entire testing set. For each test, the code prints the predictions made by the model. You can then compare the predictions to the actual stock prices to see how well the model is performing.

Top comments (0)