DEV Community

Cover image for Stock Price Prediction using Supervised Learning
Ruthvik Raja M.V
Ruthvik Raja M.V

Posted on

Stock Price Prediction using Supervised Learning


The impact of numerous factors on stock prices makes stock prediction a complex and time-consuming endeavour. Predicting the price of a stock is computationally hard because of its non-stationary nature and also it depends on many factors like News Headlines, Tweets, Historical Trends, Social Media News etc. In this paper, Machine Learning Algorithms and Neural Networks are implemented on various Companies like Apple, Amazon, Pfizer, Walmart Stores etc to overcome the difficulties and to achieve better accuracy in predicting the price of a stock. Artificial Intelligence algorithms like Random Forest, XG Boost (Extreme Gradient Boosting), LSTM (Long Short Term Memory), GRU (Gated Recurrent Units) etc are developed and their RMSE (Root Mean Square Error) are compared in predicting the price of a stock. The Dataset is an open- source Time Series dataset and consists of stock prices for 88 different companies that fall under 9 different sectors for around 5 years.

About the Dataset:-

The dataset is a Time Series data and consists of stock prices of 88 different companies like Apple, Amazon, Chevron Corporation, Sanofi, Duke Energy Corporation, Visa, Alphabet etc. These companies fall under 9 different categories namely Basic Materials, Consumer Goods, Healthcare, Services, Utilities, Conglomerates, Financial, Industrial Goods and Technology. In total there are 88 files in the dataset and each file consists of features like Date, Open price of a stock when it was opened on a particular day, High and Low price of a stock within a period, Volume of the stocks and the Adjusted closing price of an individual company. The output (or) the predicted variable is the Closing price of a stock for a particular day.


To train the Machine Learning algorithm like Random Forest Regressor, initially, all the CSV files are loaded and converted into Data frames, then Scaling is done on all the Data frames such that each feature is translated to a given range. MinMax Scaler can be implemented for scaling to normalise all the features because each feature in the dataset is of a different scale and it is very important to scale each feature before it is sent to the model for training. Also, the dataset consists of a date feature but this is not understood by the algorithm so, the datasets have to be pre-processed by splitting the date column into three different columns (Year, Month and Day). At last, the dataset is split for training and testing data.


Machine Learning Algorithm like Random Forest Regressor has outperformed for 79 companies among 88 companies and the RMSE values ranges from 0 to 1 for those 79 companies. The RMSE values for the test data on 79 best performing companies are as follows:-


Nearly, for 8 companies the Random Forest Regressor didn’t perform as expected and the RMSE values for those 8 companies range from 0 to 10 whereas for 1 company the algorithm has performed poorly and the RMSE value ranges from 750 to 830. The better and worst-performing companies are shown as follows:-

Better RMSE

Worst RMSE

Overall the RMSE values for all the companies are shown as follows:-


From the above Figure it is clear that the Machine Learning has not performed well on one company that is BRK-A so, Deep Learning was implemented on Worst performing Companies to achieve better RMSE value and the results are as follows:-


The Python code, Datasets and all other files can be found from the following Link:-
Enter fullscreen mode Exit fullscreen mode


In this article, Artificial intelligence is used to make predictions about stock market prices. A stock market is a place where buying and selling of shares happen for companies. The data that is used in this work is a time Series dataset and consists of stock prices of 88 different companies as described in Section About the Dataset. However, while the time component adds additional information, it also makes time series problems more difficult to handle compared to many other prediction tasks. In this study, we proposed two methods, Machine Learning-based, and Deep Learning-based. As the proposed methods show in this study different AI-based algorithms together with ensemble learning are used to make the predictions and make a comparison between the results of different methodologies. It has been shown that GRU performs better than Deep Learning based methods in terms of both accuracy and processing time. Also, Machine Learning-based methods perform pretty well for most of the companies but it fails when it comes to large stock price values so, Deep Learning methods were implemented on the stocks with high price values and the results were far better using Deep Learning Methods. In future, different AI algorithms can be implemented to further decrease the error value.

Discussion (0)