Link to Code

bhatshravan / AeroStocks

AeroStocks

A stock market prediction platform for parsing and predicting stock market index prices based on news articles and machine learning.

Folder Structure Overview

documentation - Some of our presentation and various notes taken.
python - All the code.
website - Contains the code for our website

Requirements.txt will be uploaded soon

Running the project

You need to download all news articles by manually specifying parameters in python/download/news/threads.py. It does multithreading and downloads parallelly saving time.

You have to manually uncomment each line and check dates. This will download news to python/data/news/[newspaper]/lists/[various-file]
Then run merge.py and specify parameters to merge all the news files.

Run the NLP classifier.

Run python/nlp/classify.py and give input file as in_csv variable. Specify output file location in output_file in makeKeyWordList function and ensure write_to_file variable is set to 1.
This outputs multiple files in output location which needs to be merged again with merge.py

Do…

View on GitHub

How I built it

Built using python, flask, beautifulsoup4, tensor flow, sklearn, VADER nlp library.

Our first step was to form a dataset of the stock market on which we had to predict the stock market.

We decided to build it on NIFTY 50 which is the indian stock market index consisting of 50 stocks and was a large enough database.

We downloaded the stock market data from alphavantage and yahoo finance of the large 8 years to use in our project.

Then we wrote html parsers to data mine 20 lakh news articles over the course of 8 years from 4 dataset news using beautifulsoup4 and python. News data

This was then given as an input to an NLP library called VADER in python for giving us a sentiment score of positivity or negativity of the news. Since there were some false positives we had to include additional keywords and thus we got a score from 1.0 to -1.0 indicating sentiment.

For each day, we took the news article, extracted any companies if it was present in the article which is part of NIFTY 50 and also took the mean of the news article sentiment of the day which belong to the same sector since stocks in same sector are affected by news as well. We extracted a dataset of 78000 points of data.

Thus we had a csv file which had
date, stock, vader score, sector mean score, stock market price which was used as input to the Machine learning.