Automated news-following trading strategy using sentiment analysis

#nlp #python #trading #programming

In this project we’ll build a sentiment analysis strategy that autonomously trades based on news headlines. We show you how to:

scrape headlines from a financial website using the Python requests library and BeautifulSoup,
determine sentiment using VADER,
arrive at trade decisions and
place trades using the lemon.markets trading API. 🍋

We’re excited to show you how lemon.markets is the perfect tool for a project like this.

If you want to get started developing straight away, you can check out our GitHub repository for this project here. Otherwise, keep reading to learn more about the strategy.

Collecting your data 📊

For this project, our goal is to place trade automatically based on the news. The first step is to decide how we want to gather our data, and especially from which source. We went for MarketWatch because the data is presented in an easily digestible format — for each headline, we are given its date and the ticker(s) corresponding to the headline, see the example below.

To collect these headlines, we use a simple GET request against the desired URL. Using the requests package, this looks as follows:

import requests
page = requests.get("https://marketwatch.com/investing/technology")

And to parse this data, we use BeautifulSoup, which is a Python package that can extract data from HTML documents.

import pandas as pd
from bs4 import BeautifulSoup

soup = BeautifulSoup(page.content, "html.parser")
article_contents = soup.find_all("div", class_="article__content")

for article in article_contents:
    headline = article.find("a", class_="link").text.strip()
    ticker = article.find("span", class_="ticker__symbol")
    headline_ticker = [headline, ticker]
    headlines.append(headline_ticker)

columns = ["headline", "US_ticker"]
headlines_df = pd.DataFrame(headlines, columns=columns)

Keep in mind, this code won’t work for just any website, see this article for help with web scraping. You’ll notice that we access the article content in something called “div” and “article__content”. You’ll need to adjust this on a website-by-website basis, and this requires some inspection of the page you are on. In Chrome, you can do this by right-clicking on any website and selecting ‘Inspect’ (if you use another browser, use these steps instead). You’ll be met with a jumble of HTML. The easiest way to figure out where the headlines are ‘hiding’, is to Ctrl-F (or Command-F on iOS) a particular headline. You can also click the ‘Select an element in the page to inspect it’ button on the top-left of the Developer console to pinpoint where to find your desired data.

Once you’ve found the tags corresponding to the right element(s), you can paste the names into the code snippet above to retrieve its contents. We suggest frequently printing your output to determine whether you are collecting the desired information and whether it needs to be pre-processed. For example, in line 8, we remove the whitespace surrounding the headline to clean up our data. When you’re happy with your output, you can collect all relevant information in a Pandas DataFrame.

Pre-processing your data 👩‍🏭

At this stage, it’s likely your data needs some (additional) pre-processing before it’s ready for sentiment analysis and trading.

In the GitHub repository, you’ll notice that we removed any headlines without tickers and headlines with tickers that we know are not tradable on lemon.markets (to make the dataset smaller). Additionally, to trade on lemon.markets, we need to obtain the instrument’s ISIN. Because we trade on a German exchange, querying for a US ticker will not (always) result in the correct instrument. Therefore, to ensure that there are no compatibility issues, we suggest mapping a ticker to its ISIN before trading. We’ve published an article that’ll help you do just that.

Performing the sentiment analysis 😃😢

Once we’ve collected our headlines and tickers (or ISINs), we need to be able to decide whether the headlines report positive or negative news. This is where our sentiment analysis tool, VADER, comes in. It’s a model for lexical scoring based on polarity (positive/negative) and intensity of emotion. The compound score indicates whether a text is positive (>0), neutral (0), or negative (<0). In the above headlines, it can determine that ‘“Squid Game” is worth nearly $900 million to Netflix’ has a somewhat positive sentiment as the word ‘worth’ is likely part of the positive sentiment lexicon. If you’d like to read more about how VADER works, check out this article. There’s also alternatives out there, like TextBlob or Flair. You might want to try out all three to determine which one works best on your dataset.

For our use-case (determining sentiment scores of online newspaper headlines), the implementation is really simple:

from nltk.sentiment.vader import SentimentIntensityAnalyzer

vader = SentimentIntensityAnalyzer()

scores = []

for headline in headlines_df.loc[:,"headline"]:
    score = vader.polarity_scores(headline).get("compound")
    scores.append(score)

headlines_df.loc[:,"score"] = scores

If we have more than one headline (and scores) for a particular ticker, we have to aggregate them into a single score:

headlines_df = headlines_df.groupby("ticker").mean()
headlines_df.reset_index(level=0, inplace=True)

We’ve chosen to combine scores by taking the simple average, but there are several measures that you might opt to use. For example, a time-weighted average to penalise older deadlines as they probably are less representative of current (or future) market movements.

Placing your trades 📈

Once you’ve obtained the compound scores for the tickers, it’s time to place trades. However, you first need to decide on a trading strategy — what kind of score justifies a buy order? What about a sell order? And how much are you trading? There are several components to keep in mind here, such as your total balance, your current portfolio, the ‘trust’ you have in your strategy and others.

Our base project works with a very simple trade rule: buy any instrument with a score above 0.5 and sell any instrument with a score below -0.5 (see if you can come up with something a bit more complex 😉):

buy = []
sell = []

for index, row in headlines_df.iterrows():
    if row['score'] > 0.5 and row['isin'] != 'No ISIN found':
        buy.append(row['isin'])
    if row['score'] < -0.5 and row['isin'] != 'No ISIN found':
        sell.append(row['isin'])

We can then feed this list of ISINs to the lemon.markets API (if you’re not signed up yet, do that here) to place and activate our trades:

orders = []

# place buy orders
for isin in buy:
    side = 'buy'
    order = requests.post(
        f"https://paper-trading.lemon.markets/v1/orders/",
        data={"isin": isin,
              "expires_at": "p0d",
              "side": side,
              "quantity": 1,
              "venue": "XMUN", 
              "space_id": YOUR-SPACE-ID},
        headers={"Authorization": f"Bearer {<YOUR-API-KEY>}"}).json()
    orders.append(order)

# place sell orders
for isin in sell:
    side = 'sell'
    order = requests.post(
        f"https://paper-trading.lemon.markets/v1/orders/",
        data={"isin": isin,
              "expires_at": "p0d",
              "side": side,
              "quantity": 1,
              "venue": "XMUN", 
              "space_id": YOUR-SPACE-ID},
        headers={"Authorization": f"Bearer {<YOUR-API-KEY>}"}).json()
    orders.append(order)

# activate orders
for order in orders:
    order_id = order['results'].get('id')

    requests.post(
        f"https://paper-trading.lemon.markets/v1/orders/{order_id}/activate/", 
        headers={"Authorization": f"Bearer {<YOUR-API-KEY>}"})
    print(f'Activated {order["results"].get("isin")}')

For demonstration purposes, our trades are all of size 1, but depending on your capital, you might want to increase this parameter (or even make it dynamic depending on the sentiment score). Besides this, there are lots of other ways you can make this project even more extensive!

Further extensions 🤓

This project is only a start to your own sentiment trading strategy. You can make your trade decisions more robust by collecting news from several sources. Or you can conduct more extensive sentiment analysis by, for example, applying VADER on the whole article rather than just the headline (#clickbait 🎣). Perhaps you want to use a different sentiment analysis tool, like TextBlob. Or maybe you even want to create your own sentiment score library based on investment-specific jargon.

We suggest you begin by collecting data from a news source you trust and tweaking the trading decision rule. Let your imagination go wild!

You’re now set up to use BeautifulSoup, VADER and lemon.markets in your sentiment analysis project. See our GitHub repository for the entire script. And, if you come up with an interesting extension, feel free to make a PR! We look forward to seeing your ideas.

Joanne from lemon.markets 🍋