How to build an NLP Pipeline to find the right stocks to buy

There are generally two approaches to stock prediction. One is the quantative approach which relies on mathematical algorithms to predict stock prices as I mentioned here. The second one is the qualitative approach where we predict stock prices based on changes in available related information.

In this article, I will walk you through the steps I took to build my machine learning pipeline to help me make my own investment decisions. In the pipeline that I built, I wrote a natural language processing algorithms to determine whether or not financial news regarding a certain stock is negative or positive on a given day, then combined with the finanical data of the stock, we can then predict the stock price based on how the financial news impacts the stock historically.

1. Data collection

The first step for running any machine learning analysis is the collection of data. In this case, after identifying what types of data that are required to have for my analysis, comes the data collection phase. For my analysis, I would need three type of data. In this step, we need to collect financial news on stocks and stock data from a public API source. Then we need to connect them together in one dataframe through ETL in preparation for the analysis.

2. NLP Analysis

The financial news we collect about a stock is just news that are not classified. In order for our model to work, we need to find a way to classify the news to either negative or positive. In order to do that, we will employ a natural language processing model to help us classify the news for each stock.

3. Machine Learning Model

After combining the NLP results about the news with the stock finanical information, we can finally employ a machine learning algorithms in XGboost to analyze the stock price patterns based on the negative or positive news a stock recevies.

4. Automatic Quantative Trading

Using the quantative trading bot we built before, we can simply fit the model into the pipeline and it will be executed automatically.