How to Start Using Big Data in your API Trading Strategy

As the world becomes more connected and digitized, individuals, businesses and countries produce data at an exponential rate. It has been estimated that the amount of data generated will grow from 33 Zettabytes in 2018 to 175 Zettabytes by 2025. Big Data has been shown to be incredibly valuable across industries and an increasing number of firms are investing in systems that use big data to identify potential opportunities and eliminate inefficiencies within their business to increase profits. Traders have the opportunity to do the same thing with their trading strategies using data. Most traders are accustomed to looking at price data to generate signals, however non-traditional sources of data can also be used to identify opportunities and eliminate inefficiencies.

What kind of data can be useful for API trading?

Historical price data is necessary for strategy development and backtesting purposes, and a stable connection to live price data is necessary for forward-testing and live trading. It is important that the historical data is complete, accurate, and granular enough for strategy development and backtesting. Traditionally, traders have used historical prices to identify areas of support and resistance with technical analysis (which is covered in more detail in this post). API traders have the benefit of using advanced tools such as those designed for Machine Learning, which analyze historical price data to identify patterns or relationships in the data that may not be visible to the human eye. Accessing historical and live price data is simple using FXCM’s REST API and fxcmpy Python wrapper—simply enter a few lines of code to create a Pandas DataFrame containing price data and begin your analysis.

In addition to price data, traders of most asset classes commonly use volume data. Volume is the amount of buying and selling of an instrument over a given time period, and traders use this information to gauge the strength of an existing trend or identify a reversal. Generally, volume tends to increase as a trend continues, and will begin to decrease when a trend starts to slow down and reverse. Traders may use volume as both a predictor of price action and a confirmation signal in conjunction with other forms of analysis. FXCM offers 32 different volume observations per minute per instrument, each illustrating a different aspect of trading volume during that minute. Live and historical volume data subscriptions are available, and a free sample is available on FXCM’s Github.

In what is known as sentiment analysis, the behavior of individuals is used to identify the potential behavior of the price of an asset. For example, traders may use sentiment expressed in the form of news announcements or social media updates regarding a particular issue or company, and go long (short) in the case of positive (negative) sentiment. In a more direct form of analysis, FXCM’s Speculative Sentiment Index provides a minute-by-minute comparison of long traders to short traders for a particular asset, providing insight into the current positioning among a large subset of retail traders. Sentiment can be used as a contrarian indicator, which means traders may want to trade in the opposite direction of the current market sentiment. To download a free sample of this sentiment data, please visit FXCM’s Github.

An inside look into the order book for a particular market is valuable information as it allows the trader to see elements of both sentiment and volume by seeing what is being traded, when it is being traded, and how much is being traded. FXCM streams its executed transactions data via its FIX APITrack this API to subscribers of this data package. This allows subscribers to receive, in real-time, the price, quantity, symbol, transaction time, and direction (long/short) of each trade. Traders can incorporate this data directly into their algorithm using FXCM’s FIX API for price streaming and order execution. As an example of the value of this data, quantitative trader Ernest Chan* did a study using FXCM’s 2017 order flow data and found that next-day returns increased with previous-day’s order flow, and created a trading based on his finding. To learn more about subscription options, click here.

Finding Alpha from Big Data

Once you have identified some data you think is interesting, how do you determine whether it has any value for price prediction? A good starting point with any data set is preliminary inspection through a process known as exploratory data analysis. Even the most interesting or exclusive dataset is not valuable if it is incomplete, unorganized or inconsistent, therefore it is important to examine the data to ensure it is complete, clean, and organized before investing time in testing and analyzing it. Exploratory data analysis (or EDA) typically involves calculating descriptive statistics such as the mean, standard deviation, and min/max values, and creating visualizations of the data such as scatterplots or histograms.

Correlation Matrix

A correlation matrix is a quick way to determine the relationship (if any) between multiple variables. The correlation coefficient shows the strength of the relationship, and varies between -1 and 1, with a correlation of -1 indicating the variables are perfectly negatively correlated, 1 indicating they are perfectly positively correlated, and 0 indicating there is no correlation. The closer the coefficient is to 0, the weaker the correlation between the variables. There are many variations when it comes to calculating correlations, but one of the most frequently used statistics is the Pearson correlation, which measures the correlation between variables that have a linear relationship. It should be noted that the Pearson correlation assumes variables are normally distributed which is not always the case. Correlation does not necessarily indicate causation so it is important to consider your results intuitively. For example, if we find that the returns of the S&P 500 are correlated to the weather in a particular city, we may have found a statistical coincidence, not a trading edge. However, the correlation matrix can help us identify what relationships may be worth further exploration.

Regression Analysis

You may remember working with regressions from your university statistics class. In a regression analysis, the dependent variable is the variable you are trying to predict and the independent variable is the variable that you believe may affect your dependent variable. A simple linear regression finds the slope and the intercept between the line that is the best fit between a dependent variable like the returns of an asset, and an independent variable like trading volume. If two variables have a linear relationship, this simply means that a change in the independent variable will cause a change to the dependent variable. You can see how this could be useful for determining the relationship between a particular dataset and price.

Machine Learning

A machine learning algorithm considers a dataset with a given number of samples and tries to predict properties about an unknown dataset based off of its analysis of the known dataset. A good place to start for data mining and analysis is with the free Python machine learning package scikit-learn. Scikit-learn is built on NumPy, SciPy and matplotlib and has excellent Documentation for those who may be new to machine learning. Machine learning can be used with price data by splitting the dataset into training data, which is the data the algorithm will use to learn, and testing data, which is the data the algorithm will test its findings on. For an example of using machine learning to determine whether the closing price of an asset today will be higher or lower than it was yesterday using FXCM’s historical price data, click here.

This is just the beginning of the many powerful ways to analyze data and use it to create a trading strategy. Ready to try it for yourself? Connect to FXCM’s REST API to access live and historical price data in real time. Click this link to find out how to get started.

Risk Warning:
CFDs are complex instruments and come with a high risk of losing money rapidly due to leverage.

76.88% of retail investor accounts lose money when trading CFDs with this provider.

You should consider whether you understand how CFDs work and whether you can afford to take the high risk of losing your money.

* FXCM is an independent legal entity and is not affiliated with Ernest Chan. FXCM does not endorse any product or service of Ernest Chan.

Links to third-party sites are provided for your convenience and for informational purposes only. FXCM bears no liability for the accuracy, content, or any other matter related to the external site or for that of subsequent links, and accepts no liability whatsoever for any loss or damage arising from the use of this or any other content. Such sites are not within our control and may not follow the same privacy, security, or accessibility standards as ours. Please read the linked websites' terms and conditions.

Be sure to read the next Financial article: IEX Cloud Provides Developers Access to Curated Financial Data and Services