# Machine Learning

Way back in November 2007, literally weeks after SPX put in its pre-GFC all-time high, Friedman, Hastie and Tibshirani published their Graphical Lasso algorithm for estimation of the sparse inverse covariance matrix. Are you suggesting that Friedman and his titans of statistical learning somehow caused the GFC by publishing their Graphical Lasso algorithm? Not at all. I'm just setting you up to demonstrate the fallacy of mistaking correlation with causation (thanks for playing along). Seeing patterns where there are none is part of what it means to be human. Of course, Friedman and his gang of statisticians didn't cause the GFC. But they did help us deal with our psychological flaws by providing us with a powerful tool for detecting spurious correlations. Their tool allows one to figure out if variables are correlated with one another directly, or whether any measured connection is merely due to a common connection to something else. Confusing? Let's look at an example Consider the two stocks ABB, a multinational industrial automation company, and PUK, a multinational life insurance company. Over the period 2015 to...

This article is a departure from the quantitative research that usually appears on the Robot Wealth blog. Until recently, I was working as a machine learning consultant to financial services organizations and trading firms in Australia and the Asia Pacific region. A few months ago, I left that world behind to join an ex-client's proprietary trading firm. I thought I'd jot down a few thoughts about what I saw during my consulting time because I witnessed some interesting changes in the industry in a relatively short period of time that I think you might find interesting too. Enjoy! Perceptions around Artificial Intelligence (AI) in the finance industry have changed signifcantly, as scepticism gives way to a rising Fear of Missing Out (FOMO) among asset managers and trading houses. Big Data and AI Strategies – Machine Learning and Alternative Data Approaches to Investing, JP Morgan's 280-page report on the future of machine learning in the finance industry, paints a picture of a future in which alpha is generated from data sources such as social media, satellite imagery, and machine-classified company filings and...

This article is adapted from one of the units of Advanced Algorithmic Trading. If you like what you see, check out the entire curriculum here. Find out what Robot Wealth is all about here. If you're interested in using artificial neural networks (ANNs) for algorithmic trading, but don't know where to start, then this article is for you. Normally if you want to learn about neural networks, you need to be reasonably well versed in matrix and vector operations - the world of linear algebra. This article is different. I've attempted to provide a starting point that doesn't involve any linear algebra and have deliberately left out all references to vectors and matrices. If you're not strong on linear algebra, but are curious about neural networks, then I think you'll enjoy this introduction. In addition, if you decide to take your study of neural networks further, when you do inevitably start using linear algebra, it will probably make a lot more sense as you'll have something of head start. The best place to start learning about neural networks is the...

It would be great if machine learning were as simple as just feeding data to an out-of-the box implementation of some learning algorithm, then standing back and admiring the predictive utility of the output. As anyone who has dabbled in this area will confirm, it is never that simple. We have features to engineer and transform (no trivial task - see here and here for an exploration with applications for finance), not to mention the vagaries of dealing with data that is non-Independent and Identically Distributed (non-IID). In my experience, landing on a model that fits the data acceptably at the outset of a modelling exercise is unlikely; a little (or a lot!) of effort is usually required to be expended on tuning and debugging the algorithm to achieve acceptable performance. In the case of non-IID time series data, we also have the dilemma of the amount of data to use in the training of a predictive model. Given the non-stationarity of asset prices, if we use too much data, we run the risk of training our model on data that...

Last night it was my pleasure to present at the Tyro Fintech Hub in Sydney on the topic of using machine learning in algorithmic trading systems. Here you can download the presentation Many thanks to all who attended and particularly for the engaging questions. I thoroughly enjoyed myself! In particular, thanks to Andrien Juric for oraganising the event and Sharon Lu from Tyro for making available such a great space!!

Introduction My first post on using machine learning for financial prediction took an in-depth look at various feature selection methods as a data pre-processing step in the quest to mine financial data for profitable patterns. I looked at various methods to identify predictive features including Maximal Information Coefficient (MIC), Recursive Feature Elimination (RFE), algorithms with built-in feature selection, selection via exhaustive search of possible generalized linear models, and the Boruta feature selection algorithm. I personally found the Boruta algorithm to be the most intuitive and elegant approach, but regardless of the method chosen, the same features seemed to keep on turning up in the results. In this post, I will take this analysis further and use these features to build predictive models that could form the basis of autonomous trading systems. Firstly, I'll provide an overview of the algorithms that I have found to generally perform well on this type of machine learning problem as well as those algorithms recommended by David Aronson (2013) in Statistically Sound Machine Learning for Algorithmic Trading of Financial Instruments (SSML). I'll also discuss a framework for measuring the...

Updates: 2019: In this first Machine Learning for Trading post, we’ve added a section on feature selection using the Boruta package, equity curves of a simple trading system, and some Lite-C code that generates the training data. 2020: I've updated the original post with some new thinking about data-mining, refreshed the code, updated the data and plots, and added the code and data to our GitHub repository. My thinking about data mining has evolved Back when I originally wrote this article, there was a commonly held idea that a newly-hyped approach to predictive modeling known as machine learning could discern predictive patterns in market data. A quick search on SSRN will turn up dozens of examples of heroic attempts at this very thing, many of which have been downloaded thousands of times. Personally, I spent more hours than I care to count on this approach. And while I learned an absolute ton, I can also say that nothing that I trade today emerged from such a data-mining exercise. Over the years since I first wrote this article, a realisation slowly dawned on...

Important preface: This post is in no way intended to showcase a particular trading strategy. It is purely to share and demonstrate the use of the framework I've put together to speed the research and development process for a particular type of trading strategy. Comments and critiques regarding the framework and the methodology used are most welcome. Backtest results presented are for illustrating the methodology and describing the outputs only. That done, on to the interesting stuff My last two posts (Part 1 here and Part 2 here) explored applying the k-means clustering algorithm for unsupervised discovery of candlestick patterns. The results were interesting enough (to me at least) to justify further research in this domain, but nothing presented thus far would be of much use in a standalone trading system. There are many possible directions in which this research could go. Some ideas that could be worth pursuing include: Providing the clustering algorithm with other data, such as trend or volatility information; Extending the search to include two- and three-day patterns; Varying the number of clusters; Searching across markets and asset...

In the last article, I described an application of the k-means clustering algorithm for classifying candlesticks based on the relative position of their open, high, low and close. This was a simple enough exercise, but now I tackle something more challenging: isolating information that is both useful and practical to real trading. I'll initially try two approaches: Investigate whether there are any statistically significant patterns in certain clusters following others Investigate the distribution of next day returns following the appearance of a candle from each cluster The insights gained from this analysis will hopefully inform the next direction of this research. Data preliminaries In the last article, I classified twelve months of daily candles (June 2014 - July 2015) into eight clusters. To simplify the analysis and ensure that enough instances of each cluster are observed, I'll reduce the number of clusters to four and extend the history to cover 2008-2015. I'll exclude my 2015 data for now in case I need a final, unseen test set at some point in the future. Here's a subset of the candles over the entire price history (2008-2014, 2015...

Candlestick patterns were used to trade the rice market in Japan back in the 1800's. Steve Nison popularised the idea in the western world and claims that the technique, which is based on the premise that the appearance of certain patterns portend the future direction of the market, is applicable to modern financial markets. Today, he has a fancy website where he sells trading courses. Strange that he doesn't keep this hugely profitable system to himself and make tons of money. Since you're reading a blog about quantitative trading, its unlikely that I need to convince you that patterns like "two crows" and "dark cloud cover" are not statistically significant predictors of the future (but I'd be happy to do a post about this if there is any interest - let me know in the comments). If only profitable trading were that easy! So if these well-known patterns don't have predictive power, are there any patterns that do? And if so, how could they be discovered? Unsupervised machine learning techniques offer one such potential solution. An unsupervised learner is simply one that...