# BLOG

Explore our step-by-step research in systematic trading, machine learning, coding and more

Important preface: This post is in no way intended to showcase a particular trading strategy. It is purely to share and demonstrate the use of the framework I've put together to speed the research and development process for a particular type of trading strategy. Comments and critiques regarding the framework and the methodology used are most welcome. Backtest results presented are for illustrating the methodology and describing the outputs only. That done, on to the interesting stuff My last two posts (Part 1 here and Part 2 here) explored applying the k-means clustering algorithm for unsupervised discovery of candlestick patterns. The results were interesting enough (to me at least) to justify further research in this domain, but nothing presented thus far would be of much use in a standalone trading system. There are many possible directions in which this research could go. Some ideas that could be worth pursuing include: Providing the clustering algorithm with other data, such as trend or volatility information; Extending the search to include two- and three-day patterns; Varying the number of clusters; Searching across markets and asset...

In the last article, I described an application of the k-means clustering algorithm for classifying candlesticks based on the relative position of their open, high, low and close. This was a simple enough exercise, but now I tackle something more challenging: isolating information that is both useful and practical to real trading. I'll initially try two approaches: Investigate whether there are any statistically significant patterns in certain clusters following others Investigate the distribution of next day returns following the appearance of a candle from each cluster The insights gained from this analysis will hopefully inform the next direction of this research. Data preliminaries In the last article, I classified twelve months of daily candles (June 2014 - July 2015) into eight clusters. To simplify the analysis and ensure that enough instances of each cluster are observed, I'll reduce the number of clusters to four and extend the history to cover 2008-2015. I'll exclude my 2015 data for now in case I need a final, unseen test set at some point in the future. Here's a subset of the candles over the entire price history (2008-2014, 2015...

Candlestick patterns were used to trade the rice market in Japan back in the 1800's. Steve Nison popularised the idea in the western world and claims that the technique, which is based on the premise that the appearance of certain patterns portend the future direction of the market, is applicable to modern financial markets. Today, he has a fancy website where he sells trading courses. Strange that he doesn't keep this hugely profitable system to himself and make tons of money. Since you're reading a blog about quantitative trading, its unlikely that I need to convince you that patterns like "two crows" and "dark cloud cover" are not statistically significant predictors of the future (but I'd be happy to do a post about this if there is any interest - let me know in the comments). If only profitable trading were that easy! So if these well-known patterns don't have predictive power, are there any patterns that do? And if so, how could they be discovered? Unsupervised machine learning techniques offer one such potential solution. An unsupervised learner is simply one that...

This post builds on work done by jcl over at his blog, The Financial Hacker. He proposes the Cold Blood Index as a means of objectively deciding whether to continue trading a system through a drawdown. I was recently looking for a solution like this and actually settled on a modification of jcl's second example, where an allowance is made for the drawdown to grow with time. The modification I made was to use the confidence intervals for the maximum drawdown calculated by Zorro’s Monte Carlo engine rather than the maximum drawdown of the backtest. The limitation is that the confidence intervals for the maximum drawdown length are unknown – only those for the maximum drawdown depth are known. I used the maximum drawdown length calculated for the backtest and considered where the backtest drawdown depth lay in relation to the confidence intervals calculated via Monte Carlo to get a feel for whether it was a reasonable value. Below is a chart of the minimum profit for a strategy I recently took live plotted out to the end of 2015, created using the method...

In the first part of this article, I described a procedure for empirically testing whether a trading strategy has predictive power by comparing its performance to the distribution of the performance of a large number of random strategies with similar trade distributions. In this post, I will present the results of the simple example described by the code in the previous post in order to illustrate how susceptible trading strategies are to the vagaries of randomness. I will also illustrate by way of example my thought process when it comes to deciding whether to include a particular component in my live portfolio or discard it. I tested one particular trading system on a number of markets separately in both directions. I picked out three instances where the out of sample performance was good as candidates for live trading. The markets, trade directions and profit factors obtained from the out of sample backtest are as follows: USD/CAD - Short - Profit Factor = 1.79 GBP/USD - Long - Profit Factor = 1.20 GBP/JPY - Long - Profit Factor = 1.31 Next, I estimated the performance of...

Picture this: A developer has coded up a brilliant strategy, taking great care not to over-optimize. There is no look-ahead bias and the developer has accounted for data-mining bias. The out of sample backtest looks great. Is it time to go live? I would've said yes, until I read Ernie Chan's Algorithmic Trading and realised that I hadn't adequately accounted for randomness. Whenever we compute a performance metric from a backtest, we face the problem of a finite sample size. We can't know the true value of the performance metric, and the value we computed may or may not be representative of this true value. We may have been simply fooled by randomness into thinking we had a profitable strategy. Put another way, was the strategy's performance simply due to being in the market at the right time? There are a number of empirical methods that can be used to address this issue. Chan describes three in his book mentioned above, and there are probably others. I am going to implement the approach described by Lo, Mamaysky and Wang (2000), who simulated...