# k-means clustering

Important preface: This post is in no way intended to showcase a particular trading strategy. It is purely to share and demonstrate the use of the framework I've put together to speed the research and development process for a particular type of trading strategy. Comments and critiques regarding the framework and the methodology used are most welcome. Backtest results presented are for illustrating the methodology and describing the outputs only. That done, on to the interesting stuff My last two posts (Part 1 here and Part 2 here) explored applying the k-means clustering algorithm for unsupervised discovery of candlestick patterns. The results were interesting enough (to me at least) to justify further research in this domain, but nothing presented thus far would be of much use in a standalone trading system. There are many possible directions in which this research could go. Some ideas that could be worth pursuing include: Providing the clustering algorithm with other data, such as trend or volatility information; Extending the search to include two- and three-day patterns; Varying the number of clusters; Searching across markets and asset...

In the last article, I described an application of the k-means clustering algorithm for classifying candlesticks based on the relative position of their open, high, low and close. This was a simple enough exercise, but now I tackle something more challenging: isolating information that is both useful and practical to real trading. I'll initially try two approaches: Investigate whether there are any statistically significant patterns in certain clusters following others Investigate the distribution of next day returns following the appearance of a candle from each cluster The insights gained from this analysis will hopefully inform the next direction of this research. Data preliminaries In the last article, I classified twelve months of daily candles (June 2014 - July 2015) into eight clusters. To simplify the analysis and ensure that enough instances of each cluster are observed, I'll reduce the number of clusters to four and extend the history to cover 2008-2015. I'll exclude my 2015 data for now in case I need a final, unseen test set at some point in the future. Here's a subset of the candles over the entire price history (2008-2014, 2015...