# statistical learning

In the last article, I described an application of the k-means clustering algorithm for classifying candlesticks based on the relative position of their open, high, low and close. This was a simple enough exercise, but now I tackle something more challenging: isolating information that is both useful and practical to real trading. I'll initially try two approaches: Investigate whether there are any statistically significant patterns in certain clusters following others Investigate the distribution of next day returns following the appearance of a candle from each cluster The insights gained from this analysis will hopefully inform the next direction of this research. Data preliminaries In the last article, I classified twelve months of daily candles (June 2014 - July 2015) into eight clusters. To simplify the analysis and ensure that enough instances of each cluster are observed, I'll reduce the number of clusters to four and extend the history to cover 2008-2015. I'll exclude my 2015 data for now in case I need a final, unseen test set at some point in the future. Here's a subset of the candles over the entire price history (2008-2014, 2015...