The post Exporting Zorro Data to CSV appeared first on Robot Wealth.

]]>Earlier versions of Zorro used to ship with a script for converting market data in Zorro binary format to CSV. That script seems to have disappeared with the recent versions of Zorro, so I thought I’d post it here.

When you run this script by selecting it and pressing [Test] on the Zorro interface, you are asked to select a Zorro market data file to convert to CSV format. Zorro then does the conversion for you and writes the data to Zorro/History/Export.csv.

// convert a .t6 or .t1 file to .csv //#define ASCENDING #define MAX_RECORDS 10000 string Target = "History\\Export.csv"; int point(int Counter) { if(0 == Counter % 10000) { if(!wait(0)) return 0; printf("."); } return 1; } function main() { file_delete(Target); string Source = file_select("History","T1,T6\0*.t1;*.t6\0\0"); if(strstr(Source,".t6")) { T6 *Ticks = file_content(Source); int Records = file_length(Source)/sizeof(T6); printf("\n%d records..",Records); #ifdef MAX_RECORDS Records = min(Records,MAX_RECORDS); #endif #ifdef ASCENDING int nTicks = Records; while(--nTicks) #else int nTicks = -1; while(++nTicks < Records) #endif { T6 *t = Ticks+nTicks; file_append(Target,strf("%s,%.5f,%.5f,%.5f,%.5f\n", strdate("%Y-%m-%d %H:%M:%S",t->time), (var)t->fOpen,(var)t->fHigh,(var)t->fLow,(var)t->fClose)); if(!point(nTicks)) return; } } else if(strstr(Source,".t1")) { T1 *Ticks = file_content(Source); int Records = file_length(Source)/sizeof(T1); printf("\n%d records..",Records); #ifdef MAX_RECORDS Records = min(Records,MAX_RECORDS); #endif #ifdef ASCENDING int nTicks = Records; while(--nTicks) #else int nTicks = -1; while(++nTicks < Records) #endif { T1 *t = Ticks+nTicks; file_append(Target,strf("%s,%.5f\n", strdate("%Y-%m-%d %H:%M:%S",t->time),(var)t->fVal)); if(!point(nTicks)) return; } } printf("\nDone!"); }

By default, the data is written in descending order (newest data first). If you want ascending order instead, uncomment the line `#define ASCENDING`

.

As you can see, this script works with Zorro version 2.30:

This script is useful if you want to convert a single market data file. But it’s a little cumbersome if you want to convert the entire market data history of a ticker since Zorro splits that data into separate files by year (except for end-of-day data – that all goes into a single file).

Here’s a script for converting the entire history of a ticker from Zorro format to CSV:

//Export selected asset history to CSV function run() { StartDate = 20060101; LookBack = 0; BarPeriod = 1; string Format = ifelse(assetType(Asset) == FOREX, "\n%04i-%02i-%02i %02i:%02i, %.5f, %.5f, %.5f, %.5f", "\n%04i-%02i-%02i %02i:%02i, %.1f, %.1f, %.1f, %.1f"); char FileName[40]; sprintf(FileName,"History\\%s.csv",strx(Asset,"/","")); // remove slash from forex pairs if(is(INITRUN)) file_write(FileName,"Date,Open,High,Low,Close",0); else file_append(FileName,strf(Format, year(),month(),day(),hour(),minute(), round(priceOpen(),0.1*PIP), round(priceHigh(),0.1*PIP), round(priceLow(),0.1*PIP), round(priceClose(),0.1*PIP))); }

This one takes the ticker selected in Zorro’s asset dropdown box and writes its entire history to Zorro/History/ticker.csv. Again, you can see it works with Zorro 2.30:

If you want to import that data into R as an `xts`

object, the following snippet will do the trick:

`Data <- xts(read.zoo("ticker.csv", tz="UTC", format="%Y-%m-%d %H:%M", sep=",", header=TRUE))`

The post Exporting Zorro Data to CSV appeared first on Robot Wealth.

]]>The post Evolving Thoughts on Data Mining appeared first on Robot Wealth.

]]>Several years ago, I wrote about some experimentation I’d done with data mining for predictive features from financial data. The article has had several tens of thousands of views and nearly 100 comments.

I *think* the popularity of the article lay in its demonstration of various tools and modeling frameworks for doing data mining in R (it didn’t generate any alpha, so it can’t have been that). To that end, I’ve updated the data, code, and output, and added it to our GitHub repository. You can view the updated article here and find the code and data here.

Re-reading the article, it was apparent that my thinking had moved on quite significantly in just a few short years.

Back when I originally wrote this article, there was a commonly held idea that a newly-hyped approach to predictive modeling known as *machine learning* could discern predictive patterns in market data. A quick search on SSRN will turn up dozens of examples of heroic attempts at this very thing, many of which have been downloaded thousands of times.

Personally, I spent more hours than I care to count on this approach. And while I learned an absolute ton, I can also say that *nothing* that I trade today emerged from such a data-mining exercise. A large scale data mining exercise *contributed* to *one* of our strategies, but it was supported by a ton of careful analysis.

Over the years since I first wrote the article, a realisation has dawned on me:

Trading is very hard, and these techniques don’t really help that much with the hardest part.

I think, in general, the trading and investment community has had a similar awakening.

*OK, so what’s the “hardest part” of trading?*

Operational issues of running a trading business aside, the hardest part of trading is maximising the probability that the edges you trade continue to pay off in the future.

Of course, we can never be entirely sure about anything in the markets. They change. Edges come and go. There’s always anxiety that an edge isn’t really an edge at all, that it’s simply a statistical mirage. There is uncertainty everywhere.

Perhaps the most honest goal of the quantitative researcher is to **reduce this uncertainty as far as reasonably possible.**

Unfortunately (or perhaps fortunately, if you take the view that if it were easy, everyone would do it), reducing this uncertainty takes a lot of work and more than a little market nouse.

In the practical world of our own trading, we do this in a number of ways centred on detailed and careful analysis. Through data analysis, we try to answer questions like:

- Does the edge make sense from a structural, economic, financial, or behavioural perspective?
- Is there a reason for it to exist that I can explain in terms of taking on risk or operational overhead that others don’t want, or providing a service?
- Is it stable through time?
- Does it show up in the assets that I’d expect it to, given my explanation for why it exists?
- What else could explain it? Have I isolated the effect from things we already know about?
- What other edges can I trade with this one to diversify my risk?

In the world of machine learning and data mining, “reducing uncertainty” involves accounting for data mining bias (the tendency to eventually find things that look good if you look at enough combinations). There are statistical tests for data-mining bias, which, if being generous, offer plausible-sounding statistical tools for validating data mining efforts. However, I’m not here to be generous to myself and can admit that the appeal of such tools, at least for me, lay in the promise of avoiding the really hard work of careful analysis. *I don’t need to do the analysis, because a statistical test can tell me how certain my edge is!*

But what a double-edged sword such avoidance turns out to be.

If you’ve ever tried to trade a data-mined strategy, regardless of what your statistical test for data-mining bias told you, you know that it’s a constant battle with your anxiety and uncertainty. Because you haven’t done the work to understand the edge, it’s impossible to just leave it alone. You’re constantly adjusting, wondering, and looking for answers *after the* *fact*. It turns into an endless cycle – and I’ve *personally *seen it play out at all levels from beginner independent traders through to relatively sophisticated and mature professional trading firms.

The real tragedy about being on this endless cycle is that it short-circuits the one thing that is most effective at reducing uncertainty, at least at the level of your overall portfolio – finding new edges to trade.

This reality leads me to an approach for adding a new trade to our portfolio:

- Do the work to reduce the uncertainty to the extent possible. You don’t want to trade just
*anything*, you want to trade high-probability edges that you understand deeply. - Trade it at a size that can’t hurt you at the portfolio level if you’re wrong – and we will all be wrong from time to time.
- Leave it alone and go look for something else to trade.

The third point is infinitely more palatable if you’ve done the work and understand the things you’re already trading.

Having said all that, I’m not about to abandon machine learning and other statistical tools. They absolutely have their place, but it’s worth thinking about the relative importance of what to concentrate on and what we spend our time on.

At one extreme, we might think that market insight and quantitative analysis (what we’d call “feature engineering” in machine learning speak) is the most important thing and that we should spend all our time there.

However, the problem with this approach is that there are effective and well-understood techniques (for example PCA, lasso regression, and others) that will very much help with modeling and analysis. Understanding these tools well enough to know what they are and when they might help greatly enhances your effectiveness as a quantitative researcher.

On the other extreme, we might think that spending all our time on machine learning, data mining and statistical tests is appropriate. This is akin to owning a top-notch toolkit for servicing a car, but not knowing anything about cars, and leads to the endless cycle of patching things up mentioned above.

The post Evolving Thoughts on Data Mining appeared first on Robot Wealth.

]]>The post Are Cheap Stocks Expensive? A Simple Equity Factor Analysis Walkthrough appeared first on Robot Wealth.

]]>I have been sharing examples of simple real-time trading research on my Twitter account.

I do this kind of thing a lot in the training program of our trading group – and I’m sharing in the hope that it might also help a wider audience.

Here’s a piece of analysis I did recently on a really simple factor that appeared to be predictive of relative equity returns.

Shall we do some analysis on a *really dumb* factor which might predict relative returns in stocks?

“Are cheap stocks expensive?”

A research thread

— Robot James (@therobotjames) November 12, 2020

Options on stocks with a low share price tend to be overpriced.

Equity options (at 100 shares a pop) are quite big for a small retail trader. So we might say there is excess retail demand for options on cheap stocks – which would result in them being overpriced.

The AMZN share price is $3k+. There are Robinhooders who can’t afford a single stock.

Do we see the same effect in Stocks as we do in the options?

I’m going to analyze this in R – using datasets from the Robot Wealth research lab.

My raw price data looks like this:

I have daily OHLC for every stock that appeared in the Russell 1000 index over the last 20 years. (Whether it still exists or not)

- The OHLC points are adjusted for splits and dividends.
- The unadjusted_close price is the price the stock actually closed on that day
- If the stock didn’t trade that day we still have a row. It has
`volume = 0`

- If the stock was not in the index that day then
`is_universe = 0`

I don’t need daily data. I’m trying to answer a pretty dumb question here. So let’s keep things simple.

I’m going to just get snapshots of the prices on the last day of each calendar year.

Always keep stuff simple for yourself. At the start of a piece of analysis, you’re just trying to quickly disprove an idea. Most ideas are bad and the market is super-efficient. So make life easy, move fast, and disprove fast.

Borrowing the language of machine learning for my trivial analysis (because it’s precise) I now need to prepare:

- the target (the thing I am trying to predict)
- the feature (the thing I hope is predictive)

My (raw) target is going to be the log returns of the stock over a year

My (raw) feature is going to be the unadjusted close price of the stock at the end of the previous year.

Here I calculate the feature and target and align them in a single data set:

I’ve calculated those for all the stocks, including on days when those stocks weren’t in the Russell 1000 index.

So now I filter out the days when the stock wasn’t in the index and days when a given stock didn’t trade (due to a trading halt or similar)

Always assume you’ve screwed it up. Check. Here I spot check on TSLA.

Looks good.

- The feature is just the unadjusted close at the end of the year.
- The target is the log returns over the next year.

Now we’re ready to do some scaling and sorting.

We don’t want to work with the raw feature. We’re looking to answer a very broad question here, and large numbers are our friends.

So we want to sort and group our data so we can aggregate it effectively.

We’ll scale our feature by sorting each stock into one of 10 buckets each year

- Bucket 1 will contain the stocks in the index with the lowest (unadjusted) share price that year
- Bucket 10 will contain the stocks in the index with the highest (unadjusted) share price that year

We call these deciles if we are fancy. I usually call them buckets.

To understand what that has done we plot a histogram of all our feature observations and colour it by the bucket it ended up in..

Now we’ve reduced our raw feature to 10 buckets. That will be helpful.

Our next task is to think about scaling the target.

We’re really more interested in * relative* (rather than absolute) returns:

So we “de-mean” the target by subtracting the mean returns of all stocks that year from the yearly returns for each stock in our universe.

The data now look like this. I’ve highlighted the scaled feature (`bucket`

) and target (`demeaned_target`

.)

Now we want to see if the really low priced stuff that ended up in bucket 1 had lower returns than the really high priced stuff that ended up in bucket 10.

So we take our observations, group by bucket and plot the mean of next years (de-meaned) returns for each bucket.

Interestingly… we do appear to see – at least over the whole sample – the annual return of the cheap stocks is significantly (6-7%) lower than the returns of the more expensive stocks.

*I haven’t lost interest yet…*

So let’s create one of those plots for each of the 22 years in the sample.

We want to get a feel for whether we see this pattern consistently.

Looks quite consistent… The relative shapes of 2000/2001 and 2007/2008 are interesting and point to this effect likely having some explanatory variable we already know about such as a beta / size / reversion effect…

Let’s not worry about that yet… move fast and loop back.

To make this thing more readable i’m going to group into groups of about 3 years and plot them.

Woah there 2017-2019… makes me wonder whether I’ve introduced some bias *(or my unadjusted close is correct)*. Something to look at.

To complete our superficial analysis, let’s plot a cumulative return time series of a strategy that goes long the 10% with the highest share price and the 0% with the lowest share price.

Long bucket 10, short bucket 1

Very interesting… I’m quite surprised at the results of this… I thought I would find nothing. Or at, least, an incredibly noisy effect.

Two main things to do now

- Get more acquainted with unadjusted close data to ensure it’s correct and I’m not introducing helpful bias

2 Try to isolate this from any other casual factors we already know about (size, reversion from big moves, beta effect)

There is a * suggestion* that cheaper stocks are expensive.

You think you’ve identified a new, useful predictive factor for trading…

But is it really new? Or just another way of looking at something you already know about?

How might you tell? Here are some simple ways…

A research thread https://t.co/q9ga3S6zbd

— Robot James (@therobotjames) November 27, 2020

We know that a low stock price doesn’t actually * cause *future returns to be lower. That would be silly.

But we thought that stocks with very low share prices may be attractive to a low capitalized seeker of stock returns – whose marginal demand may bid up these stocks.

This is *plausible*. And we *would like *it to be true (cos then we’ve found a new effect we might harness.) But we can’t always get what we want.

So we must ask: **“Is this something we already know about?”**

Economic intuition comes before statistics.

*What do we already know about that might be causing this?*

Well, it is well known that high volatility assets tend to have very poor long-run returns.You can read about this in Antti Ilmanen’s Expected Returns here…

We suspect high vol assets are attractive to those who like lotterylike “YOLO” payoffs or who dislike leverage or can’t access it easily. This creates excess demand for highly volatile stocks, which makes them more expensive, which makes their future expected returns lower.

So do stocks with lower share prices tend to underperform simply because they tend to be more volatile stocks? Or is there something else going on? Let’s look…

First, we calculate a volatility factor – which will just be the annualised volatility of the stock over the last 252 trading days (1 year).

I’m using the same dataset as in the linked thread at the top…

Now we want to answer the question:

Do stocks with low share price also tend to be high volatility stocks (and vice versa)?

A scatterplot is a useful tool for this. For each yearly stock observation plot its past volatility on the y-axis and the log share price on the x-axis.

It’s quite clear that stocks with low share prices tend to be higher volatility stocks. This suggests what we are seeing could well be a high-volatility effect.

Now we want to see if our share price effect goes away if we control for the high volatility effect.

First, let’s look at the volatility effect itself. We sort all our annual stock observations into deciles by rising volatility and plot the mean of their log returns the following year. It’s pretty clear the high vol stuff tends to have crappy returns.

What if we filter out the highly volatile stuff from our analysis?

If we only look at the stuff that appears in volatility deciles 1-8, do we still see any “signal” in our share price factor? Do lower volatility stocks with low share prices still have worse returns?

So we:

- filter out all the high vol stocks (vol_bucket <= 8)
- plot the mean log return for each share price bucket for the remaining low and moderate volatility stocks.

**And it no longer looks interesting!**

Once we’ve controlled for the high volatility effect, the share price doesn’t seem to have anything interesting to add.

As often happens in the markets, it’s unlikely we’re going to get what we want here. **It’s likely we just found another proxy for volatility.**

Now, let’s try isolating the high vol stuff. Does the share price allow us to discriminate between high vol stuff with better (less bad?) returns?

Nope. Doesn’t seem to be anything there either!

At this point, I think we can say that it’s very likely the share price effect we saw isn’t that interesting by itself – it’s really just a proxy for the volatility / “betting-against-beta” effect we already knew about.

Such is the way it goes!

The good news is we understand the effect better now. The less good news is it’s likely just another crude way of looking at something we already knew about.

Here’s a simple recipe of sorts for doing this kind of thing:

- Use economic intuition to identify what else might explain the effect
- Proxy that other thing as a factor
- Look at the relationship between the two factors (scatterplot is good)
- Control for one effect (as best you can) and see if the other factor still explains returns

Economic intuition and simple exploratory data analysis should always be your first port of call.

Don’t rush into running regressions or the like without asking some good simple questions of the data first. You’ll get much more insight this way.

Stocks with low share prices tend to underperform those with higher share prices.

Unfortunately, this doesn’t look to be a unique factor. It appears to be almost entirely explained by the fact that stocks with low share prices tend to be higher volatility stocks (a known “betting-against-beta” factor.)

The post Are Cheap Stocks Expensive? A Simple Equity Factor Analysis Walkthrough appeared first on Robot Wealth.

]]>The post Trading FX using Autoregressive Models appeared first on Robot Wealth.

]]>I’m a big fan of Ernie Chan’s quant trading books: *Quantitative Trading*, *Algorithmic Trading,* and *Machine Trading*. There are some great insights in there, but the thing I like most is the simple but thorough treatment of various edges and the quant tools you might use to research and trade them. Ernie explicitly states that the examples in the books won’t be tradable, but they’ve certainly provided fertile ground for ideas.

In *Machine Trading*, there is an FX strategy based on an autoregressive model of intraday price data. It has a remarkably attractive pre-cost equity curve, and since I am attracted to shiny objects, I thought I’d take a closer look.

An autoregressive (AR) model is a time-series multiple regression where:

- the predictors are past values of the time series
- the target is the next realisation of the time series

If we used a single prior value as the only predictor, the AR model would be called an AR(1) and it would look like:

y_t = \beta_0 + \beta_1 y_{t-1} + \epsilon_t (the \beta‘s are the regression coefficients)

If we used two prior values, it would be called an AR(2) model and would look like:

y_t = \beta_0 + \beta_1 y_{t-1} + \beta_2 y_{t-2} + \epsilon_t `

You get the picture.

Ernie says that

“Time-series techniques are most useful in markets where fundamental information and intuition are either lacking or not particularly useful for short-term predictions. Currencies … fit this bill.”

He built an AR(p) model for AUD/USD minutely prices (spot, I assume), using Bayesian Information Criterion to find the optimal p. He used data from July 24 2007 to August 12 2014 to find p and the model coefficients:

He used an out-of-sample data set from August 12 2014 to August 3 2015 to generate the following (pre-cost) backtest:

If we were to use an AR(p) model of prices to predict future prices, what are we assuming?

Essentially, that past prices are correlated with future prices, to the extent that they contain tradable information.

The first part of that seems perfectly reasonable. After all, a very good place to start predicting tomorrow’s price would be today’s price. The latter part of that however is fairly heroic. Such a naive forecast is obviously not going to help much when it comes to making profitable trades. If we could forecast returns on the other hand…now that would be a useful thing!

Most quants will tell you that price levels can’t contain information that is predictive in a useful way, and that instead, you need to focus on the process that manifests those prices – namely, returns. **Building a time series model on prices feels a bit like doing technical analysis… albeit with a more interesting tool than a trend line.**

Anyway, let’s put that aside for now and dig in.

If we’re going to use an AR model, then a reasonable place to start would be figuring out if we have an AR process. For that, we can use the `acf`

function in R:

Our minutely price series is indeed very strongly autocorrelated, as we’d expect. We have a lot of data (I used minutely data from 2009 – 2020), and of course, the price from one minute ago looks a lot like the price right now.

We can use the partial autocorrelation function (`pacf`

) to get a handle on how many lags we might need to build an AR model.

A partial autocorrelation is the correlation of a variable and a lagged version of itself *that isn’t explained by correlations of previous lags.* Essentially, it prevents information explained by prior lags from leaking into subsequent lags.

That makes it a useful way to identify the number of lags to use in your AR model – if there is a significant partial correlation at a lag, then that lag has some explanatory power over your variable and should be included. Indeed, you find that the partial correlation values correspond to the coefficients of the AR model.

Here are the partial autocorrelations with the lag-0 correlation removed:

This is interesting. If our price data were a random walk, we’d expect no lags of the PACF plot to be significant. Here, we have many lags having significant autocorrelation – is our price series weakly stationary, or does it drift?

Here’s a random walk simulation and the resulting PACF plot for comparison:

# random walk for comparison n <- 10000 random_walk <- cumsum(c(100, rnorm(n))) data.frame(x = 1:(n+1), y = random_walk) %>% ggplot(aes(x = x, y = y)) + geom_line() + theme_bw() p <- pacf(random_walk, plot = FALSE) plot(p[2:20])

Here’s a plot of the random walk:

And the PACF plot (again, with the lag-0 correlation removed):

Returning to the PACF plot of our price data, Ernie’s choice of 10 lags for his AR model looks reasonable (those partial correlation values translate to the coefficients of the AR model), but the data also suggests that going as far back as 15 lags is OK too.

It would be interesting to see how that PACF plot changes through time. Here it is separately for each year in our data set:

# annual pacfs annual_partials <- audusd %>% mutate(year = year(timestamp)) %>% group_by(year) %>% # create a column of date-close dataframes called data nest(-year) %>% mutate( # calculate acf for each date-close dataframe in data column pacf_res = purrr::map(data, ~ pacf(.x$close, plot = F)), # extract acf values from pacf_res object and drop redundant dimensions of returned lists pacf_vals = purrr::map(pacf_res, ~ drop(.x$acf)) ) %>% # promote the column of lists such that we have one row per lag per year unnest(pacf_vals) %>% group_by(year) %>% mutate(lag = seq(0, n() - 1)) signif <- function(x) { qnorm((1 + 0.95)/2)/sqrt(sum(!is.na(x))) } signif_levels <- audusd %>% mutate(year = year(timestamp)) %>% group_by(year) %>% summarise(significance = signif(close)) annual_partials %>% filter(lag > 0, lag <= 20) %>% ggplot(aes(x = lag, y = pacf_vals)) + geom_segment(aes(xend = lag, yend = 0)) + geom_point(size = 1, colour = "steelblue") + geom_hline(yintercept = 0) + facet_wrap(~year, ncol = 3) + geom_hline(data = signif_levels, aes(yintercept = significance), colour = 'red', linetype = 'dashed')

Those partial correlations do look quite stable… But remember, we’re not seeing any information about returns here – we’re only seeing that recent prices are correlated with past prices.

My gut feel is that this represents the noisy mean reversion you tend to see in FX at short time scales. Take a look at this ACF plot of minutely *returns* (not prices):

ret_acf <- acf(audusd %>% mutate(returns = (close - dplyr::lag(close))/dplyr::lag(close)) %>% select(returns) %>% na.omit() %>% pull(), lags = 20, plot = FALSE ) plot(ret_acf[2:20], main = 'ACF of minutely returns')

There are clearly some significant negative autocorrelations when we view things through the lens of returns. Any method of trading that negative autocorrelation would show results like Ernie’s backtest – including AR models and dumber-seeming technical analysis approaches. At least, in a world without transaction costs.

I think we can make the following assumptions:

- There is nothing special about the last ten minutely prices
- This is not going to be something we can trade, particularly under retail spot FX trading conditions.

But let’s not get caught up with inconvenient assumptions and press on with some simulations…

Here’s the game plan:

- Fit an AR(10) model on a big chunk of data
- Simulate a trading strategy that uses that model for its predictions on unseen data

I’ll use R to fit the AR(10) model. I’ll use Zorro to simulate the profit and loss of a strategy that traded on the basis of that model’s predictions. In the simulation, I’ll use Zorro’s R bridge to execute an R function that returns the step-ahead prediction given the last ten prices. Here’s a tutorial for setting up Zorro’s R bridge if you’d like to follow along.

First, here’s how to fit the AR(10) model in R (I have my AUD/USD prices in a dataframe indexed by `timestamp`

):

# fit an AR model ar <- arima( audusd %>% filter(timestamp < "2014-01-01") %>% select(close) %>% pull(), order = c(10, 0, 0) )

Here we use the `arima`

function from the `stats`

package and specify an order of `(10, 0, 0)`

. Those numbers correspond to the number of autoregressive terms, the degree of differencing, and the number of moving average terms, respectively. Specifying zero for the latter two results in an AR model.

Here are the model coefficients:

ar$coef # ar1 ar2 ar3 ar4 ar5 ar6 ar7 ar8 ar9 ar10 # 0.9741941564 0.0228922865 0.0019821879 -0.0073977641 0.0045880720 0.0072364966 -0.0047513598 0.0003852733 -0.0048944003 0.0057283039 # intercept # 0.6692288336

Next, here’s an R function for generating predictions from an AR model:

# fit an AR model and return the step-ahead prediction # can fit a new model or return predictions given an existing set of coeffs and new data # params: # series: data to use to fit model or to predict on # fixed: either NA or a vector of coeffs of same length as number of model parameters # usage: # fit a new model an return next prediction: series should consist of the data to be fitted, fixed should be NA # fit_ar_predict(audusd$close[1:100000], order = c(10, 0, 0)) # predict using an existing set of coeffs: series and fixed should be same length as number of model parameters # fit_ar_predict(audusd$close[100001:100010], order = c(10, 0, 0), fixed = ar$coef) fit_ar_predict <- function(series, order = 10, fixed = NA) { if(sum(is.na(fixed)) == 0) { # make predictions using static coefficients # arima(series, c(1, 0, 0)) fits an AR(1) predict(arima(series, order = c(order, 0, 0), fixed = fixed), 1)$pred[1] } else { # fit a new model predict(arima(series, order = c(order, 0, 0)), 1)$pred[1] } }

If you supply the `fixed`

parameter (corresponding to the model coefficients), the function returns the step-ahead prediction given the values in `series`

. The length of `series`

needs to be the same as`order`

, and the length of `fixed`

needs to be `order + 1`

to account for the intercept term.

If you don’t supply the `fixed`

parameter, the function will fit an AR(order) model on the data in `series`

and return the step-ahead prediction.

Save this file in Zorro’s Strategy folder.

Finally, here’s the Zorro code for running the simulation given our model parameters derived above (no transaction costs):

#include <r.h> function run() { set(PLOTNOW); setf(PlotMode, PL_FINE); StartDate = 2014; EndDate = 2015; BarPeriod = 10; LookBack = 10; MaxLong = MaxShort = 1; MonteCarlo = 0; if(is(INITRUN)) { // start R and source the kalman iterator function if(!Rstart("ar.R", 2)) { print("Error - can't start R session!"); quit(); } } asset("AUD/USD"); Spread = Commission = RollLong = RollShort = Slippage = 0; // generate reverse price series (the order of Zorro series is opposite what you'd expect) vars closes = rev(series(priceClose())); // model parameters int order = 10; var coeffs[11] = {0.9793975109, 0.0095665978, 0.0025503174, 0.0013394797, 0.0060263045, -0.0023060104, -0.0022220192, 0.0006940781, 0.0011942208, 0.0037558386, 0.9509437891}; //note 1 extra coeff - intercept if(!is(LOOKBACK)) { // send function argument values to R Rset("order", order); Rset("series", closes, order); Rset("fixed", coeffs, order+1); // compute AR prediction and trade var pred = Rd("fit_ar_predict(series = series, order = order, fixed = fixed)"); printf("\nCurrent: %.5f\nPrediction: %.5f", priceClose(), pred); if(pred > priceClose()) enterLong(); else if(pred < priceClose()) enterShort(); } }

Now, since we’re calling out to R once every minute for a new prediction, this simulation is going to take a while. Let’s just run it for a couple of years out of sample while we go make a cocktail…

Here’s the result:

Which is quite consistent with Ernie’s pre-cost backtest (noting differences due to slightly different historical data periods and modeling software. Also note that Zorro’s percent return calculation is based on the assumption that you invest the minimum amount in order to avoid a margin call in the backtest – hence why it looks astronomically large in this case).

Unfortunately, this is a massively hyperactive strategy that is going to get killed by costs. You can see in the Zorro backtest that the average pre-cost profit per trade is only 0.1 pip. That’s going to make your FX broker extremely pleased.

Transaction costs are a major problem, so let’s start there.

If we saw evidence of partial autocorrelation at longer time horizons, we could potentially slow the strategy down such that it traded less frequently and held on to positions for longer.

Here are some rough PACF charts of what that might look like. First, using ten-minutely data:

# 10-minute partials partial <- pacf( audusd %>% mutate(minute = minute(timestamp)) %>% filter(minute %% 10 == 0) %>% select(close) %>% pull(), lags = 20, plot = FALSE) plot(partial[2:20], main = 'Ten minutely PACF')

Next, hourly:

# hourly partials partial <- pacf( audusd %>% mutate(minute = minute(timestamp)) %>% filter(minute == 0) %>% select(close) %>% pull(), lags = 20, plot = FALSE) plot(partial[2:20], main = 'Hourly PACF')

Finally, at the daily resolution:

# daily partials partial <- pacf( audusd %>% mutate(hour = hour(timestamp)) %>% filter(hour == 0) %>% select(close) %>% pull(), lags = 20, plot = FALSE) plot(partial[2:20], main = 'Daily PACF')

There’s a pattern emerging there, with fewer and fewer significant partial correlations as we increase the resolution.

We can try an AR(10) model using ten-minutely data. Follow the same procedure above, where we calculate the coefficients in R, then hard-code them in our Zorro script. This gives the following result:

We managed to triple our average cost per trade, but it’s still not going to come close to covering costs.

At this point, it’s becoming quite clear (if it wasn’t already) that this is a very marginal trade, no matter how you cut it. However, some other things that might be worth considering include:

- If we assumed that our predictions were both useful in terms of magnitude as well as direction, we could implement a prediction threshold such that we would trade only when the prediction is a certain distance from the current price.
- It would be reasonable to think that when volatility is higher, price tends to move further. To the extent that an edge exists, it will be larger as a percent of costs the larger the volatility. Since volatility is somewhat predictable (at least in a noisy sense), we might be able to improve the average profit per trade by simply not trading when we think volatility is going to be low.
- Finally, we may want to try re-fitting the model at regular intervals. To check if that’s a useful thing to do, you could look for evidence of persistence of model coefficients. That is, are the model coefficients estimated over one window similar to those fitted over the next window? You can backtest this approach using the Zorro and R scripts above – simply don’t pass the
`fixed`

parameter from Zorro and think about how much data you want in your fitting window.

Fitting an autoregressive model to historical prices of FX was a fun exercise that yielded a surprising result: namely, the deviation of the price series from the assumption of a random walk. The analysis suggests that short-term mean reversion of FX exists, but is unlikely to be something we can trade effectively.

When we think about what the strategy is really doing, it’s merely trading mean-reversion around a recent price level. It’s unlikely that the ten AR lags are doing anything more nuanced than trading a noisy short-term reversal effect that could probably be harnessed in a more simple and effective way. For example, we could very likely get a similar result with less fuss by looking to fade extreme moves in the short term.

Maybe there *are *sophisticated and important interdependencies between those ten lags, but I don’t think we should assume that there are. In any event, no matter how we trade this effect, it’s unlikely to yield profit after costs.

The post Trading FX using Autoregressive Models appeared first on Robot Wealth.

]]>The post Tesla’s inclusion in the S&P 500 – Is there a trade? appeared first on Robot Wealth.

]]>The S&P index committee recently announced that Tesla, already one of the biggest stocks listed in the country, would be included in the S&P 500.

Here’s the press release:

Due to TSLA’s size, it was widely expected to have entered the S&P 500 index much earlier – but S&P has some discretionary criteria it applies to ensure that the index is an effective measure of the larger stocks in the market. I suspect they worried that TSLA’s recent parabolic move was unsustainable.

TSLA’s inclusion in the index is going to be a big deal because TSLA is so big. The S&P 500 is a market capitalization-weighted index. If prices stay roughly the same, then TSLA will represent over 1% of the index – putting it near the top 10 on its inclusion.

Now, there’s a lot of money that is tracking that index. On the rebalance date these indexers will need to:

- buy a lot of TSLA shares
- sell a lot of shares stocks that are not TSLA.

Finger in the air, we’re talking $50 – $80 billion dollars (ish) worth of rebalancing trading that will need to occur around December 21st.

Due to the sheer size of the rebalance, S&P is seeking feedback on whether the index should be rebalanced in two tranches, or all in one go.

Certainly, we know there’s going to be a large amount of rebalance trading going on. Could that generate dislocations that may be tradeable?

Quite possibly… though maybe not in immediately obvious ways.

Everybody knows that everyone knows there will be significant demand for TSLA stock on the rebalance. So, on the S&P announcement, the stock price of TSLA jumped about 13%. Markets are forward-looking like that, you see. They don’t wait for permission.

Anything obvious gets “priced in” pretty quick. But an understanding of over-reaction/under-reaction dynamics, trader constraints, and some statistical analysis can sometimes uncover noisy inefficiencies around these kinds of events.

Early studies such as The S&P 500 Index Effect in Continuous Time: Evidence from Overnight, Intraday and Tick-By-Tick Stock Price Performance by Brooks and Ward suggested that:

- Any excess returns for the index joiner were mostly incorporated in the price overnight after the announcement date
*(which, of course, is untradeable)* - There were still some excess returns realized in the period from the first day after the announcement up to the event
- There were consistent intraday trading patterns around the announcement event, suggesting inefficient front-running flows which may be exploited
*(for example, see the image below which shows the cumulative intraday abnormal performance on the day after the stock has been added to the index.)*

Of course, time passes and the market continues to become more efficient. Banks and end-users develop more sophisticated rebalancing algorithms. The world moves onwards and upwards.

In a more recent paper, The Diminished Effect of Index Rebalances, Konstantina Kappou finds no tradeable abnormal returns between the announcement and the index rebalance dates. This suggests that market participants have become more effective in pre-positioning themselves for such an event, and indexers have become more sophisticated in avoiding market impact on rebalancing.

However – they do find what appear to be tradeable patterns on and after the index inclusion date.

In particular, using data between 2002 and 2013 (and a total of 276 index inclusions) the author finds highly significant excess returns for the stock on the first day it is included in the index (see green box) – which reverse thereafter.

Now – I’ve done no work to validate this. And I don’t intend to. (One has to prioritize one’s efforts.)

But this research suggests:

- If you’re looking to buy TSLA to harvest abnormal returns prior to the inclusion date, then you’re probably too late.
- But you may look to add TSLA exposure ahead of its first day of trading in the index and reduce it at the end of the day.
- Or, if you’re feeling fruity, you may look to short TSLA against the index at the end of the day and hold for a month or so.

At best these kinds of trades are marginal. You’re never going to get rich trading stuff like this.

Do your own analysis and trade small. S&P rebalancing in two tranches may change the dynamics here. If that happens then look into it and see if you can figure out a way you might be able to exploit the dynamics.

The post Tesla’s inclusion in the S&P 500 – Is there a trade? appeared first on Robot Wealth.

]]>The post How to Connect Google Colab to a Local Jupyter Runtime appeared first on Robot Wealth.

]]>Colaboratory, or Colab, is a hosted Jupyter notebook service requiring zero setup and providing free access to compute resources. It is a convenient and powerful way to share research, and we use it extensively in The Lab.

What’s The Lab?

The Lab is the RW Pro group’s portal for doing collaborative research together as a community. The key goals are:

- To develop members’ quant research skills by getting them hands-on with the research process.
- To scale the research effort through community collaboration.
- To make the fruits of that research effort available to the entire community.

**In short, we grow people’s research skills and the number of edges available to trade faster than an individual could grow them alone.**

Colab allows you to create, run, and share Jupyter notebooks without having to download or install anything. Integration with GitHub means that you can work entirely in the cloud:

While working in the cloud has benefits – such as no local setup – there are also limitations. For example:

- Repeatedly setting up the research environment from scratch
- Session disconnection if idle for too long
- Memory limitations

To get around such constraints, you might consider connecting Colab to a Jupyter server running locally.

But why run Colab locally, rather than working in a vanilla Jupyter notebook?

Mainly for the sharing and collaboration tools: one-click loading and pushing to GitHub, easy access to cloud storage buckets, and “snippet notebooks.”

Snippet notebooks contain chunks of self-contained code that can be injected into a working notebook. This is a great way to collaborate, ensure consistency and reproducibility, increase productivity, and share tools such that the code is visible in the notebook. We find this suits collaborative quant research more than say abstracting such functionality away in a package:

The easiest way is via Conda:

`conda install -c conda-forge jupyterlab`

or pip:

`pip install jupyterlab`

More detailed instructions can be found here.

The colab team authored the`jupyter_http_over_ws`

extension. Install it by doing:

`pip install jupyter_http_over_ws`

Then enable it by doing:

`jupyter serverextension enable --py jupyter_http_over_ws`

You can read more detailed information about this extension on its GitHub repository.

We need a local Jupyter server that trusts WebSocket connections from the Colab frontend. The following command and flags accomplish this:

jupyter notebook \ --NotebookApp.allow_origin='https://colab.research.google.com' \ --port=8888 \ --NotebookApp.port_retries=0

Once the server has started, it will print a message with the initial backend URL used for authentication. You’ll need a copy of this in the next step:

In Colab, click the “Connect” button and select “Connect to local runtime”. Enter the URL you just copied and click “Connect”:

That’s it! You now have the Colab research environment running on your local Jupyter server.

Next time you want to connect to a local runtime, you only need to run steps 3 and 4 above.

When you connect to a local Jupyter server, you allow the Colab frontend to execute code in the notebook using local resources, accessing the local file system.

Before attempting to connect to a local runtime, make sure you trust the author of the notebook and ensure you understand the code that is being executed. Don’t run a notebook locally unless you understand the code!

The Lab is available to RW Pro members. Want in? Access to RW Pro is via our Bootcamp program. Join the waitlist here.

The post How to Connect Google Colab to a Local Jupyter Runtime appeared first on Robot Wealth.

]]>The post Where does FX sit in a Systematic Trading Portfolio? appeared first on Robot Wealth.

]]>You rarely meet a rich forex trader. I’ve met plenty of rich traders who trade quant factors or stat arb. Plenty of market makers, futures spreaders and volatility traders that do nicely. But I don’t think I’ve ever met a rich forex trader.

Jeez man – what a downer!

Don’t run away, we’re gonna turn this around into something positive… bear with us!

This post is a BONUS LESSON taken directly from Zero to Robot Master Bootcamp. In this Bootcamp, we teach traders how to research, build and trade a portfolio of 3 strategies including an Intraday FX Strategy, a Risk Premia Strategy and a Volatility Basis Strategy. If you’re interested in adding strategies to your portfolio or are just keen to start on the path to becoming a successful and sustainable systematic trader, you can check out full details of the Bootcamp here.

Let’s look at our map of the trading landscape and briefly discuss why that is.

This map shows the effects we can take advantage of in the financial markets to make money, and the strategies we can use to exploit those effects.

On the left, we have risk premia (which we explore in Embrace the Mayhem Initation, Week 0 of Bootcamp) .

By taking on certain short term risks, such as the risk of sudden large losses in the equity markets, we tend to get rewarded over time. In other words, you tend to make money buying and holding stocks and bonds. You also tend to make money taking on certain illiquid investments and selling insurance.

Risk premia is the easiest game in town – if you can calibrate your risk/reward expectations sensibly.

Unfortunately, there is no clear risk premium associated with the FX market as a whole. The expected value of a trade in an FX pair is essentially zero. And that’s certainly true of the market as a whole.

**There *** is no broad FX risk premium we can collect.* No easy tailwind.

So we need to make all our returns trading actively. This makes things a fair bit harder – *but not impossible.*

In our map, we divide exploitable market effects into * Factor styles* in light green at the bottom left and

Factor styles are pervasive broad cross-sectional effects which tend to be seen across and within most asset classes, including FX. They include short term mean reversion, momentum, carry, value and low volatility. If there are any “easy wins” in FX it is smart for us to start looking for them here.

The things we’re calling “Other inefficiencies” include your more unique alpha effects, including seasonality, cointegration effects and conditional over and under-reaction by traders to events. This can be a fertile place to look in less efficient markets like the equity and commodity futures markets. It’s harder (but not impossible) in the FX markets, which are ruthlessly efficient. But there are some edges we’ll look to exploit there.

Throughout Bootcamp, we’ll be sharing a number of these alphas that we’ve found and are currently actively trading.

The intent is that you will use these to make money, but even more importantly we want to show you how we discovered them, how we tested them, and how we implemented them. No doubt these alphas will disappear at some point, but if you know how to discover and exploit others, you’ll never be short of replacements. And there are literally dozens of ideas out there that you can test if you have the tools and know-how to do so.

- There is no risk premium inherent in the FX markets – which means we are missing the tailwinds that we like to trade with.
- We are therefore forced to generate returns through active trading in an efficient market.
- This is difficult, but not impossible. And there is no shortage of good ideas to pursue.
- We’ll show you a number of the alphas we trade.
- More importantly, we’ll show you how we discovered, tested and implemented these alphas.
- It’s hard to get rich just trading FX.
- Therefore, it’s very smart to view FX as one part of your trading operation, not the whole thing…

*In the next lessons, we’ll run through a quick history of the currency markets and currency trading. And we’ll talk about some other characteristics of the FX market that you simply must know before you wade into these waters.*

The post Where does FX sit in a Systematic Trading Portfolio? appeared first on Robot Wealth.

]]>The post Some Things Just Go Up (If You Wait Long Enough) appeared first on Robot Wealth.

]]>Here’s a chart of long-term asset performance….

- The blue line shows returns from US stocks from 1900 to today. That’s a 48,000x increase in nominal value.
- The yellow line shows the returns from US bonds from 1900 to today. That’s a 300x increase in nominal value.

If you look at this in isolation things look easy. You just buy all this stuff.

And it is both that easy and not quite that easy…

We need to ask: *Why does this stuff go up? Can we be confident it’s going to go up in the future?*

This post is a lesson taken directly from Zero to Robot Master Bootcamp. In this Bootcamp, we teach traders how to research, build and trade a portfolio of 3 strategies including a Risk Premia Strategy, an Intraday FX Strategy and a Volatility Basis Strategy. If you’re interested in adding strategies to your portfolio or are just keen to start on the path to becoming a successful and sustainable systematic trader, you can check out full details of the Bootcamp here.

For more on Risk Premia Harvesting, including an examination of the risk factors behind risk premia, check out this post.

Well, take another peek at the chart above. Notice the *logarithmic y-axis*. That’s the best way to look at long term asset prices, but it doesn’t give a very good idea of what it would actually have felt like to be long that stuff.

Now, look at this, which takes the blip in the red square, corresponding to the Global Financial Crisis in 2008/9 and plots the S&P500 stock index in dollar terms.

Ouch!

That 50% decline looks benign in the long-term chart – but how would you feel if your million-dollar stock portfolio was suddenly worth $500k?

*Doesn’t sound fun, right?*

The reason that stocks tend to go up in the long term is that they tend to go down (sometimes violently) in the short and medium term.

This doesn’t just apply to stocks. Any asset whose fundamental value is dependent on particular uncertain factors (or risks) tends to increase more over the long term than the interest you would receive on the same money.

So we don’t say investors are compensated for investing in assets, we say that investors are compensated for *taking on risk*. Hence the concept of a “risk premium”.

We *only** *get paid over the long run for taking on risks that others find distasteful.

The reward is the premium for taking the risk and we can’t divorce those effects. If we make the risk go away, the premium goes away too.

Our main lesson here is very simple.

- As investors, we are rewarded for taking on certain long term risks, which include buying assets that are sensitive to disappointment in economic growth, inflation and interest rates.
- So we want to get long exposure to these assets. And getting long lots of them dramatically reduces portfolio volatility through diversification.
- So a strategy that buys stocks, bonds and other risk assets is smart if your return horizon is long.

As always 80% of success is showing up. So the precise way you buy and manage risk assets matters a lot less than the fact that you do it at all.

But we’re going to put our best foot forward to put together a simple but effective Risk Premia Harvesting strategy together that manages risk in an active way.

So let’s talk simply and systematically about what we’re trying to achieve….

We’re going to ask and answer the following 4 questions…

- What effect(s) are we harnessing?
- Do we have clear evidence of these effects?
- Do we have a strong reason to think these effects will persist?
- Can we robustly harness these effects in a trading strategy?

Well, there are two…

By far the most important is to **harness risk premia effects** (or positive drift) in risk assets. We’re looking to get long and stay long these risk assets, and collect the long term risk premium. Which is a fancy way of saying we want to be long these assets because they go up.

Second, and less important, we noticed that **volatility trends** and that we can use that to smooth out the volatility of risk assets. This also appears to increase risk-adjusted performance as a side effect – but we’d want to do it even if it didn’t because it makes our exposures easier to manage and reason about.

Yes – an absolute ton. These are the two effects we can feel most certain about in the whole of trading… which is why we are looking at them first.

If we can’t feel confident about these two effects, then we’re not going to feel confident about anything in trading.

Yes – I think so.

- There’s a very good economic and behavioural rationale for the continued existence of risk premia effects.
- Risk premia harvesting is win-win. I’m happy to take on risk others don’t want to get the rewards, and others are happy to not take it on. It’s sustainable in that way, and unlikely to be competed away.
- Finally, there’s a ton of empirical evidence of both effects being persistent across time and all kinds of financial assets.

Well, I think so, obviously. But we have some questions to answer, some self-imposed constraints we’re under, and some trade-offs to weigh up.

We’re going to try to put together a strategy which is tradeable in a small, non-margin account. So it’s going to trade a small number of ETF assets from the long side only, and we’ll need to be mindful that frequent small rebalance trades are not always cost-effective on a small account.

We also want something that trades infrequently enough that it can be traded by hand. At least to start with…

*Let’s get on with it… What assets are we going to trade…?*

The post Some Things Just Go Up (If You Wait Long Enough) appeared first on Robot Wealth.

]]>The post What Assumptions Are You Making About “Time” In Your Trading? appeared first on Robot Wealth.

]]>I recently listened to a podcast about one of the earliest human civilizations – the ancient Sumerians. Apparently, our system of minutes, hours, and days has been with us since the time of these ancient people, who developed it based on a simple base-12 counting system:

- There are three joints in each of the four fingers
- You can count twelve by tapping each joint in turn with the thumb of the same hand
- When you reach twelve, you hold up a thumb or finger on the opposite hand and start again
- When all five digits are outstretched, you’ve counted sixty – the number of seconds in a minute, and the number of minutes in an hour

The measurement of time is obviously a human construct. And our system for doing so is apparently based on human anatomy and then imperfectly aligned with our planet’s journey around its sun and rotation about its axis. Which all seems rather arbitrary.

This got me thinking.

*Do arbitrary decisions about our frame of reference have implications for how we interpret the world?*

Consider that a financial market consists of a series of events. For example, transactions where an asset changes hands *(but also submitted and amended orders)*. These events are sequential in nature, that is, they happen one after another. But there are loads of them. An active market might have * millions* of transactions occur in a single day, and

That poses a significant problem for an analyst. How can we possibly make sense of such an enormous amount of data in an efficient and meaningful way? *Of course*, it is possible to analyze a market on an event basis (that is, tick-by-tick, or order-by-order), but such analysis requires significant computing power and is largely impossible to inspect visually.

The familiar answer is that we typically group events into categories, summarize or report their most interesting characteristics, and then analyze these summaries rather than the events themselves.

Typically, we do this by grouping events by time, and then reporting some summary data that describes the group as a whole.

That, of course, is the familiar **Open-High-Low-Close (OHLC)** bar or candle that we see in the typical price chart.

We could also report the mean price of all the transactions if we wished, as well as any other statistical properties of interest.

Consider what happens when we use summary data in our analysis. Let’s start with an OHLC bar from a daily chart. From potentially millions of events that occurred during the day, we derive four values: the price of the first and last transactions of the day, as well as the highest and lowest prices of the day. That’s useful, but it also results in an information loss.

*For example*, we might be able to infer possible evolutions of price during the day from the shape of the bar, but we can’t be sure exactly how events unfolded. OHLC bars also summarise the overnight session into two data points – the close of one bar and the open of the next. But again, we don’t get any detail.

One consequence of this is that simulations that rely on summary data have imperfect information. They may need to rely on *assumptions*.

*For example*, say we had a trade in the market, and both the stop loss and take profit levels of the trade were within the high-low range of a particular bar.

Thanks to the information loss associated with summarizing our data, we must make an assumption, which has a (potentially huge) impact on the simulation results and their accuracy.

Also, consider the open-close boundaries of our OHLC bar. In the *24-hour currency and cryptocurrency markets*, *when does a daily bar begin and end?*

That decision directly impacts the OHLC data that we use in our analysis, and by extension, the results of that analysis.

This concept also extends to intra-day time periods.

*For example*, why do we typically summarize hourly data into hours that start and end neatly on the hour? Is there some principle related to the underlying market phenomena guiding this decision, or is it something that we have taken for granted without a lot of thought? What impact might this have – if any?

One consequence of the arbitrary nature of our frame of reference is that we can potentially pick a *different* arbitrary frame of reference to test various hypotheses or even to generate more data.

For example, say we use hourly price data in order to research a trading idea. We think we’ve found evidence of an edge. We can test the robustness of that edge by creating hourly bars that are offset by some number of minutes. If the edge is real, it should still show up on the offset data.

If there’s one thing I’ve learned about researching the financial markets, it’s that **assumptions should always be tested**.

Some assumptions are obvious – if I use closing prices in a backtest, then I’m assuming that I got filled at the closing price. If I use a fixed spread in calculating trading costs, then I’m assuming that it’s a reasonable estimate of the spread at the time I trade. Whether these assumptions actually matter is context-dependent.

Some assumptions are less obvious, but still very real. For example, if I use walk-forward optimisation, then I’m assuming that there’s some level of autocorrelation in the optimal parameter set.

There are other assumptions – such as our use of hours, minutes, and seconds to summarise price data – that are so fundamental to our view of the world that we don’t even realise we’re making them. Thinking about these assumptions can not only lead to deeper insights into the nature of the markets but also reveal creative research techniques.

The post What Assumptions Are You Making About “Time” In Your Trading? appeared first on Robot Wealth.

]]>The post My Thoughts on Quantopian’s Closing appeared first on Robot Wealth.

]]>I was very sad to learn that Quantopian is shutting down its community services.

Quantopian’s efforts to bring quant finance outside of institutions was a genuine game-changer. The educational content was solid, the tech was excellent, and the QuantCon conferences were professional, well-run, and inclusive in a way that you never see at the “finance insider” equivalents. Any of us who are passionate about markets must feel a sense of gratitude.

Quantopian’s closure speaks to a fundamental reality: **unique alpha is hard, particularly for the individual. **

Professional teams have major advantages over independent traders:

- A team to collaborate and share ideas with. No reinventing the wheel.
- Good training
- Technology that facilitates research and development

Quantopian’s vision was to eliminate these advantages. It faced significant challenges:

- The constraints on the alphas that it could allocate to. Alphas needed to be unique, relatively slow-moving, and trade liquid instruments. That’s a difficult game, played by some of the biggest and most well-resourced players in town.
- The inherent competitiveness among users in gaining an allocation meant there was always a good reason not to share your best work. That makes it hard to realise a community that is focused on collaboration for common benefit.

Quantopian is pivoting from crowd-sourced alpha to selling an enterprise-scale quantitative research platform. Which makes all the sense in the world when you’ve developed such brilliant technology.

**But what about the Quantopian community? And individuals dedicated to making a fist of quant trading?**

If you’re an independent trader, then the first challenge to Quantopian’s original vision need not apply to you. When you are your own risk manager, you can go hunting for alpha wherever you like. I suspect that many Quantopian users will use the skills they learned on the platform to pivot accordingly.

The QuantConnect platform would enable them to get up and running quickly – QuantConnect is developing a migration tool for Zipline algorithms, as well as referencing Quantopian’s open-source projects such as `alphalens`

and `pyfolio`

.

The second challenge is much more difficult to resolve. *How do you align the interests of a community of traders such that individuals are motivated to share and contribute?*

You need a vibrant and close-knit community, but you also need to make it worth people’s while. **They need to get out more than they put in.**

Our Robot Wealth Pro community has a collaboration portal that we call “The Lab.” The Lab is organised around Research Pods, which contain data, ideas, research, and peer-reviewed alphas. We provide a database of market edges, a knowledge base of quality training material, and guidance, feedback, and clear direction on research efforts.

The Lab serves three purposes for our community:

- It gets people hands-on with research – as well as contributing, you’ll learn a ton from the feedback, guidance, and collaboration of your peers
- It scales the research effort by enabling community contribution
- It makes the fruits of that research effort available to the entire community

Which means that over time, two things grow faster than they otherwise would:

- Your research and trading skills
- The number of alphas available to you to trade

And because we’re all independent traders rather than institutions, there’s almost no risk of us competing away any alphas unearthed by the community.

We invite all fellow market-obsessed alpha-hunters to join us in the RW Pro community after completing one of our Bootcamps where you’ll:

- Be initiated into the RW trading approach
- Learn quant research skills
- Add three systematic strategies to your portfolio

Join the waitlist for our next Bootcamp below, or click here to find out more.

The post My Thoughts on Quantopian’s Closing appeared first on Robot Wealth.

]]>