data

Posted on Jun 03, 2020 by Ajet Luka
2 comments.
0 Views

In the world of Big Data, there are lots of tools and technologies to choose from. Choosing the "right" one depends on the things that you are building and the problems you are trying to solve. Trading firms have skilled teams that monitor and deploy data pipelines for their organisation and the technical overhead that comes with that. Firms invest in data infrastructure and research because data is at the centre of what they do. Data pipelines need to be robust and meet the technical requirements set by the organisation,  and they also need to be cost-efficient. These are challenges that a sole systematic trader can have a hard time tackling. This is especially true when you take into account that solo-traders also need to allocate their time in other parts of their trading business. So choosing a technology that is easy to manage and deploy pipelines with, and also offers good price efficiency is very important for a systematic trader. Enter Apache Beam... Apache Beam is a unified programming model for batch and streaming data processing jobs. It comes...

Posted on Jun 02, 2020 by Kris Longmore
No Comments.
0 Views

Holding data in a tidy format works wonders for one's productivity. Here we will explore the tidyr package, which is all about creating tidy data. In particular, let's develop an understanding of the tidyr::pivot_longer and tidyr::pivot_wider functions for switching between different formats of tidy data. In this video, you'll learn: What tidy data looks like Why it's a sensible approach The difference between long and wide tidy data How to efficiently switch between the two format When and why you'd use each of the two formats   What's tidy data? Tidy data is data where: Every column is variable. Every row is an observation. Every cell is a single value. Why do we care? It turns out there are huge benefits to thinking about the “shape” of your data and the best way to structure and manipulate it for your problem. Tidy data is a standard way of shaping data that facilitates analysis. In particular, tidy data works very well with the tidyverse tools. Which means less time spent transforming and cleaning data and more time spent solving problems. In...

Posted on May 12, 2020 by Robot James
3 comments.
0 Views

In this post, we are going to construct snapshots of historic S&P 500 index constituents, from freely available data on the internet. Why? Well, one of the biggest challenges in looking for opportunities amongst a broad universe of stocks is choosing what stock "universe" to look at. One approach to dealing with this is to pick the stocks that are currently in the S&P 500 index. Unfortunately, the stocks that are currently in the S&P 500 index weren't all there last year. A third of them weren't there ten years ago... If we create a historical data set by picking current S&P 500 index constituents, then we will be including historical data for smaller stocks that weren't in the index at that time. These are all going to be stocks that did very well, historically, or else they wouldn't have gotten in the index! So this universe selection technique biases our stock returns higher. The average past returns of current SPX constituents is higher than the average past returns of historic SPX constituents, due to this upward bias. It's easy...

Posted on Jun 09, 2019 by Kris Longmore
No Comments.
0 Views

I'm a bit late to the party with this one, but I was recently introduced to the feather format for working with tabular data. And let me tell you, as far as reading and writing data goes, it's fast. Really fast. Not only has it provided a decent productivity boost, but the motivation for its development really resonates with me, so I figured I'd briefly share my experiences for any other latecomers to the feather party. What is feather? It's a binary file format for storing data frames - the near-universal data container of choice for data science. Why should you care? Have I already mentioned that reading and writing feather files is fast? Check this out.  Here I've created a pandas data frame with one million rows and ten columns. Here's how long it took to write that data frame to disk using both feather and gzip:  Yes, you read that correctly: 94 milliseconds for feather versus 33 seconds for gzip! Here's the read time for the each format:  Platform agnostic The other thing I like about...

Posted on Jun 11, 2018 by Kris Longmore
1 Comment.
0 Views

Cryptocompare is a platform providing data and insights on pretty much everything in the crypto-sphere, from market data for cryptocurrencies to comparisons of the various crytpo-exchanges, to recommendations for where to spend your crypto assets. The user-experience is quite pleasant, as you can see from the screenshot of their real-time coin comparison table: As nice as the user-interface is, what I really like about Cryptocompare is its API, which provides programmatic access to a wealth of crypto-related data. It is possible to drill down and extract information from individual exchanges, and even to take aggregated price feeds from all the exchanges that Cryptocompare is plugged into - and there are quite a few! Interacting with the Cryptocompare API When it comes to interacting with Cryptocompare's API, there are already some nice Python libraries that take care of most of the heavy lifting for us. For this post, I decided to use a library called cryptocompare . Check it out on Git Hub here. You can install the current stable release by doing pip install cryptocompare , but I installed the latest development...

Posted on Jul 29, 2017 by Kris Longmore
7 comments.
0 Views

In keeping with our recent theme of providing useful tidbits of algo trading practicalities, here's an elegant solution that resolves Yahoo's unceremonious exit from the free financial data space. Regular readers would know that I use various tools in my algo trading stack, but the one I keep coming back to, particularly when I'm ready to start running serious simulations, is Zorro. Not only is it a fast, accurate, and powerful backtester and execution engine, the development team is clearly committed to solving issues and adding features that really matter, from a practical perspective. This is another example of the speedy and elegant resolution of a serious issue - namely, the loss of free access to good quality, properly adjusted equities data, thanks to Yahoo's exit. Zorro version 1.60 is currently undergoing it's final stages of beta testing and will likely be released publicly in the coming days. The latest version includes integration with Alpha Vantage's API, providing access to free, high quality, properly adjusted stock and ETF price data. All you need to do to use it is sign...

Posted on May 21, 2017 by Kris Longmore
16 comments.
0 Views

Recently, Yahoo Finance - a popular source of free end-of-day price data - made some changes to their server which wreaked a little havoc on anyone relying on it for their algos or simulations. Specifically, Yahoo Finance switched from HTTP to HTTPS and changed the data download URLs. No doubt this is a huge source of frustration, as many backtesting and trading scripts that relied on such data will no longer work. Users of the excellent R package quantmod  however are in luck! The package's author, Joshua Ulrich, has already addressed the change in a development version of quantmod. You can update your quantmod  package to the development version that addresses this issue using this command in R: devtools::install_github("joshuaulrich/quantmod", ref="157_yahoo_502") Of course, you need the devtools  package installed, so do install.packages("devtools")  first if you don't already have it installed. Once the package updates, quantmod::getSymbols(src = "yahoo")  should work just as it did prior to the updates on the Yahoo Finance server. I can verify that this worked for me. Of course, if you don't want to update quantmod to a version that lives on...