Exploring the finnhub.io API

The changing face of market data providers

Over the last few years, a number of new market data providers have come online. They tend to have modern websites, broad coverage, and well-documented RESTful APIs. Their services are often priced very competitively – especially for personal use – and usually have generous free tiers.

One such newcomer is finnhub.io . Its offering includes stock, bond, crpto, and FX historical price data and real time trades and quotes. Their fundamental data offering is noticably broad and includes current values and point-in-time snapshots for numerous metrics. There are also some interesting alternative data sets including measures of social media sentiment, insider transactions and insider sentiment, senate lobbying, government spending, and others. In addition, there’s a real-time newsfeed delivered over websockets.

The free tier is quite generous and offers more than enough for doing proof of concept work and testing ideas. While you only get a year’s worth of historical data per API call on the free tier (more if you specify a lower resolution, like monthly), you can make up to 50 calls per minute.

In this post, we’ll explore the finnhub.io free tier via its REST API.

Get started

A nice python library for working with the finnhub.io is available: pip install finnhub-python

To access the API, you’ll need an API key. Get one here.

To get started, import the libraries we need and set up a finnhub.Client with your API key:

import finnhub
import os
import time
import datetime
from zoneinfo import ZoneInfo
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Set up client
# Note the FINNHUB_KEY environment variable stores my API key
finnhub_client = finnhub.Client(api_key=os.environ['FINNHUB_KEY'])

OHLCV stock prices

The stock prices endpoint requires a symbol, a resolution (‘D’ for daily data), and a date range consisting of ‘from’ and ‘to’ values as UNIX timestamps.

We’ll get a week’s worth of AAPL data from February 2023.

# helper function for making UNIX timestamps
def unix_timestamp_from_date(date, format='%Y-%m-%d'):
    '''Transform human readable date string to UNIX timestamp'''
    return int(
        datetime.datetime.strptime(date, format)
        .replace(tzinfo=ZoneInfo('US/Eastern'))
        .timestamp()
    )

# api query paramters
symbol = 'AAPL'
resolution = 'D'
from_date = unix_timestamp_from_date('2023-02-06')
to_date =  unix_timestamp_from_date('2023-02-10')

# make request and print
res = finnhub_client.stock_candles(
        symbol, 
        resolution, 
        from_date, 
        to_date
      )

display(res)
{'c': [151.73, 154.65, 151.92, 150.87, 151.01],
 'h': [153.1, 155.23, 154.58, 154.33, 151.3401],
 'l': [150.78, 150.64, 151.168, 150.42, 149.22],
 'o': [152.575, 150.64, 153.88, 153.775, 149.46],
 's': 'ok',
 't': [1675641600, 1675728000, 1675814400, 1675900800, 1675987200],
 'v': [69858306, 83322551, 64120079, 56007143, 57450708]}

The data comes down as a dictionary of lists. The docs state that prices are adjusted for splits. Spot checking some data against other sources suggests it’s also adjusted for dividends.

The dictionary keys consist of letters that represent:

  • c: close prices
  • h: high prices
  • l: low prices
  • o: open prices
  • s: status
  • t: timestamps
  • v: volume traded

We’ll want to transform that response data into a pandas DataFrame:

# Ditch the status code
try:
  res.pop('s')
except KeyError as e:
  print("Already ditched status code")

# Create dataframe from remaining records
df = pd.DataFrame(res) \
  .rename(columns={
    'c': 'close', 
    'h': 'high', 
    'l': 'low', 
    'o': 'open', 
    't': 'timestamp', 
    'v': 'volume'
    }) \
  .set_index(keys = 'timestamp')

# I like things in a certain order
df = df[['open', 'high', 'low', 'close', 'volume']]

# Convert index to human-readable date format
df.index = pd.to_datetime(df.index, unit='s')

display(df)
open high low close volume
timestamp
2023-02-06 152.575 153.1000 150.780 151.73 69858306
2023-02-07 150.640 155.2300 150.640 154.65 83322551
2023-02-08 153.880 154.5800 151.168 151.92 64120079
2023-02-09 153.775 154.3300 150.420 150.87 56007143
2023-02-10 149.460 151.3401 149.220 151.01 57450708

Looks good!

Fundamental data

The basic company financials endpoint serves quite a lot of data. The response object is a dictionary of dictionaries that includes current values for various metrics (under the metric outer key) as well as historical point-in-time snapshots (under the series outer key).

# Get basic company financials
financials = finnhub_client.company_basic_financials(symbol, 'all')

# Outer keys of response object
display(financials.keys())
dict_keys(['metric', 'metricType', 'series', 'symbol'])

Current metrics

Let’s check out how many current metrics we have and display the first ten:

# current metrics

# how many?
print(f"Response contains {len(financials['metric'].keys())} \
current fundamental data points\n")

# display the first ten
display(list(financials['metric'].items())[:10])
Response contains 117 current fundamental data points

[('10DayAverageTradingVolume', 64.76467),
 ('13WeekPriceReturnDaily', 0.83284),
 ('26WeekPriceReturnDaily', -11.05993),
 ('3MonthAverageTradingVolume', 1528.63578),
 ('52WeekHigh', 179.61),
 ('52WeekHighDate', '2022-03-30'),
 ('52WeekLow', 124.18),
 ('52WeekLowDate', '2023-01-03'),
 ('52WeekPriceReturnDaily', -9.66959),
 ('5DayPriceReturnDaily', 1.0198)]

That looks quite comprehensive! For this ticker, we have values for 117 fundamental metrics.

Historical point-in-time snapshots

The historical point-in-time data is under the series outer key. The data is further organised into annual and quarterly metrics, like this:

{
    'annual': {
        'annual_metric1': [
            {'period': date_string, 'v': value}, 
            {'period': date_string, 'v': value}, 
            {'period': date_string, 'v': value}, 
            ... 
            etc
        ],
        'annual_metric2': [
            {'period': date_string, 'v': value}, 
            {'period': date_string, 'v': value}, 
            {'period': date_string, 'v': value}, 
            ... 
            etc
        ],
    },
    'quarterly': {
        'quarterly_metric1': [
            {'period': date_string, 'v': value}, 
            {'period': date_string, 'v': value}, 
            {'period': date_string, 'v': value}, 
            ... 
            etc
        ],
        'quarterly_metric2': [
            {'period': date_string, 'v': value}, 
            {'period': date_string, 'v': value}, 
            {'period': date_string, 'v': value}, 
            ... 
            etc
        ],
    }
}

Let’s first look at what annual series we get:

# point-in-time values
display(financials['series']['annual'].keys())
dict_keys(['bookValue', 'cashRatio', 'currentRatio', 'ebitPerShare', 'eps', 'ev', 'fcfMargin', 'grossMargin', 'longtermDebtTotalAsset', 'longtermDebtTotalCapital', 'longtermDebtTotalEquity', 'netDebtToTotalCapital', 'netDebtToTotalEquity', 'netMargin', 'operatingMargin', 'pb', 'pe', 'pfcf', 'pretaxMargin', 'ps', 'quickRatio', 'roa', 'roe', 'roic', 'rotc', 'salesPerShare', 'sgaToSale', 'totalDebtToEquity', 'totalDebtToTotalAsset', 'totalDebtToTotalCapital', 'totalRatio'])

Again, pretty comprehensive. Let’s put that data into a dataframe:

# Make a list of the available quarterly metrics
annual_metric_names = list(financials['series']['annual'].keys())

# Make a list of dataframes of each metric indexed by period and named appropriately
dfs = [
    pd.DataFrame(financials['series']['annual'][metric]) \
      .rename(columns={'v':metric}) \
      .set_index(keys='period') \
    for metric in annual_metric_names
]

# Join dataframes on index
annual_metrics = dfs[0].join(dfs[1:], how='outer')

annual_metrics
bookValue cashRatio currentRatio ebitPerShare eps ev fcfMargin grossMargin longtermDebtTotalAsset longtermDebtTotalCapital roa roe roic rotc salesPerShare sgaToSale totalDebtToEquity totalDebtToTotalAsset totalDebtToTotalCapital totalRatio
period
2022-09-24 50672 0.153563 0.8794 7.3158 6.1132 2514884.00 0.3369 0.4331 0.2805 0.5764 0.2829 1.9696 0.5813 0.6957 24.1536 0.5669 2.3881 0.3430 0.7048 1.1677
2021-09-25 63090 0.278449 1.0746 6.4601 5.6140 2493617.00 0.3147 0.4178 0.3108 0.5783 0.2697 1.5007 0.5019 0.5775 21.6909 0.5822 1.9903 0.3577 0.6656 1.2191
2020-09-26 65339 0.360710 1.3636 3.7818 3.2754 2041191.00 0.3205 0.3823 0.3046 0.5530 0.1773 0.8787 0.3217 0.3715 15.6615 0.6177 1.7309 0.3492 0.6338 1.2527
2019-09-28 90488 0.462022 1.5401 3.4378 2.9714 1071363.00 0.3070 0.3782 0.2712 0.4624 0.1632 0.6106 0.2783 0.3220 13.9909 0.6218 1.1940 0.3192 0.5442 1.3648
2018-09-29 107147 0.223525 1.1329 3.5449 2.9766 1186220.00 0.3417 0.3834 0.2563 0.4229 0.1628 0.5556 0.2686 0.3199 13.2797 0.6166 1.0685 0.3130 0.5166 1.4144
2017-09-30 134047 0.201252 1.2761 2.9202 2.3017 889855.00 0.3345 0.3847 0.2590 0.3893 0.1288 0.3607 0.1936 0.2456 10.9123 0.6153 0.8630 0.3082 0.4632 1.5556
2016-09-24 128249 0.259271 1.3527 2.7282 2.0766 674795.00 0.3662 0.3908 0.2345 0.3504 0.1420 0.3562 0.2122 0.2788 9.8013 0.6092 0.6786 0.2705 0.4043 1.6630
2015-09-26 119355 0.262002 1.1088 3.0740 2.3042 670483.00 0.3969 0.4006 0.1837 0.2903 0.1839 0.4474 0.2907 0.3878 10.0861 0.5994 0.5390 0.2216 0.3502 1.6980
2014-09-27 111547 0.218194 1.0801 2.1438 1.6132 608712.00 0.3803 0.3859 0.1250 0.1974 0.1704 0.3542 0.2691 0.3575 7.4638 0.6141 0.3164 0.1522 0.2404 1.9273
2013-09-28 123549 0.326607 1.6786 1.8784 1.4198 435827.00 0.3671 0.3762 0.0819 0.1207 0.1789 0.2998 0.2636 0.3487 6.5518 0.6238 0.1373 0.0819 0.1207 2.4805
2012-09-29 118210 0.278813 1.4958 2.0870 1.5767 607370.00 0.3850 0.4387 0.0000 0.0000 0.2370 0.3530 0.3530 0.4673 5.9129 0.5613 0.0000 0.0000 0.0000 3.0432
2011-09-24 76615 0.350912 1.6084 1.2884 0.9884 364842.00 0.4155 0.4048 0.0000 0.0000 0.2228 0.3383 0.3383 0.4410 4.1275 0.5952 0.0000 0.0000 0.0000 2.9271
2010-09-25 47791 0.543432 2.0113 0.7101 0.5412 254736.00 0.3176 0.3938 0.0000 0.0000 0.1864 0.2932 0.2932 0.3847 2.5191 0.6062 0.0000 0.0000 0.0000 2.7447
2009-09-26 31640 0.457414 2.7425 0.4623 0.3243 161495.00 0.2651 0.4014 0.0000 0.0000 0.1734 0.2603 0.2603 0.3710 1.6894 0.5986 0.0000 0.0000 0.0000 2.9948
2008-09-27 22297 1.045242 2.6411 0.3297 0.2422 81372.30 0.2879 0.3520 0.0000 0.0000 0.1692 0.2744 0.2744 0.3735 1.4842 0.6480 0.0000 0.0000 0.0000 2.6071
2007-09-29 14532 1.007759 2.3659 0.1770 0.1404 126607.00 0.2627 0.3317 0.0000 0.0000 0.1379 0.2405 0.2405 0.3033 0.9871 0.6683 0.0000 0.0000 0.0000 2.3437
2006-09-30 9984 0.992084 2.2519 0.0998 0.0809 57277.00 0.1490 0.2898 0.0000 0.0000 0.1156 0.1992 0.1992 0.2457 0.7861 0.7102 0.0000 0.0000 0.0000 2.3826
2005-09-24 7428 1.001147 2.9538 0.0685 0.0553 41187.00 0.2006 0.2901 0.0000 0.0000 0.1153 0.1788 0.1788 0.2212 0.5806 0.7099 0.0000 0.0000 0.0000 2.8170
2004-09-25 5076 1.119955 2.6613 0.0144 0.0123 11589.80 0.1341 0.2726 0.0000 0.0000 0.0330 0.0524 0.0524 0.0617 0.3816 0.7274 0.0000 0.0000 0.0000 2.7068
2003-09-27 4223 1.440815 2.4977 0.0000 0.0034 4629.27 0.0730 0.2752 0.0000 0.0000 0.0101 0.0163 0.0152 NaN 0.3050 0.7248 0.0720 0.0446 0.0672 2.6292
2002-09-28 4095 1.358263 3.2497 0.0008 0.0032 3267.85 0.0458 0.2792 0.0502 0.0716 0.0103 0.0159 0.0147 0.0039 0.2834 0.7208 0.0772 0.0502 0.0716 2.8588
2001-09-29 3920 1.521739 3.3880 -0.0178 -0.0013 3384.29 0.0778 0.2303 0.0526 0.0748 NaN NaN NaN NaN 0.2771 0.7697 0.0809 0.0526 0.0748 2.8658
2000-09-30 4107 0.616141 2.8076 0.0259 0.0390 6989.50 0.1265 0.2713 0.0441 0.0681 0.1155 0.1914 0.1784 0.1184 0.3956 0.7287 0.0730 0.0441 0.0681 2.5234

23 rows × 31 columns

Pretty handy… we have 23 years worth of annual company fundamentals.

Let’s do the same thing with the quarterly series. First, what metrics do we get?

# point-in-time values
display(financials['series']['quarterly'].keys())
dict_keys(['bookValue', 'cashRatio', 'currentRatio', 'ebitPerShare', 'eps', 'ev', 'fcfMargin', 'fcfPerShareTTM', 'grossMargin', 'longtermDebtTotalAsset', 'longtermDebtTotalCapital', 'longtermDebtTotalEquity', 'netDebtToTotalCapital', 'netDebtToTotalEquity', 'netMargin', 'operatingMargin', 'pb', 'peTTM', 'pfcfTTM', 'pretaxMargin', 'psTTM', 'quickRatio', 'roaTTM', 'roeTTM', 'roicTTM', 'rotcTTM', 'salesPerShare', 'sgaToSale', 'totalDebtToEquity', 'totalDebtToTotalAsset', 'totalDebtToTotalCapital', 'totalRatio'])

Let’s stick this data into a dataframe:

# Make a list of the available quarterly metrics
quarterly_metric_names = list(financials['series']['quarterly'].keys())

# Make a list of dataframes indexed by period and named by metric
dfs = [
    pd.DataFrame(financials['series']['quarterly'][metric]) \
      .rename(columns={'v': metric}) \
      .set_index(keys='period') \
    for metric in quarterly_metric_names
]

# Join dataframes on index
quarterly_metrics = dfs[0].join(dfs[1:], how='outer')

quarterly_metrics
bookValue cashRatio currentRatio ebitPerShare eps ev fcfMargin fcfPerShareTTM grossMargin longtermDebtTotalAsset roaTTM roeTTM roicTTM rotcTTM salesPerShare sgaToSale totalDebtToEquity totalDebtToTotalAsset totalDebtToTotalCapital totalRatio
period
2022-12-31 56727 0.149578 0.9380 2.2572 1.8801 2157515.00 0.3226 7.6303 0.4296 0.2873 0.2746 1.6345 0.5409 0.6478 7.3424 0.5704 1.9587 0.3204 0.6620 1.1956
2022-09-24 50672 0.153563 0.8794 1.5444 1.2855 2513943.00 0.3041 8.3332 0.4226 0.2805 0.2810 1.6090 0.5464 0.6539 5.5927 0.5774 2.3695 0.3404 0.7032 1.1677
2022-06-25 58107 0.211761 0.8646 1.4190 1.1955 2384979.00 0.3013 8.0064 0.4326 0.2816 0.2808 1.5297 0.5324 0.6323 5.1013 0.5674 2.0598 0.3559 0.6732 1.2089
2022-03-26 67399 0.220363 0.9268 1.8277 1.5247 2957573.00 0.3154 7.8393 0.4375 0.2947 0.2886 1.5288 0.5387 0.6309 5.9305 0.5625 1.7802 0.3422 0.6403 1.2379
2021-12-25 71932 0.251528 1.0378 2.5115 2.0964 3032429.00 0.4015 7.5044 0.4376 0.2797 0.2875 1.4981 0.5290 0.6151 7.5032 0.5624 1.7071 0.3221 0.6306 1.2326
2000-12-30 3712 1.061087 3.0092 -0.0222 -0.0103 3570.24 0.0089 0.0314 -0.0209 0.0520 NaN NaN NaN NaN 0.0533 1.0209 0.0838 0.0520 0.0773 2.6324
2000-09-30 4107 0.616141 2.8076 0.0041 0.0084 6989.50 0.1422 0.0537 0.2497 0.0441 0.1110 0.1859 0.1736 0.1153 0.0923 0.7503 0.0730 0.0441 0.0681 2.5234
2000-07-01 4176 0.717565 2.7533 0.0083 0.0099 16308.50 0.1342 NaN 0.2975 0.0433 NaN NaN NaN NaN 0.0901 0.7025 0.0718 0.0433 0.0670 2.5152
2000-04-01 4215 0.896924 2.6508 0.0083 0.0114 20122.50 0.0452 NaN 0.2823 0.0428 NaN NaN NaN NaN 0.0954 0.7177 0.0712 0.0428 0.0664 2.5097
2000-01-01 4414 0.807125 2.4982 0.0050 0.0092 16753.70 0.1754 NaN 0.2591 0.0395 NaN NaN NaN NaN 0.1172 0.7409 0.0680 0.0395 0.0636 2.3916

93 rows × 32 columns

Looks like we have essentially the same data as for the annual series, but at a quarterly resolution. Neat!

Restatements

One thing to be aware of with this sort of data is that sometimes companies will “restate” their reported financials after the initial filing, usually to correct an error.

Using a dataset based on restatements can introduce some bias into an analysis because you’d be using data that you didn’t have available at the time you thought you had it.

To correct for that, you’d need an “as reported” dataset – something that recorded that information available at each point in time, and not corrected retrospectively.

The finnhub.io API docs don’t specify whether the endpoint I used here contains restatements. However, I suspect that it doesn’t because there’s an additional endpoint called “Financials As Reported”, which isn’t available on the free tier.

Alternative data

finnhub.io serves some interesting alternative data:

  • Insider sentiment
  • Social media sentiment
  • Senate lobbying disclosures
  • Government spending
  • Visa applications submitted by companies
  • Many others

Example: insider sentiment

The “Insider Sentiment” endpoint serves the Monthly Share Purchase Ratio (MSPR), which is a measure of whether insider transactions are dominated by buying (a value closer to 100) or selling (a value closer to -100) over the previous month. You can read about the MSPR and how it’s calculated here.

For now, let’s just grab 10 years worth of this data for a single stock and plot it against the stock’s forward returns on the understanding that we don’t have enough data to say anything meaningful (around 120 observations total).

This API can understand human-readable date strings, unlike the prices API.

# Get ten year's worth of insider sentiment data

# api query paramters
symbol = 'AAPL'
from_date = '2013-01-01'
to_date = '2023-02-21' 

is_response = finnhub_client.stock_insider_sentiment(
    symbol, 
    from_date, 
    to_date
)

# Print the first few records
display(is_response['data'][:4])
[{'symbol': 'AAPL',
  'year': 2013,
  'month': 1,
  'change': 246,
  'mspr': 0.2078967},
 {'symbol': 'AAPL',
  'year': 2013,
  'month': 2,
  'change': 3934,
  'mspr': 42.06587},
 {'symbol': 'AAPL',
  'year': 2013,
  'month': 3,
  'change': -26460,
  'mspr': -20.025429},
 {'symbol': 'AAPL',
  'year': 2013,
  'month': 4,
  'change': -9818,
  'mspr': -21.024454}]
# Make a dataframe from insider sentiment data
is_df = pd.DataFrame(is_response['data'])

is_df.head(10)
symbol year month change mspr
0 AAPL 2013 1 246 0.207897
1 AAPL 2013 2 3934 42.065870
2 AAPL 2013 3 -26460 -20.025429
3 AAPL 2013 4 -9818 -21.024454
4 AAPL 2013 6 -321301 -28.045816
5 AAPL 2013 7 -79734 -48.365260
6 AAPL 2013 8 -75200 -34.034233
7 AAPL 2013 9 -28171 -31.069471
8 AAPL 2013 10 1930 3.064367
9 AAPL 2013 11 -4486 -100.000000

Notice that we have some missing data – above we don’t have a record for month 5. We’ll need to deal with that later.

I’ll also need to lag the mspr field (the sentiment score) to align it with the date on which it would have been actionable. We’ll do this once we have some price data.

Next we’ll grab some monthly price data and munge it for our needs:

resolution = 'M'
unix_from_date = unix_timestamp_from_date(from_date)
unix_to_date = unix_timestamp_from_date(to_date)

price_response = finnhub_client.stock_candles(
    symbol, 
    resolution, 
    unix_from_date, 
    unix_to_date
)
# Ditch the status code
try:
  price_response.pop('s')
except KeyError as e:
  print("Already ditched status code")

# Create dataframe from remaining records
price_df = pd.DataFrame(price_response) \
  .rename(columns={
    'c': 'close', 
    'h': 'high', 
    'l': 'low', 
    'o': 'open', 
    't': 'timestamp', 
    'v': 'volume'
  }) 

# Ditch stuff we won't need
price_df = price_df[['timestamp', 'close']]

# Convert timestamp to human-readable date format
price_df['date'] = pd.to_datetime(price_df['timestamp'], unit='s')

# create year and month fields
price_df['year'] = price_df['date'].dt.year
price_df['month'] = price_df['date'].dt.month

price_df.head()
timestamp close date year month
0 1356998400 16.267479 2013-01-01 2013 1
1 1359676800 15.764271 2013-02-01 2013 2
2 1362096000 15.809271 2013-03-01 2013 3
3 1364774400 15.813561 2013-04-01 2013 4
4 1367366400 16.061950 2013-05-01 2013 5

You can see that our data isn’t going to overlap neatly. There are some months in our sentiment dataframe that are missing from our prices dataframe and vice versa.

I’ll merge the two dataframes using an outer join – this will preserve all of our data and insert NaN into any record that’s missing.

From there, I can lag the sentiment score appropriately. Only then can I remove records that contain NaN (doing so prior would misalign some of our data).

df = pd.merge(price_df, is_df, on=['year', 'month'], how='outer') \
    .sort_values(['year', 'month']) \
    .set_index(keys='date')
    
df.head(10)
timestamp close year month symbol change mspr
date
2013-01-01 1356998400 16.267479 2013 1 AAPL 246.0 0.207897
2013-02-01 1359676800 15.764271 2013 2 AAPL 3934.0 42.065870
2013-03-01 1362096000 15.809271 2013 3 AAPL -26460.0 -20.025429
2013-04-01 1364774400 15.813561 2013 4 AAPL -9818.0 -21.024454
2013-05-01 1367366400 16.061950 2013 5 NaN NaN NaN
2013-06-01 1370044800 14.161771 2013 6 AAPL -321301.0 -28.045816
2013-07-01 1372636800 16.161771 2013 7 AAPL -79734.0 -48.365260
2013-08-01 1375315200 17.400550 2013 8 AAPL -75200.0 -34.034233
2013-09-01 1377993600 17.026771 2013 9 AAPL -28171.0 -31.069471
2013-10-01 1380585600 18.667911 2013 10 AAPL 1930.0 3.064367

Now we can calculate close-to-close monthly returns and lag the sentiment score:

# Calculate monthly open-to-close returns
df['return'] = np.log(df.close/df.close.shift())

# Lag sentiment score
df['mspr_lag'] = df['mspr'].shift()

df.head(10)
timestamp close year month symbol change mspr return mspr_lag
date
2013-01-01 1356998400 16.267479 2013 1 AAPL 246.0 0.207897 NaN NaN
2013-02-01 1359676800 15.764271 2013 2 AAPL 3934.0 42.065870 -0.031422 0.207897
2013-03-01 1362096000 15.809271 2013 3 AAPL -26460.0 -20.025429 0.002850 42.065870
2013-04-01 1364774400 15.813561 2013 4 AAPL -9818.0 -21.024454 0.000271 -20.025429
2013-05-01 1367366400 16.061950 2013 5 NaN NaN NaN 0.015585 -21.024454
2013-06-01 1370044800 14.161771 2013 6 AAPL -321301.0 -28.045816 -0.125907 NaN
2013-07-01 1372636800 16.161771 2013 7 AAPL -79734.0 -48.365260 0.132102 -28.045816
2013-08-01 1375315200 17.400550 2013 8 AAPL -75200.0 -34.034233 0.073853 -48.365260
2013-09-01 1377993600 17.026771 2013 9 AAPL -28171.0 -31.069471 -0.021715 -34.034233
2013-10-01 1380585600 18.667911 2013 10 AAPL 1930.0 3.064367 0.092019 -31.069471

An easy way to look for a relationship between this sentiment metric and next month’s returns is to just scatter plot them:

sns.lmplot(data=df, x='mspr_lag', y='return', height=4, aspect=1.5);

Unfortunately, we don’t see much of any kind of relationship.

The sentiment metric does jump around quite a bit – let’s see if smoothing it with a 3-month moving average reveals anything interesting:

df['mspr_roll'] = df['mspr_lag'].rolling(window=3).mean()
df
timestamp close year month symbol change mspr return mspr_lag mspr_roll
date
2013-01-01 1356998400 16.267479 2013 1 AAPL 246.0 0.207897 NaN NaN NaN
2013-02-01 1359676800 15.764271 2013 2 AAPL 3934.0 42.065870 -0.031422 0.207897 NaN
2013-03-01 1362096000 15.809271 2013 3 AAPL -26460.0 -20.025429 0.002850 42.065870 NaN
2013-04-01 1364774400 15.813561 2013 4 AAPL -9818.0 -21.024454 0.000271 -20.025429 7.416113
2013-05-01 1367366400 16.061950 2013 5 NaN NaN NaN 0.015585 -21.024454 0.338662
2022-10-01 1664582400 153.340000 2022 10 AAPL -1297681.0 -30.305650 0.103956 100.000000 NaN
2022-11-01 1667260800 148.030000 2022 11 AAPL -20200.0 -100.000000 -0.035243 -30.305650 -3.333313
2022-12-01 1669852800 129.930000 2022 12 NaN NaN NaN -0.130419 -100.000000 -10.101883
2023-01-01 1672531200 144.290000 2023 1 NaN NaN NaN 0.104829 NaN NaN
2023-02-01 1675209600 152.550000 2023 2 AAPL -1685.0 -5.882353 0.055667 NaN NaN

122 rows × 10 columns

sns.lmplot(data=df, x='mspr_roll', y='return', height=4, aspect=1.5);

The relationship is still super noisy. If anything, it’s a little opposite to what we’d expect, especially at the extreme values of our smoothed metric.

Maybe the SEC is succeeding in stamping out insider trading.

The reality is that insiders buy and sell for all sorts of reasons, many of which have nothing to do with their views on the stock itself.

As I mentioned, there aren’t a whole lot of data points here – perhaps this would be interesting to revisit using a broader universe of stocks.

Conclusion

finnhub.io is a modern market data service with a well-documented REST API, open source client libraries, broad coverage, interesting data sets, and a generous free tier.

What’s your preferred source of market data and why? Let me know in the comments!

8 thoughts on “Exploring the finnhub.io API”

  1. Thanks Kris. I enjoyed this blog and found it interesting the amount data and how reasonably easy it is to fetch, process and do some visual analysis of it.

    Looking forward to your next blog.

    Cheers,
    Alen

    Reply
    • Hey Steven, yeah I agree. My personal experience is that much of that stuff tends to be correlated with contemporaneous returns, but not very useful in trying to predict forward returns. I’m sure there are exceptions out there, as well as novel use cases.

      Reply
  2. Thank you!

    Very useful, kind of standing on two legs these days.. Learned a ton of R coding during your classes in RW, but also I see the use of doing stuff in Python from time to time. Currently working with Airflow (thanks to Ajet of RW) then Python comes in handy as well although running R code with Airflow is working as well using a sidecar.

    Wes

    Reply
    • Hey Wes! Thanks for the comment. I’m with you on that one… I like doing data analysis in R more than any other language I’ve tried, but for data plumbing type applications, Python is usually my first choice.

      Reply

Leave a Comment