Exploring the finnhub.io API

The changing face of market data providers

Over the last few years, a number of new market data providers have come online. They tend to have modern websites, broad coverage, and well-documented RESTful APIs. Their services are often priced very competitively – especially for personal use – and usually have generous free tiers.

One such newcomer is finnhub.io . Its offering includes stock, bond, crpto, and FX historical price data and real time trades and quotes. Their fundamental data offering is noticably broad and includes current values and point-in-time snapshots for numerous metrics. There are also some interesting alternative data sets including measures of social media sentiment, insider transactions and insider sentiment, senate lobbying, government spending, and others. In addition, there’s a real-time newsfeed delivered over websockets.

The free tier is quite generous and offers more than enough for doing proof of concept work and testing ideas. While you only get a year’s worth of historical data per API call on the free tier (more if you specify a lower resolution, like monthly), you can make up to 50 calls per minute.

In this post, we’ll explore the finnhub.io free tier via its REST API.

Get started

A nice python library for working with the finnhub.io is available: pip install finnhub-python

To access the API, you’ll need an API key. Get one here.

To get started, import the libraries we need and set up a finnhub.Client with your API key:

import finnhub
import os
import time
import datetime
from zoneinfo import ZoneInfo
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Set up client
# Note the FINNHUB_KEY environment variable stores my API key
finnhub_client = finnhub.Client(api_key=os.environ['FINNHUB_KEY'])

OHLCV stock prices

The stock prices endpoint requires a symbol, a resolution (‘D’ for daily data), and a date range consisting of ‘from’ and ‘to’ values as UNIX timestamps.

We’ll get a week’s worth of AAPL data from February 2023.

# helper function for making UNIX timestamps
def unix_timestamp_from_date(date, format='%Y-%m-%d'):
    '''Transform human readable date string to UNIX timestamp'''
    return int(
        datetime.datetime.strptime(date, format)
        .replace(tzinfo=ZoneInfo('US/Eastern'))
        .timestamp()
    )

# api query paramters
symbol = 'AAPL'
resolution = 'D'
from_date = unix_timestamp_from_date('2023-02-06')
to_date =  unix_timestamp_from_date('2023-02-10')

# make request and print
res = finnhub_client.stock_candles(
        symbol, 
        resolution, 
        from_date, 
        to_date
      )

display(res)

{'c': [151.73, 154.65, 151.92, 150.87, 151.01],
 'h': [153.1, 155.23, 154.58, 154.33, 151.3401],
 'l': [150.78, 150.64, 151.168, 150.42, 149.22],
 'o': [152.575, 150.64, 153.88, 153.775, 149.46],
 's': 'ok',
 't': [1675641600, 1675728000, 1675814400, 1675900800, 1675987200],
 'v': [69858306, 83322551, 64120079, 56007143, 57450708]}

The data comes down as a dictionary of lists. The docs state that prices are adjusted for splits. Spot checking some data against other sources suggests it’s also adjusted for dividends.

The dictionary keys consist of letters that represent:

c: close prices
h: high prices
l: low prices
o: open prices
s: status
t: timestamps
v: volume traded

We’ll want to transform that response data into a pandas DataFrame:

# Ditch the status code
try:
  res.pop('s')
except KeyError as e:
  print("Already ditched status code")

# Create dataframe from remaining records
df = pd.DataFrame(res) \
  .rename(columns={
    'c': 'close', 
    'h': 'high', 
    'l': 'low', 
    'o': 'open', 
    't': 'timestamp', 
    'v': 'volume'
    }) \
  .set_index(keys = 'timestamp')

# I like things in a certain order
df = df[['open', 'high', 'low', 'close', 'volume']]

# Convert index to human-readable date format
df.index = pd.to_datetime(df.index, unit='s')

display(df)

	open	high	low	close	volume
timestamp
2023-02-06	152.575	153.1000	150.780	151.73	69858306
2023-02-07	150.640	155.2300	150.640	154.65	83322551
2023-02-08	153.880	154.5800	151.168	151.92	64120079
2023-02-09	153.775	154.3300	150.420	150.87	56007143
2023-02-10	149.460	151.3401	149.220	151.01	57450708

Looks good!

Fundamental data

The basic company financials endpoint serves quite a lot of data. The response object is a dictionary of dictionaries that includes current values for various metrics (under the metric outer key) as well as historical point-in-time snapshots (under the series outer key).

# Get basic company financials
financials = finnhub_client.company_basic_financials(symbol, 'all')

# Outer keys of response object
display(financials.keys())

dict_keys(['metric', 'metricType', 'series', 'symbol'])

Current metrics

Let’s check out how many current metrics we have and display the first ten:

# current metrics

# how many?
print(f"Response contains {len(financials['metric'].keys())} \
current fundamental data points\n")

# display the first ten
display(list(financials['metric'].items())[:10])

Response contains 117 current fundamental data points

[('10DayAverageTradingVolume', 64.76467),
 ('13WeekPriceReturnDaily', 0.83284),
 ('26WeekPriceReturnDaily', -11.05993),
 ('3MonthAverageTradingVolume', 1528.63578),
 ('52WeekHigh', 179.61),
 ('52WeekHighDate', '2022-03-30'),
 ('52WeekLow', 124.18),
 ('52WeekLowDate', '2023-01-03'),
 ('52WeekPriceReturnDaily', -9.66959),
 ('5DayPriceReturnDaily', 1.0198)]

That looks quite comprehensive! For this ticker, we have values for 117 fundamental metrics.

Historical point-in-time snapshots

The historical point-in-time data is under the series outer key. The data is further organised into annual and quarterly metrics, like this:

{
    'annual': {
        'annual_metric1': [
            {'period': date_string, 'v': value}, 
            {'period': date_string, 'v': value}, 
            {'period': date_string, 'v': value}, 
            ... 
            etc
        ],
        'annual_metric2': [
            {'period': date_string, 'v': value}, 
            {'period': date_string, 'v': value}, 
            {'period': date_string, 'v': value}, 
            ... 
            etc
        ],
    },
    'quarterly': {
        'quarterly_metric1': [
            {'period': date_string, 'v': value}, 
            {'period': date_string, 'v': value}, 
            {'period': date_string, 'v': value}, 
            ... 
            etc
        ],
        'quarterly_metric2': [
            {'period': date_string, 'v': value}, 
            {'period': date_string, 'v': value}, 
            {'period': date_string, 'v': value}, 
            ... 
            etc
        ],
    }
}

Let’s first look at what annual series we get:

# point-in-time values
display(financials['series']['annual'].keys())

dict_keys(['bookValue', 'cashRatio', 'currentRatio', 'ebitPerShare', 'eps', 'ev', 'fcfMargin', 'grossMargin', 'longtermDebtTotalAsset', 'longtermDebtTotalCapital', 'longtermDebtTotalEquity', 'netDebtToTotalCapital', 'netDebtToTotalEquity', 'netMargin', 'operatingMargin', 'pb', 'pe', 'pfcf', 'pretaxMargin', 'ps', 'quickRatio', 'roa', 'roe', 'roic', 'rotc', 'salesPerShare', 'sgaToSale', 'totalDebtToEquity', 'totalDebtToTotalAsset', 'totalDebtToTotalCapital', 'totalRatio'])

Again, pretty comprehensive. Let’s put that data into a dataframe:

# Make a list of the available quarterly metrics
annual_metric_names = list(financials['series']['annual'].keys())

# Make a list of dataframes of each metric indexed by period and named appropriately
dfs = [
    pd.DataFrame(financials['series']['annual'][metric]) \
      .rename(columns={'v':metric}) \
      .set_index(keys='period') \
    for metric in annual_metric_names
]

# Join dataframes on index
annual_metrics = dfs[0].join(dfs[1:], how='outer')

annual_metrics

	bookValue	cashRatio	currentRatio	ebitPerShare	eps	ev	fcfMargin	grossMargin	longtermDebtTotalAsset	longtermDebtTotalCapital	…	roa	roe	roic	rotc	salesPerShare	sgaToSale	totalDebtToEquity	totalDebtToTotalAsset	totalDebtToTotalCapital	totalRatio
period
2022-09-24	50672	0.153563	0.8794	7.3158	6.1132	2514884.00	0.3369	0.4331	0.2805	0.5764	…	0.2829	1.9696	0.5813	0.6957	24.1536	0.5669	2.3881	0.3430	0.7048	1.1677
2021-09-25	63090	0.278449	1.0746	6.4601	5.6140	2493617.00	0.3147	0.4178	0.3108	0.5783	…	0.2697	1.5007	0.5019	0.5775	21.6909	0.5822	1.9903	0.3577	0.6656	1.2191
2020-09-26	65339	0.360710	1.3636	3.7818	3.2754	2041191.00	0.3205	0.3823	0.3046	0.5530	…	0.1773	0.8787	0.3217	0.3715	15.6615	0.6177	1.7309	0.3492	0.6338	1.2527
2019-09-28	90488	0.462022	1.5401	3.4378	2.9714	1071363.00	0.3070	0.3782	0.2712	0.4624	…	0.1632	0.6106	0.2783	0.3220	13.9909	0.6218	1.1940	0.3192	0.5442	1.3648
2018-09-29	107147	0.223525	1.1329	3.5449	2.9766	1186220.00	0.3417	0.3834	0.2563	0.4229	…	0.1628	0.5556	0.2686	0.3199	13.2797	0.6166	1.0685	0.3130	0.5166	1.4144
2017-09-30	134047	0.201252	1.2761	2.9202	2.3017	889855.00	0.3345	0.3847	0.2590	0.3893	…	0.1288	0.3607	0.1936	0.2456	10.9123	0.6153	0.8630	0.3082	0.4632	1.5556
2016-09-24	128249	0.259271	1.3527	2.7282	2.0766	674795.00	0.3662	0.3908	0.2345	0.3504	…	0.1420	0.3562	0.2122	0.2788	9.8013	0.6092	0.6786	0.2705	0.4043	1.6630
2015-09-26	119355	0.262002	1.1088	3.0740	2.3042	670483.00	0.3969	0.4006	0.1837	0.2903	…	0.1839	0.4474	0.2907	0.3878	10.0861	0.5994	0.5390	0.2216	0.3502	1.6980
2014-09-27	111547	0.218194	1.0801	2.1438	1.6132	608712.00	0.3803	0.3859	0.1250	0.1974	…	0.1704	0.3542	0.2691	0.3575	7.4638	0.6141	0.3164	0.1522	0.2404	1.9273
2013-09-28	123549	0.326607	1.6786	1.8784	1.4198	435827.00	0.3671	0.3762	0.0819	0.1207	…	0.1789	0.2998	0.2636	0.3487	6.5518	0.6238	0.1373	0.0819	0.1207	2.4805
2012-09-29	118210	0.278813	1.4958	2.0870	1.5767	607370.00	0.3850	0.4387	0.0000	0.0000	…	0.2370	0.3530	0.3530	0.4673	5.9129	0.5613	0.0000	0.0000	0.0000	3.0432
2011-09-24	76615	0.350912	1.6084	1.2884	0.9884	364842.00	0.4155	0.4048	0.0000	0.0000	…	0.2228	0.3383	0.3383	0.4410	4.1275	0.5952	0.0000	0.0000	0.0000	2.9271
2010-09-25	47791	0.543432	2.0113	0.7101	0.5412	254736.00	0.3176	0.3938	0.0000	0.0000	…	0.1864	0.2932	0.2932	0.3847	2.5191	0.6062	0.0000	0.0000	0.0000	2.7447
2009-09-26	31640	0.457414	2.7425	0.4623	0.3243	161495.00	0.2651	0.4014	0.0000	0.0000	…	0.1734	0.2603	0.2603	0.3710	1.6894	0.5986	0.0000	0.0000	0.0000	2.9948
2008-09-27	22297	1.045242	2.6411	0.3297	0.2422	81372.30	0.2879	0.3520	0.0000	0.0000	…	0.1692	0.2744	0.2744	0.3735	1.4842	0.6480	0.0000	0.0000	0.0000	2.6071
2007-09-29	14532	1.007759	2.3659	0.1770	0.1404	126607.00	0.2627	0.3317	0.0000	0.0000	…	0.1379	0.2405	0.2405	0.3033	0.9871	0.6683	0.0000	0.0000	0.0000	2.3437
2006-09-30	9984	0.992084	2.2519	0.0998	0.0809	57277.00	0.1490	0.2898	0.0000	0.0000	…	0.1156	0.1992	0.1992	0.2457	0.7861	0.7102	0.0000	0.0000	0.0000	2.3826
2005-09-24	7428	1.001147	2.9538	0.0685	0.0553	41187.00	0.2006	0.2901	0.0000	0.0000	…	0.1153	0.1788	0.1788	0.2212	0.5806	0.7099	0.0000	0.0000	0.0000	2.8170
2004-09-25	5076	1.119955	2.6613	0.0144	0.0123	11589.80	0.1341	0.2726	0.0000	0.0000	…	0.0330	0.0524	0.0524	0.0617	0.3816	0.7274	0.0000	0.0000	0.0000	2.7068
2003-09-27	4223	1.440815	2.4977	0.0000	0.0034	4629.27	0.0730	0.2752	0.0000	0.0000	…	0.0101	0.0163	0.0152	NaN	0.3050	0.7248	0.0720	0.0446	0.0672	2.6292
2002-09-28	4095	1.358263	3.2497	0.0008	0.0032	3267.85	0.0458	0.2792	0.0502	0.0716	…	0.0103	0.0159	0.0147	0.0039	0.2834	0.7208	0.0772	0.0502	0.0716	2.8588
2001-09-29	3920	1.521739	3.3880	-0.0178	-0.0013	3384.29	0.0778	0.2303	0.0526	0.0748	…	NaN	NaN	NaN	NaN	0.2771	0.7697	0.0809	0.0526	0.0748	2.8658
2000-09-30	4107	0.616141	2.8076	0.0259	0.0390	6989.50	0.1265	0.2713	0.0441	0.0681	…	0.1155	0.1914	0.1784	0.1184	0.3956	0.7287	0.0730	0.0441	0.0681	2.5234

23 rows × 31 columns

Pretty handy… we have 23 years worth of annual company fundamentals.

Let’s do the same thing with the quarterly series. First, what metrics do we get?

# point-in-time values
display(financials['series']['quarterly'].keys())

dict_keys(['bookValue', 'cashRatio', 'currentRatio', 'ebitPerShare', 'eps', 'ev', 'fcfMargin', 'fcfPerShareTTM', 'grossMargin', 'longtermDebtTotalAsset', 'longtermDebtTotalCapital', 'longtermDebtTotalEquity', 'netDebtToTotalCapital', 'netDebtToTotalEquity', 'netMargin', 'operatingMargin', 'pb', 'peTTM', 'pfcfTTM', 'pretaxMargin', 'psTTM', 'quickRatio', 'roaTTM', 'roeTTM', 'roicTTM', 'rotcTTM', 'salesPerShare', 'sgaToSale', 'totalDebtToEquity', 'totalDebtToTotalAsset', 'totalDebtToTotalCapital', 'totalRatio'])

Let’s stick this data into a dataframe:

# Make a list of the available quarterly metrics
quarterly_metric_names = list(financials['series']['quarterly'].keys())

# Make a list of dataframes indexed by period and named by metric
dfs = [
    pd.DataFrame(financials['series']['quarterly'][metric]) \
      .rename(columns={'v': metric}) \
      .set_index(keys='period') \
    for metric in quarterly_metric_names
]

# Join dataframes on index
quarterly_metrics = dfs[0].join(dfs[1:], how='outer')

quarterly_metrics

	bookValue	cashRatio	currentRatio	ebitPerShare	eps	ev	fcfMargin	fcfPerShareTTM	grossMargin	longtermDebtTotalAsset	…	roaTTM	roeTTM	roicTTM	rotcTTM	salesPerShare	sgaToSale	totalDebtToEquity	totalDebtToTotalAsset	totalDebtToTotalCapital	totalRatio
period
2022-12-31	56727	0.149578	0.9380	2.2572	1.8801	2157515.00	0.3226	7.6303	0.4296	0.2873	…	0.2746	1.6345	0.5409	0.6478	7.3424	0.5704	1.9587	0.3204	0.6620	1.1956
2022-09-24	50672	0.153563	0.8794	1.5444	1.2855	2513943.00	0.3041	8.3332	0.4226	0.2805	…	0.2810	1.6090	0.5464	0.6539	5.5927	0.5774	2.3695	0.3404	0.7032	1.1677
2022-06-25	58107	0.211761	0.8646	1.4190	1.1955	2384979.00	0.3013	8.0064	0.4326	0.2816	…	0.2808	1.5297	0.5324	0.6323	5.1013	0.5674	2.0598	0.3559	0.6732	1.2089
2022-03-26	67399	0.220363	0.9268	1.8277	1.5247	2957573.00	0.3154	7.8393	0.4375	0.2947	…	0.2886	1.5288	0.5387	0.6309	5.9305	0.5625	1.7802	0.3422	0.6403	1.2379
2021-12-25	71932	0.251528	1.0378	2.5115	2.0964	3032429.00	0.4015	7.5044	0.4376	0.2797	…	0.2875	1.4981	0.5290	0.6151	7.5032	0.5624	1.7071	0.3221	0.6306	1.2326
…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…
2000-12-30	3712	1.061087	3.0092	-0.0222	-0.0103	3570.24	0.0089	0.0314	-0.0209	0.0520	…	NaN	NaN	NaN	NaN	0.0533	1.0209	0.0838	0.0520	0.0773	2.6324
2000-09-30	4107	0.616141	2.8076	0.0041	0.0084	6989.50	0.1422	0.0537	0.2497	0.0441	…	0.1110	0.1859	0.1736	0.1153	0.0923	0.7503	0.0730	0.0441	0.0681	2.5234
2000-07-01	4176	0.717565	2.7533	0.0083	0.0099	16308.50	0.1342	NaN	0.2975	0.0433	…	NaN	NaN	NaN	NaN	0.0901	0.7025	0.0718	0.0433	0.0670	2.5152
2000-04-01	4215	0.896924	2.6508	0.0083	0.0114	20122.50	0.0452	NaN	0.2823	0.0428	…	NaN	NaN	NaN	NaN	0.0954	0.7177	0.0712	0.0428	0.0664	2.5097
2000-01-01	4414	0.807125	2.4982	0.0050	0.0092	16753.70	0.1754	NaN	0.2591	0.0395	…	NaN	NaN	NaN	NaN	0.1172	0.7409	0.0680	0.0395	0.0636	2.3916

93 rows × 32 columns

Looks like we have essentially the same data as for the annual series, but at a quarterly resolution. Neat!

Restatements

One thing to be aware of with this sort of data is that sometimes companies will “restate” their reported financials after the initial filing, usually to correct an error.

Using a dataset based on restatements can introduce some bias into an analysis because you’d be using data that you didn’t have available at the time you thought you had it.

To correct for that, you’d need an “as reported” dataset – something that recorded that information available at each point in time, and not corrected retrospectively.

The finnhub.io API docs don’t specify whether the endpoint I used here contains restatements. However, I suspect that it doesn’t because there’s an additional endpoint called “Financials As Reported”, which isn’t available on the free tier.

Alternative data

finnhub.io serves some interesting alternative data:

Insider sentiment
Social media sentiment
Senate lobbying disclosures
Government spending
Visa applications submitted by companies
Many others

Example: insider sentiment

The “Insider Sentiment” endpoint serves the Monthly Share Purchase Ratio (MSPR), which is a measure of whether insider transactions are dominated by buying (a value closer to 100) or selling (a value closer to -100) over the previous month. You can read about the MSPR and how it’s calculated here.

For now, let’s just grab 10 years worth of this data for a single stock and plot it against the stock’s forward returns on the understanding that we don’t have enough data to say anything meaningful (around 120 observations total).

This API can understand human-readable date strings, unlike the prices API.

# Get ten year's worth of insider sentiment data

# api query paramters
symbol = 'AAPL'
from_date = '2013-01-01'
to_date = '2023-02-21' 

is_response = finnhub_client.stock_insider_sentiment(
    symbol, 
    from_date, 
    to_date
)

# Print the first few records
display(is_response['data'][:4])

[{'symbol': 'AAPL',
  'year': 2013,
  'month': 1,
  'change': 246,
  'mspr': 0.2078967},
 {'symbol': 'AAPL',
  'year': 2013,
  'month': 2,
  'change': 3934,
  'mspr': 42.06587},
 {'symbol': 'AAPL',
  'year': 2013,
  'month': 3,
  'change': -26460,
  'mspr': -20.025429},
 {'symbol': 'AAPL',
  'year': 2013,
  'month': 4,
  'change': -9818,
  'mspr': -21.024454}]

# Make a dataframe from insider sentiment data
is_df = pd.DataFrame(is_response['data'])

is_df.head(10)

	symbol	year	month	change	mspr
0	AAPL	2013	1	246	0.207897
1	AAPL	2013	2	3934	42.065870
2	AAPL	2013	3	-26460	-20.025429
3	AAPL	2013	4	-9818	-21.024454
4	AAPL	2013	6	-321301	-28.045816
5	AAPL	2013	7	-79734	-48.365260
6	AAPL	2013	8	-75200	-34.034233
7	AAPL	2013	9	-28171	-31.069471
8	AAPL	2013	10	1930	3.064367
9	AAPL	2013	11	-4486	-100.000000

Notice that we have some missing data – above we don’t have a record for month 5. We’ll need to deal with that later.

I’ll also need to lag the mspr field (the sentiment score) to align it with the date on which it would have been actionable. We’ll do this once we have some price data.

Next we’ll grab some monthly price data and munge it for our needs:

resolution = 'M'
unix_from_date = unix_timestamp_from_date(from_date)
unix_to_date = unix_timestamp_from_date(to_date)

price_response = finnhub_client.stock_candles(
    symbol, 
    resolution, 
    unix_from_date, 
    unix_to_date
)

# Ditch the status code
try:
  price_response.pop('s')
except KeyError as e:
  print("Already ditched status code")

# Create dataframe from remaining records
price_df = pd.DataFrame(price_response) \
  .rename(columns={
    'c': 'close', 
    'h': 'high', 
    'l': 'low', 
    'o': 'open', 
    't': 'timestamp', 
    'v': 'volume'
  }) 

# Ditch stuff we won't need
price_df = price_df[['timestamp', 'close']]

# Convert timestamp to human-readable date format
price_df['date'] = pd.to_datetime(price_df['timestamp'], unit='s')

# create year and month fields
price_df['year'] = price_df['date'].dt.year
price_df['month'] = price_df['date'].dt.month

price_df.head()

	timestamp	close	date	year	month
0	1356998400	16.267479	2013-01-01	2013	1
1	1359676800	15.764271	2013-02-01	2013	2
2	1362096000	15.809271	2013-03-01	2013	3
3	1364774400	15.813561	2013-04-01	2013	4
4	1367366400	16.061950	2013-05-01	2013	5

You can see that our data isn’t going to overlap neatly. There are some months in our sentiment dataframe that are missing from our prices dataframe and vice versa.

I’ll merge the two dataframes using an outer join – this will preserve all of our data and insert NaN into any record that’s missing.

From there, I can lag the sentiment score appropriately. Only then can I remove records that contain NaN (doing so prior would misalign some of our data).

df = pd.merge(price_df, is_df, on=['year', 'month'], how='outer') \
    .sort_values(['year', 'month']) \
    .set_index(keys='date')
    
df.head(10)

	timestamp	close	year	month	symbol	change	mspr
date
2013-01-01	1356998400	16.267479	2013	1	AAPL	246.0	0.207897
2013-02-01	1359676800	15.764271	2013	2	AAPL	3934.0	42.065870
2013-03-01	1362096000	15.809271	2013	3	AAPL	-26460.0	-20.025429
2013-04-01	1364774400	15.813561	2013	4	AAPL	-9818.0	-21.024454
2013-05-01	1367366400	16.061950	2013	5	NaN	NaN	NaN
2013-06-01	1370044800	14.161771	2013	6	AAPL	-321301.0	-28.045816
2013-07-01	1372636800	16.161771	2013	7	AAPL	-79734.0	-48.365260
2013-08-01	1375315200	17.400550	2013	8	AAPL	-75200.0	-34.034233
2013-09-01	1377993600	17.026771	2013	9	AAPL	-28171.0	-31.069471
2013-10-01	1380585600	18.667911	2013	10	AAPL	1930.0	3.064367

Now we can calculate close-to-close monthly returns and lag the sentiment score:

# Calculate monthly open-to-close returns
df['return'] = np.log(df.close/df.close.shift())

# Lag sentiment score
df['mspr_lag'] = df['mspr'].shift()

df.head(10)

	timestamp	close	year	month	symbol	change	mspr	return	mspr_lag
date
2013-01-01	1356998400	16.267479	2013	1	AAPL	246.0	0.207897	NaN	NaN
2013-02-01	1359676800	15.764271	2013	2	AAPL	3934.0	42.065870	-0.031422	0.207897
2013-03-01	1362096000	15.809271	2013	3	AAPL	-26460.0	-20.025429	0.002850	42.065870
2013-04-01	1364774400	15.813561	2013	4	AAPL	-9818.0	-21.024454	0.000271	-20.025429
2013-05-01	1367366400	16.061950	2013	5	NaN	NaN	NaN	0.015585	-21.024454
2013-06-01	1370044800	14.161771	2013	6	AAPL	-321301.0	-28.045816	-0.125907	NaN
2013-07-01	1372636800	16.161771	2013	7	AAPL	-79734.0	-48.365260	0.132102	-28.045816
2013-08-01	1375315200	17.400550	2013	8	AAPL	-75200.0	-34.034233	0.073853	-48.365260
2013-09-01	1377993600	17.026771	2013	9	AAPL	-28171.0	-31.069471	-0.021715	-34.034233
2013-10-01	1380585600	18.667911	2013	10	AAPL	1930.0	3.064367	0.092019	-31.069471

An easy way to look for a relationship between this sentiment metric and next month’s returns is to just scatter plot them:

sns.lmplot(data=df, x='mspr_lag', y='return', height=4, aspect=1.5);

Unfortunately, we don’t see much of any kind of relationship.

The sentiment metric does jump around quite a bit – let’s see if smoothing it with a 3-month moving average reveals anything interesting:

df['mspr_roll'] = df['mspr_lag'].rolling(window=3).mean()
df

	timestamp	close	year	month	symbol	change	mspr	return	mspr_lag	mspr_roll
date
2013-01-01	1356998400	16.267479	2013	1	AAPL	246.0	0.207897	NaN	NaN	NaN
2013-02-01	1359676800	15.764271	2013	2	AAPL	3934.0	42.065870	-0.031422	0.207897	NaN
2013-03-01	1362096000	15.809271	2013	3	AAPL	-26460.0	-20.025429	0.002850	42.065870	NaN
2013-04-01	1364774400	15.813561	2013	4	AAPL	-9818.0	-21.024454	0.000271	-20.025429	7.416113
2013-05-01	1367366400	16.061950	2013	5	NaN	NaN	NaN	0.015585	-21.024454	0.338662
…	…	…	…	…	…	…	…	…	…	…
2022-10-01	1664582400	153.340000	2022	10	AAPL	-1297681.0	-30.305650	0.103956	100.000000	NaN
2022-11-01	1667260800	148.030000	2022	11	AAPL	-20200.0	-100.000000	-0.035243	-30.305650	-3.333313
2022-12-01	1669852800	129.930000	2022	12	NaN	NaN	NaN	-0.130419	-100.000000	-10.101883
2023-01-01	1672531200	144.290000	2023	1	NaN	NaN	NaN	0.104829	NaN	NaN
2023-02-01	1675209600	152.550000	2023	2	AAPL	-1685.0	-5.882353	0.055667	NaN	NaN

122 rows × 10 columns

sns.lmplot(data=df, x='mspr_roll', y='return', height=4, aspect=1.5);

The relationship is still super noisy. If anything, it’s a little opposite to what we’d expect, especially at the extreme values of our smoothed metric.

Maybe the SEC is succeeding in stamping out insider trading.

The reality is that insiders buy and sell for all sorts of reasons, many of which have nothing to do with their views on the stock itself.

As I mentioned, there aren’t a whole lot of data points here – perhaps this would be interesting to revisit using a broader universe of stocks.

Conclusion

finnhub.io is a modern market data service with a well-documented REST API, open source client libraries, broad coverage, interesting data sets, and a generous free tier.

What’s your preferred source of market data and why? Let me know in the comments!

8 thoughts on “Exploring the finnhub.io API”

Alen Yokhanis

February 24, 2023 at 9:29 am

Thanks Kris. I enjoyed this blog and found it interesting the amount data and how reasonably easy it is to fetch, process and do some visual analysis of it.

Looking forward to your next blog.

Cheers,
Alen
- Kris Longmore
  
  February 24, 2023 at 11:50 am
  
  Thanks for reading Alen! Same here – it’s such a luxury having all that data at your fingertips and so easy to work with.
Pingback: Quantocracy's Daily Wrap for 02/23/2023 - Quantocracy
Steven Pola

February 24, 2023 at 7:44 pm

Thanks Kris, enjoyed the post. Suss on the value of the alternate data but really wont know till tested.
- Kris Longmore
  
  March 6, 2023 at 2:06 pm
  
  Hey Steven, yeah I agree. My personal experience is that much of that stuff tends to be correlated with contemporaneous returns, but not very useful in trying to predict forward returns. I’m sure there are exceptions out there, as well as novel use cases.
Wes

March 6, 2023 at 2:06 am

Thank you!

Very useful, kind of standing on two legs these days.. Learned a ton of R coding during your classes in RW, but also I see the use of doing stuff in Python from time to time. Currently working with Airflow (thanks to Ajet of RW) then Python comes in handy as well although running R code with Airflow is working as well using a sidecar.

Wes
- Kris Longmore
  
  March 6, 2023 at 2:04 pm
  
  Hey Wes! Thanks for the comment. I’m with you on that one… I like doing data analysis in R more than any other language I’ve tried, but for data plumbing type applications, Python is usually my first choice.
Pingback: Why Is My Finnhub.Io Request Returning Same Response Each Time? Update Mới - Chuyenphatnhanhdhlvn