[HFWWT 2]: Walk-Through 2
Trying Tiingo to find missing tickers from HFWWT 1
In HFWWT 1, we set our script, accessed the necessary API keys, and scraped the State Street website for a list of the top 10 holdings in each of the 10 main sector ETFs. The list was great, but it did not include tickers, so we began our search using the AlphaVantage API to match tickers to company name. Unfortunately, the search endpoint was only able to find 30% of the tickers, so we decided we'd have to use another API. Today, we try with Tiingo. We'll assume you've loaded most of same script from part 1. We'll repeat the lines of code we used to make requests to the AlphaVantage search endpoint
# Get company names
companies = top_5['name'].to_list()
companies = [extract_words(name) for name in companies]
# Try AlphaVantage
tickers_av = []
missing_av = []
for company in companies:
try:
ticker = get_ticker_alpha_vantage(company, av_api_key)
tickers_av.append(ticker)
except Exception as e:
missing_av.append(company)
print(f"Error for {company}: {e}")
assert len(tickers_av) + len(missing_av) == 50, "You lost some tickers!"
if len(missing_av) > 0:
print(f"Missing {len(missing_av)} tickers!")
else:
print("Good to go!")
# Try Tiingo
headers = {
'Content-Type': 'application/json'
}
comp_info = []
for company in companies:
url = f"https://api.tiingo.com/tiingo/utilities/search?query={company}&token={ti_api_key}"
r = requests.get(url, headers=headers)
data = r.json()
comp_info.append(data)
ticker_dict ={}
missing_ti = []
for idx, info in enumerate(comp_info):
try:
ticker_dict[info[0]["name"]] = info[0]["ticker"]
except IndexError as e:
print(f"Missing {companies[idx]} due to {e}")
missing_ti.append(companies[idx])
tickers_ti = [value for value in ticker_dict.values()]
assert len(tickers_ti) + len(missing_ti) == 50, "You lost some tickers!"
if len(missing_ti) > 0:
print(f"Missing {len(missing_ti)} tickers!")
else:
print("Good to go!")
Having the assert
line in there helps especially if you accidentally copy over missing_av
instead of missing_ti
as we did. Now, we'd like to point out that this code doesn't exactly follow the don't repeat yourself (DRY) principle. We could have, of course, turned the endpoint requests into separate functions, or even one function, though that would probably get a little unwieldy. Instead, we kept it procedural as we don't intend to perform this search too many times, as you'll soon see.
The print function reveals that with Tiingo we are Missing 11 tickers!
, a much better result, but still substantial. Let's know have a look at the tickers that we actually have in each list.
# Print AlphaVantage tickers
print(tickers_av)
>>> ['LIN', 'APD', 'ECL', 'NFLX', 'DIS', 'XOM', 'CVX', 'EOG', 'WMB', 'AMJB', 'BAC', 'GE', 'CAT', 'RTX', 'HON', 'UBER']
# Print Tiingo tickers
print(tickers_ti)
>>> ['LIN', 'APD', 'ECL', 'NFLX', 'DIS', 'XOM', 'CVX', 'EOG', 'WMB', 'JPM-P-D', 'BML-P-L', 'GEHCL', 'CAT', 'RTX', 'HON', 'UBER', 'AAPL', 'NVDA', 'MSFT', 'AVGO', 'CRM', 'COST', 'WMT', 'PG', 'CCHGY', 'PEP', 'NEEPR', 'SOJE', 'CEG', 'DUK-P-A', 'VST', 'UNH', 'JNJ', 'ABBV', 'MKGAF', 'AMZN', 'TSLA', 'HD', 'BKNG']
# Print difference
print(set(tickers_ti).difference(set(tickers_av)))
{'SOJE', 'UNH', 'DUK-P-A', 'BKNG', 'TSLA', 'JPM-P-D', 'MSFT', 'AAPL', 'VST', 'MKGAF', 'AMZN', 'HD', 'WMT', 'PG', 'CRM', 'AVGO', 'CEG', 'JNJ', 'ABBV', 'COST', 'NEEPR', 'NVDA', 'CCHGY', 'PEP', 'BML-P-L', 'GEHCL'}
A few comments. First, you may not get exactly the same print out as above due differences on the State Street website, the search endpoints, the API, or general randomness. Second, even though the AlphaVantage list is missing a lot of tickers, most of them are accurate except for AMJB
. This is where domain knowledge comes in. While we may not know all the tickers for most of the big companies in each of the major S&P 500 sectors, we're familiar enough them to notice something that looks amiss. AMJB
is not familiar to us and that's because when you search for it, it comes up as the Alerian MLP Index ETN. Definitely not a company! On to Tiingo. While it has more tickers, it is definitely more problematic than AlphaVantage. For one, we see tickers with hyphens and longer than four characters. A company generally only has a ticker longer than three characters if it trades on the NASDAQ or some other exchange, is foreign, or has a specific issue, like bankruptcy. OUCH! In general, however, most large companies have tickers are no longer than four characters, GOOGL
being a notable exception.
If we print the tickers with more than four characters we get the following:
# Print tickers with longer than four characters
print([x for x in tickers_ti if len(x) > 4])
>>> ['JPM-P-D', 'BML-P-L', 'GEHCL', 'CCHGY', 'NEEPR', 'DUK-P-A', 'MKGAF']
I have no idea where JPM-P-D
comes from or NEEPR for that matter
. The upshot is that trying to resolve this issue is probably not worth the effort. On to our last resort: Yahoo! We'll examine that source in our next walk-through.