The plumbing is done. It's just fine tuning the indexer to get the results to be what we want. I'm happy with it so far. Comments needed.
Also need to find the right weighting too.
It takes about 30 minutes to read-index the 50 million episodes. Have to figure out a good workflow for that. Maybe daily.
@dave Astoundingly accurate.
@ThewTheKooky Yes, I’m thinking 24 hours is a good starting point.
@dave I must confess to being a little confused about this. podcast:people is a new tag to help find people correctly, in a semantic way rather than just going through show notes and finding words.
https://api.podcastindex.org/api/1.0/search/episodes/byterm?q=dave+jones&pretty certainly finds everyone called Dave Jones. But it doesn't find *the* Dave Jones. Which it *could* if we were using the podcast:people tag.
So - and perhaps I'm being a bit dim - why would you want to be promoting this as a "people search" when it isn't that at all?
@jamescridland It’s just the framework for what’s to come when we get the person tag baked. When we get that tag it’s still going to take time for adoption, so I built a backing search to fill in the gaps.
Once we start indexing the person tag those results will get weighted up higher.
What I’ve done is imported the entire English language dictionary as a stop-word list so that the only thing left that it will pick up are proper names. I think it’ll be a good base to work from.
@jamescridland Also, I kept getting asked for this feature so I didn’t want to make people keep waiting until the tag to get something they could use.
@jamescridland To say it a bit more clearly, because of the enormous stop-word list, the only thing this endpoint _can_ pick up are proper names of people. (English-wise at least)
@martin @jamescridland Well, it's a little more complicated than that - as these things usually are. I took a full english language dictionary and then subtracted names from it to fix the issue you're talking about. What's left over are problems like searching for "Tom Cotton", a US senator. "cotton" is still in the stop-word list, but "tom" isn't. So, going forward I'll be having to refine the list to fix those things.
@martin @jamescridland It's something that should be crowd-sourced. But, I'm not sure how deep to go until we get the person tag implemented. It's really just a stop gap for now. I made it public mostly because it was giving some real good results.
I need to surface those results somehow in the mean time.
@dave @martin @jamescridland by accident I searched for myself in Apple's podcast app and I was quite surprised when it showed me the guest appearance I made on Delingpole's podcast a few months back.
Apple are certainly surfacing search results from episode descriptions in a way that PodcastIndex doesn't do at present.
Search for brianoflondon and you see Delingpod.
Intended for all stake holders of podcasting who are interested in improving the eco system