When compiling hosting company stats, what are the sane limits that should determine size? Anchor.fm currently has 1,506,486 feeds in the Index. A large percentage of it is garbage/old.
A lot of Spreaker garbage also.
So, what should a realistic hosting stats page include? Only feeds that have published a new episode in the last X days? That doesn't seem right since some podcasts are evergreen.
Maybe podcasts that have more than X number of episodes?
@dave This is what I debate with Podcast Industry Insights.
It's up to you to define your own labels. I consider it a "valid podcast" if it has downloadable media via an RSS feed and it's listed in a podcast catalog.
But I'm working on some new taxonomy and have joined the Podcast Taxonomy project to help guide industry-standard labels for stuff like this.
Even "active" and "inactive" are up for debate, too. That's why I changed some of my labeling.
@dave I would label that "inactive."
BTW, "podfade" probably can't be measured, since it's more about intention and communication (or lack thereof). So I think it's very important to avoid derogatory terms to describe inactive podcasts.
Except "podflash." That one's easy. :)
@theDanielJLewis When I say "junk" and "garbage" I just mean stuff like one episode anchor feeds where someone was just messing around on a lark and it's now in the catalog forever. Or clone/fraud feeds.
@dave Yeah. That's where we need a good term that includes new podcasts, ongoing podcasts, and retired but timeless podcasts.
Everything in podcast pollution is, technical, still a podcast. But probably not someone anyone would want to follow.
@dave To say, "It's not a podcast until you reach N episode," is like telling a baby they're not a human until they reach a certain age.
@dave In a way, "active" could still describe a podcast like Serial. Even though it's not current/ongoing/publishing, it's still getting lots of followers. But a third-party can't measure that.
@dave I've thought about terms like "living," "sustained/maintained," "matured," "seasoned," "relevant," and more.
@dave "Relevant" is a bit more subjective, though.
@dave You could simply say, "hosted podcast feeds," which is far more accurate, and also would prevent Blubrry from being incorrectly ranking low. Unless you're looking at the enclosure URLs.
@theDanielJLewis I'm looking at both enclosure urls and feed urls to make it accurate. I plan to open source the compilation code when I feel good about its accuracy. Blubrry was my test case because they are so wrongly reported. If I can get them right then I feel good about the logic.
@dave Libsyn will be slightly undercounted, too, if looking at only feed URLs, since many people use Libsyn with PowerPress.
@theDanielJLewis I really have zero interest in "stats" other than to help me dedup this mountain of feeds. Also, the question comes up often about why we have a million and a half more feeds than Apple. And, I'd like to know how much of that is "junk".
@dave you have access to a stat, I don't know if you use it yet. What is being requested /fetched from the index? Surely something that is active is fetched more than dead (though if it is in someone's podcatcher or may get fetched till the end of time) in isolation it might be noisy, in aggregation I think you'll find a signal.
Intended for all stake holders of podcasting who are interested in improving the eco system