383: Repositioning Podscan: From Monitoring to Data Platform

Download MP3
Arvid:

Hey. It's Arvid, and this is the Bootstrap founder. Before we get to my latest positioning efforts, a word from paddle.com, the merchant of record and sponsor of this episode. So I've been using Paddle for everything really related to making money with my recent businesses, and they do more so that I have to do less. And in fact, they believe that the world needs more, and that's m o r, merchant of record, and I very much agree.

Arvid:

And I also really enjoy a nice nerdy wordplay. They deal with sales taxes for me and deal with invoices, credit card charges, fraud, that kind of stuff, so I can focus on my own business. I get more of my time back, and they handle more of the behind the scenes complexity than I ever could. So, yeah, you probably should check out paddle.com to learn more. Coming out of MicroConf, I've gained some clarity about something that I've been feeling for a while.

Arvid:

I talked about it last week in the hotel room as I recorded last week's podcast episode, but I've had some time mostly because right after that, I went to the reception and talked to so many people and then talked to more people. And then on the plane ride back home, I thought a lot about that stuff and then came to a certain conclusion. I think it's time to reposition PodScan. And I have to shift away from this initial stuff that I build it as. It was kind of Google alerts for podcasts.

Arvid:

That was the initial idea, alerting and monitoring. Those were the main features cause they probably were also the easiest ones to build initially to validate it. And I have to move away from this a little bit because I have to move more decisively towards what is already the state of the platform, comprehensive podcast database. I have to embrace that what it is over what it was built to initially do. This is not just a cosmetic change.

Arvid:

It's a genuine pivot really in focus, my focus, and the product's functionality that aligns better with how people actually are using the product and kinda also aiming at who I want to use the product more out of the things that I see people do with it. Over the last couple months, I've watched how customers have interacted with the platform. The most telling indicator for me is the API usage. Because I built this whole platform as an API first product. Everything the platform does in the interface, if you click around and do stuff, can be done automatically as an API call, as an automated feature.

Arvid:

And more and more customers are quite intensively using the API to access the data that Podscan offers, which at this point is tracking 3,000,000 podcasts with a lot of metadata about who's on the show, who's hosting the show, where the show is hosted, recency of episodes, a lot of demographics and that kind of stuff, plus what is now, I think, 27,000,000 actual podcast episode with transcripts for a massive majority of them, just over 20 some million transcripts in the database itself. That's a lot of data, and that is more interestingly accessed through an API than it is clicking through stuff. Makes perfect sense. Right? The API is a really good source to get specific info from this vast amount of data.

Arvid:

Some clients are making 6 figure API requests every single day at this point. That's my ideal customer, and that's quite significant. And it tells me something about the importance of the product and the value that we're providing to those particular customers. I've been getting feature requests and ideas from my customers because they're using this to do real stuff, to actually solve real jobs to be done. People have been asking more and more about data exports and not just a couple of things from a list that they might have inside of Podscan, but actual data exports of their full backlog of transcripts that sit in a database.

Arvid:

And to put that into perspective, I just run an actual query. We have roughly 18,000,000 fully transcribed episodes that have passed all our completeness checks with many more waiting to be processed or they are ready for a second pass to improve accuracy. Right? That's how I get to the 20 plus million there. When customers are asking for this direct access to this kind of data volume, they're telling me something essential about what they value and what they need.

Arvid:

I think this mostly is a machine learning effort or AI training effort. That's probably what this data is going to be used for. At least those are the companies that are asking for it. So I see something happening there and it's not just me wanting them to ask me to do it, which was the state of mind that I had in the past. Now I'm actually getting these emails.

Arvid:

So this repositioning isn't just about how Podscan presents itself to the world. It's about what I also prioritize in development and where we focus our resources in the product. And the shift centers around two key principles. One is data agility. That's just how quickly it is available.

Arvid:

And then data fidelity, which is about ensuring data is better and more reliable, more accurate, more complete. That's kind of what fidelity is about. I think agility is kind of part of it. But I want to stress these two because I want it to be faster so people get their results quicker, and I want it to be more precise and more accurate so people don't really have to do a lot of heuristics in terms of detecting errors or anything. I want to do that for them.

Arvid:

And those principles are now driving the development road map for Podscan much more than before. I'm focusing on features and on things that customers have been requesting for a while, but that kind of felt previously unfeasible or too complicated or maybe not worth the effort. But now that I see people actually using this and with the new direction clarified, the features have moved to the foreground because let's just be honest, these customers also pay more. It's a fact that once you use the API significantly, you have to be on a more expensive plan. And I recently implemented metered billing for plans that exceed these 5,000 API requests a day, which means that for every couple thousand API requests, I get more money.

Arvid:

So allowing people to get more data and putting more interesting stuff in the data that wants them to have more data is clearly a directional alignment thing. So for me, it's very important to focus on features that provide this. So what's fascinating about this pivot is how it's forcing me to focus much more intently on the specific ways that my customers use that data. And that varies somewhat by customer segment, but there are common patterns emerging to me. One major use case that keeps coming up is host and guest tracking.

Arvid:

A lot of the people that look into the data try to recognize people in those shows. Right? Many users aren't just interested in the content of the conversation, which is interesting to begin with, but they wanna know about the people behind the voices. They're asking things like how trustworthy are these people that are speaking there? How can I figure out more about them?

Arvid:

Who is saying what? What are the credentials of this host or this guest? And what's their background expertise? What other shows have they been on? Is their reputation somewhere.

Arvid:

And it's not just the content, it's about context at this point. Who someone is shapes how we interpret what they say. So any kind of additional information I could give toward that end is useful. I've actually been extracting this kind of information for months, but in a display only form. So for every episode that I have, I extract this data and I say, this is the host, this is the guest, and I extract the names, the social links in their profiles, maybe their website, their occupation, if I could figure it out from the episode content, which was likely the case, and that was it.

Arvid:

And when a new episode for that podcast came in, I would do the same data extraction again. Sometimes the new extraction would be better, more precise, and sometimes a little bit worse than previous attempts because it was always done on this per episode basis. Right? Sometimes people would say what their occupation is and I could filter it out, and sometimes they wouldn't and I wouldn't really know. I would know their name, but I wouldn't know, is this the same person or not?

Arvid:

So now I'm working on something much more powerful, and that is entity tracking. The concept is quite simple, but game changing for the data fidelity and the interconnectedness of data. If a person can be reliably detected by their name on a show, then the same name with the same social media handles, maybe occupation, if they show up on another show, well, it's likely the same exact person. And this means that I can say this one person, well, they appear here as a host, here as a guest they mentioned in this episode, and they've sponsored this episode. All in one.

Arvid:

One entity linked to all these things with all these roles. Entity recognition and attribution are becoming core capabilities of the data platform. And I'm super excited about this. Because if you think about the fact that we ingest 50,000 podcasts every single day, which have at least one person to each of them, likely two because there's a guest and maybe a sponsor and maybe they talk about a couple people, that means that data just flows into the system as highly connected with other things. Right?

Arvid:

Who talked about this person where now becomes an actual question you can ask of the API or of the data inside the API if you phrase it right, and you get information. It becomes very, very interesting. And unlike before where this information was just kind of layered on top of transcription data, we're now tracking this stuff as entities in their own database. And this opens up these incredible possibilities, like following the same person across all podcasts that they've ever appeared on or connecting appearances on one podcast with mentions on other shows that happened right after, or maybe the most interesting thing, creating a graph of interconnections between people in the podcast ecosystem, either visualizing to make it interesting or as a data model to then run queries on if you have a graph database. Very exciting.

Arvid:

This is extremely valuable for people who are using PodScan for research and outreach purposes. Imagine being able to query, give me a list of all people who have been on this show and on these five other shows and they've been mentioned over there. That stuff, the things that we actually want. Right? If we wanna find a new guest, wanna find out where they have been talked about and where they have talked, that was super hard to get in the past.

Arvid:

Now with entity tracking, it becomes very easy. And that kind of capability transforms PodScan from a useful tool into an essential data platform for other useful tools to be built on top of. Now once sugarcoated here, the recognition part is surprisingly complicated. Entity recognition is generally fraught with false positives wherever you look, but it's particularly hard with data that comes from audio, gets transcribed, and then is analyzed. There's always a little bit of loss along the way.

Arvid:

Because think about a scenario like somebody says, hey, it's me. I'm John from this kind of podcast. And they never mention their full name. So there's no additional information. No homepage.

Arvid:

No email address. No social media link. It's just John from podcast x y z. So I adjust another podcast and it says, hey, it's John from podcast a b c. Is that a different John?

Arvid:

Does one person have two podcasts? It's kind of the same John or is it a very different John? There's lots of Johns. Right? Is he the host of one show and guest on another?

Arvid:

It's hard to determine this programmatically with limited data. But this is where the approach that we have gives us an edge because since we extract so much data from podcast episode and the adjacent social media profiles, we have been able to develop really reliable heuristics. We can say with, let's say, reasonable confidence that we know this person, we've seen them before on a similar show. It's quite likely that this is the John we're talking about. Right?

Arvid:

John has hosted this podcast 10 times. It's probably the same John. And the system that I've currently running in my testing environment is quite reliable when it comes to this, but it took time to set up. The challenge was really finding the right balance of these flexibility and precision choices that you need to make in a heuristic because you needed enough leniency to handle slightly different data that still belongs to the same entity. Like, a sponsor might have different tracking links for each episode that they sponsor, like a discount code in the URL, but they're still the same sponsor.

Arvid:

Still wanna recognize them as the same thing. But at the same time, you don't want two people with, like, similar sounding social media profile names to be automatically attributed to one individual. Like, that's just merging too much data. So a lot of stuff going on, a lot of interesting little technical challenges, but I know that there are entire businesses that focus solely on solving these problems. I've talked to them in the past to maybe solve my problems inside of Podscan for me.

Arvid:

But for our purposes right now, the system works reliably for any kind of normal thing, like names of brands. For people with names that are often mistranscribed or very common, it became less reliable for a bit, but that's the nature of working with massive datasets. And I fixed it by just adding the mistranscriptions or more specifiers to the heuristic, and now it detects them pretty well. I'll just keep refining the system to make it better and more accurate. The goal here is to enable it as a searchable feature for everybody on the platform and then API centric stuff for people who build automations on top of it.

Arvid:

Right? You can search for a name and find all the episodes they're in, all the episodes they mentioned in, all the episodes, and so on, turning the platform more from this alerting monitoring thing into a fully data centric thing. And this product pivot is a consequence for me from being at MicroConf twenty twenty five this year in New Orleans? I finally understood who my main customer is and should be there. And in the week leading up to it, I had a couple conversation with a marketing expert that helped me see this more straightforwardly.

Arvid:

And that said, I'm not abandoning any of the existing user segments because I think there's still a progression here. There's a pattern that I've observed that is the transition path that customers often take because many people start using Podscan manually alone. Often the first person in their agency or marketing department to adopt the tool for a specific project. They just need something to track alerts. And then later, they might get others in their organization interested.

Arvid:

They invite them into a team, and eventually someone turns it into a more automated part of their business or their project or the job that they need done. And the agency then starts using the API. And that's when they upgrade to a higher plan because that's when the API features kick in that they need. I've seen several customers follow this exact path and it tells me that maintaining both the manual interface and the powerful API is important to get people to be API customers. Features just serve different stages of the customer journey.

Arvid:

It's a journey through the product to a higher plan. And this repositioning has clarified not just what Podscan is, but it also what it's not. I don't think we're just a marketing monitoring tool anymore. I think Podscan has morphed into a comprehensive podcast data platform with quite unique capabilities. Now around entity tracking and relationship mapping, but just about data connectivity.

Arvid:

And by focusing on this this core value prop of having the best, the most comprehensive podcast data platform, I think we can make better decisions about where to invest development time. Features that enhance data quality and comprehensiveness and even accessibility are priorities now compared to monitoring capabilities that have kind of become secondary. They're still there. Right? I've built this in the past.

Arvid:

It's still working because it doesn't ever change, but I'm not gonna refine that too much. I'm gonna dive deep into the data now. For existing users, this means that they would just have more powerful data capabilities and more reliable information, particularly for those using the API, just richer datasets, more sophisticated query possibilities. And that's exciting because I can also build features on top of that. But, you know, that's future talk.

Arvid:

And those who are considering Podscan right now probably have a clearer understanding of the value proposition. Well, they will soon because I'm still not communicating this too well. I'm incredibly excited to move into this direction, building out entity tracking attribution. That's just the right step. And it aligns with how the most engaged users are already using the platform, but I need to do something about how we present it.

Arvid:

So in the coming weeks, you will see these capabilities roll out both in the user interface and in the API, and you'll be able to follow entities across shows, understand connections between these things, and gain insights that simply were not possible before. But I have to do something about how I present the landing page of the product. Like, right now, it's a mix of everything. It's focused on alerting. It has API stuff.

Arvid:

It talks about every single thing kind of on the same level. And if the talk given at MicroConf gives me any kind of indication, then I should be moving the landing page toward a very platform centric projection of what Podscan is and move the alerting and all the other features down on the page, if not onto their own landing pages. I need to make very clear what the main purpose of PodScan is to potential customers so I get more that are interested in API capabilities and get viewers that just want alerting. Because if I want the customer journey to be moving toward being an API customer, then I should get people on board from day one, right, from the first time they see the product. So I really need to do this very soon.

Arvid:

Moving the self determined positioning of the product onto how I communicate about it on the landing page, and I really wanna get it done within the week. So I'm kinda tasking myself just to stay accountable to get this done before I record the next episode of this podcast. And I guess next week, you'll see if I have done it or not, but I just need to move a couple things around to make it more obvious. But that's thinking work, and thinking work takes time and the right place to do it. So, yeah, that's that's my task for myself for this week is to shift how I communicate, the main purpose of Podscan, who the main target customer, the main ICP is for this product and for the whole business.

Arvid:

And I really wanna get it done because, you the platform is being developed and becoming better and better, and I need more people to actually understand that this would be for them. So that's the plan. Tune in next week. See what happens. And that's it for today.

Arvid:

Thank you for listening to The Bootstrap Founder. You can find me on Twitter at avid kahl, a r v I d k a h l. If you wanna support me and the show, please share Podscan.fm with your professional peers who you think would benefit from having this amazing data API. It's a near real time podcast database with a really well designed push and pull API. So please share the word with those who need to stay on top of the podcast ecosystem and present data to their own customers.

Arvid:

Thank you so much for listening. Have a wonderful day, and bye bye.

Creators and Guests

Arvid Kahl
Host
Arvid Kahl
Empowering founders with kindness. Building in Public. Sold my SaaS FeedbackPanda for life-changing $ in 2019, now sharing my journey & what I learned.
383: Repositioning Podscan: From Monitoring to Data Platform
Broadcast by