The Bootstrapped Founder | Transcript: 360: Product-Market Fit & Time-to-First-Value

360: Product-Market Fit & Time-to-First-Value

December 6, 2024 / 18:01/E360 Download MP3

Arvid: 00:00

Hey. It's Arvid and this is the Bootstrap Founder. This episode is sponsored by paddle.com, a company that truly has found product market fit which is the topic of today's episode. And if you wanna focus your efforts on finding your product market fit I highly recommend using Peddle because they take care of all the things like taxes, refunds, chasing after invoices for you. So check them out at peddle.com.

Arvid: 00:30

The journey to product market fit, which is the topic here, has always been fascinating to me. It's probably one of the most fascinating things in entrepreneurship. Because it's a journey of struggle and overcoming obstacles and all that. And my personal experience with PodScan has been quite enlightening and challenging. So while we're not yet profitable, we're really close but we're not.

Arvid: 00:51

We're getting that much closer every single month. There's constant growth and I wanna share some insights about how we're finding our way there with you here today. Because I think there's something to learn or at least something to maybe try to replicate attempt or change your own ways. So from the start, I struggled with clarity about what PodScan was going to be. Because when I started building this, the potential for what it could be seemed vast, like massive.

Arvid: 01:17

With these transcripts and then data mined insights from every podcast conversation out there like seconds after it's being released, I I was wondering what couldn't I do. Right? PodScan could do everything. We could be this alerting system, mention tracking platform, a tool for downloading somebody's entire thoughtscape. Everything they ever said or were even talked about from the Internet.

Arvid: 01:38

Or maybe even be this comprehensive data platform for allowing others to analyze podcast trends over time or categories or geographic regions, all of these are valid use cases. And people actually use PodScan for each of these. I have customers doing one of these things exactly and they're paying for it. But as a solopreneur, focusing on multiple directions simultaneously is always challenging and it's not always a good idea. Each use case requires different positioning.

Arvid: 02:07

I need to talk about the business differently. I need to emphasize what the product can do differently. And I have to communicate the value of the business of a subscription for that matter in a different way. So when I talk to a podcast marketing agency, they don't care about, like, data extraction, file formats, or API specifications. They need to know if they can effectively place their clients on podcasts.

Arvid: 02:31

That is the job that they were hired for. And, conversely, business is building on top of PodScan's Firehose API, all the transcript data that is sent out as soon as we transcribe a podcast, well, they don't need our alert tracking component or the UX for that. They don't care. They just want reliable access to raw data and then do whatever they wanna do with it. So it's all about perspective here.

Arvid: 02:53

So while juggling all of these different possibilities, I think a pattern emerged for me over the last couple months. There's one particular customer type for whom PodScan's value is more immediately apparent than to others. Right? One of them stands out and that's podcast agencies. When they log in, they instantly understand how they can search for podcasts and track their clients' names at mentions and get notifications that then help them place their clients on shows.

Arvid: 03:23

It's very clear, it's actionable and it's a value prop that I can highlight. And this realization that I can kind of focus on this led to some crucial product decisions that I made over the last month or so. Initially, in the beginning, notifications for podcasts were quite basic. I had the podcast name, the episode title, maybe a thumbnail and then the text of dimension itself. That was the basic version.

Arvid: 03:48

I thought that was enough because people could then, you know, take action. It was sufficient for that. But through conversations with my customers who use PodScan for booking, podcast appearances or reaching out for sponsorships, anything that's related to interacting with some podcast host somewhere, I discovered that they actually had 2 questions that they needed answers for critically. 2 very specific questions. The first one was, is it worth my time?

Arvid: 04:16

Is reaching out to this podcast worth my time? And the second one, well, if it is, how can I most easily reach out and get my person placed on this show or get the sponsorship or whatever? Right? Is it worth even thinking about it? And if so then how do I talk to them?

Arvid: 04:29

And the first question led me to tackle one of podcasting's biggest challenges. That is audience size metrics. Like figuring out how many listeners a show really has. Because this information is surprisingly hard to come by. Podcast platforms like the big hosting platforms out there, they don't publicly share listener accounts because of privacy and people don't want to talk about it.

Arvid: 04:52

And unless hosts voluntarily share their numbers, it is completely opaque. Like, there are podcasts out there that you think may have like a couple hundred listeners. They have thousands of maybe hundreds of thousands of listeners. You don't know. Might be a niche somewhere, some weird niche podcast that you think is like, no, but who's listening to this?

Arvid: 05:11

And they have 100 of thousands of people listening. You couldn't tell from any data out there. I mean, there are signs and hints, like, if the podcast exists as kind of a YouTube show somewhere and they have millions of subscribers, it's likely that the podcast itself also has a couple, but that is all information that has nothing to do with the actual podcast itself. It's adjacent. And even the major platforms like Apple Podcasts, Spotify only show rankings within the categories if things even chart but never actual listener numbers.

Arvid: 05:42

At least not reliably. You won't find download counts per episode anywhere public. It's all estimated or if people talk about it, it's super rare. You have to really chase these down. It's intentionally opaque data that platforms and creators guard closely because obviously if you have a podcast with a lot of listeners sharing that number with a potential advertiser that's great.

Arvid: 06:05

Right? But if you don't have a lot of listeners but you wanna find a sponsor well you wanna make it as appealing as possible. So, you know, it's it's it's very opaque. People don't wanna share it. And this drove me to build something that I thought initially was impossible, which is why I hadn't done it in, the last, what, 6 months until I started working on this.

Arvid: 06:26

I built a machine learning system for estimating podcast audience sizes. The journey for that started with a pretty intense manual data collection. I spent weeks gathering information about thousands of podcasts out there where hosts had publicly shared their listener counts in interviews, social media during episodes. I really dove deep. And PodScan was very helpful because I could just search for this kind of stuff inside my own database, which was really cool.

Arvid: 06:52

And for each podcast, I collected over 260 different data points. Some of which, obviously, automated from my own data source of a podcast. Right? But some also painstakingly by going to websites and looking at numbers and getting them in. For each podcast, I looked at public metrics.

Arvid: 07:12

Things like rankings and review counts and ratings across all these different platforms. They were pretty easy to track. It's just a massive chore to do it. And then from my own database I inferred things like content patterns, episode frequency, how often do they release an episode, how long are these on average, what's the deviation, right? You can infer a lot of interesting features which is what in machine learning these things are called about a dataset just from the relationship between these things.

Arvid: 07:39

And guest appearance rates, how many of these episodes have guests, which is also data that PodScan has and automatically extracts from each episode. I looked into historical data. Podcast age, how old is the show? How many episodes have they released? And how consistently are they publishing?

Arvid: 07:53

Was there a break? Is it very consistent? Looked into engagement signals. Like, is there social media presence attached? Are there traffic indicators for the website?

Arvid: 08:02

And then looked into the categories that this podcast is ranked in as well. Because there you can kind of tell depending on the category a show is in that also affects how big the audience might be. So the real challenge after collecting all of this data came in building the machine learning model itself. I implemented it directly in my PHP application which was adding a layer of complexity on top. Because most machine learning stuff happens in languages like Python, right?

Arvid: 08:30

That's that's kind of the data analysis and data extraction language that is used quite a lot in machine learning. A lot of the AI stuff, of the generative AI, a lot of the the CUDA systems, like all the GPU things that are happening out there with graphics cards, computing a lot of data is sitting on top of Python. But I chose PHP because I wanted the model to happen directly inside my Lyra application. I didn't wanna have a microservice anywhere. I wanted this model to be represented as some kind of blob somewhere that my application could load into RAM, run real quick, and estimate the audience.

Arvid: 09:05

That's what I wanted. And that required careful architecture to handle the computational load quite efficiently. And the system that I have uses a neural network with multiple hidden layers and then performs gradient descent on top of that, to optimize for finding the best correlations between these input features and known audience sizes that I tracked. And one of the trickiest aspects here was handling outliers and incomplete data. There are ways to do this in machine learning.

Arvid: 09:31

I learned a lot about machine learning and trying to get this podcast estimator going. Not all podcast has these 260 data points available. You know, not every show has a ranking on Spotify or not every show has a YouTube channel attached but the system needed to be able to work with partial information. I implemented a weighted feature system that helps and adjust the confidence based on the quality and quantity of available data. It's a complex beast but it is working.

Arvid: 09:57

The current system achieves an impressive sub 3% error rate for the median error, which means that 50% of the guesses that it makes are within 3% of the actual value which is quite significant in a world where podcasts wildly differ in terms of audience size. And I think the biggest error that it creates average error is somewhere around 20 some 30%, which also isn't too big. That's just like, you know, you have a 2,000 listener audience and 20% of that is what, 400. So it says, maybe 2.5 k. That's that's, still very precise.

Arvid: 10:32

And the biggest error that the system currently has, and I'm constantly training new models in the background. I'm talking about the current, like the right now model that I have has, like, what is it? A a a 1000%, like a 10 x. So it might say, well, it's it's a 1,000 listeners for real, but it says 10,000 just because the data looks like it might be. So still fine.

Arvid: 10:55

It's still acceptable for the outliers to have these kind of numbers, but most of the estimates are very precise which is exactly what an estimator does. Right? You always have the good and the slightly worse data. But for everything I checked it with, which is what I do with my actual data, it's been very very close. Been really cool to build.

Arvid: 11:16

And to maintain and improve accuracy for this thing, I built a switchable model architecture so I can deploy new models as they are trained without any service interruption. I just, like, throw them up and the system uses the model immediately. And each version is tracked and evaluated so I have automated performance monitoring to make sure that it's always using the most accurate predictions. And if it can't predict something there's like a very simple just heuristic to say this is a podcast as like what 4 years old and has like 400 ratings. It's probably 20,000 or so listeners.

Arvid: 11:50

Right? It's a very like, it's just a linear thing that I have to fall back on if the actual machine learning model is not working for some reason. And that's just to get a number in front of my customers for them to make a choice. It was like 3 or 4 weeks of work just to produce this tiny little number that they can see and say, yeah, okay. This is a podcast that I can spend some time on.

Arvid: 12:12

You know, to place my client It was quite some work. And the 2nd major challenge after I built this, streamlining the outreach part is requiring a completely different kind of technical solution here. So I already have contact information in the database and that's kind of scattered across all kinds of data sources, right? Some of the contact information lives in the RSS feed of a podcast, some is in the episode description, some people put it in the show notes, there are social profiles that are linked to a podcast that are mentioned in a podcast. There's marketing websites that are also in the feed or in the description and then there's historical information just from the conversations.

Arvid: 12:50

There were people talk about their their contact information. So I built a contact information extraction pipeline that takes these sources and uses NLP, natural language processing to identify and validate contact details. And the system can then distinguish between general information and specific guest stuff that I can kind of put aside because sometimes people really just wanna talk to the host but hey, sometimes the guests too. So I have that as well. And then as confidence scoring internally to email addresses that I find in the official RSS feed obviously are more likely to be the actual email to reach people in than some email I find from an episode description or something that is mentioned on a show.

Arvid: 13:30

But all of this goes into the contact data outreach capabilities of PodScan. And with both of the audience metrics and the contact data systems in place I completely redesigned the notification system. Because I like I said in the beginning, those notifications that I had were rudimentary. They were enough but they were not enough. They were just some information, they were lacking the actionable information.

Arvid: 13:53

So now when users receive a mention alert they see an estimated audience size, they see a growth trend over time because I have historical data, right, over all these, these audience size calculations and direct contact options. Like, they have an email if possible because the best way to contact is always through email or social feeds right there and then they can see okay this person is also reachable here. And that allows them to just collect all of these things. I have list feature too that I recently added so people can actually add these mentions to a list and then have a one click export that allows them to connect their CRM systems to start their own campaigns. And if there is historical interaction data, I make this available too.

Arvid: 14:37

The results have been remarkable for this. Our trial to paid conversion rates have improved quite significantly over the last week or so because users find this clear value in these metrics for prioritizing the outreach efforts. That's what they come here to do. And if they see that the data is actually helping them do it, they start paying for the businesses. They start paying for the subscription.

Arvid: 14:57

The average time from receiving a notification to initiating contact is now also much faster because they don't have to jump through hoops. They can click on the email right there and an email window opens up. Right? That is making it easy for my prospective customers to see the value in the platform and to use it immediately to get their job done. So I don't think we're done evolving with this but it has been a major step almost micro pivoting into a more this customer centric feature.

Arvid: 15:27

I think the next challenge here is demographics. Figuring out location, gender distribution, age range, that kind of stuff. This might involve audio analysis. I'm really looking forward to this like trying to figure out what the age of the people talking is just from the audio data that's something that as a technical person I'm super interested in or text processing is still in there or maybe both. I'm excited about your voice analysis, another round of NLP for audience targeting, figuring out the keywords.

Arvid: 15:54

I might actually do some kind of, this age group uses these kind of phrases thing and from there analyze the way people talk to figure out who they're talking to. And then geographic distribution mapping is gonna happen, content categorization is gonna be improved and trend detection is also a thing I'm working on. But all of this is meant to help. Help people make a choice. And what I've learned through this journey is that product market fit isn't just about having valuable features, like features are always great, but it's about making that value immediately obvious and actionable to users.

Arvid: 16:30

Right? I've been extracting all of this content information for months, but I didn't make it available. They had to jump through hoops to get there and that was a problem. And for PodScan, this means moving away from this one size fits all approach to optimizing specific features for our best fit customers. The API part of the business serves businesses wanting to process podcast data professionally.

Arvid: 16:52

So I do a lot of professional API development on that side. But the alerting system focuses on agencies needing quick insights and action paths. So I optimize that part for them. And I don't think this is a pivot really. We're not changing what PodScan fundamentally does.

Arvid: 17:07

We're just getting better at showing the right value to the users at the right time. Right? To the right users at the right time. Sometimes that means building something you initially thought impossible like a machine learning system in PHP. But when it serves your users core needs, it is worth the effort.

Arvid: 17:24

And that's it for today. Thank you for listening to the Bootstrap founder. You can find me on Twitter at abitkahl, a r v I d k a h l. You find my books and my Twitter courses there too. And if you wanna support me in the show, please tell everyone you know about podscan.fm and leave a rating and a review for this podcast by going to ratethispodcast.com/founder.

Arvid: 17:43

Makes a massive difference if you show up there, because then the podcast will show up in other people's feeds, and that will truly help the show. Thank you so much for listening. Have a wonderful day, and bye bye.

Creators and Guests

Host

Arvid Kahl

Empowering founders with kindness. Building in Public. Sold my SaaS FeedbackPanda for life-changing $ in 2019, now sharing my journey & what I learned.

360: Product-Market Fit & Time-to-First-Value

Broadcast by

Creators and Guests

headphones Listen Anywhere

Listen Anywhere