349: Navigating Constraints as a Bootstrapper
Download MP3Hey. I'm Arvid, and this is the Bootstrap Founder. In recent podcast episodes, I've been discussing constraints and how I deal with limited resources as a bootstrapper. Today, I wanna dive deeper into this topic, and I wanna share my experiences with PodScan and how I navigate these challenges with limited adjustable income and fixed expenses. This episode is sponsored by paddle.com.
Arvid:More on that later. Running PodScan involves a delicate balance between various processes. There's transcription of podcasts, there is data extraction, and then question answering and alert verification. And, unfortunately, all of these things need a GPU to work efficiently. Graphics cards and their respective compute cycles are a rare commodity right now.
Arvid:And even with new providers flooding the market, I think just a couple days ago, DigitalOcean has started offering GPU droplets, This stuff is still very expensive. So my core business processes compete for resources all the time. And managing a steady flow of podcasts coming in while extracting valuable data from it is crucial to my customers who wanna see PodScan as something valuable or who I want to have them see it as. So when extracted data then fits an alert, so that's the third part and that's the main selling point of my business right now, It needs verification, and that's quite a bit of GPU compute. It creates an interesting balance and a balance that I have to maintain, especially when it comes to managing queue sizes, server capacity, because that's really what I'm limited by.
Arvid:As a main part of my UI and the underlying API, I am running mainly search. That is an open source, very fast search on a fairly sizable machine. But the structure of the data that I push there requires massive indexing too. Just imagine like a couple million of items and all of them have this massive transcript attached and then you wanna do full text search on it. Obviously, every day I'm pushing tens of thousands of these and to add to an index that already has millions, and it it just takes a lot of compute.
Arvid:And this can lead to the queue on that server falling behind. And then if that doesn't get updated quickly, it overall results in outdated data on the system. And that's a dilemma. I need to balance volume, customers being able to find everything on this API with fidelity. Data being as accurate as it can be and recency too.
Arvid:Right? Finding the most recent episodes first. All of these things are important, but I need to prioritize. And to prioritize at this point becomes extremely important, because it wouldn't make sense for PodScan to prioritize a 4 year old episode of a podcast that people have already talked about and kind of forgotten about over one that has been released, like, 2 hours ago. To my customers, recent episodes offer way more potential for user engagement and for collaborations or even just placement opportunities if you're a podcast agency.
Arvid:I often tell my users, if you respond to a recent episode, some something that has been released very, very recently, you talk to the host quickly or you post the whole episode on social media, a link to it, you can potentially get a partnership out of this or you find a place for your podcast agency to place a client, like, the more recent you are, the better. That's why timeliness is more important than completeness, but both are relevant. Right? You want both because I want search to work historically too. I want people to find that old episode if they're doing the research, if they're trying to figure out what people said, but I also want the recent ones to be in there.
Arvid:So I can't just ignore one over the other. And in my constrained world of what right now probably is like 30 some back end servers, so 30 graphics cards, and this one search server with limited compute capacity, I am constantly deciding where data should flow and who gets priority, who and what gets priority. It's not just a technical thing really. It's almost a mental exercise in prioritization. And I have found that it really helps to conceptualize all the software stuff, all these system priorities as a reflection of actual customer priorities.
Arvid:I could theoretically do the developer thing and build a perfect system with, like, ideal balance and really cool mechanisms and all that, but the ultimate decider, if this is good or bad, is how my customers react and what they truly need. Because they are literally paying for this. I'm paying for this clearly, it's an expense, but they the revenue they provide pays for this. So they need to understand it, they need to get something out of it. So I talk to them as I usually do, and I listen to them when they have something to say.
Arvid:And their feedback, both the explicit stuff and the underlying implicit thing, has been very instrumental in figuring out and implementing out my priorities. So here's how I came up with my priorities, what they are, I guess, and what triggered them. I'm gonna start with the biggest one, data extraction. It's crazy how important this is. My customers were most vocal about extracted data not being complete or present when they expected it.
Arvid:They didn't care, like, at all about transcription speed as long as it happened within the same day, but it didn't matter to them if it's, like, 10 minutes after an episode was released or an hour, maybe even 10 hours as long as the data they needed was there. And this was surprise to me initially because I had assumed that transcription speed and, like, the time to first email would be the primary concern that people have, but they would rather wait for data to be complete and enriched over getting it quickly. Ideally, obviously, it would be both, but in these conversations and in the actual behavior that I see from paying customers, they're fine with the delay as long as more data is in there. But the other way around, they start complaining. Very interesting observation, clearly impacts my priority here.
Arvid:The same goes for my AI verification, my in, what do I call it, my inference where I, do AI verification that ensures that the keywords that are mentioned in a transcript actually happen in the right context. Right? My customers can put a question into their keyword list, and then if that question is answered with a yes, then and only then do they get a notification. And if it's a no, then it's ignored. It's like if you have a a name like Meta or Facebook and you only want people, only want notifications when people talk about your brand, Facebook or Meta, in the context of, I don't know, the shareholder value or something like this.
Arvid:Right? Something very specific, You can add this line of text and that is the verification part of it. And I think this is very important to a lot of customers who have brands that are kind of very common names, and the speed of that was not a concern for them at all. As long as it happened roughly on the same day, again, like speed only doesn't matter until it's super delayed, but this tool to them is crucial for maintaining the accuracy of their alerts without being overwhelmed and customers were super forgiving of slight delays there. And, again, accuracy and completeness over speed, something I did not necessarily expect when I started building PodScan.
Arvid:And I had several calls and chat messages where people told me that they truly need slash value, I guess, maximally useful as enriched as possible data as an input for their own next process, their job to be done, whatever it is they're doing. Whether it's integrating into a CRM or an outreach tool or a database or just into an Excel sheet, it doesn't matter. I've come to think of business processes like the things that I do and the things that my customers do as machines with an input and an output shoot where stuff gets put in and then it comes out on the other side and the magic happens in the middle. And for me and when I look at my business, obviously, I have an input and an output and all the stuff in the middle that I do, but for my customers, all they see is my output and that is their input for their own machine that has an input chute and an output chute and that they stuff something into the input, then something happens and then they have an output that they need to deliver to somebody else. So for my customers, the more enriched data they get, the better their following process works.
Arvid:My output is their input. That's all that matters. They need the highest fidelity data possible. Speed, again, not important, which is interesting. Again, was surprising to me.
Arvid:And finally, reliability is very important to them. For my API users, having reliable and consistent data, both in the schema and in the actual data itself is crucial. They prefer having all possibly extractable data even if it means a slight delay in other things and other processes that happen. APIs don't really do well with change, so having data there that is kinda unreliable and sometimes there, sometimes not, that's problematic. It being the same shape and the same expectable density of data is very important to people.
Arvid:I used to expose new episodes onto my API even before they were transcribed and extracted and just acted like it was already kind of fully a a data set for people to pull, and customers asked for flags to be able to hide incomplete data from their from their tooling. So whenever they would send a request, they did not want to see incomplete data until it was fully there. Didn't expect that because I use APIs differently, but apparently, my customers do not and they pray for this, so they get what they need. And based on customer feedback, I've learned that my priority should be extracting as much data as possible from the podcasts that people actually want to learn about rather than tracking all podcasts everywhere as quickly as possible, which was my initial mission, the initial idea that I had. And this realization led me to focus on improving the data extraction capabilities of PodScan.
Arvid:I've been running a lot of experiments trying to find the right balance between speed and quality of extraction, and this has been taking up quite some time during the last week or so. And before we jump into that, let me talk for a minute about the sponsor of this episode panel because it's kind of related. When I started PodScan, I had yet another prioritization choice to make. How much of my time and effort should go into dealing with customer payments? You know how founders are.
Arvid:We have to do all the things. Right? We have to build the thing. We have to market the thing. We have to sell the thing, and then we have to get the money that people owe us for the thing.
Arvid:And I have implemented my fair share of payment systems over the last couple decades. And for PodScan, I wanted to make the optimal choice from day 1. And since I was building on Laravel, I looked into what the ecosystem would give me, which was Laravel Spark, kinda plug and play payment portal solution. And to my delight, they supported Paddle, like, from the get go. And why was I so happy about this?
Arvid:Well, I've used other payment providers in the past, and I had to deal with creating invoices, calculating sales taxes, I had to go to APIs to get, like, the current VAT in Europe, that kind of stuff. I had to chase payments that didn't go through. For PodScan, I was not gonna do that again. Paddle is a merchant of record and they take care of all these things for me, so I can spend my time actually scratching my head over data extraction problems instead of not getting paid. So if you're looking for a payment provider that takes care of you and your customers needs, I recommend Paddle.
Arvid:Check them out at paddle.com. I'm currently working with Meta's latest local large language model, lama 3.2, I guess, is what it is right now. It has a couple variants. They're all very interesting. The bigger variants have vision capabilities now, which is new for meta's model, but it's really not particularly useful for podcasts.
Arvid:I'm more interested in the smaller, the text models, Particularly, the 1,000,000,000 and the 3,000,000,000 parameter models that can run on edge devices or servers with limited resources. I think they're actually trying to get these things to be able to run-in CPUs effectively, which is really, really cool. If you think about what that means that every computer out there can deal with this eventually. And even right now, they're so small that it's quite exciting because you they can run these models can run on about 10 gigabytes of graphics RAM, some sometimes even less, which is something that even normal graphics cards can handle. Laptops can handle this.
Arvid:And this means they'll be very performant in their regular state, those models, and even more so when quantized and preloaded onto my back end servers. As you can probably tell, I'm quite excited to get these things on the road. And if I can get the llama 3.23000000000 instruct model, that's the one I'm currently working with, running where currently I have the llama 3.07000000000 instruct model, I could potentially speed up extraction significantly, probably double it without losing much in quality or to just terms of accuracy for summaries and data extraction precision. I might even try the 11,000,000,000 model and see if I can make it work within my constraints and be at the same speed as the 7,000,000,000 model was before. So there's a lot of really fun experimentation with these local LLMs that I'm currently doing.
Arvid:And the idea here is to find the perfect balance between accuracy of the extraction that it does and cost of the GPU that is used for that, cost per cycle. Right? Cost per transcription that I run through it. And fortunately, that is one of the most wonderful things about this whole space right now. Every few weeks, new models or new implementations or inference strategies appear, and they make it into the tools that I use.
Arvid:So I just need to get pull and recompile, and then it's, like, 20% faster. And this makes everything just a little bit faster or a little bit cheaper. So the true challenge is we need to keep up with the tech, which models are out there. Is there an optimized version of the model? Is there an optimized version of, like, llama.cpp or whatever?
Arvid:Well, that that is what I'm currently dealing with. It gets me quite excited. It's probably audible, but it's not the only thing I'm dealing with, you know, founder with a 1000 hats. I have another challenge that I wanna share because it highlights just how hard it is to deal with things that don't scale well compared to things that actually do. So one of the most computationally intensive processes in podcast analysis, like when you have the raw audio and you wanna deal with it, is diarization.
Arvid:And that's the process of detecting and labeling different speakers in a podcast. And this became a significant bottleneck in my system, and it's a great example of how constraints force you to think creatively. Before I started diarizing episodes, and that was like the the first couple months of PodScan, I was just just transcribing them. Didn't care, like, who said what. I just wanted the raw text so I can get to the keywords.
Arvid:But more and more of my customers told me, hey, it would be really cool if I could either know exactly who's speaking or at least I would see that there are different speakers in here. So before I did this, I could process like a 120,000 episodes a day with, like, medium to high quality transcription. It's very easy. If I went for extremely high quality, it would still be around like 80,000, which was significantly outperforming the 30,000 new episodes of podcasts that are released every day. Right?
Arvid:I could easily cover today's 30 k and then for the remaining like 50 to 70,000 episodes that I could run-in on any given day, I could go like 2 days back into the past. So every day, I would step one day ahead and 2 back in the past in my backlog, which was great. But after introducing diarization, this number dropped to dramatically from a 120 to 50,000 or less per day, depending on the length of these shows, sometimes as low as 35,000 episodes. That is just enough to do what comes in every day. And unlike transcription, where you can choose different models to balance speed and accuracy, You can have, like, a super fast model that often makes mistakes, but it still gets to the gist of it.
Arvid:Where you can have the highest quality model that takes, like, 12 times as much time to to to actually transcribe, but it gets you the best results. But diarization doesn't care. It's a very linear affair. It takes the same time to diarize a high quality expert panel where 10 people talk about very important stuff as it does a simple monologue or a reading from a book. And to address this, I had to build an internal prioritization system that considers factors like podcast popularity or user interactions from from customers of PodScan.
Arvid:Have they interacted with that or not? And search frequency, how often does this appear in people's searches? This prioritization system determines which podcast should get diarized and which receive regular transcription so I can keep up with my backlog. So I've really adopted an 80 20 approach here. I provide standard treatment by default to every podcast, and then I flag podcasts for special treatment based on user interaction and importance.
Arvid:If a podcast gets a certain amount of interaction from my customers, it will eventually get flagged by the system to be one of the important ones and get diarized. If people don't interact with the podcast, if it never has any mention of a keyword of the the ones that people currently have in the system on PodScan, then maybe it's not important enough to be prioritized for diarization and it does not. It will still flag keywords in the future, and that will bring it back onto the map where it might get flagged in the future if people interact with it. And as you can imagine, all of these transcripts, diarized or not, that's a lot of data. 30,000 plus episodes a day that occupies quite a lot of space in my database, and that brings me to yet another constraint that I'm dealing with all the time, that's search data indexing.
Arvid:PodScan really serves 2 main functions. It's a business that has kinda two products. Not too happy about this. I would like to streamline that either into an even bigger suite of things or into different businesses, but I'll get to that when I'll get to that. The first one is it's kind of a Google alerts for podcast mentions.
Arvid:Right? It's very much real time. Like, this was just mentioned on this podcast. Here, you get an email. Now you can do stuff with it.
Arvid:And on the other side, it's kind of a Google like search for podcasts with a lot of historical data. You wanna figure out what somebody said on the Tim Ferriss Show 2 years ago? Yeah. That's what you do. You search for it, you find it, and you read it, and you then do something with that.
Arvid:And the search function is accessed through our interface which some people use, and the API, which more people use. I have a lot of people that build tools on top of this. So when I encountered issues with Matysearch earlier, we mentioned this. Right? Being millions of items behind an indexing because I just was throwing way too much data at it, my API users reached out about the lack of fresh information, and they were very clear to me in their communication.
Arvid:They said, I use the API to get fresh information. I feed this into my own system. If the information on there is not fresh, this is a problem for me. And this feedback led me to develop another prioritization system for search database updates. Initially, I had the approach that every episode with a transcript that I have, either highest quality or lowest quality, popular or not, should be in a search database.
Arvid:That's kind of the volume thing. I wanted everything to be in there. And when that overflowed the database, I switched to only including important episodes. But for many podcasts, that meant that for the less popular ones, title and description of the episodes of that podcast were not searchable so that whole podcast would never be found, which wasn't ideal either. So now I found a balance here.
Arvid:Every episode's title, description, podcast name, and the metadata are written to the database, the search database, but full transcripts are only included for episodes from podcasts with a certain level of popularity. Again, are people using it? Are people following it? I sometimes it's a dynamic system, but let's just think if the iTunes rating, the amount of reviews on iTunes for this podcast is over 10, then its episodes, full transcripts are added. If it's below, and this fluctuates obviously, but then they might be omitted.
Arvid:And this dynamic priority system adjusts based on the current queue size of my back end, and it allows for more room for popular podcasts likely to contain information that's relevant to PodScan's professional users. That's really what it is. It's not a perfect solution, and I kind of hate the idea that I can't include everything right now, but it's a necessary compromise given my current constraints. And as a bootstrapper, I often face the unfortunate reality of not being able to do everything simultaneously. Prioritization has to happen because without it, I won't be able to attract customers because the service is gonna be subpar on every level.
Arvid:And without customers, the business won't be profitable, so it won't be around for much longer. So I'm very much enjoying the kind of limitations that I have around my back end systems and stuff, because it allows me to really min max the efficiency of these models and see which one of these many open source models can do the most work and create the most value for my customers. But every component of PodScan that interacts with the world of podcasts and the data in there has to have this balance of priorities, and that is derived from customer conversations and the observed behavior. I've started tracking a lot of behaviors of my customers, like, internally, obviously, and kinda anonymized as well. Like, what things are clicked on the most?
Arvid:Which items are opened? Which podcasts are frequented a lot? And I continually ask to just understand. I ask them questions to understand what customers need and then automatically prioritize these things in future interactions with the software that they might have. And I focus on data quality.
Arvid:Fidelity is very important for me, like the completeness and accuracy, timeliness, and then quality of service. Like what can I deliver? How quick can I do it? So for instance, and this is like with just an example of how I go about this now, I get a lot of podcasts coming in every day from things like city council meetings and religious readings. They're interesting and they contain a lot of relevant information depending on who you are, but it's content with a small but well defined audience.
Arvid:And unless that audience, this is the bootstrapper mentality, is a customer of PodScan right now, I just transcribe these things quickly to catch most of it and then summarize them, but I don't need a 100% accuracy on stuff that nobody looks at right now. Might be different for podcasts that people are actively following. Every episode of those needs to be high quality. But as a bootstrapper, I can always adjust things once I find customers who care about this data. I can reanalyze these things later.
Arvid:But right now, my focus should be on things that people already use. So in the end, navigating limitations as a bootstraper is about understanding my customers' needs and having those customers in the first place. Maybe that's also important. Having people to pay for this stuff. Right?
Arvid:I need to set clear priorities, and I continuously optimize my systems to just deliver the most value within my constraints that is noticeable by my customers. It's challenging, but I think it's very rewarding as a process because it forces me to think creatively and stay closely attuned to my users' needs. And this approach has not only helped me make the most of my limited resources, but has also given me valuable insights into what my customers truly value and what I thought they did and the discrepancy between them. It's this constant process of adjustment and refinement. But it is what allows a Bootstrap business like PodScan to compete and thrive in a resource intensive field, like trying to get all podcasts everywhere under control.
Arvid:So, yeah, that's what I'm dealing with. And that's it for today. Thank you so much for listening to the Bootstrap founder. Big shout out to Paddle for sponsoring this episode. You can find me on Twitter at abidkahl, a r b I d k a h l, and you'll find my books on my Twitter core stat too.
Arvid:If you wanna support me in this show, please tell everyone you know about podscan.fm and leave a rating and a review by going to rate this podcast.com/founder. It makes a massive difference if you show up there because then the podcast will show up in other people's feeds and any of this truly helps the show. Thank you so much for listening. Have a wonderful day, and bye bye.