327: Two (Surprisingly Scary) Tales of Platform Risk

Download MP3
Arvid:

Over the last couple of weeks, I ran into 2 very distinct, but still pretty dangerous instances of platform risks, and I wanna share them with you today. I am Avidka. Welcome to the Boots of Founder podcast. This week, I ran into 2 very interesting instances of platform risk in my own software business. I will share what happened, what I did, and how I intend to combat them in the future.

Arvid:

This episode is sponsored by acquire.com. More on that later. And before I get into the nitty gritty details of all of this, be aware that this episode might get a bit technical, a bit too technical for some. But don't worry. There's something to be learned from all of this even if you don't know what an AWS RDS blue green deployment is.

Arvid:

You don't need to. I'll explain it, and I'll explain the context of all of this because the problem of platform risk is very universal. Every software business has to run on somebody else's computer. I guess every business for that matter depends on other businesses to function. Your suppliers, your vendors, resellers, they all impact our success, and that is particularly true for software founders.

Arvid:

Few of us get to self host our servers in our basements or in conveniently located data centers. Almost all parts of a software business stack are either run or maintained by somebody else. And for PodScan, my own business, it is no different. Over the last couple of weeks, I ran into 2 very distinct but still pretty dangerous instances of platform risks and I wanna share them with you today. The first instance started 2 weeks ago when I found an email from Amazon Web Services from AWS hidden in my to be read later stash.

Arvid:

I use hey.com for email and it was in my transactional email stream. So I didn't see it for a couple of days because I only occasionally check this. And I'm glad I did check because I found an email that warned me that AWS thought there was unauthorized access to my account. And if I didn't respond within 5 days, they might close the account. And I found this email 3 days in.

Arvid:

So I still had some time to respond, but I was pretty upset. This email could have cost me my entire business because I run PodScan's database on AWS using their managed database service, RDS. And all of my data including the backups, is on AWS in that account, and they thought there was unauthorized access. So they needed me to respond quickly or they would delete my account with an extremely short ultimatum of 5 days, 2 of which or 3 of which had already passed. And of course, it happened at the worst possible time for me.

Arvid:

Just hours before I left for a long weekend vacation, I received that email. And I was faced with the possibility that within a couple of days, all my data could be restricted or at worst deleted. Now, had I been at home and known to be home for a couple days, I could have devised a backup extraction strategy and all of that, but try doing that from a moving vehicle or the hotel WiFi. We're talking about just under a terabyte of data here. That's not gonna happen if you don't have anything prepared for it.

Arvid:

But I didn't panic. I just sighed and got to work. I immediately reached out to AWS customer support to just figure out what was going on. And they told me again there was unauthorized access and I should check stuff, so I looked into all the logs in my account and found no illegal access. But what I did find, what probably triggered their internal alert was that I had set up a new server for my search engine that I use for PodScan.

Arvid:

It's a Miley search server or a server that has Miley search installed, and I used an old AWS access key for that that I created back, like, half a year ago when I started PodScan. It went unused then. I just created it and set it up in my Laravel Forge system, but now the system used it. And me suddenly using that key to provision a pretty beefy server caused AWS to decide that this was probably a leaked key, unauthorized access, and to restrict my account. And in a way, I really like the security approach here.

Arvid:

They restricted the account immediately after suspecting tampering. They caused a full password reset for me as as the main root user and they limited my account so I couldn't create any new instances or even security keys before I had talked to the security team. And this led to an email back and forth over several days with AWS telling me what they thought had happened and me proving it wasn't illicit access. And throughout my vacation, like, once a day, I responded with an email to their requests. And by the last day of my vacation, everything was finally resolved.

Arvid:

I had to go through every single service and deployment on AWS and then explain what it was, what it was for, who accesses it, from where, and if it's legitimate. And once they had all the information that they needed, restrictions were lifted and always back to normal. But that could only happen because I found that email and acted swiftly. And that situation made me realize that my data is not entirely safe if AWS can just turn it off whenever they please with a couple days of ultimatum. I needed to back up my now sizable data, the whole database, to other accounts and have a local copy.

Arvid:

So I'm putting systems into place here to backup and restore my data from AWS without being entirely dependent on them and that particular account. My database is very critical to the success of the business of PodScan, so I need to protect it with external backup systems. Think like weekly dumps on my local computer as long as it can still manage the database size and then uploading it into non AWS backup locations. That's what I'm gonna do with this. It's not a perfect solution.

Arvid:

It may not be easily automated. It could be, but there's a lot of data and there's a lot of egress and ingress in terms of cost depending on where you store it, but it is better than not having a copy of it at all. And the value of the data of PodScan is such that even if I had to revert back to the state of the database a week ago, it would still be extremely valuable because there's already millions of database rows in there that make up the value of PodScan. So it's perfectly fine to have a slightly older backup. It, will catch up anyway because, we know the reality of podcast out there is that the old podcast, they stick around for a while so I can just rescan them and all that.

Arvid:

So it's fine to have a backup somewhere. That's what I'm trying to say. And that is my solution for this first instance of platform risk that I encountered. But the second instance is also very interesting, and that happened a couple of days ago. And in a way, it's kind of related to the cause of AWS's false positive detection.

Arvid:

It was this new server that I created on AWS the day before they restricted my account, and that server started malfunctioning. And again, I'm using Miley Search on there, and and that is a very fast search engine that allows, like, sub 50 millisecond responses to search queries on arbitrarily large data. It's really cool. It's effectively using a copy of my existing relational database, that MySQL database that I host on RDS, and then it's optimized for quick search retrieval. Right?

Arvid:

The normal database is optimized for all kinds of database shenanigans, and that mighty search thing is really just for really quick type ahead search and type of typo detection and being able to, you know, have certain search queries optimized for, like, trending and all that kind of stuff. It's really, really cool. But judging from my performance logs, that cool piece of software started malfunctioning ever so slightly over the last couple of days every now and then. And then it completely stopped working. It would just block requests for over 2 minutes at a time, and that would make both searching and keeping the index current impossible.

Arvid:

And with search being an essential tool for my API and my UI, I needed to fix that. And that was quite the oddest day. I found that nobody else seemed to have this problem. I was looking in Google and I even asked, like, ChatGPT which was halfway down as well. Another example of a platform dependency, if you use these kind of tools, if they're not working, you're not working.

Arvid:

So I was trying to find any kind of source of this, and I found 2 people, like, 1 or 2 people in Stack Overflow that had a similar problem with the software over the last couple of years. And the solution that they found was there wasn't really a solution. They had to delete the search index and repopulate it from scratch, which at any sizable databases kind of volume can take a long time. In my case, with my database, it takes almost half a day. And there was no way to repair the broken index, which I guess was the issue suggested as the cause of the problem.

Arvid:

BidiSearch is great as a piece of software when it works, but it does occasionally run into issues at scale that it itself can't reliably detect or warn about or even report as an error. It just looked up on me and I didn't know why. So I had to kinda guess. And, ultimately, I went for this great reset there. I had no alternative to the solution, which meant the system had to catch up and was performing sub bar for a couple hours as a completely new index had to be created and was slowly populated with data.

Arvid:

That made me realize that I'm quite dependent on this particular system as well. So while my index was growing, I started looking into solutions to take my research either out of the picture in in the first place or at the very least, work with my existing database, my MySQL database that I have, and then have multiple search engines running simultaneously outside of it. And as I was waiting for things to catch up, I dove into my PHPStorm and built a front end search feature that works entirely in MySQL using a full text index on the fields that my customers search in, which are like podcast names and episode names and the full transcripts and descriptions of these shows. And of course, yet again, this is not a quick thing to deploy because the code itself is just a few queries and some UI. I wrote this in like 30 minutes, but the index that I need is another story because my MySQL server does not yet have a full text index on all the transcripts.

Arvid:

And there are about 6,300,000 transcripts in there right now, each many thousands of words. That's billions of words that need to get into that index. And creating such an index takes a long time, and different databases handle index creation differently. And I learned that a full text index in my version of my SQL can only be done in what they call shared mode. That means there is a write log on the table while the index is being built.

Arvid:

So nobody can write to the table. Applications can still read from it, but they can't add new rows or update existing ones. And that is problematic for an active application that is trying to check millions of podcasts every day and has tens of thousands of new episodes going into a database at all times. And I tried creating the index on my production system, which was not a smart move. And 5 hours in, it still wasn't done.

Arvid:

And all of a sudden, my application just went down because the backlog of rights that was kinda just kept because things were happening on the server and they wanted to write to the database, and the database was like, hey, guys. You wait until I'm done here. And that was just amassing a lot of data that overwhelmed my SQL Server. The thought it would at least keep up with reads, but that was way too optimistic. Everything kind of froze and I had to kinda stop the whole thing.

Arvid:

Once it did that, my system quickly recovered and then after being interrupted by a power outage here of all things, which is a whole other example of platform risk, but at least that gave me some time to mow the lawn and get my mind off things, I just reconsidered what had happened and looked into solutions. And I learned that there was a way to get my SQL full text index without slowing down the whole system. I'm just gonna share this with you in case you run into a similar issue and wanna solve this, wanna build a big index on a production system. So you need to create a read replica on wherever you host this. Like, a read replica for, an SQL server is effectively a copy of that SQL server that cannot be written to.

Arvid:

It can only be read from that is in constant sync with the main server. Right? Main server takes reads and writes, and everything gets kinda copied onto that other server, and you can read from it in a faster way. It kinda takes some load off the main server and allows that read replica to really be focused on reading data quickly. So you create this read replica, you turn off the read only protection replica.

Arvid:

You're not supposed to write on it, but you can if you turn it off. Then you kinda disconnect it from your system. So you don't read off that server and then you create your index on this replica so that, like, a couple hours into creation of the index, it is completely done and is automatically updated. And then here's the trick, you promote that read replica to the main server. You kinda switch out the database connections on the fly in your application as this promotion happens.

Arvid:

So now the next read replica, I guess, with the index becomes the new main server, and your old main server becomes the read replica, which duplicates the index while you don't have load on it. Right? It just creates the index. It will take a couple hours as well. So that's how you get a big index on a production system.

Arvid:

You kinda create a copy, you just let it run, you create the index there, and then you switch it over. And I will, once this is done, have a full text index on all my podcast episodes and transcripts soon, and this will allow me to implement search more reliably and kind of more locally in a way without needing an extra search server with another binary that may or may not lock up at any point of time, and I have no idea why. So I'll still use MylySearch. I like it, but I will not rely on it exclusively for features. I plan to build a backup system for quick search just like MileySearch, still using it on my UI and API with a fallback to my own database using MySQL's full text index.

Arvid:

Building a software business is never without its surprises, and most of them are just these externalities that you just cannot be fully prepared for. All you can do is have the courage to face those challenges every single time they rear their ugly little heads and build processes to prevent those things from happening again. That's all what business is. Like, you run into a problem, you solve it, and you kinda set yourself up not to run into the problem again. And this week taught me that I need to have reliable backup mechanisms to reduce platform risks for the core things for my business.

Arvid:

I'm working on implementing these solutions to ensure the critical functionality of PodScan can always work as long as it's connected to a functional and well maintained database which I externalize to a provider that I pay to do this for me. And if you have to work with platforms, you have to know the risks they pose and prepare accordingly. I wanna briefly thank my sponsor acquire.com. Imagine this, you're a founder who's built a great software business, you found your customers, and you're making solid recurring revenue because what you offer matters to them. But it's kinda becoming less and less important to you.

Arvid:

Maybe you've hit a skill ceiling or you just wanna do something else. No matter the reason, you're losing interest in your business. And unfortunately, this too often becomes a story of inaction and stagnation. And in the end, the business becomes less and less valuable or even completely worthless. And that's not okay.

Arvid:

You put in so much work. So how about listing it for sale on acquire.com? Thousands of founders have sold their businesses there for life changing amounts of money and found a new home for their software babies. So go to try. Acquire.com/arvid and see for yourself if this is the right option for you right now or in the future.

Arvid:

It never hurts to be prepared. Thank you so much for listening to the Bootstrap Founder today. You can find me on Twitter at abidkaahl, arbidka h l, and you find my books on my Twitter course there too. If you wanna support me in the show, please subscribe to my YouTube channel, get the podcast in your podcast player of choice, and leave a rating and a review by going to rate this podcast.com/founder. It really makes a massive difference if you show up there because then the podcast will show up in other people's feeds.

Arvid:

And that's where I would love for it to be. Any of this will help the show. Thank you so much for listening. Have a wonderful day and bye bye.

Creators and Guests

Arvid Kahl
Host
Arvid Kahl
Empowering founders with kindness. Building in Public. Sold my SaaS FeedbackPanda for life-changing $ in 2019, now sharing my journey & what I learned.
327: Two (Surprisingly Scary) Tales of Platform Risk
Broadcast by