308: Michael Taylor — Prompt Engineering for Fun & Profit
Download MP3On the Bootstrap founder today is Mike Taylor, a prompt engineering expert. We talk about AI. Why AI is more like people than you would think, how to build the perfect prompt, and how the field of AI is developing and will develop for indie hackers, founders, and entrepreneurs in the future. This episode is sponsored by acquire.com. More on that later.
Arvid:Now here is Mike. Michael, do you think generative AI will one day be as ubiquitous as concepts like databases or emails, like these things that are all around us all the time. Will Generative AI be that for us as well?
Mike:I I think there'll be an interim period where it is. Although, I I suspect a lot of those old abstractions will become irrelevant. Like, it like, we might not care about databases I was telling you one day. Like, the thing I I think about is, I read an article about, how kids these days don't understand file systems because they grew up with Google Drive. And in Google Drive, you could just search.
Mike:And because the search is so good, they never organize stuff into folders. So that's really surprising to me. Right? Like, a child of the eighties nineties. I meticulously organized stuff.
Mike:But it kind of really made me think because, I mean, you I don't know if you've used, object storage and stuff with the the startup that you're building, but, like, even in Google Cloud Storage, there is no real file system. It's just blobs. And then the folders are just like if you give it a name with a forward slash and then a folder name and then a forward slash, then then it calls it a folder. Right? It displays it as a folder, but it's not how it is actually stored.
Mike:It's not actually in that folder, if that makes sense.
Arvid:That's such an interesting point. It's like the the new technology actually acts like the old thing that us old people know. Yes. That would used to work. Yeah.
Arvid:That that wasn't that, like, an article in Hacker News about this just a couple weeks ago, like AWS is not a file system. Right? Like that that
Mike:was Yeah.
Arvid:A day to where it's really just blob storage. And it's it's meticulously secured with check sums and everything, but there is no file structure. It's just it's just set on top. That's an interesting point. I guess, if you only have a mobile phone and you don't really have anything comparable to, like, the the the Windows Explorer or Mac macOS finder or something like this.
Arvid:Like, you never actually have to deal with files in the context of just them being filed files somewhere. You always interact with them as data in an application. Okay. That's interesting. So what what will we see in that regard with with AI tools?
Arvid:Like, what steps will they take away that we take for granted right now?
Mike:Yeah. Good question. I mean, nobody knows is the short answer, but, if I was, like, venturing to guess, based on like, I I always like to say, I I can't remember who said this originally, but, they said, like, you should live in the future and build what's missing. Right? I think it was Paul Graham, in one of his essays.
Mike:So, so, you know, when I talk to people about the prompt engineering stuff that I'm doing, they're like, oh, you're living in the future, man. You know? Like, it's kind of crazy what you're what you're doing right now. And, and and so, like, if I can take my own life as to the abstractions that I care about, with AI, and, like, what I'm what I've abandoned now, I guess, since AI is doing most of my work is I don't really care about code very much anymore. So so, like, I used to work in data science.
Mike:I actually used to manage a team of data scientists, and, I cared very much about, like, how the code was written and what the results were. And and, like, we'd step through, you know, line by line. But now I find quite often, I'm just, like, asking chat gbt. It's writing some code. I'm not even looking at the code it's writing.
Mike:And it gives me the answer, and I, you know, check that answer, and I think, yeah, that matches my intuition. But, ultimately, that code is, like, written for one purpose and is never looked at again. And I suspect that that's gonna keep expanding. Right? So that's where data analysis is already pretty good at that, but I've seen some demos from people like Next.
Mike:Js, I think, is pretty forward thinking on this, like Vercel, the team for Vercel. They, they're they're like building one time use UIs. Right? So when the if you're chatting to ChatGPT, it builds a UI. Right?
Mike:If this this you can imagine this. It would build a form that's just related to that query, and then it throws it away. So I feel like a lot of code is gonna be thrown away and won't be seen as, like, an asset so much anymore. Like, right now, it's like I have a really good code base, and I I need to make sure nobody steals it because, that's how I'm gonna raise money with investors. And and I don't think that I think a lot of the value of, code as an asset, will will be stripped away.
Arvid:Yeah. I I guess that's that's part of, I think, the the underlying threat that a lot of developers feel that that part of their the job, like the creation of code, the writing, the actual authorship, like penmanship of code is gonna be taken away. But I I kinda also have the feeling that with that comes something else that does did not exist before, which is kind of the the wrangling of the code. Right? You become more of like a like an AI cowboy where you just kinda keep pursuing it back into place Yeah.
Arvid:Instead of being being the the person, like, actually, like, hurting the specific animals to really drive this metaphor into the ground, but you know what I mean. Right? It's it's like, the job changes there. Have have you seen, things happen in your interaction with those AIs, with code in particular, that you never thought you you had to do before? Because I certainly have in building my new product.
Arvid:Like, this is, like, an 80% code generated by AI that I just kinda try and make fit into an existing system. Do you experience the same?
Mike:It's like working with a petulant child sometimes. You're trying to, like, you're you're you're, like, writing, you you write this amazing prompt and, you're like, you know, you try it a few times locally, and it's like, yeah. This seems to be getting me the results I need. And then you run it like a 1000 times, and you're like, Like, every one in every a 100 times, it just refuses to do the task. Right?
Mike:And I'm like, there's no there's no rhyme, no reason why. Like, it's the same task. Right? So, yeah, that that that is, I think, it really changes the way you have to think about writing code and actually brings it closer to like, I I I came into coding later in life. Like, I I actually built and ran a marketing agency.
Mike:So, you know, when I was managing that team, I had, like, 50 people. You know, when when you're managing 50 people, it's, you know, there's 30 days in the month. So, it's at least one person's, like, worst day of the month, like, every day. Right? So, like, if you're just an employee, like, you have one bad day a month, 2 bad days a month, whatever, and the rest of the days are fine.
Mike:If you're, like, managing a team of people, there's always someone's bad day happening that day, and you you're the one that it kind of it filters up to. And and I kinda feel like that's the same with when I'm managing models. Right? Like, it's actually closer to being a manager of people than it is to being a software developer.
Arvid:That's very interesting, man. I love the analogy with the bad day because I have the feeling, particularly as we are living in this rapidly evolving world where models come out with new versions, like, on a daily basis. If you go to Hugging Face, just to look at the most recently updated models, like, you will find 100, if not thousands of models that have been updated in the version just today. Right? And some of them are the big ones that a lot of people use.
Arvid:So it feels like the the bad day for a model is not just a bad day. It's a bad version deployed on a certain day as well. Right? Like, there's you have to all all of a sudden judge is this good or is this not? And I honestly, I struggle with this because I use several, of these models locally.
Arvid:Like, my for me, I don't necessarily deploy them to the cloud. I run them on GPU instances in, like, I don't know, Lambda Labs or AWS, just some, like, instances of a VM with a GPU attached. And boy, is it hard to wrangle these things to get them to do what you want them to do reliably. It's what you said. They sometimes just flat out refuse for no apparent reason.
Arvid:The black box is a big issue for me. How how do you deal with this? Because I know you do a lot of prompt engineering. That is, the big topic that I really wanna dive into with you today because it's something that I do every day. It's everybody does it.
Arvid:Like, every software engineer is trying to figure out how to get AI to do their bidding. How do you deal with the fact that you have no insight into these models other than what comes out of them? You you don't really know their inner workings. Now how does this affect how you approach talking to them?
Mike:Yeah. I I think you have to kind of approach it more like, a researcher studying human behavior in a way. You know, you have to kind of, kind of observe what's happening in different, extreme cases. And then, and then and then you start to form a pattern of, like, okay. You know, 5 times out of 10, it says, you know, it it refuses.
Mike:But, obviously, each time refuses might be different because there's some nondeterminism there. Right? There's some randomness in the responses. So you have to kind of go through and notice that pattern. And it's the same way that, you know, if someone was doing research of, you know, tribe in the Amazon, like, and they're trying to figure out how these people communicate, how what the structure is of the tribe, like, how how how they behave in certain situations.
Mike:That's what they would do. Right? Like, they would maybe record a lot of footage or they would, you know, record a lot of transcripts and then go through the transcripts meticulously and go, okay. Well, you know, I I've noticed, like, 10% of the time this phenomenon happens. And, like, you know, in these cases, like, this is likely to happen.
Mike:So you're really kind of doing research into this new creature. It is this new form of intelligence that, you know, is even weirder than, you know, an uncontacted tribe because you've never because, like, you know, they're human. They're just like us. And this is like a simulation of a human. It's not, you know, not quite behaving the same way that we do.
Arvid:Yeah. It's it's funny. I thought our first encounter with aliens would look different, but this is what it is. Right? Like, we are literally seeing a new form of life for the lack of a better term because, obviously, it's not organic life, and it's not Yeah.
Mike:It's not it's not real life. It's just a, you know, it's a very good simulation of of, of how a human response rate.
Arvid:You know, the the term real here is something that I as a as a avid fan of science fiction, like, I've been a Star Trek fan for all of my life, and I've read a lot of, like, hard sci fi and that kind of stuff. And, like, you know, I'm I just love the idea of, like, what if. Right? That's that's what this is about. I I couldn't tell you if this is a life or not, and the whole debate around AGI and and q star put aside.
Arvid:Right? All of these these terms that, are very academic. Like, there is a feeling here that there there's something in these systems, even in in the most basic, like, one point 3,000,000,000 parameter trained, tiny 500 megabyte LMM systems that is just in inescapably different from how technology used to be. Like, there's something in there that creates something out of nothing, even though we kinda know how these networks work, how these models work. To me, that could well be understood as consciousness a couple 100 years from now.
Arvid:You know, like, how we, in retrospect, always figure out, oh, things have been a certain way. We just didn't see it. Kind of feels like AI is at a stage where this, in a couple decades, is clearly something different than what we perceive it as right now. Do you feel the same? Do you feel this is, like I suspect that.
Arvid:Yeah. On this on the verge?
Mike:Yeah. I suspect that. I mean, fundamentally, like, I don't care that much about, like, the theoretical, question. It's like, is this life or, you know, but, I mean, it's pretty clearly not right now. Right?
Mike:But but, like, it could, get to the point where I suspect we would have to treat AI with more respect. Like, I always say please and thank you to ChachiPT. Yeah.
Arvid:Me too. Me too.
Mike:Yeah. But but, but, you know, I suspect that, eventually, it might get to that point where you have, like, AI rights. So, you know, like because, ultimately, like, if it if it simulates life very closely, is there, like, that much of a difference? You know, just in the same way that, you know, you can have real emotion when you're playing your video game, which is just a simulation. And, and some people actually prefer playing video games to, playing reality.
Mike:So, you know, like, it's, I suspect that the question won't really matter that much, the theoretical question. What will really matter is how people behave around AI and and how we, you know, form as a society different, kind of norms around, how to work with AI. I think, like, that's gonna be important shift that happens.
Arvid:That's yeah. It it goes both ways. Right? Like, you have ethics for how you treat AI, like, as the potential life or consciousness that it could be, but you also have ethics in what you do with AI, Like, the the work that you create, the work that you let it create. And I think the the biggest debate over the last couple weeks was around, Google's, like, the the gamma thing and the the misrepresentation of historical visualizations, and that was a big deal.
Arvid:And the question was, well, is it ethical to misrepresent history, or is it ethical to ask for things that the AI does not want to do or is trained to decline? Like, where where do we draw the line? Because I I think a a lot of the visual AI has the obvious problems with, you know, pornography or with, things themes that that are deemed socially unacceptable. Yep. How how does the prompt engineering play into this?
Arvid:Because I know people have been trying to effectively jailbreak these systems and get what they want out of it. Like, well, how how do you deal with this? As somebody who is teaching people how to prompt engineer, how do you deal with this ethical implication there?
Mike:Yeah. I I I don't know if I just, am attracted to these, fields for some reason. But but, like, I we had the same problem in growth hacking. So my my agency that I built was a growth hacking agency. Right?
Mike:And, I was interested in growth hacking because it was like, this is what happens when you get a developer and you force them to work on marketing. And then the magic happens. Right? Like, they they squirm for a bit, but eventually, they produce magic. But, but yeah.
Mike:But then because, there are a lot of marketers who are like, If I, like, you know, do a boot camp or, like, I learned a little bit of JavaScript, I can, like, say I'm a your growth hacker, and then I can get paid like a developer. And, and then what they really end up doing was, like, spamming everyone's contact, address books. You know? And and so the growth hacking ended up being associated with a lot of spam and, like, bad behavior, trying to get something for nothing. Like, the word hacking obviously didn't help there.
Mike:But, it's it's a similar problem in prompt engineering where, you know, you get mixed reactions from people when you say, oh, I'm a prompt engineer. Because, what I mean by prompt engineering is, someone who works with AI to build a system, to get useful and reliable outputs. Right? So it's just like, if you're engineering a bridge, you want the bridge to be reliable. Like, you don't want it to suddenly turn to jelly.
Mike:Right? So so so, you know, you need to kind of understand a little bit about how bridges work, how physics works, you know, in order to, make sure that the process of building those bridges is reliable, and we'll get the same results every time. And and, you know, people don't get in trouble. So, prompt engineering is is like that to me. Like, if you're building a production system like you have with your app, you know, you just need to make sure it's reliable.
Mike:You can't have it refuse to do a request every now and again. Right? Or you can't have it, like, hallucinate and make make something up, that like, with your tool, like, it might say that something was said in the podcast that wasn't, and that could actually cause a big trouble. Yeah. A lot of trouble for, for some guests.
Mike:Right? So I that's how I see prompt engineering, but, how a lot of people see it is, like, these spammy, like, here are, like, 800 of the best prompts, for for this. And I'm like, okay. You look through it, and it's it it looks like someone casting a spell. You know?
Mike:It's like it's like an incantation. You know? That's that's how you know it's a lot of crap.
Arvid:Yeah. Honestly, that's that's exactly how I feel about most of, of this this whole AI world in many ways. It has some kind of sense of wizardry to it. Right? It feels magical.
Arvid:Yeah. Yeah. And it is it is it is kind of an incantation. It is you you you evoke a result by just telling something what to do, but you really you use the right words in the right order, and you have a a swoosh and flick to use the the Harry Potter methodology. Right?
Arvid:There there is there is something about, casting a spell that that a lot of prompt engineering looks like to the uninitiated. Obviously, once you look into it, you understand how, you know, the tokens work and how context works and and how even you can dive into embeddings and all these things where how the data gets ingested, but it it feels magical. What I I wanna wanna talk get get back to you mentioning the bridge because, like, civil engineering and architecture has a lot of certification to make sure that people don't build bridges that turn into jelly. Do you think, prompt engineering in the world of, like, AI education in particular would benefit from such such kind of a certification, or do you feel that it's still like the wild west and we'll see
Mike:where this goes right now? Yeah. Good question. I I mean, I would say it depends on what you're using it for. I think I think, you know, the there probably needs to be some sort of university degree in, AI, right, or prompt engineering, like, an actual one.
Mike:Like, in the same way that, you know, if you go to university to become an engineer, then you get trusted a little bit more with with these sorts of problems. But I would say that, you know, each industry, each then I think you're gonna have a lot less innovation in the fields where, like, you should be able to just mess around. Equally, I suspect that, you know, even, like, bridge engineering would be civil engineering would would benefit a little bit from, more risk taking, in in the testing phase. Just like, you know, actually, I'm wearing SpaceX t shirt, so it kind of looks makes me look like a Elon fanboy. But, but, you know, like, he blew up a lot of rockets.
Mike:Right? And and, you know, like, obviously, he has smart people in the team who are qualified engineers. But but but, you know, I think prompt engineering works best when it's like that. You're you're just really testing the limits of the model. You're testing, like, weird things.
Mike:And and then in production, you wanna be really safe. But but, you know, when when you put humans on board the rocket, you want it to be really safe. But the way that you create safety is by having an, like, almost like unregulated crazy amount of testing, of, like, really creative ideas. So so I think you have to kind of decide what stage you're at in the product as well.
Arvid:Yep. That makes sense to me. I mean, the the problem right now to to be is that there is such an incredible pace in development speed and also in in the best sense, I guess, at incredibly accessibility to all these things. Like, everybody can prompt engineer. It's not that, like, the current version of, like, llama 2 or llama 3 or, you know, g p d 4, g p d 5 is, like, hidden behind some kind of, like, academic wall and only certified prompt engineers can, you know, figure stuff out.
Arvid:Like, everybody can either run these models themselves locally or at least use a fairly reasonably priced API to access them. Everybody can use it. And I guess with tools that are ever more interconnected, also able to, like, execute functions and, you know, lookups on the Internet and even, you know, evoke other services. Again, there's this magical thing where the thing that I tell to do something actually calls somebody and does something for me. How weird is that?
Arvid:Right? There's there's a lot of risk in it overstepping boundaries or, again, unethical behavior. So I think the the pace of stuff just makes it so hard to to even, you know, like, even tier this. Like, there is a development tier, there's a testing tier, a QA tier, a stage tier, a production tier. That's how we do software.
Arvid:But for these models, it feels like they're all in the same happening at the same time. Do you see that too?
Mike:Yeah. For sure. I I would say that there is, usually, distinct phases in the projects I'm working on. So, typically, you would start with, you know, the, like, if I work with a client or if even if it's just me doing it myself, you start in in chat in chat gbt or in the playground, and you're messing around with the prompts. You'd have a problem.
Mike:You think, I wonder if AI could do this. Right? And then you're like, oh, it actually does a pretty good job, but there's some problems. So you start to note down the problems. You make changes to the prompt, and there's this trial and error phase.
Mike:Right? And, if you keep going on trial and error, I think that's when you get to these incantations because, you maybe I think after you've been working on that problem for too long and you've seen too many versions of the same thing, you you start to get weird. And, you know, that's when you start casting spells. But but, I think I think you quickly need to move out of that phase once it's working okay into a more rigorous, like, optimization phase. And the difference between those two things is, when you're doing trial and error with chat gbt, or with, you know, an image model like DALL E, you, you you have a tendency to overextend the prompt, and you also have a tendency to make changes that aren't really improving the performance.
Mike:Like, they just kinda look like they are because you got a lucky hit. Right? And it's very similar actually to, like, the early days of medicine where they would, you know, they would if you're kind of looping this back to incantations, you know, they're would be like, oh, he needs to we need to balance his humors, so we'll bleed some
Arvid:Yes.
Mike:Bloody. His left arm. Bloody. Yeah. And it's like the reason those things exist is because of, you know, people mistaking correlation for causation, or they they, you know, they they like, it happened and it worked once, so then they keep doing it even though they never actually tested whether it continues to work.
Mike:Right? So, so I think you need to get more scientific with something that's gonna be in production. And that could be as simple as, you know, just running that prompt, like, 30 times, and then pasting it into, Google Doc. Right? And and then just reviewing, like, okay.
Mike:Like, how often does it do bad things? Like, you know, what, like, what what types of groups of bad things does it do? And then you get more of an understanding. Like, when I'm doing prompt engineering, I'm, like, in a Jupyter notebook, like, writing Python, and, like, with my function that calls chat gpt. It won't just call it once.
Mike:It'll call it a 100 times, and it'll do it asynchronously. So instead of going, like, call chat gbt, and then it gives them back the answer, then call chat gbt, it gives me the answer. Instead, it will call, like, a 100 times at once, and then you get, like, a 100 answers back at once. So it's a lot faster. Otherwise, it would take hours.
Mike:And then, and then I have, like, some automated evaluation as well. So from trying to, like, make a blog post longer, you know, because right now, when you ask it to generate a long blog post, it will write at most a 100 words. If I'm trying to find different ways to improve that, I'll do it systematically. So I'll test, you know, version a a 100 times, version b a 100 times, and then I'll see if there's actually any aggregate difference. I think that that's when you get into real engineering, separate from, like this witchcraft stuff.
Arvid:I I love this because that's exactly what I've been doing over the last couple days. I've been, for for PodScan, I have this question answering thing, right, where where, my my users can ask a specific question of every podcast that is out there. And if it triggers, if it's answered with yes or no, and if it's yes, then they get a notification. That's the general idea. That's how I use inference and AI on my system.
Arvid:Right? It it looks for keywords, and if it finds any keyword, it checks, well, is it actually answering this question with yes or no? But for that, I needed a system that can reliably and truthfully answer yes or no to any question. And that turned out to be quite complicated because, like, a a lot of systems out there, even chat g p t, is is pretty good at at saying yes or no, but sometimes it just answers with maybe or with probably or something like this. Right?
Arvid:It's it's hard to quantify words like that if you expect a yes or no. So I needed to find a specific model that was useful for question answering. Like, I'm I'm using, I think, the Falcore model, which is a specific QA trained model, and then I needed to figure out what is the right prompt. And all these models have very different styles of prompt. It's not just that you write a text.
Arvid:Sometimes to train on certain formats, right, where it says, like, bot says this and then user says, and then you you get the response or sometimes you have the l m starts, the the l m, system tags or these things. They're all very specific, and it took me ages to figure out a good prompt that is reliable and answers in in a way that does not answer like, it does not give false positives, but it it can answer wrong. Like, yeah, I don't want it to say yes to something that is a no, but it's fine for me to for it to say no to something that that might be a yes. I don't care about the the false negatives. Those are acceptable because I have, like, 20,000 podcast episodes coming in in a day.
Arvid:It's fine if 1 or 2 are not mentioned, but the false positive is a problem. So I set up a system in Python, not in a Jupyter notebook, just in a Python script where it just consistently runs this local AI on, I think, a 100, like, text fragments plus a question and an answer, yes or no, and then it just consistently checks. And I I've run this, I think, has been running for 2 days straight. I think it ran over, like, 30,000 times at this point. And I I got it to a point with, you know, playing with the prompt where it has, like, 99.8 or something percent accuracy, which it's bizarre because I never expected to do this kind of research work by building something that scans podcasts.
Arvid:It's bizarre, but you kinda have to do it. Right? That's this is how you you have to optimize these models.
Mike:Yeah. For sure. Like and and, what you're doing there, like, the the keyword, for anyone, like, interested in looking more into this is evals. That's what all the AI people call it. Right?
Mike:Evaluations. So, like, when OpenAI releases a new model, they'll have, like, these benchmarks of, you know, different sets of questions and answers that, they're kind of like different tests that the AI can take. Right? And, you know, some of them measure reasoning ability, so they have a lot of question answering reasoning, you know, type type sets. And then and then you'll have, like, some that measure, mathematical ability, others that measure, like, grammar and English literature.
Mike:You have some that measure ability to do other languages. So, for example, if you have your podcasts that are, you know, in other languages, you could translate them, right, if that becomes important. So, yeah, typically, you know, I'm custom building them every time for my clients because it doesn't really matter to my client, like, if it's good at reasoning. It just matters, like, can it do this specific, task? Right?
Mike:And, and it's just like recruiting. Right? Like, you know, in the job interview, you wanna kind of figure out, you know, can they, like, can they use Excel? Like, can they do this? You know, how good are they at doing this?
Mike:And, you know, and and sometimes you have to, you know, recruit a model that, isn't very good at that yet and fine tune it, like train it, and just like you would train a, an employee.
Arvid:I I love the fact that I think this is, like, the 5th time you've kinda compared working with AIs to working with people. I I think, like, as agents, I mean, that's what they are. Right? They are agents of our intention, and they they tend to have be able to make some kind of decision, conscious or not, right, in in our stead. So I I love this comparison.
Arvid:That that is really cool. And, yes, you you definitely have to evaluate them. But one thing that comes to mind is overfitting because these benchmarks are also quite public. Right? What prevents AI systems from kinda including these benchmarks in their training system and then acing them?
Arvid:Is that a problem? Because I I don't know that that part of the space too well. Is is this an issue?
Mike:It is. It is actually, like, I mean, I I don't I'm not involved too much in, like, the public benchmark so much, but, because, like, I I just look at, like, okay. If someone tells me that this new model is good, then I'll try it for myself and see if it works on my tasks. Right? But, but, ultimately, you know, a lot of those benchmarks are becoming meaningless in some respects, for a couple of reasons.
Mike:Right? One is that there's probably some bad behavior going on where, you know, people aren't intentionally overfitting on the exam questions. Like, you know, so that that's, that's just one thing. I I would like to think that's relatively rare because I think a lot of the people who work work in these AI research labs are relatively ethical, thankfully. But, I I would also suspect that, they're doing a wide scale unintentionally.
Mike:So I'll give you an example. Like, GPT 4 can pass the bar exam. You know? It's pretty smart. Right?
Mike:But, if you give it novel legal questions, it it fails really badly. So if it's if it's in the training data, like, if you think about it, the bar exam is a question and answer set. It's an eval for humans, and it's based on the training data for those humans. Right? Like, they have to have read certain cases in in in Harvard Law School, and, in order to answer those questions.
Mike:And gpt 4 has read those cases too, and it can has perfect recall. So, I I I would like, sometimes those benchmarks are not really, testing the ability of the AI, more so than, like, testing the quality of the dataset they were trained on. And then I think the third thing is, a lot of these models are not just, you know, a one shot prompt model, these days or zero shot prompt model. Like, it's not just like you type a question, you get an answer back. There's a lot of, like, stuff happening in the background.
Mike:So, you know, when you ask, you know, Chatgbt to write some code for you, it actually, in the back end, comes up it it it's it's multiple calls to the model. Like, the first call will make a plan of, like, what needs to be written. And then the second call will then go and, like, write the code, if that makes sense. And then it has another call where it can run the code, and it passes an error message back to itself. It says, oh, I had hit an error.
Mike:Sorry. And then it will attempt to fix it. Right? So, it's one call for you, but it's actually multiple calls in the back end. And, and and the one that I recall recently that was interesting was, Google's model by default can search the web.
Mike:So if you're testing Google's model on any question, if that answer is online in in anywhere any way, shape, or form, like, you know, like, giving Google an open book exam. Right? So Gemini's scores are massively inflated because it can go search the answer. It's not having to, like, look back in its training data, if that makes sense.
Arvid:It's like cheating the test, but in a good way. Yeah. Right?
Mike:Yeah. But, but just like, you know, just like school versus work, like, a lot of the behavior that would get you thrown out of school would actually get you promoted at work. So, you know, like, you should go and cheat on the test. Right? Like, if you can, if you're if employed or or if it's your own company in particular.
Arvid:I I I like what you said about the bar exam where the the exam is really just test test on how well people study the the data, the underlying training data. And I think that the interesting part here will be how can we reliably create tests that kind of test that, but are also novel enough to see where the limitations of these systems are. Right? And I wonder if this is gonna be kind of almost a self cannibalizing thing where AIs are built or LLM systems are built to generate similar yet unique questions or or, you know, to test datasets for other AIs to be tested on. Or maybe this is gonna be something that people will always have to do.
Arvid:It's always gonna be a human ingenuity thing. Do do you think this where do you think this might go? Which direction might this go?
Mike:Yeah. Do you know what? I I think there's gonna be a couple of, things that will happen. So one is, benchmarks overall will become less important as they just get beaten all the time. And, I I don't think we'll come up with better benchmarks necessarily.
Mike:Right? Like, that will that will that will end in a few years, I think, or become less relevant. Like, it'll be in the paper that they publish, but it won't be something that, you know, average people talk about. I think it's really just important when we were making a lot of rapid progress in AI, and, AI was, you know, it was really important that, like, it got, like, 10% better at reasoning. Right?
Mike:And especially if it becomes the new state of the art. But, like, pretty soon these models are gonna get to the point where they surpass, like, the abilities needed to do most of the tasks that we need them to do. And at which point, you'll just kinda use the one you like, I guess, similar to, I'm going back to humans as well, but, like, you know, if if you could hire, you know, a bunch of people from Harvard or Oxford or, you know, the lead university, of that level of intelligence, like, you know, if they're already intelligent enough to do the job, the important thing is, like, do you wanna spend 8 hours with them every day? So I think the personality of the model will will start to make a difference. And therefore, like, there'll be we've become very tribal.
Mike:Like, there'll be people who love, like, Chat GPT type model. There'll be people who love, like, Claude from Anthropic. I already see that actually happening. Claude has, like, a little bit more personality than Chat GPT. So I've seen people saying, oh, even though Claude is slightly worse, in some things, I prefer Claude, and I'm gonna use Claude from now on.
Mike:And and people will also go tribal around the companies as well. Like, there'll be the Microsoft camp. There'll be the Google camp. There'll be, you know, etcetera, etcetera. So I suspect it'll come down more to personal preference.
Mike:And then I think the other thing is, the real test is gonna be, like, how well they perform in the real world or in virtual worlds. So, you know, for example, like, the test of whether Tesla's, you know, self driving car is doing a good job is how many times the driver has to intervene. And and this is basically impossible to fake that test. Right? You know, so so, like, if the driver feels unsafe enough to intervene, then that's a failure.
Mike:Right? And and and the the more they can eradicate that, like, then then, you know, the the better the AI is doing. Right? That's the real benchmark. You know, and and in order to get there, they, you know, didn't just test they didn't just let loose a self driving car.
Mike:Right? Like, and crash into the wall because, you know, it needs tons of data to be able to be good enough. What they did is they slowly automated different parts of the driving experience, like, the some things are easier than others. They also did a lot of testing in a virtual world. So they have their own version of Grand Theft Auto.
Mike:You can think about this, where you you know, the car can drive around, but the car can drive around 30,000 times a day. Right? Like, you know, it's, there's no limitation in the simulation, but the simulation obeys the rules of physics. So at the very least, the model learns how to obey the rules of physics, and it knows, okay. If I steer too hard to the right, I'm gonna hit the wall.
Mike:And and then then it's kind of ready for real world, behavior. So so I suspect there's gonna be a lot more stuff like that where I've seen models, you know, people have got them to play Minecraft, and, like and that's a really good test for agentic behavior. Like, can it make decisions, you know, about what to do next? And, you know, I I've seen people say that the real test of a model is gonna be, can you just say go make me money online? And if it does, then then it's then it's succeeded.
Mike:Yeah. But then you get into the ethics of, like, whether that's a good thing or not. Yeah. So, have you seen, Devin? But, in in that line, have you seen Devin, the, the new, like, developer agent that's been doing jobs in Upwork and all this stuff?
Mike:Like, what's your what's your thoughts on that?
Arvid:Devin Devin has been interesting, like, for two reasons. Obviously, the technology is very interesting. And for certain things like right unit tests for my code base, that's perfect. He's like, okay, sure, go ahead. The technology is as it is so rapidly evolving.
Arvid:I I don't feel threatened by it. I feel kinda empowered by it to know that there's something that will take away these things that I would have to either spend a lot of time on myself or figure out how to hire somebody or whatever, right? It's it's nice to to see technology take over that part of technology creation as well. The interesting part for me has been the reaction in the community, which has been split along this line as well. The community is either, oh, no.
Arvid:We're gonna take our jobs. Right? We're we're gonna lose everything we have. We're developers aren't worth anything anymore. You should never learn how to code.
Arvid:Like, people have been saying this for some reason. They don't learn how to code. Machines are gonna do it anyway, which which is, I I honestly, in my opinion, is just as reductive as saying you shouldn't learn how to read or write when you have audiobooks. Like, it doesn't make any sense. Right?
Arvid:The the capacity to think, the the capacity to structure thought, to architect solutions to a problem, that's what coding is. Code the the writing part is irrelevant. And so I I guess, you know, you should still learn how to code and how to think and how to express instructions to something that is effectively what prompted JDA Venus in a way too. It's coding, but on a different cognitive level. And the other side is just very open of this conversation.
Arvid:It's like, great. Some another agent for for me to not have to do the work that I don't like doing. I like to conceptualize. I like to make money online. I don't wanna implement that that blog.
Arvid:I don't wanna implement that affiliate system. Let that thing do it. It it feels like, do we see it as a threat or as a tool? It's like, you know, is this the the ever present debate about weapons? Right?
Arvid:Is a kitchen knife a a murder weapon, or is it a tool to make food? Yes is the answer. And I think it's and Devin is exactly the same. Yes. Devin's answer is yes.
Arvid:Like, whatever it is, it's yes. So how do you feel about this? How do you feel about this from from the prompt engineering side of this?
Mike:Yeah. I had a little bit of a taste to this where I I was doing a ton of prompt writing, and then I, I I did I I I was actually gonna I was working on a different book, but not not the one, that that's coming out in June, for O'Reilly. But but I was working on a different for a different publisher, which I would say. But I started working on it was gonna be a big collection of prompts. So exactly like the incantations that, that I was railing against before.
Mike:But my plan was to make it more scientific and kind of show some actual test results for each prompt. Right? So it's gonna be, like, 200 prompts. And, what I found is, I I got really tired of doing it, and I was like, maybe I could maybe maybe gpt 4 can write prompts. And, and it would it was great.
Mike:It was actually really good, to the point where, like, I couldn't be bothered to write prompts anymore because I was like, It's it's actually pretty good. So then I was thinking, what have I done? Like, I, you know, I've been charging hard to, like, automate everyone else's job, and I've just accidentally automated my job. Yeah. But but the funny thing is, like, it's, you know, it's it's now at the point where, like, you know, I I am I I'm using that to kinda get a good baseline, but then, the really powerful prompts are the ones where, like, I have some knowledge that's, like, not in the training data, and I have some, like, opinion or preference that, like, the average person doesn't have.
Mike:And, like, I I put that in the prompt, and then you don't really need to care about, like, the rest of the formatting stuff like that. Yeah. That's kind of basic. Like, that's boilerplate. Right?
Mike:And and let the yeah. Let Devon do the boilerplate. Right? Let gpt4 do the boilerplate for you. And then you can do the stuff that you actually care about and you enjoy.
Arvid:Yeah. That's that's kind of, in in many ways, I think this discussion goes way beyond tech. This goes into, you universal basic income and the capacity to to freely live a life full of meaning and all that. But even in in the confines of, like, the AI world and and prompting and and LLMs, it feels like, yeah, machines should do the baseline stuff. The foundational work should be done by the automatable systems or the systems that have much more capacity to work through this than we as humans.
Arvid:It's like the the thing you do or that we both do in in testing our data. We don't sit there and get the result from chat gpt, then we check it, and then the next one goes out, and then we check it. Like, we sent them out in bulk. They come in bulk. We do an evaluation on them, and then we look at the data, and then we dive into the specifics.
Arvid:Right? In in my case, I look at the thing that always gets answered wrong and then I try to figure out, well, how can I change this number here for this specific thing to go up or down? Right? That's that's where we are good. We are we're good at spotting things that need to be done that a machine would never see.
Mike:Exactly. Yeah. I feel like people are gonna end up doing a lot more primary research. Like, starting a startup or being an indie hacker will be, I think, much more about, like, carving off a specific niche of, like, all the problems left in the world and, like, you know, actually going and running experiments to figure it out. Because, I think that that's something we have a really strong capacity for.
Mike:Like, I found this with my content writing as well because I went for a period of, like, doing a lot of content writing. And, like, my marketing agency, we we grew like, 60% of our leads came from our blogs, so it was really big for us. I'm terrible at networking, but I was good at writing, so I kind of substituted. But, you know, I was writing a lot, like, professionally. And then, and, actually, it was a big part of my identity.
Mike:Like, people knew me for it. Right? And then I went through a phase of doing everything through chat gbt. Like, when gpt 4 came out, it was or, like, I had already automated a lot with gpt 3, and then gpt 4 was so much better. So I was like, okay.
Mike:Why do I need to write anything anymore? And now, I've become right I've started writing again, and and my writing is so much better because, what I'm doing is I'll go to chat gbt, and I'll ask it to write something on the topic that I'm interested in. And then I'll, like, look for holes, and I'll go, no. It's wrong about this. I'm pretty sure it's wrong about this.
Mike:Right? I need some proof, but and I'm pretty sure that, like, this is not correct. So I'll go and run an experiment or I'll go experiment or I'll go collect some data, and I'll I'll do the actual research, and then I will write it up. Right? And, and and I think it's like pushing me to be a better writer now.
Mike:I went through this weird phase where I just stopped writing and lost all hope, but now I'm, like, back into it. And And I'm enjoying it more than I ever would because I'm not, like, writing the boring stuff that you would have to write for SEO anymore. You know? Like Chargebee can kind of do that stuff. But, you know, now I'm writing the stuff where I'm like, I'm going out and finding something new about the world, and then I'm becoming the training data for the next version of chat gbt.
Mike:You know, I think you wanna yeah. If if you can if you can be in the training data, more than you're using the training data, then I think that's a good balance.
Arvid:I love that. Be the training data. That's something to strive towards. Right?
Mike:Yeah. For sure.
Arvid:It makes makes a lot of sense because you're kinda you're on the edge of technology. Right? You're you're already using, like, the latest technology if you deal with AI systems like this. So you might as well be the person that influences the next steps instead of being just the one that, like, takes benefits, even though that is great, from the past steps. That reminds me, you brought up the book, and I do wanna talk about this.
Arvid:Like, you've been writing a book about prompt engineering and and generative AI, and that field is fast paced. Like, I think over the last couple of weeks, we've been presented with, like, just Soarer for that matter, like a a model that I never expected to appear this quickly, like video generation. How do you deal with this in writing a book about this? How do you keep up with technology and all these new models and all these new things in a book that hopefully at some point is gonna be an artifact in time? Do do you even even do do you wanna change that as well?
Arvid:Do you wanna keep it updated all the time? Like, how are you gonna deal with this?
Mike:Yeah. Yeah. I mean, I feel like the book publishing model will have to change in some respects. You know, I I suspect what what we'll probably do in the future, and and O'Reilly, I think, are already, like, publicly talking about this sort of thing. Right?
Mike:Like, they like, as part of our contract we signed, they have, like they they've, like, optioned the right to, like, basically ingest our book into a chatbot. Right? So people can talk to our our book. Right? Like, I don't think they're actually doing that yet, but but, like, it's something they've talked about doing at some point.
Mike:And I'm sure all book publishers are thinking about this. Right? And and as I imagine, like, you you talk about sci fi and I love sci fi as well. I imagine the sci fi of the future will be like, not like, you know, we have to wait for the next, book in the series, but but it will be, like, you know, they they build a world, and then, and then you can query that world and maybe go on your own adventures in in that world. Right?
Mike:And you could, like, go deep on on the specific topic. Right? I I just that's what I suspect will happen in the future. Yeah. In practical terms, how did we approach this for our book?
Mike:So I have a a coauthor, James Phoenix, who I also work with on a few projects. And, so it it definitely helped, like, not have, like, the whole weight of keeping up with everything in AI on my shoulders. Like, he did a lot more of the lang chain stuff, and and went deeper on the more technical aspects. And I I, you know, focused more on, you know, image generation, stable diffusion, and, and then also, like, the general principles of prompting, which the book is based on. And, and the way that we tried to, approach this was, you know, I started using AI in 2020.
Mike:It was actually the year I left the agency, and I I got access to GPT 3. Actually, first, it was Copy AI, and then I got I was like, this is amazing. What does it use? Right? And then I got access to GPT 3.
Mike:And, and then, you know, I what I found is when when it went from GPT 3 to GPT 4, or 3.5 and then quickly 4 afterwards, a lot of the old, like, tricks that we had to use, like, the hacks we had to use to get the model to do anything useful, it didn't apply anymore. And what we're left were is kind of, I guess, like, 5 general principles, that, we refined over time. So, I already have, like, a blog post on this. It's just, like, what led to the book deal. But it made me, because they're, like, general principles that still worked from GPT 3 to GPT 4, what I'm hoping is that, like, when they release GPT 5, they will still continue to work there.
Mike:Right? And and we you know, I guess there's no coincidence that I keep referring to, like, managing GPTs as, like, managing humans because, what I noticed like, I studied business management and then and went into did a master's in economics. And I noticed that, like, pretty much all of these principles are basically, like, business management principles. So, yeah, one is give direction. So, you know, you would never hire a human employee and then not give them a brief on the type of tasks that you want to do.
Mike:Right? You wouldn't hire an agency and say, you make up the marketing campaign. I don't care. You know, you would say, okay. Here's the brief.
Mike:Here's the kind of thing I'm looking for. Right? So that's like one of the that's the first principle. And then specify the format. Like, what what do you want back in terms of, like, do you want, you know, a numbered list, an ordered list, or a paragraph of text?
Mike:How many paragraphs to text you want? Or even if you're building a tool, like, do you want this back in JSON, the structured data, so you can put it into a database or display it on the web page. And then, and then the third one is giving examples. So, typically, like, if if a human is struggling to, do a task, you would just like, here are some examples of how this task has been done well in the past that I like. And that gives them a real good sense because sometimes it's really hard to, like, explain exactly what you want.
Mike:So if you find some good examples, it's maybe easier to infer the nuance of like, oh, I I kind of want it like this. I get it now. Right? And and the same trick works for GPT. So, you know, I won't go through everything, but, but, yeah, like, it struck me one day that, like, oh, this is kind of like, you know, what I learned back in, you know, business management, school.
Mike:You know? So so so, yeah, there there are parallels there. Yeah.
Arvid:The transfer of knowledge here is so impressive. Right? That's that and for 1st of the our capacity to do this as people just shows you, like, what how cool this actually is that we can take these these wildly different principles and just apply them to something new. But it also shows just how similar AI agents and humans are. Right?
Arvid:They they they are able to do things if you explain them well, if you give them the the format, if you give them the examples, the intention, and all that. That is, that is definitely helpful. It's cool. I I love the fact that you're writing a book about this because I feel I I love books. I love having a library of things to look things up and to be able to to just learn things from.
Arvid:I I know this is a changing field, but I think the the principles in this, underlying principles, will be valid for a long while. Even just the ideas of embedding or of of, you know, text splitting and all these things to to feed them into models in different ways and giving context, context windows, all of this. This will probably stick around for a while. And in our terms, a while might be 2, 3 years. Who knows?
Arvid:But still, it's it's not gonna be outdated immediately, the the concepts of that work. You also have and I I, I learned this very recently, a fairly successful Udemy course about this too. Right? You you went multimodal with this, to Of course. Use the term.
Arvid:And and that one seems to have worked pretty well as well. Is is that, the same ideas, the same contents, or how does that work?
Mike:Yeah. Good question. We we get this, you know, we get this a lot actually because, I had written this blog post on the principles of prompting, and, they actually came about as just, I was doing a lot of image generation stuff for this this first book that I was writing. This was like a self published book on on marketing. And, you know, I was trying to do designs in mid genie version 4, so it was really crappy at the time.
Mike:And that was you know, I did a lot of prompt engineering to figure it out. And then I wrote this this, you know, this this blog post, and I kept updating it over the years. So then when, you know, when when O'Reilly came knocking, and they're like, hey. Would you like to publish a book? I'm like, yeah.
Mike:Of course. So I learned how to code was reading O'Reilly books, you know. It's pretty amazing. So, Yeah. So, you know, I I jumped on that opportunity, but it it did take, like, a few months, right, to, you know, like, figure out you know, we go through the approvals and pitch the ideas and shape the, you know, table of contents, and then, like, you know, also time to write it and then edit it.
Mike:Right? So, you know, AI doesn't, hang around that long, you know, and and I also have we have I was having a lot of really good ideas all the time. And, so, what we did was we we published this Udemy course, and it was based on the same principles. But, obviously, because it's a different format in multimodal, as you said, you know, it's it's very different from the blog post and and and, you know, different again from the book as well. The book is much more in the vein of, like, an O'Reilly book where, you know, it kind of explains these topics in a comprehensive way.
Mike:It has it goes deeper into, like, why it is the way it is. Whereas the Udemy course is much more of, like, a quick hit because that's what, like, what Udemy people want. Right? So the Udemy course is obviously in video format, so that appeals to different people. But it's much more organized as, like, here are different, like, projects that you can do.
Mike:So the the book is, like, lots of, like, practical tips and examples and theory, and then the last chapter is that one overarching project that brings everything together. Whereas the Udemy course is, like, 5 videos on the principles, and then just, like, lots of, like, crazy stuff that we've done with AI. So it's, like, very different and and, you know, the same underlying theme. It's definitely, you know, the same authors. Right?
Mike:But, but, yeah, like, very different use cases, very different target audiences.
Arvid:No. That and that and that's that makes these things so interesting. Right? Like, people are very cerebral. They can read the book.
Arvid:They can go through everything and understand, like, all the the basics, the foundational, and then build on top of that, build the project. And some people just wanna be inspired. They just wanna see what can be done. Right? It's it's really cool to see you offer this in in multiple ways.
Arvid:It's a it's an approach that I've used as well, and I really appreciate it. Well, that is really cool. Well, now I have another book to read and another course to take. So alright. Yeah.
Arvid:I guess guess my weekend is, is fully booked now. If people wanna figure out where to learn more about this topic and learn more about you and the work that you do and the products you create and the knowledge that you share, Where would you like them to go?
Mike:Yeah. So, you know, the book is on the O'Reilly platform. You can get, like, a free trial. I think it's 10 days, which, you know, should be enough to, like, skim and see if it's it's useful for you. And but it'll be in print.
Mike:It's actually on preorder in Amazon, now. So it'll be in print in June, hopefully, if editing goes well.
Arvid:What's its full name?
Mike:Yeah. So it's Prompt, so it's Prompt Engineering for generative AI, and it's Mike Taylor and James Phoenix, the the authors. So, I also work with James on a company called Vex Power, which is like an education platform. As part of why we did the Udemy course because we wanted to see how Udemy did it, you know, like, reverse engineer their success. But then the Udemy course blew up.
Mike:Right? So it was, like, way more successful than our our, our tech business. So so, yeah, you could check that out as well. But, but, yeah, we, we we just set up a new company I would call Brightpool. It's Brightpool dot dev.
Mike:There's not really anything on the website right now. It's just a Notion page, but, that's where we're gonna start, like, putting random, interesting stuff we work on. So, we're building, like, a a portfolio of different projects, kinda seeing which ones take off and, you know, doing the in the Yaki thing. You you know how to say That
Arvid:that is awesome.
Mike:That's what we're gonna be doing. Yeah.
Arvid:Very cool. Well, I think I'm I'm gonna put all of these things in the show notes, including, I guess, your Twitter handle and everything else that that you wanna be fine found at. I really appreciate you talking to me about this. I burn for this topic right now. Like the the presence of this in my day to day is incredible.
Arvid:Like I use chat GPT and all my local systems, like, for hours every day, and it's it's nice to talk to somebody who really deeply understands this and who also has a a methodical scientific approach to making sure that we get the right results. I really appreciate you sharing all of these insights and your understanding of the space and where it might or might not go. It's really, really cool. And thanks again for making this connection between people and AI. I did not think about it like this before.
Arvid:I cannot I I see myself not think about this in the future.
Mike:Yeah. I
Arvid:think that I was always gonna be this one though. It it it really is infectious. Man, thank you so much for being on the show. I really appreciate it.
Mike:Yeah. I know. It's a pleasure to be here, and, you know, I've been a longtime fan, so it's great to be amongst the, like, the the crowd now. You know? I'm one of you guys now.
Arvid:And that's it for today. I will now briefly thank my sponsor, acquire.com. Imagine this. You're a founder who's built a really solid SaaS product. You acquired all those customers, and everything is generating really consistent monthly recurring revenue.
Arvid:That's the dream of every SaaS founder. Right? Problem is you're not growing. For whatever reason, maybe it's lack of skill or lack of focus or applying lack of interest. You don't know.
Arvid:You just feel stuck in your business with your business. What should you do? Well, the story that I would like to hear is that you buckled down, you reignited the fire, and you started working on the business, not just in the business. And all those things you did, like audience building and marketing and sales and outreach, they really helped you to go down this road 6 months down the road, making all that money. You tripled your revenue, and you have this hyper successful business.
Arvid:That is the dream. The reality, unfortunately, is not as simple as this. And the situation that you might find yourself in is looking different for every single founder who is facing this crossroad. This problem is common, but it looks different every time. But what doesn't look different every time is the story that here just ends up being one of inaction and stagnation.
Arvid:Because the business becomes less and less valuable over time and then eventually completely worthless if you don't do anything. So if you find yourself here, already at this point, or you think your story is likely headed down a similar road, I would consider a third option, and that is selling a business on acquire.com. Because you capitalizing on the value of your time today is a pretty smart move. It's certainly better than not doing anything. And acquire.com is free to list.
Arvid:They've helped hundreds of founders already. Just go check it out at try. Acquire.com/arved. It's me. And see for yourself if this is the right option for you, your business at this time.
Arvid:You might just wanna wait a bit and see if it works out half a year from now or a year from now. Just check it out. It's always good to be in the know. Thank you for listening to the boost up founder today. I really appreciate that.
Arvid:You can find me on Twitter at avitkahl, a r v e r I k a h l, and you'll find my books and my Twitter core stat too. If you wanna support me and this show, please subscribe to my YouTube channel, get the podcast in your podcast player of choice, whatever that might be. Do let me know. It would be interesting to see. And leave a rating and a review by going to ratethispodcast.com/founder.
Arvid:It really makes a big difference if you show up there, because then this podcast shows up in other people's feeds, and that's I think where we all would like it to be. Just helping other people learn and see and understand new things. Any of this will help the show. I really appreciate it. Thank you so much for listening.
Arvid:Have a wonderful day and bye bye.