|
|
4:33 |
|
show
|
1:19 |
Hi there, my name is Vincent, and this is going to be a course on using LLMs as building blocks for your Python programs.
To help explain what this course is going to be about, it might help to just use a tangible example.
So in front of me here is an LLM, and I can talk to it.
This is Claude from Anthropic, by the way.
Now what's kind of the miracle with these LLMs is that you effectively don't need any knowledge about artificial intelligence or machine learning.
You really just have to give it a prompt, and the LLM, quite typically, is able to generate text that contains the answer that you're interested in.
So, in this particular case, if I were to give it a sentence, like, these days it might be preferable to use polars instead of pandas, and if I were to then ask the LLM, hey, could you fetch me all the Python libraries, then it is able to accomplish this task.
This is pretty impressive on its own, but from the perspective of a Python program, this really isn't what you want.
After all, this seems to be a system where text goes in, but also text comes out.
And if I were a Python program, maybe it'd be a whole lot more convenient if we didn't have text coming out, but maybe just a list should come out.
In this case, a nice structured list that names the packages in question, that's probably a lot nicer than just having text going out.
There are tools at our disposal that are going to help us accomplish this, so this course is going to talk about that.
|
|
show
|
2:47 |
Feel free to skip to the next video if you just want to get started right away, but I figured it couldn't hurt to maybe also spend one video explaining to you who I am.
My name is Vincent, and I'm a person that's been in the Python and data ecosystem for well over a decade now.
If you go to YouTube and if you go to the PyData channel, you'll see a fair amount of talks that I gave, and a bunch of them got quite popular.
I'm also well known for my work in open source.
I have well over a dozen packages under my name at this point, some of which have gotten quite popular.
scikit-lego is a popular plugin for the scikit-learn ecosystem that has well over a million downloads now, but I've also written some tools that might help you do very rapid prototyping with LLMs.
In particular, I'm going to talk about SmartFunk at the end of the course.
Another place where people tend to know me for is my work over at CalmCode.
This is a pretty popular learning resource for Python tools.
A lot of people also tend to know me from my work at startups.
Before, I have worked at Raza, which is a company that makes tools that allow you to write your own chatbot.
I've also worked over at Explosion, where I worked on annotation software, as well as a little bit of prompt engineering, such that the Explosion stack could properly integrate with LLMs for NLP.
That's also stuff I've worked on.
And I've also worked for a company called Probable, which is all about the scikit-learn ecosystem.
One of the things I did there, by the way, is I fine-tuned a version of Mistral, such that it could actually generate proper scikit-learn pipelines.
A lot of the LLMs tend to get that wrong, by the way, because they write pipelines like you would have written them maybe five years ago.
and a lot of them tend to ignore the modern tools.
So some of my knowledge in LLMs actually comes from working at these companies, but I also tend to use LLMs a lot for my own private scripts and workflows.
In particular, the thing I did recently, I have a YouTube channel where I review these split ergonomic keyboards.
The channel has gotten quite popular, but I wanted my reviews to not just be on YouTube, I also wanted to have them on my own blog.
So how do you do that?
Well, you write an LLM pipeline, something that can take YouTube videos and turn that into a proper blog post with a nice summary such that people can also find me via Google on my own site.
Hopefully this helps paint a picture of who I am, but hopefully it also helps paint a picture of how I like to teach.
As you've probably noticed by now, I really like to do this doodling.
I have a drawing tablet that allows me to draw over anything that is on my screen, and this really is my preferred way of teaching because I honestly like to think that my face can be a bit of a distraction, and it's much better for me to focus on the code that is on display and that I make sure that I get some of the ideas across.
So in this course, I will do exactly that.
You're not going to see my face anymore after this.
But what you will see are a couple of these segments where I do a whiteboarding exercise where I try to explain a concept.
And you can also see me emphasize bits of code that are important as I explain how they work.
So again, this is the last time you're going to see my face like this.
But next up, we're going to talk about some of the tools and software that you have to install in order to follow along in this course.
|
|
show
|
0:27 |
Now, before we dive into the course, it is good to know that we have a Git repository that contains all the code.
We have all of these Python files, and these are pretty much self-contained, which also means that you don't have to program along.
You can just Git clone and have everything locally.
So if you want to follow along live, now will be a good time to get this code on your machine.
You don't have to code along, though.
You can also definitely watch all of these videos and maybe review the code later at your own convenience.
|
|
|
21:21 |
|
show
|
4:35 |
Now, in order to get started, we have to go and install some software that we're going to go ahead and use.
And in particular, the two tools that I do assume that you've got around are uv, and I'm also going to use Marimo, which is a modern Python notebook that will help us do a bunch of rapid prototyping.
If you don't have uv installed, one thing you can do is install it with pip, but I am going to assume at this point that you've installed it, and the next thing that you've got to do is make a virtual environment that you can do with uv.
I'm also going to source this virtual environment now.
And from here, I can go ahead and pip install Marimo.
Given that Marimo is installed, I can go ahead and use it.
And let's start a new Marimo notebook called demo.py.
By running this command, I'm going to make an isolated virtual environment for just this one notebook.
That's what the sandbox flag is for.
And this demo.py file is going to contain all the code that we need to run our notebook.
So I've got a notebook over here that can run Python, but at this point you might be wondering why I'm so keen on using Marimo in this course.
So to explain that, I'm going to just highlight a few features that I think are very useful, especially if you're interested in doing things with LLMs.
Before I can get to that though, I have to explain this one thing that makes Marimo really different.
So I'm just gonna have a couple of cells here, A is equal to one, B is equal to two, and let's also have a cell over here where I add A and B together.
So far, there shouldn't be too many surprises.
Sure, the output over here is on top.
And if you're used to Jupyter, the output is typically below.
But we're still talking about Python cells that run Python code.
But notice what happens if I were to change a value in this one cell over here.
I can hit play on this one cell.
And the moment that I do, you will notice that the cell below over here automatically updates on its own.
And this is something that is very different from Jupyter.
In Jupyter, you would have to run that manually.
What is actually happening under the hood here is that Marimo is detecting in what order these cells have to run.
It does that by looking at these variables that are declared.
So it knows that cell A has to run, just like cell B has to run, before it can run cell C.
There is this dependency graph that Marimo is figuring out under the hood.
And by doing this, Marimo is also able to update things.
If you make a change to this cell over here, all of its children also have to update.
this is a reactive notebook, so to say.
This is great because it allows for a very fun party trick, and that is that I can mix and match user interface elements.
So I can say, hey, let's add a slider.
And next up, I can say that the value of A, that that's actually the value that's attached to that one slider.
And kind of the magic thing now is that I have a slider at my disposal, and by changing a value of a slider, cells can actually update automatically as a response.
It's this behavior over here, the ability to add UI elements and have that interact with Python code, that is going to make it amazing if we want to do rapid prototyping.
And this is actually something that you're going to do a lot when you're working with LLMs.
You want to try out lots of ideas very quickly.
And part of that is writing Python code.
But another good chunk of that is also toying around with some user interface elements at the same time.
And I've just found that Marimo is a really great tool for that.
Now, I do want to add a disclaimer at this point, and that is to mention that at the time of making this recording, I'm actually employed by Marimo, the company that makes this notebook.
So definitely feel free to consider my bias a bit here.
Now, one final lesson about this reactive nature is that you do have to rethink the way that a notebook works a little bit.
Right now, if I make a change in a cell, everything automatically updates.
And that's not what you want all the time.
You don't want to accidentally trigger heavy compute workloads, for example.
That is something that we can change over here because we can set cell changes to be lazy if we want to.
This is much more similar to Jupyter if that's what you're used to.
Another thing that's also just good to be aware of is that because Marimo really tries to figure out in what order all of these cells have to run, you are not allowed to redeclare a value.
This tends to be one of the main things that Jupyter users can be confused with because in normal Python code as well as in Jupyter, you can overwrite variables all the time.
Marimo takes a principled stance here though, because if we have this dependency graph, well, if there are two nodes that both declare the variable a, well then which one is leading?
Marimo needs to make an assumption in that scenario.
And that's why Marimo says you can't do this.
Anyway, enough about Marimo, let's dive into some LLM code.
|
|
show
|
2:24 |
The main topic of this course is to get Python to communicate with LLMs to make useful software.
However, one place to get started is to maybe just mention that LLMs is kind of broad these days.
There's a lot of vendors.
In the beginning, ChatGPT from OpenAI was the main vendor you would talk to.
But these days, there are lots of competitors.
You've also got Claude from Anthropic.
There's also Mistral.
There's lots of open source models that you can run locally on your machine.
So Olama is actually something you might be interested in running.
And there's also a vendor on the internet called Grok that allows you to run these open source models on very expensive servers that are really, really fast.
So in designing this course, I was kind of wondering, well, what I could do is I could pick one vendor and just use that through the entire course.
But odds are that whatever vendor might be optimal for your use case, that's something that's definitely going to change as time moves forward.
That's also something to keep in the back of your mind when you're working on building LLM-based software.
You probably don't want to overinvest in any one of these vendors just yet, and that means that you want to invest in a tool that allows you to mix and match and switch around any of these vendors.
All of these vendors typically come with their own APIs, and you can use these Python software SDKs directly.
But for this course, I figured maybe the best place to get started is to use this one library called LLM, which is a library that's made by Simon Willison.
Now, the interesting thing about this library, if you go to GitHub, is that your first impression might be that it's meant to be run from the command line.
If I were to scroll down here and have a look at the quick start, then you can definitely see that there's lots of these command line prompts.
And this was certainly the original goal.
If you go to the documentation page, though, you'll notice that there's actually a very nice Python API that's attached.
Not only that, this library also comes with a very big plugin ecosystem.
This plugin ecosystem includes lots of local models that you can run.
But there are also plenty of these remote APIs that you can call.
And that means that you can use this tool with Mistral, with Gemini, with Anthropic, and a whole zoo of other providers.
And that's going to be great for rapid prototyping as well.
In this course, we're also gonna talk about some other tools, but I felt that it might be good to motivate why I've chosen for this LLM tool.
And one of the main reasons really is that it just supports lots of these APIs natively by just installing a plugin.
And it's also a library that really does come with some good building blocks right from the get-go.
So let's start using this in a notebook.
|
|
show
|
2:56 |
So let's give this LLM tool a quick spin.
In order to use this library, you are going to have to install it first.
And you can do that from the command line, but you can also go to this manage packages tab.
You can type LLM in here and also add it natively.
Besides loading this LLM library, I'm also loading this.env library over here.
If you want to install that, by the way, you have to run pip install python-env in order to install it.
The name of the import is different than the name of the pip install command.
But this library, and especially this command, is going to make sure that I've got an environment variable loaded that contains my secret that allows me to communicate with OpenAI, because the model that I'm using over here is the 4.0 mini model from ChatGPT.
And just for good measure, this is what a.env file would look like.
Locally, you would have a name of a key, and that will be followed by equals and then a string that contains your secret key.
Definitely keep this a secret.
Make sure you don't accidentally add this to GitHub or anything like that.
But by configuring this and by then loading this.env file, I am able to communicate with this particular model.
Now from here, the API is actually relatively simple.
You have to come up with some sort of a model, and then you can use that model by giving it a prompt.
And the system here is pretty much put text in, and then you can expect text to come out.
However, what you get out of this model prompt function is a response object.
And that response object does come with a couple of extra verbs that you can play with.
In this particular case, I am asking it to give me all the data in JSON, and that gives me some extra information.
The main content that I've been asking for is under this content key.
So you can see that indeed it has been writing a haiku of sorts about Python, which is neat.
But there's also some extra information that could be useful.
The full name of the model is being logged in here.
There's also this timestamp.
And there's also some interesting information like the usage over here.
This information can be relevant because the number of tokens that are being used tends to have an effect on the price that you have to pay.
And roughly speaking, when we're talking about tokens in LLM land, we are roughly saying the amount of text that goes into a system or out of a system.
Tokens are not exactly words.
They're also not exactly letters.
Kind of the best way to think about it is that if you have words like geography and words like geology, then one thing you could do is you could maybe chunk these words up.
Maybe there's a chunk, geo, there's another chunk, logi, maybe another one, graphi, right?
These tokens are determined by a compression trick to make sure that the vocabulary that the LLM needs to know about is kept at bay a bit.
It's a bit of an implementation detail, and I'm not going to go too much in depth in it, but it is good to know that tokens, whenever we talk about them in the context of LLMs, they're not exactly words, they're more like subwords.
And the more tokens that you use, the more that you pay for.
|
|
show
|
1:56 |
So, so far, so good.
We have an LLM library that we can install.
We have to do something with the key, but eventually we get an API that allows us to pick a model and that model can be prompted.
So in a lot of ways, you could argue, hey, maybe this is enough for my Python program.
You wouldn't necessarily be wrong, but one thing it is good to observe is that you can, of course, pick a different model.
Different models have different prices and there can be lots of good reasons not to only use one.
And this is where the LLM library also has a nice helper.
You can go LLM.
and then get models.
And this is going to give you a long list of names of all the models that are available to you.
These models tend to come and go, but you can already see by just looking at what OpenAI gives us, there is a big list to pick from.
And as the name implies, some of these models have extra features, like the ability to also handle audio, for example.
But these are still just all the OpenAI models, and you might be keen to give a different model a spin.
If you want to use extra models over here, we're going to have to install a plugin.
So I'm going to go to the packages tab over here.
And I'm going to go ahead and install lm-anthropic.
I got this name from the docs, by the way.
But once I install this, we should also be able to see new models appear when I rerun this cell.
So I'm going to add it.
It's now been installed.
And after restarting the notebook and also restarting this one cell over here, we can scroll more to the bottom.
And then you do see that we have a bunch more models at our disposal.
I'm going to keep on using the OpenAI models in this course, because at the time of making this recording, it still feels like this is one of the most ubiquitous models out there.
But definitely feel free to pick your own model and your own favorite.
Just keep in the back of your mind, if you want to have the name of the model, this is the string that you want to copy.
And sometimes that means that you shouldn't just copy the name, but you should copy this full string that includes this date.
There are distinctions and different milestones for these models, and you want to be explicit.
|
|
show
|
2:55 |
All right, so we're able to change this model.
We're also able to change this prompt and we get a lot of useful information back, but there's still something extra that you typically want to do.
As mentioned earlier, this is a system where text can go in and text can come out.
But most of the time when text goes in, we actually want to have structured information to go out.
And if you're using a somewhat modern LLM, then there's actually a technique that you can use to ensure that you do indeed get structured information out.
And that is to go ahead and not just provide a prompt, but to also provide a schema.
The thinking is that we can declare what kind of JSON format we would like to see come back.
And by doing that, we should be able to get JSON out in the format that we specify.
We're gonna see in a moment that these schemas tend not to always give you perfect results.
So it's always good to at least verify manually once in a while, but using schemas in general tends to be a great practice.
And as luck would have it, the LLM library actually supports Pydantic.
So what you can do is you can declare a Pydantic class that tells the model something about the structure that we expect to come back.
So in this example, I can give it a prompt that says, hey, I want to get a high queue about Python.
And I can give it this class over here.
And as you can see here, I expect a topic to be of string and I expect to see high queues that are a list of string.
I can actually see that indeed these high queues that I get back, well, that is actually a list of strings.
In this particular case, though, I was actually hoping to generate more than one Haikyuu.
So instead of having a list of strings, it actually tends to make a little bit more sense to pass in a list of these Haikyuu classes.
Now, to be clear, this class is really just a wrapper around a string.
But notice the difference in what I get out over here by making this one change.
So I'm not going to do a list of strings.
I'm going to do a list of these objects.
When I do that, then I can indeed see that haikus now becomes a list of dictionaries.
Each dictionary has a poem key and that has a haikyu in it.
And I can also confirm that this topic also got filled in nicely.
So I can say, hey, if I want to have a haikyu about pythons, then the topic is also something that was extracted as well.
There are still lots of details to get right here, by the way.
Note how I've not specified the number of haikus that I'm generating over here.
And if that's something that's really important, then I should probably fiddle around with the string a bit more.
But at this point, I mainly hope to show that having schemas around can be super duper useful.
The only downside of using these schemas is that they are not always supported.
It's mostly the more modern models that tend to have support here.
And even if a schema is supported, I have found that not every single model out there is able to give the best guarantees that what you receive over here is actually matching the schema.
But still, in the end, we usually want to have structured information.
So having these schemas around definitely makes a lot of sense.
|
|
show
|
2:25 |
So far, what we've been doing is we've been taking this model object and we've been prompting it directly.
And what I mean by that is that we've usually just taken our model.
We have then called a prompt on it and that one response that we got back, that was the final result.
But if you use ChatGPT or any other LLM, then odds are that you experience it as a conversation.
And that is something that the LLM library can also mimic if you want.
To do that, though, you are going to have to make a declaration that we are dealing with a conversation instead of just a mere prompt.
This will give you a conversation object, and it's this object that allows you to take conversational turns.
So just to give a example, I can start by saying, hey, give me a haiku about Python as the first prompt.
We can see from the responses, by the way, that indeed we get a haiku back.
And then after that, I can say, hey, give me another one, but have this one be about snakes.
Note that another one, the only way to really know what we're meaning here is to know about the history of the conversation.
And we can see that indeed it is generating high queue over here.
So it's able to use something from the past of the conversation in order to give a better response down below here.
And you can actually use this conversation in a couple of interesting ways.
One thing that I might do, for example, is I might ask the LLM to check on itself.
In the first turn of the conversation, I can ask it to accomplish some sort of a task, maybe detect a topic in a sentence or something like that.
And then in the second turn of the conversation, I can ask it to maybe check if this task was done correctly.
There is a cost to this.
After all, you are spending more tokens in order to do this checking over here.
But this is one of those techniques that you can apply in your LLM program to maybe get a little bit more certainty about the outputs that the LLMs provide.
Another thing you can do with this, though, is you can actually have a widget inside of Marimo that allows you to interact with a conversation.
We are not going to use this much in this course, but definitely feel free to check out the documentation if you're interested.
But one thing I could do, for example, is say, give me a haiku about Python and another one about snakes.
Definitely feel free to play around with this widget because you can customize it with some Python code if you like, but we're not going to use this much in our course.
We're definitely more assuming that the LLMs that you're going to be interacting with, that they are are going to serve a purpose in a Python program.
And it might not directly be via a chat interface directly to a user.
|
|
show
|
2:03 |
At this point in time, we've discussed how this model.prompt works, as well as some tools around it.
And technically, this is already quite sufficient to build something that's quite useful.
So let's explain the setup.
What I've done over here is I've created a text widget that you can see above over here.
That's defined by this part of the code.
This is just a Marimo widget.
And what I can do is I can go ahead and paste some text in here and then hit submit.
And when I do, some code could trigger.
And in particular, the code it is going to trigger is this summary function over here.
As we'll see in a bit, the text from this widget is going to go into this function, and then it's going to be used inside of this prompt.
And in this prompt, I'm asking to get me a summary of a bit of text.
And then the text that I pass into this function over here goes into this f string.
I've also got a summary class defined over here that I'm going to be using as a schema.
I'm going to get a bit of JSON back, and I'm going to load that into a Python dictionary.
But what I hope is clear is that we're asking our LLM to make a summary of any text that we put in here, but also with a little bit of structure.
So I'm going to have a title, a summary, as well as a pros and cons list.
And what I'm about to do is I'm about to put in a transcript of a YouTube video that I made that concerns itself with a specific keyboard.
So a whole lot of text.
I'm going to hit submit.
And that is going to trigger this one function down below over here.
And it took about six seconds to do the round trip.
But here we are.
The transcripts were indeed about a keyboard called the GloVe80.
And I can see a summary.
I can see a title as well as a pros and cons list.
And this is an interesting point in time in this course, because as far as building blocks go, we have really just been discussing a few of these methods.
But because we can do all sorts of things with text, hopefully you can imagine that this does cover a lot of ground.
we can come up with lots of useful functions that can do all sorts of elaborate tasks for us, as long as we can come up with a good prompt, hook it up with a good LLM, and maybe also provide it with a good schema.
|
|
show
|
2:07 |
Before moving on to some advanced topics, I figured I might share this one notebook, which is something that I actually tend to use regularly, that uses some LLM code and is very similar to what we saw before.
It's just that this notebook is a fair bit more elaborate to what we saw in the previous video.
There is a lot of code over here, but eventually you are going to hit this one UI element that allows me to pass a YouTube URL, or in this case, a ID of a video.
And then after I hit submit, it's going to go ahead and download the associated YouTube video.
Then once I have that video on disk, I'm using an open source model called Whisper to turn the spoken text in that video into text in a file.
And then that text is being passed on to an LLM.
I wrote a prompt for it.
And the whole point is that afterwards, I get a little bit of a summary out that I can quite easily turn into something that I can copy and paste on my blog.
This in my is a perfect example of something that will save me a lot of time.
And I can imagine there's a fair amount of these use cases for you as well.
And one thing I do want to mention that's nice about doing this in the Marimo notebook is that what you're looking at here is a notebook in edit mode.
But what I can also do is hit this button over here.
It's going to toggle the app view.
And this allows me to just look at the UI elements and the output elements of all of my cells.
And lo and behold, once everything is done, here is the summary that I can go ahead and copy and paste to my blog.
And there's a copy to clipboard button.
And I like to think that if you spend a little bit of time making a notebook like this, maybe make a few of these notebooks that contain these little apps that just automate a little thing for you, this is great stuff to help get the ball rolling in terms of exposing yourself to some LLM tools.
So at this point in the course, it might be good to take a break and to see if you can come up with a notebook that you can kind of turn into your own app that can go ahead and automate something for you.
You're also free to check out the next videos, of course, but this will be a good point in time to see if you can maybe challenge yourself a little bit.
|
|
|
29:22 |
|
show
|
1:37 |
So far, the main ingredients of this course were that we would take a LLM model, combine it with a prompt that would contain some sort of a task, and then we had nice ways to maybe wrap that up inside of a Python function where text could go in and maybe some structured information could go out.
And for a lot of use cases, this is actually a great starting point.
But let's talk about a few things that actually have been easy so far.
So far, we've been talking about a relatively small scale.
That is to say, just one example at a time.
So far, we've also been talking about situations where there was always a human in the loop.
There's lots of great use cases for this, and keeping a human in the loop in general is a good idea.
But there are also use cases where you want to really automate something, and then you cannot watch every single example manually.
So in the next couple of videos, we're going to address some extra features that are good to know about if you're going to be writing LLM tools, so to say, that will try to dive into these topics specifically.
And one part of it will be that we're going to talk about some general Python techniques that are going to be useful, like caching, as well as asynchronous programming.
But when we're going to talk about this human-in-the-loop aspect, we are also going to have to talk a little bit about methodology.
It's not always about what tools to use, it's also about knowing how to use them.
And especially when it comes to LLMs that have to automate a task, we really need to think about measuring the performance of an LLM.
So to discuss all of this, what we're going to do is we're going to take a relatively classic text classification example, and we're going to see if we can maybe tackle it with an LLM.
In a lot of ways, it's definitely doable, but even for relatively simple tasks like this one, there are a few things to keep in the back of your mind.
|
|
show
|
2:21 |
So I'm back inside of a Marimo notebook and this is the second notebook that you can find in the GitHub repository.
Note that the same GitHub repository also has this spam.csv file and we're not going to take this entire file.
We're going to keep it nice and cheap.
We're just going to have a look at the top 200 examples or so.
200 examples isn't a whole lot but it's enough to maybe get some sort of measurement going so to say.
And the whole point with this data set is that we have these text messages that were sent to people and some of these messages were definitely spam and other messages were not spam.
This is the ham versus spam label.
And it's a little bit unbalanced.
Luckily, there are not that many spam messages.
There's definitely more non-spam messages.
But maybe it will be nice to see if we can get an LLM to actually handle this labeling for us.
And to be very clear, you could definitely use classic machine learning algorithms over here.
That would be a pretty good idea.
But for sake of a tangible example, let's just see if we can get LLMs to handle this task.
Now before moving on, let's just do a quick back of the envelope calculation.
If I recall correctly, it could take anywhere between four and five seconds if we were a little bit unlucky to make a request to OpenAI and get something back.
This depends a little bit on the task rate, but let's say it's around this ballpark.
If we have 200 examples and worst case scenario, they'll take five seconds.
Let's divide that by 60 seconds in a minute, but this would roughly take 15 minutes to run if I were to do one request after another.
And you know, if you're patient, that can be fine.
But in this particular case, we might as well go with some async code instead.
In a nutshell, the whole point of running something asynchronously can be thought of as a metaphor.
Let's say that you're baking a pizza, and let's say that you've got an oven.
Well, once the pizza is ready to go into the oven, you don't have to sit next to the oven and stare at the oven while it is in there.
While the pizza's in the oven, you can actually go and maybe prepare another pizza.
Writing your code using the async functionality in Python effectively means that if you are this person over here, that you're gonna spend less time looking at the oven and more time preparing more pizzas.
And for this course, we're gonna ignore most of the details of async that will be a course of its own, but it is good to keep in the back of your mind that we are going to be running our code somewhat concurrently and that I'm about to show you some code that does this.
|
|
show
|
2:59 |
So let's demonstrate some benefits of running code asynchronously.
Over here, I have a function that is asynchronous.
It doesn't really do much.
It just multiplies a number by two, but it also sleeps right there in the middle.
This is to effectively simulate communicating with a server.
It's going to take a little while before we get our response back.
What I'm able to do, thanks to this helper over here that I've got from a plugin, I'm able to run this function concurrently over many inputs.
And if I run this function, I also get this nice little progress bar on top.
But again, the way to think about this is that we are going to have lots of these inputs.
So I have a range full of inputs over here.
I have input zero that's going to go off and run on a function.
I've got input one.
I've got many of these inputs.
And I've also got this max concurrency setting over here.
So that means that when I hit number nine, that's going to be sent off.
But then we are not going to be running anything concurrently anymore until one of these items comes back.
And when we do, then we move on to the next item in the list, etc, etc.
But it does effectively mean that I've got this pool of 10 concurrent connections, I guess you could say, of things that are going to be running.
And to maybe perhaps put that in a metaphor, this is kind of like having those 10 ovens, such that we can always have 10 pizzas in a oven, so to say.
But just to confirm that this works, I can hit this play button to run this one cell.
And we're able to see that indeed it jumps a little bit and it seems to go in batches of 10.
And if I have a look at the right over here, we can see that this cell took about 10 seconds to run.
Given that we had 100 inputs and a concurrency of 10, that also makes sense.
So we're going to reuse this pattern a whole lot.
But instead of having a function that just sleeps, we are going to call a LLM asynchronously, which is something that the LLM library totally supports.
However, just like with the schema, it should be said that there are some models that support async, but not every model is supported here.
Some implementations of some of the plugins don't support this asynchronous method, which is good to keep in the back of your mind.
But if you're curious to know which models do support it, you can scroll down to see the full output list from the getAsyncModels function over here.
And these are all the asynchronous functions that you can go ahead and use.
Another thing to remember is that the fact that the function is async is great, but it could also be the case that you want to have a function that's both async and supports the schema.
And if memory serves, for example, GPT 3.5 turbo does not support schemas, even though it does support async.
So it can take a while to find the model that is just right.
For this course, though, what I'm going to go ahead and do is use GPT 4 and GPT 4.0.
And we're going to be comparing these two models.
Before I do, though, there's this one aspect of performance that I should discuss first.
Because if I run my code asynchronously, then that should definitely save me a whole lot of time.
But there's still something that can go wrong even when I run code asynchronously.
|
|
show
|
2:19 |
So let's consider that we are going to have a Python function, and let's also consider that it's going to be a synchronous Python function.
Then a really big benefit is that this function can run concurrently, which means that we will be waiting way less.
But why even wait?
And what I mean by that is, let's consider the inputs.
I'm going to have a text message, as well as a prompt that maybe goes into a function like this.
And then the LLM needs to provide some sort of a label, so to say.
But you can also imagine that I might try more than one LLM.
So sure, there's a bunch of inputs and an output.
But what you can also imagine is that I'm going to be trying out a whole bunch of different LLMs and maybe a whole bunch of different prompts to figure out which one works best.
But it would really be a shame that if I have a set of inputs that's exactly the same as something that I ran earlier, that I would still run this function.
It would be a whole lot nicer if I could maybe relay that off to some sort of a cache.
The setup here would be that we have our text message.
We have our prompt and maybe our LLM.
And then what we're going to do is we're first going to check, hey, was that combination something that we've used before?
We're going to check a little database for that.
And if that's the case, well, then we are just going to use the cache to give us the answer.
If the answer is no, then we are going to use our Python function to call the LLM and maybe do the big heavy compute.
But the benefit of having all your answers in a cache is kind of twofold.
For one, having this on disk is just kind of nice, because every time that you want to maybe run a new prompt, you will still have your old results on disk and available for comparison.
But the second really big benefit is that you don't suddenly incur big costs, because you never really want to be in a position where you accidentally forgot to save your results, and therefore you're going to have to run 500 of these requests again.
That's just going to cost you a lot of money if you're not careful.
So in the code that we're about to discuss, we're going to use a lightweight cache that you can use with Python called disk cache.
And the way that all of that works is we're going to be running a local SQLite database, but you're not going to interact with it directly.
You're going to be interacting with the disk cache library instead.
But it is good to know that all of your data is stored locally on disk in SQLite.
And you have manual control over it as well.
|
|
show
|
2:40 |
So with that bit of context, hopefully you understand why I have declared a cache variable over here.
I'm using disk cache, this points to a folder on disk, and this is just a reference to a SQLites table that is going to take care of all the caching on my behalf.
I'm also making a dictionary of models here, by the way, such that I can use a string to refer to a model from the LLM library.
Note, both of these two models are async.
I've also got a base prompt over here, as well as a base model that's defined.
And I've got my asynchronous function.
Note how it works internally though.
A text can go in, a prompt can go in, and a model can go in.
All that combined here becomes a tuple.
And this cache basically behaves like a Python dictionary does.
So I could say, hey, take that tuple.
And if that tuple already appears in the cache, well, then just take the tuple as a key and return the result.
So that's kind of like grabbing the value from a dictionary.
And if that's not the case, Oh, well, then we are going to do this concurrent calling of the LLM by selecting the right model, and then by sending it the prompt that we are interested in giving it.
Note, by the way, that in this case, the prompt will be something like, hey, is this spam or ham?
And then there's a little bit of text over here, which is the actual text from the message, but both have to go to the LLM.
Now, one thing to also remember with these async functions is that you do have to call them a little bit differently than normal.
Because the function is asynchronous, you're going to have to put this await keyword in front of it, just so that Python understands that this is indeed a synchronous function that we're going to be waiting until the function is done running.
But we can see that besides this async bit, it still behaves like a normal Python function.
It's just one that uses an LLM under the hood.
But this is also a function that can use the cache if we are interested in that.
And that also means that I can set up a pretty big loop over here where I have my classify function and I'm going to be looping over all sorts of texts that would like to get predicted.
And you can also see that when I run this function and when I inspect the progress bar that follows, that 200 of these examples get done in about 0.1 seconds.
And lo and behold, the reason is that I'm using a cache.
So even if I were to run this multiple times by accident, I still wouldn't wait or spend or waste a whole lot of time or tokens because everything is stored on disk in the first place.
And that's a really nice situation to be in in general.
If you're interested in running this yourself locally though, one bit of advice that I do have is that you do check this max concurrency setting because depending on your vendor, as well as the payment tier, this max concurrency setting needs to go up or down.
It can be that you hit a API request limit pretty quickly.
|
|
show
|
3:34 |
So let's check the results.
I have a few of them over here and let's also remind ourselves of what we're actually running.
The prompt over here is, is this spam or ham?
Repeat only with spam or ham.
That's the thing I'm asking for over here.
That prompt is passed along to the LLM together with the text that I wanted to infer.
And when I have a quick look, it seems that it gets about 66 and a half percent right.
And you could have a little bit of an argument, you know, I'm only taking 200 examples, is that enough?
but this does give us a pretty good ballpark figure.
I didn't stop there though, and I tried a whole bunch of settings down below just to see if maybe I could squeeze a bit more juice out of it.
In general, you could say that GPT-4 is a pretty good model, but GPT-4-0 is a better one, or at least it's more expensive.
And I also have two different prompts.
I have a prompt over here that's relatively short, and I've got a different prompt over here that's longer, but does a better job of explaining what we need.
After all, when I think about spammy text messages, especially back in the early aughts, then a lot of them were about promising cheap or free goods.
So in my mind, mentioning this might be good for performance, but it turns out that it actually depends a little bit more on the model itself as well.
We don't really see that much of an uplift if I add this little bit of context, but if I add this bit of context and if I use GPD 4.0, then I do seem to get a pretty substantial boost in performance.
Instead of being around 66, we are now around 72%, which I would argue is a bit better.
Though you should always keep in the back of your mind that this model is also more expensive, so this boost in accuracy may actually not be worth it, depending on how much extra dollars you're going to have to spend on this.
At this point in time, I don't want to suggest that what we've got over here is a good enough model for this particular use case.
After all, I don't know if 72% is actually good enough.
Could be that this number needs to be higher, and it can also be that it's really just too expensive to take this into consideration.
But the most important thing at this point in time is more the way that I'm going about this, the methodology, so to say.
There are lots and lots of variables to consider from the prompts to the model that I pick.
And if this is something I'm honestly thinking about automating, then I should at least think about measuring some base statistics.
And in the case of text classification, it's relatively easy because it is either correct or it isn't.
But the main thing I want to drive home is that this is something you actively want to think about.
And the best advice that I can give you is that yes, even though in this case, it's relatively easy, because we are trying to figure out a yes or no answer.
In practice, this whole idea of yes, no, that's easy.
That is something to keep in the back of your mind.
Sure, you might have to annotate some data yourself.
But one thing that you can try to keep track of is maybe you can come up with a very common failure scenario.
Maybe there's a type of mistake that the LLM tends to make.
And then over time, you can track, hey, which prompt makes this mistake more or less often.
And again, if you have a very specific scenario, that very quickly becomes a yes-no kind of a question, which means that you can start tracking these kinds of statistics.
Iteration really can be the name of the game here, and the only way to do iteration well is to also think about methodology or at least take it serious.
There are lots of ways to solve a problem, especially in LLM land, but there are also lots of ways to fool yourself if you're not careful.
So definitely do your future self a favor, take this iteration serious, and try to come up with useful metrics that can help tell you when a system is performing better or when it's performing worse.
|
|
show
|
2:12 |
If you scroll all the way at the end of this notebook, you will notice this note that I added myself manually.
It reads that running this entire experiment cost me about $2 in token costs.
I can argue, you know, $2, not too much.
But then again, I tried maybe six variants in total on these 200 examples.
And it's safe to say that if I were to think about real life scenarios at a business, 200 examples is nothing.
This cost can really skyrocket if you're not careful.
And there's also another aspect to all of this.
And that is the fact that, you know, because what we're dealing with here is a very typical machine learning classification kind of a problem, we actually have decades of work in other systems and algorithms that also solve this problem.
So I figured, you know, what might happen if I were to run this in a scikit-learn pipeline instead?
So I'm taking the exact same 200 examples to validate my results on.
I'm going to be taking 200 other examples to train a very basic system on.
And lo and behold, it seems that I get about 88% accuracy using a very standard, basic, scikit-learned little pipeline over here.
Now, that's not to say that if I were to maybe scroll up here to my list of different prompts, that maybe I could come up with a better prompt.
And it's also not to say that maybe there will be better models in the future over here.
And there might also be something to be said that this task that I'm dealing with over here is maybe also something that LLMs tend to have a little bit of difficulty with.
If I just have a look at some of the examples over here, which is something you should do in real life as well, you do see that there's definitely a lot of slang being mentioned here, not to mention horrible spelling.
And depending on what these LLMs are trained on, I can definitely imagine that it would have a hard time dealing with this slang speak and text messages.
But still, the larger overarching point over here is the fact that if you are dealing with a classic classification use case, then maybe the best defaults to benchmark against is a simple scikit-learn text classification pipeline.
Because if you can solve it with a simple scikit-learn classification pipeline then you are going to be off way cheaper than anything that you might be able to do with a LLM provider anytime soon.
|
|
show
|
2:37 |
In the previous section, we were dealing with a somewhat easy problem, if you think about it, when it comes to LLMs.
And what I mean by easy here isn't so much the prompts, which indeed were pretty easy.
The real thing that was easy here was the evaluation.
As mentioned before, we were dealing with a yes-no scenario.
Either the classification was correct or it wasn't.
And this really makes evaluation a whole lot easier.
But because LLMs tend to be so flexible, you sometimes want to tackle a problem that I'm going to call hard.
And again, the thing I want to zoom in on here is the evaluation of the whole setup.
As I hinted earlier, the way to turn a hard problem into an easier problem is to maybe cheat a little bit and to maybe wiggle around until you're able to figure out some sort of a way to turn the hard problem into a more simple yes-no formulation.
And as I will explain in a bit, This is actually more doable than it might seem originally.
But before talking about the how we're going to go about this, let me give you one example that I made as a hobby project of something that I would argue is definitely more of a hard problem.
This is a hobby project of mine.
If you go to thedailybin.com, you're going to see this newspaper-like experience that is fully LLM generated.
Now, to be clear, this is something I really just made for myself.
I just put it up online because it's easy for me to reach.
But what you're looking at here are summaries of discussions on Hacker News.
And for every single discussion, I'm trying to evaluate if it's a discussion I might be interested in reading.
For every single discussion, I'm also generating a title, I am generating a summary of the discussion, and I'm also generating a image that goes along with it.
There are lots of interesting steps here, and lots of moments where we need a prompt.
For example, if you've got lots of Hacker News posts, there is one prompt that needs to turn that into some sort of a subset because normally Hacker News has 30 articles or so on the site, but I want to turn that into a subset of 10, which are probably going to be the most interesting.
And for that, I'm going to need a prompt to be able to deal with that.
This, I might argue, is a relatively easy part of the task.
The task that is much harder is to take one of these Hacker News discussions and to actually turn that into a summary because what makes a good summary, what is a good style of writing, those are kind of hard to quantify up front.
You're Gonna want to read the entire article to make a judgment on that.
So in the next few lessons, I want to talk about problems that are kind of like this, where it's not necessarily going to be easy to declare some sort of a yes-no number originally, but as we'll see in a bit, there is a labeling trick that we can actually do, and that's going to help us out immensely.
|
|
show
|
2:19 |
So let's formalize everything just a little bit.
I have an LLM.
That LLM also comes with a prompt.
And this is going to output me some text.
Then in this case, it would be amazing if I could maybe take a text like this, apply some sort of a function that then turns that into a number that tells me how good that text is.
Many people have tried this and you can read lots of interesting academic papers on this topic.
But I have found nine times out of 10, especially if you're trying to do something in business, that relying on some sort of a magic formula that turns text into a number is a pretty bad strategy.
In this case, there's actually something you can do that's a bit simpler.
Because why just have one LLM in a prompt?
You can actually have two.
And let's assume that these are two different systems.
So I've got system A and system B, and maybe the LLMs are different.
Maybe the prompts are different.
That's not necessarily too important.
The main thing that's important is that we have some system A that can generate one, two, or many of these texts.
And the same thing we can do with system B.
And that means that we can always compare.
And the nice thing about comparing is that either something that was generated by process A was better, or something that was generated by process B.
And lo and behold, we now have our binary decision once again.
Whenever you're dealing with a system that's hard to quantify with some sort of qualitative label, you can always resort to this technique.
And in order to do this, you don't need a whole lot to get started.
If you have a Python notebook at your disposal, you can build an interface that can do exactly this.
And sure, you are still going to want to look at the data and actually do some of this annotation yourself, but I might also argue you don't need thousands of examples in order to draw a conclusion from this exercise.
Also note, by the way, that once you've done this, and let's say, for example, that you find out that option B is preferable over option A, then odds are that maybe you've learned something, and maybe you've got some inspiration to try yet another prompt.
And that means you can do another LLM prompt combination, generate some more texts, and now you're going to compare these two systems.
But if you're doing this, you're kind of in a good spot because lo and behold, at this point, you're doing iteration.
You're making continuous improvements until you end up with a system that genuinely feels like it does something right.
The art here really is to both look at the data and also try and be quantitative.
So let's dive into a notebook that actually shows you how you can do this.
|
|
show
|
4:11 |
So I've got my third notebook open over here that you can also find on GitHub.
And let's talk about some of the things that are implemented here.
There's a little bit of boilerplate on top.
I'm loading in some libraries and I'm making sure that my environment variables are set.
Then, like before, I'm grabbing a model, I'm setting up a cache, and I'm making this one function over here that can generate a haiku.
Or at least, I hope that's how you pronounce it.
It could also be haiku.
This function will accept a couple of things.
It's going to accept a prompt.
It's going to accept a topic.
but it's also going to accept a seed.
The thinking here is that I want to generate haikus about a specific topic.
So I'm going to need a prompt, I'm going to need a topic, but I would also like to generate more than one poem, and that's what this seed parameter is for.
By adding a term here for the seed, I'm going to prevent this cache from overriding it, because otherwise, if there was no seed over here, I can call this function with the same prompt and the same topic, and I will get the same response back every single time, Allowing for a seed value here, we are able to somewhat prevent that.
And then there's a couple of for loops over here that are really just meant to fill the cache.
So there's a list of topics, there's a list of prompts, and I'm going to try out four different seeds.
And that's what this function is going to go ahead and try out.
Now, what prompts and topics am I using?
I'm keeping it simple for now.
I'm saying write me a haiku about a topic, and write me a funny haiku about a different topic that rhymes.
For good measure, by the way, notice that I'm also passing this seed along here as part of the prompt.
And I'm doing this manually in an attempt to prevent the LLM from sending me the same response back every single time.
This is not a perfect strategy, and you should expect a couple of duplicates if you do things like this.
But for the intents and purposes of this notebook, this is a technique that works well enough.
If then, in hindsight, I have a look at what my cache contains, then I have this long list.
I have my prompt.
I have my topic.
Notice that in this particular case.
I'm not logging the LLM because I'm using the same one everywhere here.
But you can see that this one particular prompt generated this one particular result for this one particular topic.
And the same prompt and the same topic is being mentioned below here, but you can see that there is definitely a different poem happening at the same time.
So this is my data set with lots of texts that have been generated.
And again, I'm glancing over lots of details here.
But the next step is take that list, this stream of lots of examples, and to turn that into a stream that I can use for annotation.
This is a data frame that has lots of examples.
I'm taking that data frame and I'm joining it with itself.
By doing this, I get all sorts of combinations.
I make sure that I never get the same prompt on the left and right hand side.
And then I shuffle by sampling everything with a fraction of one.
And this gives me a random stream of things that I can go ahead and annotate.
And then we get to the important bit, which is over here, because this is the interface that allows me to do some light annotation.
This is a poem, this is a different one, and I can determine which one I might think is best.
In this particular case, I think this one is better.
Again, the right one.
I think I like the left one now a little bit better.
Note, by the way, that the way this is configured is that there are also keyboard shortcuts available for all of these buttons.
And I'm also able to skip, and this is useful because there could be all sorts of reasons why making a direct comparison doesn't make sense for a particular set of examples.
But as you can imagine, you can just do a little bit of clicking left and right.
And if I were to scroll down and again, go past through a bunch of boilerplate, then at the bottom over here, you can see all of my annotated examples.
I had a prompt on the left, a result on the left, a prompt on the right, as well as a result on the right.
And I've got an outcome that tells me which of these two prompts was better during this one annotation that I did.
And again, I can start building charts on top of this.
And again, I can also eventually draw some conclusions.
But this setup is, albeit somewhat minimal, definitely quite general.
Whenever you are in a situation that you have a thing that is very hard to directly quantify, the thing that you can at least try to do going forward is to take two things, A and B, and to compare them instead.
|
|
show
|
2:33 |
If you're not interested in using this notebook directly, feel free to skip this video.
But Marimo is doing something interesting here that deserves a little bit of a minor deep dive, just in case you are interested.
As you might remember, Marimo is a little bit different.
I can definitely say something like A is equal to 1, but what I cannot do is have another cell that then overwrites that cell.
That's not allowed, because A was already defined.
But then you might wonder, well, how does that work with annotations?
Because in that case, we would probably be dealing with a list instead of an integer.
But the same property does kind of hold here.
I am not really able to make a change to that list directly by redefining the variable.
Now, there are hacky things that I could do to maybe remedy and prevent me from doing stuff like this.
But to play it safe, it's better to use this abstraction that Marimo provides natively, which is this idea of a state.
So just to show how that works, you are able to declare state by calling mo.state.
And in this case, I'm saying the default value for that state is an empty list.
And then out of that come two functions.
The first one allows me to get the state that is currently relevant.
Or I have this other one that allows me to set the state.
So no surprise there.
If I just call getState like that, I get an empty list.
But if I now call setState, and if I were to do something like, well, Let's just take the old value by calling getState over here and then adding to that the number one as a list, so to say.
If I were to now run this one cell, we can see that getState automatically updates.
If I were to maybe run that again, then again, I can see an update.
And lo and behold, every time that I run something here, this state over here updates.
If you have a look at how all these buttons are implemented, by the way, you will notice that they call this update function.
Whenever there's a change to a button, I call update with a different parameter that goes in.
And if I scroll down, I can see what that update function does.
What does it do?
Well, it gets some state and it sets some state.
And this is the mechanic inside of a Marimo notebook that you are going to want to use if you are going to build annotation interfaces.
Going about state this way, as far as Marimo goes, is definitely more of advanced topics.
And when you're doing data analysis, you typically shouldn't muff around with any of this.
But in the case of making an annotation framework, it actually helps a lot to have a bit of state around, in which case, you do want to use this mechanic.
|
|
|
12:49 |
|
|
1:25 |
|
show
|
2:41 |
The first tool I want to discuss is this library over here called SmartFunk.
And full disclosure, this is a library that I made.
The idea of SmartFunk is to build on top of the library that Simon Willison made.
So it's really just doing stuff on top of this LLM library.
I figured I might be able to come up with a UI that makes it just easier for me to do very rapid prototyping.
Because in the end, the thing that I really want with LLMs usually is to literally just have some sort of a smart function, some sort of a Python function that is able to take some input and apply some LLM sauce to it to get the output that I'm interested in.
The base premise is that you're going to import a backend.
You can also import a asynchronous version of it, but you need some sort of a backend.
And a backend in general is just a string that points to a backend model that you can go ahead and call.
Then I have this LMify object over here.
This is something that I can use as a decorator.
The thinking is that if I were to apply this decorator to whatever Python function, that I then have something that is quote unquote smart.
And how does it work?
Well, it's going to apply itself to a Python function, and it's going to grab the decorator from that Python function, and it's going to take that as a prompt.
In fact, this docstring is written as a Jinja2 template, so it can also do something clever.
It can detect that I need some sort of text input.
And where could I grab this from?
Well, the original function also has an input.
So the steps here really are that I have a doc string over here.
I'm going to inject whatever input this function passes into my doc string.
That then becomes a prompt that is sent to the LLM.
And then I get something back in return that I'm hopefully interested in.
It's also designed in such a way that you can attach a type.
So I can define this pydantic object like so.
And I can add that as a type at the end over here.
And by doing this, I am also able to control the structure of the information that I get back like what we've seen before.
Again, what I'm doing here really is nothing more than syntactic sugar.
But I do like to look at this one implementation over here.
And I like how just dense it is.
It is still a Python function.
But with very few characters, I can actually go ahead and get started and try some things out.
And as you can see over here, when I give this function the keyword Pikachu, and when I ask it to describe a Pokemon, now you can see that indeed I get a summary with a pros and cons list.
And again, if you have a look at how little text I'm actually writing here, I would argue that this does feel like an improvement, if only for the rapid prototyping segment.
|
|
show
|
2:52 |
Now, because I made this tool, I definitely feel like this implementation here is very cute.
But if you take a step back, there are definitely also a few things wrong with it.
For starters, note how this function, when we define it, doesn't return anything.
But that's not what we're saying by doing this.
It's definitely cute to add type information here and to then have a decorator that can deal with it.
But if you are running a proper type checker in Python, then this is totally going to start throwing some errors.
And that's not great if you intend to use something like this in production.
Furthermore, you could also look at this and say, well, that docstring idea is also kind of cute, but it's also kind of indirect.
A more direct approach might be to say, well, what we can also do is maybe make a template here, maybe a proper one that actually uses Jinja.
We can then say, hey, just return the render of that template doing something like this.
And suddenly we are really just writing Python again and not doing any docstring magic.
And from a perspective of taste, I definitely think that this is nicer.
And it's something I got wrong when I initially designed this library.
Around the time that I started writing this though, I also started looking at this other project called Myroscope.
And it actually seems to take this idea of having a function that does most of the heavy lifting, but it is able to do that without any of the hacks that I've actually introduced here.
And just for comparison, if you were to use this Myroscope library, this is what the code would look like.
You would definitely add a decorator just like before.
But now inside of this decorator, you would have your provider, you would have a model to pick from.
And if you want to use a response model, then you add that to the decorator itself.
This way, the decorator can still do everything it needs to do in order to pass this definition along to the LLM.
But notice that as a direct consequence, we can still use correct type information.
We can say that Pokemon is a string that's an input to this function.
And this function actually returns a string without any surprises.
And that's because instead of doing doc string shenanigans, we just return a Python string.
And this Python string is going to be the prompt that we actually send to the LLM that's defined in the original decorator.
And lo and behold, if I were to have a look at the response, you see that we get something very similar back as well.
And also for good measure, by the way, if I have a look at this response that we get back, then we see that it is indeed a Pyranic type.
It's a summary object that we defined earlier.
And I will say in general, I think Myroscope really has good taste.
So if you are in the market for a LLM tool to take you further, I do think that this library has a lot of things going for it.
Also because it seems well maintained and there are people behind it that are actively working on it.
It also seems to support lots of different providers.
including some models that you can run locally on your machine as well.
|
|
show
|
2:01 |
In this course, we've mainly been using LLMs that are on the internet, but what you can do is you can also install this tool called Olama that makes it very easy to run these LLM models locally.
You don't get the same models as the LLM providers typically.
These are different models.
These are ones that are made to be open source, but you can definitely download a few that are really good.
There's a registry that you can pull a model from.
And if I were to just have a quick look at models already downloaded, then you can see that these models, although they're not really small, in the larger scheme of things, 8 gigabytes on disk really isn't much.
And let's just take this Gemma 3 model over here.
I'm on a Mac Mini, by the way.
And let's just run this to see how fast they can generate text.
So I'm going to call olama run this one model.
It takes a while to load, but I can start talking to it now.
So maybe something like describe Pikachu for me with a pros and cons list.
And there it goes.
This to me feels pretty quick.
I'm sure it could run even slightly faster if I wasn't recording at the same time, but I would not call this slow at all.
This feels like a pretty fast model.
And you can also see that it's definitely taking my instructions to heart, which is also good.
Both the LLM library as well as Miroscope can actually integrate with a llama that's running locally.
Definitely feel free to check documentation if that's something you're interested in.
But one thing to keep in the back of your mind is that, of course, if you're running a model locally, the async features aren't going to be that beneficial to you because you'll be bottlenecked by your own CPU.
Furthermore, structured output that we saw earlier, where we're able to define the JSON structure that we want to see, that's also something that's not necessarily supported by all the models.
Olama does support it for a few models out there, but definitely not all of them.
Another thing that's also good to point out is that some models also have support for images.
This holds for both the LLM library, Myroscope, as well as some of these OLAMA models.
So if you're interested in doing multimodal things, that is also something to keep in the back of your mind here.
|
|
show
|
3:50 |
Finally, there's this one other library that I think you should at least know about, and that is called Instructor.
If you scroll down on the front page of the project, then you're going to see this one example over here that does a really good job of explaining how you can use it.
In this case, what you got to do is you got to import the OpenAI client from the OpenAI library.
Instructor can then wrap around it.
Then this client is something that you can go ahead and use by passing it a model, by passing it a response model, like this definition of the structured output.
And then you can give it this dictionary where you can pass user and system interactions.
And you can also put the prompt in here.
And you could argue that this is a pretty elaborate way, but also quite feature complete to converse with an LLM.
Now, personally, I do feel that this is a little bit verbose.
I kind of don't like it that I need to know about the library that I'm using under the hood.
I personally prefer to just use strings here.
And I also think that the functional approach is a little bit nicer to look at compared to what I see over here.
But there is this one thing that Instructor does that is pretty dang neat that really does deserve attention.
And the easiest way to explain that is to look at another example that's on their docs.
Again, a Pydanic model is being defined over here.
But something that's being added that I think is pretty interesting is you can see that there's this custom validator over here.
After all, Pydanic, you can use it to specify some types, but you can also add extra validators.
In this somewhat silly example over here, we are demanding that the name field that's being passed along must be in uppercase.
And if that's not the case, a value error will be thrown.
So how does this chat completion actually use that?
Well, to draw the order of things, let's say that you've got your laptop over here and then you send your prompt to the LLM.
You can imagine that this prompt contains this bit of context over here, but it also has a little bit of extra text that's being added that describe these user details that you want to add.
Let's also assume that this is a LLM that doesn't support structured output.
Well, then one thing I can do is I can still internally add something to the prompt that makes it clear that I want to see JSON output.
And I can also take this user details base model, attach that to the prompt to make it clear what kind of output I want to get.
And then the LLM can give me a response back.
Then that response can be checked on my laptop.
and it could be the case that then this validation throws an error.
What I can then do is I can send a new prompt to the LLM that takes the old one but also tells me that an error was being thrown.
So this is the error prompt, so to say, and we can confront the LLM with the mistake that it made in the hope that we get a response back that is in fact better.
If that's not the case, then we can again adapt the prompt, send a new error message, and hope the LLM does better.
etc etc etc there's a retry mechanic in here that does something clever with this base model that could theoretically also be used on models that do not support structured outputs and although you could argue that maybe this feature was more useful back in the gpt 3.5 days because back in those days getting structured output was insanely hard and maybe future models are going to all support this natively, I do think the idea of having custom validators and using that to make the next prompt better, the fact that it can do that for you under the hood, that is a clever feature and something I do really appreciate.
So at least for me right now, I think that I might personally still prefer Myroscope for a lot of my projects, but I can definitely see that there might be these moments when I'm somewhat forced to still resort back to instructor simply because it has this mechanic implemented so well in it.
|
|
|
2:38 |
|
show
|
2:38 |
And this, I think, is a nice place to wrap up this first course in how to build things with LLMs.
At this point, I hope that you agree that, in the end, a lot of the stuff you've got to be mindful of is a little bit more in the methodology, as well as just understanding a business case.
I do want to part with maybe two lessons, though, that I learned personally that I do think are worth emphasizing at this point in time now that you've taken this entire course.
The first point right now, if you've never really done much with these LLMs before, go and find excuses to do so.
Try and find some motivating examples that automate something for you.
This course really should give you just enough building blocks to really consider doing things.
And the worst thing you could do right now is convince yourself that what you got to do is read a book if you've never really built something for yourself that's already quite useful.
So definitely go ahead and do that first.
The second thing to also just mention, which is a bit unfortunate, but something to keep in the back of your mind, is to consider that some of the best LLMs right now are hosted, particularly by companies like OpenAI or Anthropic.
And that comes with downsides if you ever want to do something in production, because what you're going to notice, if you're going to end up using their API, maybe in a cron job, odds are that you are going to see that they tend to go down a lot, at least at the time of making this recording.
I have had to add lots and lots of retry mechanics to my cron jobs that generate the daily bin, and I was honestly quite surprised at how often the pipeline would just break, simply because the backend couldn't handle the traffic that's being thrown at it right now.
Some of these issues might go away, but if you are considering doing something very serious in production, this phenomenon on its own might also be a reason to at least keep an eye on the open models as well.
After all, these you can host yourself, if only as a fallback scenario.
And it also really helps that these open models are getting better as time moves forward.
So even though maybe right now, OpenAI and Anthropic, they might have some of the more expensive models, but it might be okay because you could maybe argue they're quote unquote good.
That could be the case, but that shouldn't be a reason that you're going to use them all over the place and forever.
There are lots of other providers out there like Mistral, Gemini from Google, as well as some of the llama models from Facebook and tons of other ones from the community.
There can definitely still be good reasons to consider using these kinds of models, but I would argue it's a little bit dangerous to outright ignore all of these alternatives.
Just keep that in the back of your mind.
And thanks for listening to this course.
|