simple.ai by @dharmesh
Posts
A Mere Mortal's Visual Guide To AI Vector Embeddings

A Mere Mortal's Visual Guide To AI Vector Embeddings

so you can be the life of the party

Dharmesh Shah
January 10, 2024

I’m going to ask you to take a bit of a trust fall with me along two dimensions (geeky pun not intended).

1) Trust me that understanding, at a high level, what vector embeddings are and how they work is important. They’re a key element of generative AI and knowing what they are and how they work will expand your mental model when thinking about AI. They’re useful.

2) Trust me that you will be capable of understanding this even if you’re a mere mortal and not a developer and you haven’t ever written any code. I explained this to my 12 year old son and he loved it.

Back to Some Grade School Math

Wait, wait, math?! I know you’re tempted to just hit the back button or go work on that other task on your list that doesn’t involve math — but stick with me here for just a few minutes.

I promise this will be painless and it will lay the foundation for learning about vector embeddings in a way that most other people — even developers — often don’t.

We’re going to take baby steps.

Let’s start with a line or an axis. Starts at 0 and goes on for a bit.

Now we can put a point “A” anywhere on that line and the only thing we need to specify where A is on the line is a single number (could be positive or negative, but we’ll just stick with positive). So, if A could be 2 units from the origin (x=2).

This is what geeky people might call a “1 dimensional space”. Because, in this world, there’s just one dimension and when there’s just one dimension, you can describe any point in the space using just one number. In this case, 2.

It would look something like this (excuse my crude drawing skills).

OK, so we have one point A. It’s easy to then also imagine another point B that’s somewhere else on the line. Let’s say at 5 units. Once again, B needs just one number to specify where it is.

Now, we have something that might look like this (if drawn by a well-intentioned toddler).

Now, given that we have any two points and their “coordinates” (which in this case is a single number), we can intuitively see that it is possible to calculate the distance between those points. Here, we can just eyeball it and say the distance is 3 units. (We’re not going to actually have to calculate the distances — just understand that there is a distance and that it’s calculatable. (Yes, I know that’s not a word).

Now, let’s get a bit fancier — but not much. So far, we’ve had a single dimension (the x-axis) and so every point had just one number to describe where it was (it’s x-value).

We could add a second dimension, a y-axis. When we do that, in order to describe where any point is, we now need two numbers (an x-value and a y-value). The x-value tells us how far the point is from the origin along the x-axis and the y-value does the same for the y-axis.

This could look something like this.

So, let’s do a quick recap:

1) In a one-dimensional space (just the x-axis), every point is described with one number.

2) In a two-dimensional space (x and y axis), every point is described with two numbers.

3) Given that we know the coordinates for any two points, we can calculate the geometric distance between them. (We don’t have to get into the nitty-gritty of it and use Pythagoras’ theorem and what-not). We just have to understand that it could be calculated.

Movin’ On Up…To More Dimensions.

Now, here’s the first big jump in intuition we’re going to be making.

Just like we did with a 1 dimension space and a 2 dimension space, we could also go to 3 dimensions. Then, how many numbers would we need to describe each point? That’s right, 3 numbers. So in 3-dimensional space, the coordinate system uses 3 numbers.

So far so good. We actually deal with 3 dimensional spaces all the time in the world we live in, so it’s not hard to imagine.

Once again, if we knew the coordinates of any two points (3 numbers each), then we could calculate the distance between those two points.

Head Exploding Time

OK, now take a breath and let me hit you with something bigger.

Although we can’t easily visualize it, one could in theory imagine even more dimensions. 4, 5…100. Though we can’t visualize it, and it doesn’t exist in our physical world (that we know of), it’s completely fine as an abstraction. An idea.

So, let’s assume we had a 100 dimension space. How many numbers would we need to describe any point in that space? That’s right…100. The coordinate system would have a series of 100 numbers that described each point.

And, just like we could do with 1 dimension, 2 dimensions and 3 dimensions, even with 100 dimensions, if we knew the coordinates of any two points, we could calculate the distance between those two points. The math would be fancier, but it’s possible.

Congratulations — you’ve now made one of the biggest cognitive leaps required in this article. Give yourself a pat on the back.

Of course, we don’t have to stop at a 100. Why not 1,000? Why not 1,536? Doesn’t really matter right? The number of dimensions defines the coordinate system and we can still calculate the distance between any two points if we know the coordinates of those two points.

Assigning Meaning To Dimensions

We’ve been dealing with just abstract theories so far, but imagine if our dimensions were used to measure a particular attribute/quality of something.

Let’s say we had a 3 dimensional space and each dimension represented the taste characteristics of a particular food: sweet, sour and bitter (we’ll ignore salty and savory).

If we were to put a point in this space to represent a donut, it would likely have a high “sweet” value and a lower sour and bitter value. And if we plotted a point for a lemon, it’d have a higher sour value. We could also plot cupcake and gorgonzola cheese. If we plotted all these points in this space and calculated the distance between them, we’d discover that donut is closer to cupcake than it is to gorgonzola. Makes sense, right?

So, in theory, we could come up with a “system” to plot anything based on a set of “attributes”. And, we could use as many dimensions as we wanted/needed. Then, we could figure out how “close” one thing was to the other based on the coordinates of those things.

And, that’s the second cognitive leap. We can model all kinds of things using a n-dimensional space, and come up with a few to plot “points” within that space based on what each dimension represents.

And that’s basically what vector embeddings are. It’s just a fancy term for what is basically a set of coordinates for a point in an n-dimensional space used to describe something.

But, like I said…baby steps.

Vector Embedding Models

Let’s bring this back to AI and use a real-world example.

OpenAI has a piece of software called ada-002. It’s a vector embedding model. There are open source models too, but OpenAI’s is very commonly used (and what I use).

The vector embedding software can take English text and map it into a space with 1,536 dimensions. It does this based on the semantic meaning of that text.

So, let’s absorb that for a second.

You can pass a piece of text like:

Sentence A: “Computers have greatly amplified the power of humans”

It turns that into an embedding — which is nothing more than a series of 1,536 numbers.

In fact, for fun, here’s what the first set of 25 numbers looks like for that exact sentence (but trust me, I have the full 1,536).

-0.004466521553695202, -0.0015770458849146962, 0.016135841608047485, -0.027395540848374367, 0.00568555761128664, 0.012439410202205181, -0.019635654985904694, -0.012242792174220085, -0.015205180272459984, -0.02925686351954937, 0.008500481955707073, 0.012098604813218117, 0.005200564861297607, -0.003122960450127721, -0.0009470467921346426, 0.015467338263988495, 0.03937617316842079, -0.002954196184873581, 0.003295001806691289, -0.017066504806280136, -0.009300065226852894, 0.009509791620075703, 0.0033081097062677145, -0.012806432321667671, -0.027133382856845856

So, now we have the coordinates for a single point in 1,536 dimensional space that captures the meaning of Sample Sentence A about computers and humans.

But the vector embedding model can do that for any phrase, paragraph or even an entire document. It analyzes that piece of text and turns it into a vector embedding which is just a set of coordinates consisting of 1,536 numbers.

Now, imagine we took a different sentence and found it’s coordinates/embedding:

Sentence B: Homo sapiens are unique as a species in use of digital tools.

We could then “plot” that sentence too, with its own coordinates. It would be a different set of 1,536 numbers. And because we know the coordinates of both points, we could calculate the distance between those two “points”. And, that distance represents how close the two sentences are in meaning.

What we would find is that the distance is relatively short because the two sentences are related. This is despite the fact that one uses the word humans, the other homo sapiens. One uses computers and the other uses “digital tools”. It’s because vector embeddings capture the meaning of the text — not just the literal words.

And remember, the embedding model can do this for entire documents too — like a blog post or an essay. And, it can plot all those documents into this high-dimensional space.

Vector Embeddings and Semantic Search

OK, we’ve covered a lot of ground.

We now know that technology exists to take any piece of text and plot it as a point in a high-dimensional space based on its meaning and context. And, we can calculate the distance between any two points — so we can effectively find other pieces of text that are similar or related.

This unlocks a major capability: semantic search. Or search based on meaning, not literal keywords.

Let’s look at an example. Let’s say you had a million+ emails you’ve exchanged over the years. You could create a vector embedding for all of those emails and store them in what’s called a vector database (or vector store).

Then, instead of searching for emails using keywords (like you usually do), you could run queries like:

Netflix documentary about some defunct tech company named after a fruit.

And, here’s how it would work: It could create a vector embedding for that search query (which means we can plot the “point” in our high vector space). Then, we can use the vector database and find the closest “points” to the point representing our query based on the semantic distance. Voila! We have the emails where we discuss Blackberry, the documentary.

When you see this work, it’s quite magical.

Congrats! You now know what vector embeddings are, how they work and why they’re useful. And, you can be the life of the party wherever you go (YMMV).

In a future post, we’ll look at vector embeddings in action for Retrieval Augmented Generation (RAG) which is all the rage — and a commonly used approach to using LLMs to answer questions using a large body of private documents.