1
00:00:00,000 --> 00:00:05,580
As we already know, computers can only work with numbers, and yet, there are computer systems that

2
00:00:05,580 --> 00:00:12,000
are capable of understanding our texts. How does this happen? What is used is a mechanism to

3
00:00:12,000 --> 00:00:17,660
translate words or phrases into a numerical representation known as embeddings. As Jeremy

4
00:00:17,660 --> 00:00:22,920
Howard mentions in his book AI Applications Without Having a PhD, the artificial intelligence

5
00:00:22,920 --> 00:00:27,679
community sometimes likes to use somewhat pompous names for concepts that are actually very simple.

6
00:00:27,679 --> 00:00:30,640
And this is somewhat the case with embeddings.

7
00:00:31,719 --> 00:00:33,000
Let's see how they are built.

8
00:00:34,060 --> 00:00:41,079
Let's imagine we are in this situation where a numerical representation has already been assigned to a set of words using two numbers.

9
00:00:42,179 --> 00:00:43,600
Where would we place the word apple?

10
00:00:44,640 --> 00:00:47,140
Near position A there are several round objects.

11
00:00:48,240 --> 00:00:51,079
Near B there are words that have to do with constructions.

12
00:00:51,759 --> 00:00:55,679
But in position C we would have the word apple near others related to fruits.

13
00:00:55,679 --> 00:01:01,520
This would be a good location since the objective of embeddings is that similar words

14
00:01:01,520 --> 00:01:06,780
correspond to nearby points and words that are different correspond to distant points.

15
00:01:07,480 --> 00:01:12,859
Let's see another example. Suppose we have already assigned the numerical representation

16
00:01:12,859 --> 00:01:20,340
to the words dog, puppy and calf. Where would we place the word cow? All three positions could make

17
00:01:20,340 --> 00:01:24,859
some sense but if we place it in position C we would be capturing some relationships between

18
00:01:24,859 --> 00:01:29,620
the words, which is precisely another one of the objectives of embeddings.

19
00:01:29,620 --> 00:01:33,079
In this case we would be capturing two analogies.

20
00:01:33,079 --> 00:01:40,140
On one hand, puppy is to dog, what calf is to cow.

21
00:01:40,140 --> 00:01:43,799
And on the other, puppy is to calf, what dog is to cow.

22
00:01:43,799 --> 00:01:49,799
Thus, this embedding would be capturing two properties of the words age and size.

23
00:01:49,799 --> 00:01:52,560
And basically these are embeddings.

24
00:01:52,560 --> 00:01:57,079
What happens is that the ones we use in real applications have hundreds or thousands of

25
00:01:57,620 --> 00:02:02,200
that is to say, that a word translates to a vector of hundreds or thousands of numbers.

26
00:02:03,319 --> 00:02:08,780
As we detail in the article associated with this video, these embeddings allow for visualizations

27
00:02:08,780 --> 00:02:13,500
and classroom activities that are very interesting and that could be the 21st century equivalent of

28
00:02:13,500 --> 00:02:18,879
learning to explore a dictionary. But these word embeddings have certain limitations when it comes

29
00:02:18,879 --> 00:02:23,840
to recognizing sentences, since the same word can mean different things depending on the context.

30
00:02:23,840 --> 00:02:29,400
Fortunately, since transformers were born with their attention mechanism that allows understanding

31
00:02:29,400 --> 00:02:33,960
context, we now also have embeddings that are capable of assigning a numerical representation

32
00:02:33,960 --> 00:02:45,500
to complete sentences in a coherent way. Thus, we can see that the sentence nothing

33
00:02:45,500 --> 00:02:50,000
pleases me more than basketball is semantically closer to I love basketball than the sentence

34
00:02:50,000 --> 00:02:54,300
I love football, despite the fact that these last two share more identical words.

35
00:02:59,430 --> 00:03:03,090
And there are even multilingual sentence embeddings in which sentences that mean

36
00:03:03,090 --> 00:03:06,849
the same thing in different languages receive a close numerical representation.

37
00:03:07,629 --> 00:03:12,789
As we will see in upcoming episodes, these word and sentence embeddings are the foundation of

38
00:03:12,789 --> 00:03:18,849
large language models like GPT-3 and Bloom. But while we get to that, don't stop playing

39
00:03:18,849 --> 00:03:23,270
with the challenges and tasks we propose on our website, as they will allow you to interact

40
00:03:23,270 --> 00:03:28,030
directly with the internal workings of many of the artificial intelligence systems we use daily.