1
00:00:00,370 --> 00:00:08,269
As we already know, computers can only work with numbers, and yet, there are computer systems that are capable of understanding our texts.

2
00:00:08,669 --> 00:00:09,810
How does this happen?

3
00:00:10,109 --> 00:00:17,210
What is used is a mechanism to translate words or phrases into a numerical representation known as embeddings.

4
00:00:17,809 --> 00:00:22,170
As Romea Jeremy Howard says in his book "AI Applications Without a PhD",

5
00:00:22,170 --> 00:00:30,030
the artificial intelligence community sometimes likes to use rather pompous names for concepts that are actually very simple.

6
00:00:30,370 --> 00:00:33,770
And with embeddings, this is somewhat the case. Let's see how they are built.

7
00:00:34,189 --> 00:00:41,649
Imagine we are in a situation where a numerical representation has already been assigned to a set of words, using two numbers.

8
00:00:42,570 --> 00:00:44,570
Where would we place the word "apple"?

9
00:00:45,490 --> 00:00:52,350
Near position A there are several round objects. Near position B there are words related to constructions.

10
00:00:53,030 --> 00:00:57,450
But at position C, we would have the word "apple" close to others related to fruits.

11
00:00:58,149 --> 00:01:02,750
This would be a good location, since the goal of embeddings is for similar words

12
00:01:02,750 --> 00:01:07,450
to correspond to nearby points, and words that are different to correspond

13
00:01:07,450 --> 00:01:08,650
to distant points.

14
00:01:09,750 --> 00:01:11,030
Let's see another example.

15
00:01:11,849 --> 00:01:17,049
Suppose we have already assigned a numerical representation to the words "dog", "puppy", and

16
00:01:17,049 --> 00:01:17,549
"calf".

17
00:01:18,090 --> 00:01:20,030
Where would we place the word "cow"?

18
00:01:20,810 --> 00:01:25,469
All three positions could make sense, but if we place it at position

19
00:01:25,469 --> 00:01:30,269
C, we would be capturing some relationships between the words, which is precisely another

20
00:01:30,269 --> 00:01:35,769
goal of embeddings. In this case, we would be capturing two analogies.

21
00:01:36,189 --> 00:01:42,650
On one hand, "puppy" is to "dog" as "calf" is to "cow". And on the other, "puppy" is to "calf",

22
00:01:42,909 --> 00:01:49,010
as "dog" is to "cow". Thus, this embedding would be capturing two properties of the

23
00:01:49,010 --> 00:01:54,790
words, age and size. And basically, these are embeddings. What happens is that the

24
00:01:54,790 --> 00:01:58,969
ones we use in real applications have hundreds or thousands of dimensions, meaning

25
00:01:58,969 --> 00:02:04,290
that a word is translated into a vector of hundreds or thousands of numbers.

26
00:02:04,290 --> 00:02:08,870
As detailed in the article associated with this video, these embeddings allow

27
00:02:08,870 --> 00:02:13,669
performing visualizations and classroom activities that are very interesting and that

28
00:02:13,669 --> 00:02:19,110
could be the 21st-century equivalent of learning to explore a dictionary.

29
00:02:19,110 --> 00:02:22,949
But these word embeddings have certain limitations when it comes to

30
00:02:22,949 --> 00:02:28,150
recognizing phrases, since the same word can mean different things depending on the

31
00:02:28,150 --> 00:02:34,050
context. Fortunately, since transformers were born with their attention mechanism that

32
00:02:34,050 --> 00:02:39,270
allows understanding the context, we also have embeddings that are capable of assigning a

33
00:02:39,270 --> 00:02:45,430
numerical representation to complete phrases coherently. Thus, we can see that the phrase

34
00:02:45,430 --> 00:02:51,250
"I like basketball more than anything" is semantically closer to "I love basketball"

35
00:02:51,250 --> 00:02:56,889
than the phrase "I love football", even though these last two share more words

36
00:02:56,889 --> 00:03:02,770
in common. And there are even multilingual phrase embeddings where phrases that

37
00:03:02,770 --> 00:03:07,330
mean the same thing in different languages ​​receive a close numerical representation.

38
00:03:07,669 --> 00:03:12,409
As we will see in future installments, these word and phrase embeddings are the basis

39
00:03:12,409 --> 00:03:17,710
of large language models like GPT-3 and Bloom. But until we get there, don't

40
00:03:17,710 --> 00:03:22,069
stop playing with the challenges and tasks we propose on our website, as they

41
00:03:22,069 --> 00:03:26,389
will allow you to interact directly with the internal workings of many

42
00:03:26,389 --> 00:03:28,830
of the artificial intelligence systems we use daily.