1
00:00:00,000 --> 00:00:04,320
People perceive the world through our senses, but how do machines perceive the world?

2
00:00:05,360 --> 00:00:12,820
Computers use different types of sensors, such as microphones, cameras, radars, or GPS receivers, among others,

3
00:00:13,080 --> 00:00:17,980
to receive information from the environment around them and build a representation of their surroundings.

4
00:00:20,890 --> 00:00:27,449
But computers only know how to work with numbers, so all the information they receive from their sensors must be stored as a set of numbers.

5
00:00:27,449 --> 00:00:35,270
For example, a black and white image is encoded as a matrix of numbers, where each value indicates the brightness of each pixel.

6
00:00:36,329 --> 00:00:43,409
If the image is in color, three numbers are stored for each pixel, representing the brightness of the red, green, and blue components.

7
00:00:44,509 --> 00:00:52,729
Sounds are also encoded as a series of numbers, which indicate the waveform values at different moments, taking hundreds or thousands of samples every second.

8
00:00:53,729 --> 00:00:56,570
Does the fact that a machine can receive information from the world

9
00:00:56,570 --> 00:00:59,030
already make it an artificial intelligence system?

10
00:01:00,289 --> 00:01:02,729
Well no, for us to consider it as such,

11
00:01:02,969 --> 00:01:05,870
it needs to be capable of extracting meaning from that information.

12
00:01:06,930 --> 00:01:10,569
Let's think about a supermarket door that opens when a sensor detects movement.

13
00:01:13,480 --> 00:01:16,780
The system is too simple to be able to perceive who or what is entering

14
00:01:16,780 --> 00:01:18,700
and make decisions based on this meaning.

15
00:01:22,469 --> 00:01:25,409
And thanks to this limitation we can enjoy the wonderful videos

16
00:01:25,409 --> 00:01:31,980
of wild animals walking through supermarket aisles, as Churisky and Garner joke in their

17
00:01:31,980 --> 00:01:38,840
chapter on artificial intelligence literacy in this magnificent work. But how do computers

18
00:01:38,840 --> 00:01:45,980
extract meaning from a set of numbers that represents an image, for example? This transformation

19
00:01:45,980 --> 00:01:50,640
from signal to meaning occurs in progressive stages through a process called feature extraction.

20
00:01:54,620 --> 00:01:58,319
On screen we have an image of a number for written by a person that the computer has

21
00:01:58,319 --> 00:02:03,799
already encoded into a matrix of numbers from the information from its camera. But how could it know

22
00:02:03,799 --> 00:02:10,199
that it's a 4 and not a 1 or a 7? By searching for specific combinations of values that represent

23
00:02:10,199 --> 00:02:16,219
light and dark pixels in small areas of the image, in this case 3 by 3 pixels, the location can be

24
00:02:16,219 --> 00:02:22,020
detected and the orientation of different edges in the image. Thus, the result of applying a filter

25
00:02:22,020 --> 00:02:26,740
to detect left edges is shown in the image on the right, where the areas detected as left edges

26
00:02:26,740 --> 00:02:33,000
appear marked in red. In blue the opposite areas are shown, that is, in this case the right edges.

27
00:02:42,000 --> 00:02:49,180
Let's now apply a filter to detect top edges. See? Well, through this staged process of feature

28
00:02:49,180 --> 00:02:54,139
extraction, in which different types of filters are used and combined, this is how a signal is

29
00:02:54,139 --> 00:02:59,740
transformed into meaning. With sound something very similar is done, for example for voice

30
00:02:59,740 --> 00:03:04,419
recognition, since each vowel and each consonant can be associated with different patterns of a

31
00:03:04,419 --> 00:03:09,080
spectrogram, which is a visual representation that allows identifying the different variations

32
00:03:09,080 --> 00:03:15,400
in frequency and intensity of sound. But there are artificial intelligence systems that are not

33
00:03:15,400 --> 00:03:21,860
only capable of translating audio into text, but also seem to understand these texts. But how can

34
00:03:21,860 --> 00:03:28,539
this be? How is this possible? Well, that's precisely what we're going to see in the next video.