1 00:00:00,000 --> 00:00:04,320 People perceive the world through our senses, but how do machines perceive the world? 2 00:00:05,360 --> 00:00:12,820 Computers use different types of sensors, such as microphones, cameras, radars, or GPS receivers, among others, 3 00:00:13,080 --> 00:00:17,980 to receive information from the environment around them and build a representation of their surroundings. 4 00:00:20,890 --> 00:00:27,449 But computers only know how to work with numbers, so all the information they receive from their sensors must be stored as a set of numbers. 5 00:00:27,449 --> 00:00:35,270 For example, a black and white image is encoded as a matrix of numbers, where each value indicates the brightness of each pixel. 6 00:00:36,329 --> 00:00:43,409 If the image is in color, three numbers are stored for each pixel, representing the brightness of the red, green, and blue components. 7 00:00:44,509 --> 00:00:52,729 Sounds are also encoded as a series of numbers, which indicate the waveform values at different moments, taking hundreds or thousands of samples every second. 8 00:00:53,729 --> 00:00:56,570 Does the fact that a machine can receive information from the world 9 00:00:56,570 --> 00:00:59,030 already make it an artificial intelligence system? 10 00:01:00,289 --> 00:01:02,729 Well no, for us to consider it as such, 11 00:01:02,969 --> 00:01:05,870 it needs to be capable of extracting meaning from that information. 12 00:01:06,930 --> 00:01:10,569 Let's think about a supermarket door that opens when a sensor detects movement. 13 00:01:13,480 --> 00:01:16,780 The system is too simple to be able to perceive who or what is entering 14 00:01:16,780 --> 00:01:18,700 and make decisions based on this meaning. 15 00:01:22,469 --> 00:01:25,409 And thanks to this limitation we can enjoy the wonderful videos 16 00:01:25,409 --> 00:01:31,980 of wild animals walking through supermarket aisles, as Churisky and Garner joke in their 17 00:01:31,980 --> 00:01:38,840 chapter on artificial intelligence literacy in this magnificent work. But how do computers 18 00:01:38,840 --> 00:01:45,980 extract meaning from a set of numbers that represents an image, for example? This transformation 19 00:01:45,980 --> 00:01:50,640 from signal to meaning occurs in progressive stages through a process called feature extraction. 20 00:01:54,620 --> 00:01:58,319 On screen we have an image of a number for written by a person that the computer has 21 00:01:58,319 --> 00:02:03,799 already encoded into a matrix of numbers from the information from its camera. But how could it know 22 00:02:03,799 --> 00:02:10,199 that it's a 4 and not a 1 or a 7? By searching for specific combinations of values that represent 23 00:02:10,199 --> 00:02:16,219 light and dark pixels in small areas of the image, in this case 3 by 3 pixels, the location can be 24 00:02:16,219 --> 00:02:22,020 detected and the orientation of different edges in the image. Thus, the result of applying a filter 25 00:02:22,020 --> 00:02:26,740 to detect left edges is shown in the image on the right, where the areas detected as left edges 26 00:02:26,740 --> 00:02:33,000 appear marked in red. In blue the opposite areas are shown, that is, in this case the right edges. 27 00:02:42,000 --> 00:02:49,180 Let's now apply a filter to detect top edges. See? Well, through this staged process of feature 28 00:02:49,180 --> 00:02:54,139 extraction, in which different types of filters are used and combined, this is how a signal is 29 00:02:54,139 --> 00:02:59,740 transformed into meaning. With sound something very similar is done, for example for voice 30 00:02:59,740 --> 00:03:04,419 recognition, since each vowel and each consonant can be associated with different patterns of a 31 00:03:04,419 --> 00:03:09,080 spectrogram, which is a visual representation that allows identifying the different variations 32 00:03:09,080 --> 00:03:15,400 in frequency and intensity of sound. But there are artificial intelligence systems that are not 33 00:03:15,400 --> 00:03:21,860 only capable of translating audio into text, but also seem to understand these texts. But how can 34 00:03:21,860 --> 00:03:28,539 this be? How is this possible? Well, that's precisely what we're going to see in the next video.