Activa JavaScript para disfrutar de los vídeos de la Mediateca.
Recognizing audio - Contenido educativo
Ajuste de pantallaEl ajuste de pantalla se aprecia al ver el vídeo en pantalla completa. Elige la presentación que más te guste:
When you access the application urlv2-learningml.org for text recognition, image and number recognition,
00:00:00
you'll see there's a new type of recognition which is sound.
00:00:16
Let's see how it works.
00:00:19
We click on it and the three phases of supervised learning appear, just like in the rest of the recognitions.
00:00:22
The training phase to collect data, learning to build the model and the testing phase.
00:00:27
We're going to build a model that will be capable of recognizing my voice from a whistle and from the background noise in the room.
00:00:33
So we create the three classes we need, in this case voice, whistle and background.
00:00:39
Good, and now it's about adding examples of voice sound or whistle background sound.
00:00:54
I'm going to start with the voice because while I speak and explain how the recording works we'll be collecting voice samples.
00:00:59
when we want to collect sound samples we simply click record and then you'll see it will start
00:01:05
collecting sample recordings of about one second duration more or less and automatically that is
00:01:11
it will keep recording until we stop it if we're going to stop the recording stops in this case
00:01:16
it has collected 12 recordings of approximately one second of my voice if we want to play it back
00:01:21
to see what it has recorded we click here and we see how it has been collecting the different things
00:01:26
dive. Been saying here the interesting thing is to collect more the timbre because that's what
00:01:31
this recognizes quite well the timbre of sounds. If we don't like any of the samples we can simply
00:01:36
delete it let's imagine we don't want number 12 we click on the trash can button and it's deleted
00:01:42
now we're going to take sound samples. It's very important that the samples we take since they're
00:01:46
one second long that during that second or so that it's recording that it really records what we want.
00:01:52
that's why it's good to review afterwards how the samples were collected to see if it really
00:01:57
recorded what we wanted we always have to keep in mind that data quality is fundamental to then
00:02:02
obtain a good model well let's collect whistle sounds i'll stop it well 13 samples since that
00:02:08
number 13 brings bad luck and we're going to be a little superstitious we'll take advantage and
00:02:33
delete the last sample. Good and now we're going to take 12 background samples. I'll simply press
00:02:37
record, stay quiet and it will capture the ambient noise there is, a bit of the fan motor, anyway
00:02:44
there's always noise everywhere we go. Good, 12 samples more or less. Remember that it's important
00:02:50
that the number of data samples whatever they are whether sound, texts, numbers or in this case sound
00:03:09
it's important that each class has more or less the same number of samples what's called a balanced
00:03:15
data set. Good, we now have the sample data set. Now it's time for learning, that is, building the
00:03:20
model. We click here and well, the machine learning algorithm will analyze that data to build a model
00:03:28
capable of recognizing those three tambras. Good, it has been trained. It took 9.3 seconds and now
00:03:33
we're going to test it. To test it, well, we do the same as when we collected data. We press the
00:03:46
record button, in this case from the testing phase and see what happens. Well, first we stay quiet to
00:03:53
see if it picks up the background. Perfect, it picked up the background noise. Now I'm going to
00:03:59
speak. Hello, hello, hello. And again it got it right, it recognized the voice. And now I'm going
00:04:06
to make a small whistle. And we see it recognized the whistle. And well, this is how to build sound
00:04:14
recognition models well next i'm going to make a program with scratch that uses the model we just
00:04:21
created for sound recognition we click on the cat and we'll see that in the learning ml blocks
00:04:29
there's a new block called record audio this block works very similar to this record button
00:04:34
when executed it records a sound of approximately one second duration and that sound is converted
00:04:40
into a vector a vector that is multidimensional which is what will really be passed to the machine
00:04:46
learning algorithm to recognize it and how is classification performed? Well as we do with the
00:04:51
rest of classification problems with this classify item block what happens is that here we're going
00:04:57
to place audio as an argument. Let's see, let's try it. First we're going to execute it with silence
00:05:01
to see if it detects the background. Very good, now I'm going to execute it while speaking.
00:05:09
Hello, hello, hello, hello
00:05:20
And now I'm going to execute it while whistling
00:05:23
As we can see, it works exactly the same
00:05:27
As the rest of the recognitions
00:05:30
But in this case recording samples of one second duration
00:05:33
And with this we could make some type of program
00:05:37
For example, imagine making a model that recognizes the words up, down, left and right
00:05:40
and then with scratch make a program that moves the cat based on what the user is saying
00:05:46
that goes up when up is said down when down is said etc well that will be the subject of
00:05:52
another later video for now we'll stick with this so you get an idea of how this new learning ml
00:05:59
functionality works
00:06:04
- Idioma/s:
- Materias:
- Tecnología
- Etiquetas:
- Inteligencia Artificial
- Niveles educativos:
- ▼ Mostrar / ocultar niveles
- Educación Secundaria Obligatoria
- Ordinaria
- Primer Ciclo
- Primer Curso
- Segundo Curso
- Segundo Ciclo
- Tercer Curso
- Cuarto Curso
- Diversificacion Curricular 1
- Diversificacion Curricular 2
- Primer Ciclo
- Compensatoria
- Ordinaria
- Autor/es:
- Juan David Rodríguez García
- Subido por:
- David G.
- Licencia:
- Reconocimiento - No comercial - Compartir igual
- Visualizaciones:
- 11
- Fecha:
- 6 de agosto de 2025 - 0:29
- Visibilidad:
- Público
- Centro:
- IES MARIE CURIE Loeches
- Duración:
- 06′ 07″
- Relación de aspecto:
- 1.78:1
- Resolución:
- 1920x1080 píxeles
- Tamaño:
- 73.51 MBytes