In the video, we talked about how for stops, because they don’t have airflow happening during the sound itself, the sound while the stop is going on is pretty much the same no matter what stop it is: silence. The main clue we use to tell where the stop was made is formant transitions: the influence that the stop has on the articulation of the sounds before or after it.
But where the stop got made isn’t the only piece of information we need in order to tell what sound we heard. We also need to know about the voicing of the consonant - is it voiced, voiceless, or aspirated? But since there’s no airflow during the sound itself, it’s not like with a fricative like [z] or an approximant like [l]. The vocal fold vibration doesn’t carry throughout the course of the sound we’re making. Instead, what we pay attention to is when the voicing starts. This is known as voice onset timing, or VOT.
What this means is that for stops, we pay attention to really fine timing about when the vocal folds start up their vibration again. A difference of a few tens of milliseconds either way changes the category of sound we place it in. And we can look at spectrograms to see how this works.
So we can break this down as follows. It’s very easy to tell when a stop gets released, because suddenly, there’s a lot more noise in the spectrogram. Take a look at this spectrogram of [pʰi]:
In the first section, not much is happening (beyond some background noise on the mic). But then, the mouth opens, and air starts flowing out - suddenly, there’s a lot more noise, and therefore darkness, in the spectrogram. But it’s not until the last section, where we see the pulses at the bottom, that voicing actually starts. And if we look at the time stamps at the bottom, we can see it’s about 67 milliseconds after the release of the stop. So we say that this bilabial stop has a VOT of +67 ms.
That makes this sound an aspirated [pʰ]. If there’s an extended delay between when the stop is released and the voicing begins, so if the VOT is significantly positive, then the stop is said to be aspirated. And all aspirated stops are also voiceless, because, well, there’s no voicing around during the stop. But what if the delay is pretty short, like here in [stoɹ]?
The [t] is the part in the middle, and the period from when the [t] is released, to when the voicing starts, is pretty short, only about 11 ms. When the voicing starts pretty close to the release, we call that a voiceless stop. So VOTs around zero will be voiceless.
Finally, we can start having voicing occur a little bit before the stop is released. We can still start airflow up from our lungs we let go of a closure, and get some vibration going! We just can’t do it for very long before we let go. But we can see that voicing in the spectrogram before the release, like with this [bɹi]:
Here, we can see the release. And before it, we already see the pulses. This stop has a negative VOT, because voicing is starting before release, and that’s where we measure from. If you have a significantly negative VOT, then you’re a voiced stop, like the [b] here.
Now, these aren’t the only cues that we use to tell us about voicing for stops; we can also look at things like how long the stop closure lasts, how long the previous vowel was, etc. But VOT is a big measure for stops, and it’s one that different languages treat differently for carving up the time spectrum for what qualifies as what kind of stop. But that’s a topic for another day. For now, we’ll just leave it at this; what kind of voicing you have for a stop is largely just about timing.
So how about it? What do you all think? Let us know below, and we’ll be happy to talk with you about how we hear different consonants, or other sounds. There’s a lot of interesting stuff to say, and we want to hear what interests you!