More on acoustics

The most important principle of physics on which verbal communication is based is that vibrating bodies send waves that are propagated in the environment. Our articulatory organs produce a number of vibrations; these vibrations need a medium to be transmitted through. The medium through which speech sounds travel is usually the air. A classical example of a vibrating body is the tuning fork. When you struck a tunig fork, its prongs move in one direction and then back to the starting point and then in the opposite direction again, to roughly the same extent. The movement is continued decreasingly until the vibration dies out completely.

Source of the picture:

It is because of friction with the environment that the movement eventually dies out. Ideally, if the vibrating body were placed in vacuum the energy of the initial impulse would be kept constant and the movement would continue for ever. However, as the vibrating body is surrounded by air, its movement is transmitted to the air molecules around, that vibrate accordingly. The vibration of the prong of the tuning fork can be represented graphically by a sound wave.

The vertical axis will measure the amplitude or intensity of the sound, while the horizontal one will measure the duration of the vibration in time. The amplitude of a sound wave corresponds to the loudness of the sound we perceive, and a simple rule is: The higher the amplitude is, the louder the sound. The conventional way in which we refer to the intensity or loudness or amplitude of sounds is that of using the decibel scale. The decibel scale does not express the absolute intensity of a sound, but the ratio between the intensity of a sound and a reference intensity. A complete movement, that is one starting from the initial point, going as far as the maximum amplitude, then back to the point of rest and beyond it to the maximum amplitude in the opposite direction and finally back again to the point of rest, is called a cycle. The higher the number of cycles per unit of time (second) is, the higher is the frequency of the vibration. The time it takes for a cycle to be completed is called the period of the vibration. Frequency is measured in cycles per second (cps) or Hertz. Sounds having a constant period (in other words sounds displaying a regular vibration) are called periodic sounds. The typical example for this kind of sounds are musical sounds.

However, in the case of other sounds, successive periods vary and these sounds are called aperiodic. In reality, periodic vibrations are seldom simple, the vibration being of a more complex kind than that represented by the simple sinusoidal wave described above. A vibrating body oscillates or vibrates at various intensities, the ensuing vibration of the entire body being a wave that is not sinusoidal and will differ from any of the simple sine waves of which it is the result. The sinusoidal components of any complex periodic sound are called the harmonics of the respective sound. The higher harmonics are integral multiples of the lowest harmonic which is called the fundamental frequency or the fundamental of the respective sound. Thus, if a sound has as its fundamental frequency 200 cps and one of its higher harmonics is of, say, 400 cps, we say that the latter is the 2nd harmonic of the sound since it is twice higher than the fundamental. Even though the various rates of vibration will result in a given timbre (tonality) of the sound, which is different from any of the harmonics, it will always be the fundamental that essentially defines (gives the quality of) a given sound. This kind of specification that includes the fundamental and the harmonics of a sound is called the spectrum of the respective sound.

An essential feature of any sound is its pitch. Pitch is, roughly speaking, the way in which we perceive the frequency of a sound, it is, in other words the perceptual correlate of the frequency of that sound. We can say that the higher the fundamental frequency of a sound is, the higher is the pitch of the respective sound, or rather that we perceive the sound as having a higher pitch. This correlation is not, however, linear as there is not always a direct proportionality between the frequency of a sound and our perception of that frequency. Pitch has a very important role in intonation as we shall see later. Pitch differs a lot from one speaker to another. Women, for instance, have shriller voices than men, therefore the pitch of their utterances will be higher. (The frequency of vocal cord vibration ranges, generally, between 80 and 200 Hz in men, while the vibration of women’s vocal cords can reach 400 Hz.)

How is it then that we recognize a sound as being “the same” even if it is pronounced by persons whose voices have very different pitches? The answer is that though the fundamental and the number of harmonics differ, obviously, in the two cases (the one with a lower pitch having a lower number of harmonics) the shape of the spectrum of the two sounds is pretty much the same in the sense that the harmonics with the greatest amplitude are at about the same frequency in both cases. While vowels and sonorants have spectra which resemble those of periodic sounds (of the kind musical sounds are), obstruents, and particularly the voiceless ones, are aperiodic sounds, which makes them pretty similar to pure noises.

Three are then the essential acoustic parametres that characterize a given sound (a sound having a certain quality): its amplitude or intensity, that we perceive as loudness; its frequency, that we perceive as pitch, and its duration. A given sound, therefore, say the vowel /e/, can be pronounced with various degrees of intensity, the amplitude varies therefore, but fundamentally the sound is the same. In spite of frequency variations (that we perceive as variations in pitch) in the pronunciation of the above-mentioned vowel by different persons, we will still identify the “same” sound. We can also vary the length of the vowel and we will still say that the sound hasn’t fundamentally changed its quality.

The anatomy and physiology of both the articulation and audition processes drastically limit the range of sounds that we can produce and perceive, respectively. In other words we can only utter sounds within a certain range of intensity and loudness and their duration is also limited. Conversely, our auditory system is able to perceive and analyze sounds whose frequency and intensity are situated between certain values and whose duration is limited. The vibrations of a body can be transmitted, often with a higher amplitude, by a phenomenon called resonance. Certain bodies have the property of transmitting vibrations in this way and they are called resonators. It is enough to think of musical instruments and this physical process becomes clear for everybody. If we take a violin, for instance, the strings play the role of vibrating bodies, while the body of the instrument acts as a resonator.

Source of the picture:

And this is true not only for string instruments, but for wind instruments as well. If we take a flute or a bassoon, we shall easily see that the air that is pushed into the instrument when we blow it makes the air already existing inside the instrument vibrate and the body of the instrument plays again the role of resonator.

A similar process can be witnessed in the case of speech. Remembering our description of the main articulators we shall again mention the glottis as the first essential segment of the speech tract that shapes the sounds that we produce. The vocal cords have the role of vibrating bodies while the pharynx, the oral and the nasal cavities, respectively, act as resonators. The versatility of these cavities (notably the oral cavity) that can easily modify their shape and degree of aperture, the mobility of the tongue and the complexity of the human speech producing mechanism enable human beings to articulate a remarkable variety of sounds in terms of their acoustic features. The initially weak vibrations of the vocal cords, having a wide range of frequencies, are taken over and amplified by the above mentioned resonators. The amplitude and frequency of the sounds that are further transmitted by the resonators depend very much on the size and shape of these resonators. Resonance does not characterize only cavities that modify the acoustic features of a sound, however. Vibrating bodies themselves are characterized by various degrees of resonance. Resonators can amplify or damp the formants of the given sound, by enhancing or suppressing various frequencies. This accounts for the wide variety in the parametres of sounds different human beings are able to produce. Each of the features of the articulators of an individual has an impact on the types of sounds that individual utters. The musicality of the sounds that we produce largely depends on the characteristics of our phonatory system, too. Vowels, for instance, have distinct and constant patterns of resonance (the resonating cavities assume certain shapes whenever a given sound is uttered) and thus we can always recognize the respective sound by its distinctive mark. The various positions of the soft palate will direct the air through either the oral or the nasal cavity or through both of them. This will give the sounds we produce a nasal or an oral character. As pointed out above, the shape and degree of openness of the mouth can vary. The tongue, the lips, the teeth, the movement of the mandible can also influence speech production assigning various acoustic characteristics to the sounds we articulate. The qualities of the vibrating bodies themselves (in our case the vocal cords) largely influence the timbre of the sound that is produced.

Speech perception also fundamentally relies on the vibrating characteristics of various membranes, on the possibility of transmitting these vibrations and converting them into neural impulses. Certain segments of the auditory system, too, act as resonators, amplifying the basic features of the sounds that reach our ear, or, on the contrary, damping these sounds, often in order to protect our auditory organs.

As we have said, acoustic phonetics is the branch of phonetics where data are most liable to measurements, quantification, etc. While we can hardly think of apparatuses being used in other linguistic fields like syntax or semantics, for instance, the situation is different in the case of phonetics, as scientists have devised various instruments that are used to provide an “image” of the way people speak and graphics representing the sounds we produce. Such an instrument is the acoustic spectrograph, an appliance similar in many ways to a seismograph, or to an electrocardiograph (devices that record seismic and heart activity respectively). It marks on paper the vibrations caused by speech sound production. The graphs they produce are called spectrograms and represent the frequency of the sound on the vertical and its duration on the horizontal.

Source of the picture:

The darker bands in the spectrogram are called the formants of the respective sounds and they represent the frequencies at which a greater amount of energy is spent. Normally, two or three formants at the most are used to describe a certain sound. Formants are essential for the acoustic representation of sounds and all voiced sounds have a formant structure. Different classes of sounds have, as shown above, different acoustic parametres. We have already mentioned the fact that, of the two major classes of sounds, vowels and consonants, the former are closer, acoustically speaking, to musical sounds, as their vibration comes closer to the ideal line of the periodic constant vibration. Vowels in their turn have distinct acoustic features. Front vowels, for instance, are acute sounds, displaying higher frequencies in their second formant (between 1800 and 2300 cps), while back vowels are, comparatively, graver sounds, their second formant ranging between 800 and 1000 cps. We can also distinguish between compact and diffuse vowels, depending on the way in which the main formants are close to each other or are wider apart in the spectrum of the sound. Thus, low or open vowels have their formants grouped towards the middle of the spectrum and are consequently compact, while high or close vowels are diffuse, the distance between their formants being greater. Consonants, on the other hand, can be clearly distinguished on the basis of their acoustic features. Non-peripheral (dental, alveolar, alveopalatal, palatal) sounds are acute, as their formants are situated among the upper frequencies of the spectrum, while peripheral consonants are grave, as their formants are situated among the lower frequencies of the spectrum.