Sound Frequecy Cues. Animals rely on sensitivity to sound frequency cues for communicating and discriminating emotion or meaning of vocalizations.The range of frequencies that are audible varies considerably across animals. The diagram above ranks animals according to the highest sound frequency they can hear. Thus bats hear higher sound frequencies (150,000 Hz, 150 kHz) than dogs (400,000 Hz, 40 kHz) and humans hear only up to 20,000 Hz or 20 kHz. The behavioral sensitivity to frequency is directly related to the frequency sensitivity of the cochlea. The range of frequencies animals hear is correlated with the range of frequencies they generate with vocalizations (echolocation in bats; sonar vocalizations in dolphins; speech in humans). Animals with small heads are not sensitive to low frequency sounds because the low frequencies are not transferred through the very small ossicles of their middle ear. Animals with big heads (like elephants) can hear very low frequencies because the larger ossicles are capable of conducting the low frequency sound. The range of frequency sensitivity is related to the size of the cochleaanimals with a large cochlear distance (bast to apex) hear a large frequency range.
Sound Intensity Cues. Two standard metrics are used to describe sound intensity. With one metric sound pressure changes are expresed in units of pascals or µ pascals (micro, µ = 1/1000,000). Audible natural and man-made sounds span a range of 20 to 200,000,000 pascals as illustrated above. To avoid having to write all those zeros it is standard practice to express sound intensities as decibels. The first step of calculating a decibel is to divide the actual pressure (Pactual) by an intensity threshold reference (Pref) so that the units of sound pressure level are easily compared across individuals in a group (e.g. land versus marine mammals). To calculate decibels first you divide the measured intensity (in pascals) by an ideal reference intensity (in pascals); then to you take the log base ten of that ratio and multiply by 20. For humans, the ideal reference intensity is the intensity threshold for hearing of a human measured when presenting sound through the air. For whales, the ideal intensity threshold for hearing can be the average intensity threshold for a whale measured in water. Natural and man-made sounds span a SPL range of 0 to 140 decibels (dB) as illustrated on the lower axis in the figure above.
| ( | Pactual | ) |
|
SPL= 20log10 |
|||
| Pref | |||
Human Audibility Curve. Above is a human audibility curve The standard way to compare the hearing range between animals is to test the audibility of sound frequencies at different sound intensity levels and to generate an audibility curve which is also called an audiogram. Humans hear a frequency range between 20 and 20,000 Hz (or .020 and 20 kHz, respectively). Sound frequencies below 10 Hz and above 20 kHz are inaudible to humans. Humans are most sensitive (lowest intensity thresholds) for 2,000 Hz (2 kHz) sound frequencies. At ages beyond 20 years of age most people begin to loose their sensitivity to the highest sound frequencies and no longer hear 19 or 20 kHZ, as evident in this demonstration of the Mosquito ringtone
(Mosquito Ringtone)
(Figure Source: Wikipedia)
Variation in Mammalian Hearing Shown in Audiograms. Audiograms and audibility curves can be measured in two ways. First, they can be measured behaviorally by having the animal detect and report sounds that are heard as you vary the sound intensity level (dB) and frequency (Hz). Second, you can measure the auditory brainstem response (ABR) with electrods placed on the forehead and earlobes. When electrodes are arranged in the standard way, they can measure the electric response of several spiral ganglian sensory fibers and brainstem neurons that fire in unison when the coclea is activated by a given sound. ABR measures do not require any subjective behavioral response and they can be done easily in subjects that have a hard time understanding instructions for behavioral tasks like babies and dolphins. Audiograms differ across animals. Hence what is "ultrasonic" or beyond hearing range to the human is not necessarily "ultrasonic" to small animals such as a mouse. Mice, bats and rats have a much smaller heads and ear canals and these small animals have maximal sensitivity for high frequency sounds (>10,000 Hz or >10kHz). Cats and dogs have maximum sensitivities that overlap with that of humans but extend into the higher frequency range. Notice that all the audiograms have a minimum where hearing requires lowest sound levels. This is called the audiogram minimum threshold. We learned earlier that sound frequency sensitivity changes as you move along the cochlear surface from base to apex. The audiogram reflects sensitivities to sound frequency and intensity level along the surface of the cochlea. In other words, it would take higher sound intensities to activate the base and apex of the cochlea than somewhere in the middle.
Acoustic Properties of Sound: Waveforms and Sonograms. The top graph is a sound pressure illustrating how sound pressure level changes over time for a mystery sound. To listen to the sound you may click here The bottom graph is the frequency decomposed sonogram or sound frequency sonogram. In a sonogram the sound pressure level is indicated by a gray color scale where the lowest and highest sound levels in the meow appear light gray and black, respectively. This meow contains sound pressure at frequencies ranging from 500 to 4500 kHz as evident by black bands (also called peaks) between the two cyan colored dotted lines. The sound intensity and frequency (y-axis) content of the meow change over time (x-axis). The dark band located at about 2500 Hz (red dotted line) is the highest intensity frequency component throughout most of this "meow" sound file. Since neurons in the primary auditory cortex respond primarily to one characteristic frequency (CF) information from many neurons in AI must be combined in order to build a neural code of the meow (a fancy word for sound or .wav or .mp3 file is an acoustic waveform).
Frequency Cues for Grouping Sounds The ability to understand or comprehend speech is called speech intelligibility. Speech intelligibility drops when segments of a speech sound file are digitally cut out as shown in the above sound wave file and corresponding sonogram on the left. The speech becomes very garbled when eight 200 millisecond segments are removed as in this Sound file with 200 ms gaps. If you replace the gaps with segments of noise the speech sounds better as shown here: (the 6 kHz noise, 1.5 kHz speech with Gaps). This illustrates how sounds that are actually missing from speech signal can be filled in perceptually by the brain. This illusory phenomenon, known as the phonemic restoration effect demonstrates how resistant speech perception is to interuptions in noisy environments. With phonemic restoration there is increased apparent continuity and increased intelligibility of speech. The perceived increase in continuity of the sound when the gaps are filled with noise is called illusury auditory continuity and it is not restricted to speech. It is perceived in music, environmental sounds, and pure tones and is sometimes also called acoustic restoration. Phonemic restoration can be thought of as a special case of a more general category of auditory continuity illusions. In this demonstration a noise was used to "fill" the missing gaps of speech however you can get a similar effect when natural speech and sounds are masked due to noise such as coughing, hammering or music. Masking of speech occurs when speech and noise have similar intensity and frequency range. As a rule the phonemic restoration sounds more continuous if the 'restored' sound placed in the gap has similar center frequency. When the center frequency between the noise masker and the speech sounds is similar there is high masking potential and a high likelihood of hearing continuous speech. In the example given above, if you shift the center frequency of the noise so it has the same center frequency of the speech the perceived continuity of the speech increases and speech intelligibility improves. Example sound file (1500 Hz) which is also illustrated in the figure below. (Sound File and Figure Source: Makio Kashino (2006) Acoustic Soc & Tech)
Phonemic Restoration With Metalica Here is another demonstration of phonemic restoration. First listen and try to guess what is said in this sentence with 200 ms Gaps. Then see if you hear more information when the gaps are filled in with Metallica music to create a phonemic restoration. This filling in process requires high level cortical brain processing. Speech with 100 ms Gaps Speech with 100 ms Gaps plust Metallica: Phonemic Restoration
Conveying Emotion with Unexpected Sound Frequency Cues Music and vocal communications elicit emotion by including an element of surprise. For example, the musical score above illustrates a musical structure of a notoriously sad popular song Someone Like You by British singer songwriter Adele. This song like many sad songs includes unexpected key-changes or dissonant notes that create unexpected melodic structures. In the musical score above an unexpected low tone frequency note is slipped in right before the final note creating a sense of surprise this melodic structure is called an appoggiatura. Psychologists and neuroscientists find unexpected tone patterns evoke strong oftentimes excitatory neural responses in the auditory sensory pathways of mammals(Malmierca et al., 2012). Adeles song is so notorious for making people cry they made an SNL skit about it (Saturday Night Live (SNL) skit) (Figure Source: Wall Street Journal)
Conveying Emotion with Unexpected Sound Frequency Cues Music and vocal communications elicit emotion by including an element of surprise. For example, the musical score above illustrates a musical structure of a notoriously sad popular song Someone Like You by British singer songwriter Adele. This song like many sad songs includes unexpected key-changes or dissonant notes that create unexpected melodic structures. In the musical score above an unexpected low tone frequency note is slipped in right before the final note creating a sense of surprise this melodic structure is called an appoggiatura. Psychologists and neuroscientists find unexpected tone patterns evoke strong oftentimes excitatory neural responses in the auditory sensory pathways of mammals(Malmierca et al., 2012). Adeles song is so notorious for making people cry they made an SNL skit about it (Saturday Night Live (SNL) skit) (Figure Source: Wall Street Journal)
Conveying Emotion with Dynamic Range of Sound Intensity Cues. Dynamic range is the difference between the lowest and highest sound intensities we can hear. Dynamic range is measured in decibels (dB). The typical dynamic range for a cassette recording is around 60 dB, while computer discs (CDs, digital) can reach a dynamic range of 96dB. "For years we've tried to recreate the excitement of a live performance by trying to maintain as wide a dynamic range as possible. This has always been difficult with analog recording. We had to keep the softest signals above the noise floor while keeping the loudest signals below the level of distortion. To keep the soft signals from being buried in tape hiss, we had to record with as high a level as possible. To keep our loud signals from distorting, we had to compress the signal which resulted in a restricted dynamic range. As the years went by, many improvements were made in recorder and tape technology. This, along with various types of tape noise reduction systems, helped to improve the dynamic range of our recordings, but it was still limiting.
Then one day we awoke to a new technology, 'digital recording.' Wow, now with a dynamic range of over 90 dB, our recordings could almost rival a live performance. Well, in theory, yes. However, the music industry had other ideas.
"Rather than use this new technology to take advantage of it's wide dynamic range, the music industry followed the opposite direction. They decided that louder is better. Suddenly, we found ourselves in a race to see whose CD was the loudest. Hot music like that described below has less dynamic range because it is all loud.
Dynamic range and other complexities such as frequency, pitch, key signature are important cues that can be used convey emotion and drama in music. Aristotle spoke of the similarity between our experiences of music and drama. In the Politics, he referred to music as the most "imitative" of the arts: ". . . music produces by its sounds the same effects that nature produces by human character in action. A good poem or a good song arouses in us the same feelings and emotions as do the actions of a man." More recently, German physiologist and physicist Hermann Helmholtz held that music can imitate and express not only overt physical motions but also "the mental conditions which naturally evoke similar emotions, whether of the body or voice . . ." (Optional Reading: Adapted from Roger Bissell).
In this videotape a young celloist, The Swan by Nathan Chan explains how he captures emotion in his rendering of the famous cello solo piece called "The Swan". UCONNs Library of Music:
has a link to Naxos where you search for music. Another good example of use of dynamic range to convey emotion is
Tchaikovsky: Serenade for Strings, Mvmt 3 (NYSA Live Aberystwyth International Music Festival).
(Figure Source, Wall Street Journal)
Advanced Reading, Demos and References
Sound file with 200ms gaps filled with low amplitude noise
Sound file with 200ms gaps filled with high amplitude noise
Demo speech comprehension when interupted by a cough
Demo speech comprehension when interupted by a cough but sentence is not completed.
Cambiata Illusion
Get High Now Auditory Illusions
>(Figure source, Gehr et al., Hear Res. 2000)
(Demonstration of real-time spectrogram)