Emotional Cues in Music, Audibility Curves, Auditory Continuity and Dynamic Range

  • Optional Tutorial on calculating log
  • Wikipedia URL explains sound pressure level and calculating decibels.
  • CDMasteringServices.com, "What happened to dynamic range?"
  • Optional Current Event Reading: Is "Death Magnetic by Metallica too Loud?"
  • Optional: Isabelle Charrier URL
  • Optional Reading on the impact of music "getting louder", TIMESONLINE

  • Objectives
  • Identify categorical and emotional cues in music and vocal communication
  • Learn how to calculate sound pressure level
  • Learn two important cues used to identify sounds: sound level and sound frequency
  • Learn units for measuring sound level (pascals) and explain why we log transform it into decibels
  • Define appoggiatura
  • How do neurons change their respond to appoggiaturas in tone sequences?
  • Define dynamic range and give example of how compressed dynamic range can result in overall increased intensity of sound waveform
  • Describe the perceptual disadvantage of hot music
  • Give examples of how dynamic range can be used to emotionally shape music
  • Describe how to plot Audibility Curves that show hearing thresholds as a function of SPL and Frequency
  • Define auditory continuity, acoustic restoration, phonemic restoration, masking potential
  • Describe how acoustic properties of sound determine the effectiveness of phonemic restoration
  • Understand the difference between sound waveform and sonograms (spectrograms)
  • Describe the relationship between cochlear frequency sensitivity and the audibility curve
  • Explain how cochleotopy allows you to represent sound frequency components of sound separately

  • Sound Frequecy Cues. Animals rely on sensitivity to sound frequency cues for communicating and discriminating emotion or meaning of vocalizations.The range of frequencies that are audible varies considerably across animals. The diagram above ranks animals according to the highest sound frequency they can hear. Thus bats hear higher sound frequencies (150,000 Hz, 150 kHz) than dogs (400,000 Hz, 40 kHz) and humans hear only up to 20,000 Hz or 20 kHz. The behavioral sensitivity to frequency is directly related to the frequency sensitivity of the cochlea. The range of frequencies animals hear is correlated with the range of frequencies they generate with vocalizations (echolocation in bats; sonar vocalizations in dolphins; speech in humans). Animals with small heads are not sensitive to low frequency sounds because the low frequencies are not transferred through the very small ossicles of their middle ear. Animals with big heads (like elephants) can hear very low frequencies because the larger ossicles are capable of conducting the low frequency sound. The range of frequency sensitivity is related to the size of the cochleaanimals with a large cochlear distance (bast to apex) hear a large frequency range.

    Sound Intensity Cues. Two standard metrics are used to describe sound intensity. With one metric sound pressure changes are expresed in units of pascals or µ pascals (micro, µ = 1/1000,000). Audible natural and man-made sounds span a range of 20 to 200,000,000 pascals as illustrated above. To avoid having to write all those zeros it is standard practice to express sound intensities as decibels. The first step of calculating a decibel is to divide the actual pressure (Pactual) by an intensity threshold reference (Pref) so that the units of sound pressure level are easily compared across individuals in a group (e.g. land versus marine mammals). To calculate decibels first you divide the measured intensity (in pascals) by an ideal reference intensity (in pascals); then to you take the log base ten of that ratio and multiply by 20. For humans, the ideal reference intensity is the intensity threshold for hearing of a human measured when presenting sound through the air. For whales, the ideal intensity threshold for hearing can be the average intensity threshold for a whale measured in water. Natural and man-made sounds span a SPL range of 0 to 140 decibels (dB) as illustrated on the lower axis in the figure above.

    ( Pactual


    SPL= 20log10


    P (actual) is the sound pressure in pascals of a sound.
    P (reference) is the sound pressure in pascals of a universal reference sound
    The universal reference is the absolute sound level threshold (in pascal units) for human hearing in normal air.
    P (reference) in rare cases can be the sound level threshold (in pascals) for whale hearing or for human hearing under water!

    Human Audibility Curve. Above is a human audibility curve (Figure Source: Heffner, 2004). The standard way to compare the hearing range between animals is to test the audibility of sound frequencies at different sound intensity levels and to generate an audibility curve which is also called an audiogram. The human audiogram indicates that humans hear a frequency range between 20 and 20,000 Hz (or .020 and 20 kHz, respectively). Sound frequencies below 10 Hz and above 20 kHz are inaudible to humans. The minimum threshold (i.e. lowest intensity (dB) threshold) is 4,000 Hz (i.e. 4 kHz) indicating that humans are most sensitive to sound frequencies at or near 4,000 Hz. At ages beyond 20 years of age most people begin to loose their sensitivity to the highest sound frequencies and no longer hear 19 or 20 kHZ, as evident in this demonstration of the Mosquito ringtone (Mosquito Ringtone)

    Variation in Mammalian Hearing Shown in Audiograms. Audiograms and audibility curves can be measured in two ways. First, they can be measured behaviorally by having the animal detect and report sounds that are heard as you vary the sound intensity level (dB) and frequency (Hz). Second, you can measure the auditory brainstem response (ABR) with electrodes placed on the forehead and earlobes or near the ear canal. When electrodes are arranged in the standard way, they can measure the electric response of several spiral ganglian sensory fibers and brainstem neurons that fire in unison when the coclea is activated by a given sound. ABR measures do not require any subjective behavioral response and they can be done easily in subjects that have a hard time understanding instructions for behavioral tasks like babies and dolphins. Audiograms differ across animals. Hence what is "ultrasonic" or beyond hearing range to the human is not necessarily "ultrasonic" to small animals such as a mouse. Mice, bats and rats have a much smaller heads and ear canals and these small animals have maximal sensitivity for high frequency sounds (>10,000 Hz or >10kHz). Cats and dogs have maximum sensitivities that overlap with that of humans but extend into the higher frequency range. Notice that all the audiograms have a minimum where hearing requires lowest sound levels. This is called the audiogram minimum threshold. We learned earlier that sound frequency sensitivity changes as you move along the cochlear surface from base to apex. The audiogram reflects sensitivities to sound frequency and intensity level along the surface of the cochlea. In other words, it would take higher sound intensities to activate the base and apex of the cochlea than somewhere in the middle.

    Acoustic Properties of Sound: Waveforms and Sonograms. The top graph is a sound pressure waveform illustrating how sound pressure level changes over time for a mystery sound. To listen to the sound you may click here The bottom graph is the frequency decomposed sound called a sound frequency sonogram also called a spectrogram. In a sonogram (spectrogram) the sound pressure level is indicated by a gray color scale where the lowest and highest sound levels in the meow appear light gray and black, respectively. This meow contains sound pressure at frequencies ranging from 500 to 4500 kHz as evident by black bands (also called peaks) between the two cyan colored dotted lines. The sound intensity and frequency (y-axis) content of the meow change over time (x-axis). The dark band located at about 2500 Hz (red dotted line) is the highest intensity frequency component throughout most of this "meow" sound file. Since neurons in the primary auditory cortex respond primarily to one characteristic frequency (CF) information from many neurons in AI must be combined in order to build a neural code of the meow (Figure and soundfile source, Gehr et al., Hear Res. 2000)

    Frequency Cues for Grouping Sounds The ability to understand or comprehend speech is called speech intelligibility. Speech intelligibility drops when segments of a speech sound file are digitally cut out as shown in the above sound wave file and corresponding sonogram on the left. The speech becomes very garbled when eight 200 millisecond segments are removed as in this Sound file with 200 ms gaps. If you replace the gaps with segments of noise the speech sounds better as shown here: (the 6 kHz noise, 1.5 kHz speech with Gaps). This illustrates how sounds that are actually missing from speech signal can be filled in perceptually by the brain. This illusory phenomenon, known as the phonemic restoration effect demonstrates how resistant speech perception is to interuptions in noisy environments. With phonemic restoration there is increased apparent continuity and increased intelligibility of speech. The perceived increase in continuity of the sound when the gaps are filled with noise is called illusury auditory continuity and it is not restricted to speech. It is perceived in music, environmental sounds, and pure tones and is sometimes also called acoustic restoration. Phonemic restoration can be thought of as a special case of a more general category of auditory continuity illusions. In this demonstration a noise was used to "fill" the missing gaps of speech however you can get a similar effect when natural speech and sounds are masked due to noise such as coughing, hammering or music. Masking of speech occurs when speech and noise have similar intensity and frequency range. As a rule the phonemic restoration sounds more continuous if the 'restored' sound placed in the gap has similar center frequency. When the center frequency between the noise masker and the speech sounds is similar there is high masking potential and a high likelihood of hearing continuous speech. In the example given above, if you shift the center frequency of the noise so it has the same center frequency of the speech the perceived continuity of the speech increases and speech intelligibility improves. Example sound file (1500 Hz) which is also illustrated in the figure below. (Sound File and Figure Source: Makio Kashino (2006) Acoustic Soc & Tech)

    Phonemic Restoration With Metalica Here is another demonstration of phonemic restoration. First listen and try to guess what is said in this sentence with 200 ms Gaps. Then see if you hear more information when the gaps are filled in with Metallica music to create a phonemic restoration. This filling in process requires high level cortical brain processing. Speech with 100 ms Gaps
    Speech with 100 ms Gaps plust Metallica: Phonemic Restoration

    Conveying Emotion with Unexpected Sound Frequency Cues Music and vocal communications elicit emotion by including an element of surprise. For example, the musical score above illustrates a musical structure of a notoriously sad popular song Someone Like You by British singer songwriter Adele. This song like many sad songs includes unexpected key-changes or dissonant notes that create unexpected melodic structures. In the musical score above an unexpected low tone frequency note is slipped in right before the final note creating a sense of surprise this melodic structure is called an appoggiatura. Psychologists and neuroscientists find unexpected tone patterns evoke strong oftentimes excitatory neural responses in the auditory sensory pathways of mammals(Malmierca et al., 2012). Adeles song is so notorious for making people cry they made an SNL skit about it Adel
    (Saturday Night Live (SNL) skit)
    (Figure Source: Wall Street Journal)

    (Figure Source, Wall Street Journal)

    Conveying Emotion with Dynamic Range of Sound Intensity Cues. Dynamic range is the difference between the lowest and highest sound intensities we can hear. Dynamic range is measured in decibels (dB). The typical dynamic range for a cassette recording is around 60 dB, while computer discs (CDs, digital) can reach a dynamic range of 96dB. "For years we've tried to recreate the excitement of a live performance by trying to maintain as wide a dynamic range as possible. This has always been difficult with analog recording. We had to keep the softest signals above the noise floor while keeping the loudest signals below the level of distortion. To keep the soft signals from being buried in tape hiss, we had to record with as high a level as possible. To keep our loud signals from distorting, we had to compress the signal which resulted in a restricted dynamic range. As the years went by, many improvements were made in recorder and tape technology. This, along with various types of tape noise reduction systems, helped to improve the dynamic range of our recordings, but it was still limiting. Then one day we awoke to a new technology, 'digital recording.' Wow, now with a dynamic range of over 90 dB, our recordings could almost rival a live performance. Well, in theory, yes. However, the music industry had other ideas. "Rather than use this new technology to take advantage of it's wide dynamic range, the music industry followed the opposite direction. They decided that louder is better. Suddenly, we found ourselves in a race to see whose CD was the loudest. Hot music like that described below has less dynamic range because it is all loud.

    The Wall Street Journal ran a front page article picking up on this "hot" topic entitled, "Even Heavy-Metal Fans Complain That Today's Music is Too Loud!!!" If you find an article in the Wall Street Journal, you know there is big money involved. In the article, Ethan Smith summarizes the business side of why record labels are compressing the dynamic range of music. To summarize, they do it so that songs play loud and carry further from ipods to the environment surrounding the listener and hopefully sell more songs. The article suggests that Metallica has joined the "hot" music club with their release of "Death Magnetic". This song from 2008 has been digitally mastered to reduce the dynamic range and increase the average loudness... and some say to increase the mp3 and CD sales. You can hear the loss of dynamic range by comparing two Metallica songs "Blackened" versus "End of the Line". The figure above is a plot of the sound pressure (loudness) for these two songs. Note that the sound wave file for "Blackened" (top) has breaks in the loudness where there is no music. The quiet parts of the song contrast with the loud parts creating a large dynamic range. Dynamic range has always been important for music composition. The newer song by Metallica, End of the Line (bottom), takes out all the "quiet" stretches of music and boost the overall (average) loudness. This instructional video created by Matt Mayfield demonstrates how compression of music changes the musical experience.

    Dynamic range and other complexities such as frequency, pitch, key signature are important cues that can be used convey emotion and drama in music. Aristotle spoke of the similarity between our experiences of music and drama. In the Politics, he referred to music as the most "imitative" of the arts: ". . . music produces by its sounds the same effects that nature produces by human character in action. A good poem or a good song arouses in us the same feelings and emotions as do the actions of a man." More recently, German physiologist and physicist Hermann Helmholtz held that music can imitate and express not only overt physical emotions but also "the mental conditions which naturally evoke similar emotions, whether of the body or voice . . ." (Optional Reading: Adapted from Roger Bissell). In this videotape a young celloist, The Swan by Nathan Chan explains how he captures emotion in his rendering of the famous cello solo piece called "The Swan". UCONNs Library of Music: has a link to Naxos where you search for music. Another good example of use of dynamic range to convey emotion is Tchaikovsky: Serenade for Strings, Mvmt 3 (NYSA Live Aberystwyth International Music Festival).

    Advanced Reading, Demos and References
    Sound file with 200ms gaps filled with low amplitude noise
    Sound file with 200ms gaps filled with high amplitude noise
    Demo speech comprehension when interupted by a cough
    Demo speech comprehension when interupted by a cough but sentence is not completed.
    Cambiata Illusion
    Get High Now Auditory Illusions
    Figure source, Gehr et al., Hear Res. 2000
    Skidmore Demos
    Montreal Affective Voices download soundfiles
    Shahin and Miller, 2009, Advanced Reading Phonemic Restoration
    Figure Soundfile Source, Koshino 2006, Advanced Reading Phonemic Restoration
    Advanced Reading Phonemic Restoration
    Birds like the Attenboroug Lyre Bird rival mammals in the range and complexity of their songs.
    Reference: An Introduction to the Psychology of Hearing 5th ed, Brian C.J. Moore, Elsevier Press.
    Advanced Reading Review, Cochlear Mechanics in Mammals
    Advanced Reading Review, Evolution of low frequency hearing
    Advanced Reading Review, Relationship of spiral turns of cochlea to audible hearing range, West, 1985
    Advanced Reading Review, update on Von Bekesy (Nobel prize winner) cochlear traveling wave
    Advanced Reading Review, Popper lab vertebrate hearing
    Figure Source, Heffner, Anatomical Record Part A, 2004
    Frequency Hearing Range Several Mammals
    Someone Like You (Adele Cover) -Steve and Sam

    Topic cZar: Heather Read