31 May Music Therapy & Trauma:Insights from the Polyvagal Theory
Music is intertwined with emotions, affect regulation, and interpersonal social behavior and other psychological processes that describe basic human experiences in response to environmental, interpersonal, and even intrapersonal challenges. These psychological processes shape our sense of self, contribute to our abilities to form relationships, and determine whether we feel safe in various contexts or with specific people. Although these processes can be objectively observed and subjectively described, they represent a complex interplay between our psychological experience and our physiology.
This chapter will provide a novel insight into the traditions of music as a therapy aiding physical and mental health. Music therapy is more than listening to music or singing or playing a musical instrument. Music therapy involves active interactions among three features: 1) therapist, 2) client, and 3) music. In the following pages, the Polyvagal Theory will be used to present a plausible model to explain how and why music therapy would be helpful in supporting physical health and in enhancing function during compromised states associated with mental and physical illness including the consequences of trauma. The Polyvagal Theory provides a strategy to understand the mechanisms and processes that enable music and music therapy to improve social engagement behaviors and to enhance the regulation of bodily and behavioral state. The theory provides insights that bridge music therapy to the nervous system and health outcomes. The Polyvagal Theory will deconstruct music therapy into two components: 1) the interpersonal relationship between therapist and client, and 2) the acoustic features of music being used in the therapeutic setting.
The Polyvagal Theory: A primer
Our nervous system functions as a sentry by continuously evaluating risk in the environment. Through neural surveillance mechanisms (i.e., neuroception), our brain identifies features of risk or safety (Porges, 2004). Many of the features of risk and safety are not learned, but are hardwired into our nervous system and reflect adaptive strategies associated with our phylogenetic history. The way we react to the specific acoustic frequency bands that constitute music is determined by the same neural circuits that evaluate risk in our environment. For example, low frequency sounds elicit a sense of danger associated with approaching predator. Prokofiev, in Peter and the Wolf, exploits this biological feature by conveying the impending and predictable danger with the low frequency sounds of the kettledrums.
Specific acoustic frequency bands in the environment elicit different emotional experiences, which are paralleled by adaptive physiological states. Each of these physiological states is functionally an adaptive state that influences affect regulation, social engagement behaviors, and our ability to communicate. We experience these states with feelings of safety, danger, or ultimate demise (i.e., life threat). Physiological state is an implicit component of the subjective experience of listening to or producing music. Music changes, not only, our emotive state, but also elicits changes in our physiology that parallel the feelings of anxiety, fear, panic, and pain. For example, while listening to certain melodies, we relax, slow our heart rate and smile. However, while listening to other music we may start to imagine danger and visualize marching off to war or protecting our loved ones. The feelings of danger will change our facial expression and increase our heart rate.
As Oliver Sacks discussed in Musicophilia (Sacks, 2007), music appears to be part of the human experience, yet no brain area or circuit has been identified to explain or represent music. This chapter approaches this question differently and asks a different question. Rather than seeking specificity in the neural regulation required to process and to express music, the chapter will discuss the convergence and similarity between the neural mechanisms required to process music and the neural mechanisms required to process features of social engagement behaviors and risk in the environment. This convergence between physiological state and music-related emotional experience is neurophysiologically determined and explained by the Polyvagal Theory (Porges, 1995, 1997, 1998, 2001, 2003, 2007). The Polyvagal Theory will be used as an organizing principle to explain how music, and especially music when expressed via music therapy, can recruit the neural mechanisms that integrate facial muscles and visceral state, which in turn promotes restorative affective states and prosocial behavior.
The Polyvagal Theory emerged from the study of the evolution of the vertebrate autonomic nervous system. The theory is based on the functions of a part of our nervous system that automatically regulates several major organs such as the heart, the lungs, and the gut. Since the neural regulation of these organs occurs automatically and often without our awareness, the neural structures regulating these organs is known as the autonomic nervous system. The autonomic nervous system is dynamically regulated by our brain. The regulation is bidirectional with the brain and its neural sentries continuously monitoring body state and body state dynamically influencing brain function. Moreover, the neural regulation of the autonomic nervous system is linked to the neural regulation of the muscles of the face and head. The muscles of the face and head are involved both in listening to and in producing of music. The muscles that we use to signal our emotional state are involved not only in the production of vocal and instrumental music (i.e., via wind instruments), but also in the active process necessary to actively listen to music (i.e., the modulation of our middle ear muscles).
The Polyvagal Theory is particularly important in understanding the mechanisms underlying music therapy, which requires within the therapeutic setting both the processing of acoustic stimuli and face-to-face social interactions. Thus, the Polyvagal Theory provides insights into the beneficial effects of music therapy, since it provides an understanding of the neural control of structures involved in the two features of music therapy: 1) social interactions between the client and the therapist, and 2) listening and expressing music. Historically, the autonomic system was described as having two opposing components, one labeled sympathetic, and the other parasympathetic. This organizational model was used to describe the function of the autonomic nervous system in the late 1800s and the early 1900s. In the 1920s this paired-antagonism model was formalized (Langley, 1921). The paired-antagonism model characterized the function of the autonomic nervous system as a constant battle between the sympathetic nervous system (associated with fight/flight behaviors) and the parasympathetic nervous system (associated with growth, health, and restoration). Because most organs of the body, such as the heart, the lungs and the gut, have innervations from both sympathetic and parasympathetic components, the paired-antagonism model evolved into “balance theories.” Balance theories attempted to link “tonic” imbalances to both physical and mental health. For example, a sympathetic dominance might be related to symptoms of anxiety, hyperactivity, or impulsivity, while a parasympathetic dominance might be related to symptoms of depression or lethargy. In addition to the tonic features of autonomic state, the balance theories were assumed to explain the reactive features of the autonomic nervous system. .
The Polyvagal Theory proposes that the autonomic nervous system reacts to real world challenges, not as a balance system, but in a predictable hierarchical manner that parallels, in reverse, the phylogenetic history of the autonomic nervous system in vertebrates. In other words, if we study the evolutionary path through which the autonomic nervous system unfolded in vertebrates (i.e., from ancient jawless fish to bony fish, amphibians, reptiles, and mammals), we learn not only that there is an increase in the growth and complexity of the cortex, but also that there is a change in composition and function of the autonomic nervous system. In mammals, the autonomic nervous system functions as a hierarchical system that parallels phylogenetic states in reverse and not as the balance between sympathetic/ parasympathetic systems.
The Polyvagal Theory: The biobehavioral quest for safety, survival, and a painless death
To survive mammals must determine friend from foe, when an environment is safe, and communicate to their social unit. These survival-related behaviors limit the extent to which a mammal can be physically approached, whether vocalizations will be understood, and whether coalitions can be established. Moreover, these behavioral strategies, which are used to navigate through the “stress of life,” form the bedrock upon which social behaviors and higher cognitive processes can be developed and expressed. Thus, learning and other expansive mental processes must be structured, manipulated and studied within the context of how the environment fosters or ameliorates stress-related physiological states.
The Polyvagal Theory proposes that the evolution of the mammalian autonomic nervous system provides the neurophysiological substrates for affective processes and stress responses. The theory proposes that physiological state limits the range of adaptive behaviors and psychological experiences. Thus, the evolution of the nervous system determines the range of emotional expression, quality of communication, and the ability to regulate body and behavioral state including the expression and recovery of stress-related responses. Relevant to adaptive social and emotional behaviors, these phylogenetic principles illustrate the emergence of a brain-face-heart circuit and provide a basis, not only for investigating the relation between several features of mental health and autonomic regulation, but also for deconstructing how music and music therapy can support mental and physical health.
The investigation of human phylogenetic history identifies changes in neural regulation that occurred as vertebrates evolved from jawless fish to humans and other mammals. Phylogenetic development resulted in humans having an increased neural control of the heart via the myelinated mammalian vagal system (see above). The evolution of the myelinated vagus was paralleled by an enhanced neural control of the face, larynx, and pharynx. This integrated face-heart system enabled complex facial gestures and vocalizations associated with social communication to influence physiological states. The face-heart system can “cue” others of safety and danger via facial expressions or vocalization, while promoting transitory mobilization by increasing heart rate. Removing the myelinated vagal inhibition from the heart physiologically supports this biobehavioral process of mobilization. These mechanisms provide us with an understanding of how a warm smile simultaneously reflects a calm state and triggers calmness and a sense of safety and benevolence in the observer. In contrast, an angry face reflects a “mobilized” state and triggers a matching defensive state in the observer. Vocalizations, in addition to facial expression, can reflect and trigger bodily states. Similar to the smile described above representing a calm state, melodic patterns of vocalizations, which are not shrill or booming, provide convergent cues to the observer. However, a drop in pitch associated with a booming voice will startle individuals and make them scared. While a high pitch shrill voice reflects the anxiety and fear of another.
Three phylogenetically defined autonomic circuits supporting adaptive behaviors
The Polyvagal Theory emphasizes the neurophysiological and neuroanatomical distinction between the two branches of the vagus (i.e., tenth cranial nerve) and proposes that each vagal branch is associated with a different adaptive behavioral and physiological response strategy to cope with stressful events. The theory describes three phylogenetic stages in the development of the mammalian autonomic nervous system. These stages reflect the emergence of three distinct subsystems, which are phylogenetically ordered and behaviorally linked to social engagement, mobilization, and immobilization.
The theory emphasizes the phylogenetic origins of brain structures that regulate social and defensive behaviors. For example, prosocial behaviors cue others that the environment is safe. Safe environments signal the individual to dispense with the hypervigilance required to detect danger and allows this precautionary strategy to be replaced with social interactions that further calm and lead to close proximity and physical contact. The prototypical prosocial behaviors in mammals are related to nursing, reproduction, interactive play, and being able to be calm in the presence of another. In contrast, defensive behaviors could be categorized into two domains: one related to mobilization including fight and flight behaviors and the other related to immobilization and death feigning that might be associated with dissociative psychological states. Thus, if music therapy can trigger the circuits of social engagement it will not only support affect regulation and social interactions, but also promote health, growth, and restoration.
The Social Engagement System
As mammals evolved from more primitive vertebrates, a new face-heart circuit emerged to detect and to express signals of safety in the environment (e.g., to distinguish and to emit facial expressions and intonation of vocalizations) and to rapidly calm and turn off the defensive systems (i.e., via a myelinated vagal pathway to the heart) to foster proximity and social behavior. This recent neural circuit can be conceptualized as a Social Engagement System. The Social Engagement System involves pathways that travel through several cranial nerves (V, VII, IX, X and XI). These pathways regulate the expression, detection, and subjective experiences of affect and emotion. The Social Engagement System is an integrated system with both a somatomotor component regulating the striated muscles of the face and a visceromotor component regulating the heart and bronchi. The system is capable of dampening physiological arousal and stress reactions (e.g., activation of the sympathetic nervous system and HPA-axis activity). By calming the viscera and regulating facial muscles, this system enables and promotes positive social interactions in safe contexts. Neuroanatomically, this system includes special visceral efferent pathways that regulate the striated muscles of the face and head and the myelinated vagal fibers that regulate the heart and lungs (see Porges, 1998, 2001, 2003). The source nuclei for both the special visceral efferent and myelinated vagal pathways communicate with each other and originate in a similar area of the brainstem.
The Social Engagement System regulates facial muscles including the sphincter muscles around the eyes (e.g., promoting social gaze and emotional expressivity), middle ear muscles (e.g., extracting human voice from background sounds), muscles of mastication (e.g., ingestion), laryngeal and pharyngeal muscles (e.g., sucking, swallowing, vocalizing, breathing) and muscles of head turning and tilting (e.g., social gesture and orientation). Collectively, these muscles act as filters that limit social stimuli (e.g., observing facial features and listening to human voice) and determinants of engagement with the social environment. Interestingly, the neural pathways regulating the orbicularis oculi, a sphincter muscle around the eye involved in expressive displays, also are involved in the dynamic regulation of the stapedius muscle in the middle ear (Djupesland, 1976). Thus, the neural mechanisms for emotional cueing via eye contact are shared with those needed to listen to human voice. As a cluster, difficulties in behaviors associated with the Social Engagement System (e.g., avoidant gaze, nonresponsiveness to human voice, reduced facial affect and vocal prosody, and atypical or lack of head gesture) are common features of individuals with autism, PTSD, and other psychiatric disorders. Thus, astute clinicians infer from facial expressions and vocal prosody, difficulties in both social engagement behaviors and physiological state regulation
Human responses to trauma are devastating and compromise subsequent social behavior and emotion regulation. Understanding the mechanisms underlying the mammalian “hardwired” response to life threat, may demystify these debilitating consequences. From this neurophysiological perspective, a variety of clinical features, including severely compromised social behavior and difficulties in emotion regulation, are predictable. An understanding of the mechanisms mediating these atypical behaviors in response to trauma is helpful to the client, the family, and the therapist in developing supportive and restorative contexts and treatments. Functionally, our nervous system is continuously evaluating risk in the environment through an unconscious process of neuroception (see above). Specific features in the environment trigger physiological states associated with feelings of safety, danger, or ultimate demise. The human nervous system evolved efficiently to shift between conditions of safety and danger. We easily adjust and calm following situations requiring fight or flight maneuvers. We use social interactions with appropriate and contingent facial expressions, intonation of our voice (i.e., prosody), and gaze to calm and be calmed. However, in contrast to challenges of danger, reactions to life threat are not easily remediated. Attempts to socially engage a traumatized individual, rather than calming, may result in defensive strategies of rage and anger. Life threat triggers a very ancient neural circuit that severally limits social engagement behaviors and may distort neuroception resulting in a detection of risk when there is no apparent risk. Thus, treatment of trauma requires a new model distinct from the traditional psychotherapeutic strategies of face-to-face dialog to trigger the calm states associated with the social engagement system. Music may provide this portal to the social engagement system and avoid the initial face-to-face interactions that may be misinterpreted as threat in a traumatized individual.
As vertebrates evolved from reptiles to mammals, the structures at the end of the mandible (i.e., jaw bone) that define the middle ear bones became detached (Luo, Crompton, & Sun, 2001; Rowe, 1996; Wang, Hu, Meng, & Li, 2001). For humans and other mammals, sound in the environment impinges on the eardrum and is transduced from the eardrum to the inner ear via the small bones in the middle ear known as ossicles. When the stapedius (innervated via a branch of the facial nerve) and the tensor tympani (innervated via a branch of the trigeminal nerve) muscles are innervated, the ossicular chain becomes more rigid and dampens the amplitude of the low-frequency acoustic stimulation from the environment reaching the inner ear. This process is similar to tightening the skin on a kettledrum. When the skin is tightened, the pitch of the drum is higher. When the ossicular chain is tightened, similar to the stretched skin, only higher frequencies bouncing against the eardrum are transmitted to the inner ear and to the auditory processing areas of the brain.
The functional impact of the middle ear muscles on the perceived acoustic environment is to markedly attenuate the low-frequency backgrounds sounds that dominate most acoustic environments and to facilitate the extraction of high-frequency sounds associated with human voice (and other vocalizations made by mammals). Loud low frequency sounds functionally mask the soft high- frequency sounds associated with human voice. In humans, the ossicular chain is regulated primarily by the stapedius muscle and tensing the stapedius prevents this masking effect (Borg & Counter, 1989). In fact, individuals who can voluntarily contract middle ear muscles exhibit an attenuation of approximately 30 dB at frequencies below 500 Hz, while there is no or minimal attenuation at frequencies above 1000 Hz (Kryter, 1985).
The evolution of the mammalian middle ear enabled low amplitude, relatively high-frequency airborne sounds (i.e., sounds in the frequency of human voice) to be heard, even when the acoustic environment was dominated by low frequency sounds. Detached middle ear bones were a phylogenetic innovation that enabled mammals to communicate in a frequency band that could not be detected by reptiles. Reptiles can only hear lower frequencies due to a dependence on bone conduction.
Studies have demonstrated that the neural regulation of middle ear muscles, a necessary mechanism to extract the soft sounds of human voice from the loud sounds of low-frequency background noise, is defective in individuals with language delays, learning disabilities and autistic spectrum disorders (Thomas, McMurry, & Pillsbury, 1985). Middle ear infection (i.e., otitis media) may result in a total inability to elicit the ‘‘reflexive’’ contraction of the stapedius muscles (Yagi & Nakatani, 1987). Disorders that influence the neural function of the facial nerve (i.e., Bell’s palsy) not only influence the stapedius reflex (Ardic, Topaloglu, Oncel, Ardic, & Uguz, 1997), but also affect the patient’s ability to discriminate speech (Wormald, Rogers, & Gatehouse, 1995). The observed difficulties that individuals with a variety of physical and mental disorders have in extracting human voice from background sounds may be dependent on the same neural system that regulates facial expression. Thus, deficits in the Social Engagement System would compromise, not only the expression of emotion, but also social awareness and even language development.
The perception of sound is not equal at all frequencies. We hear sounds at low frequencies as if they were softer than they really are. In contrast, we are relatively accurate in estimating the acoustic energy of human voice. This phenomenon was initially reported as the Fletcher-Munson equal loudness contours (Fletcher & Munson, 1933) and illustrated how human perception attenuated the “loudness” of low frequency sounds. As measurement technologies improved, researchers refined the perceived loudness contours and sound meters were modified to include a scale known as dBA, which adjusted for the perceived differences in loudness as a function of frequency (i.e., the acoustic energy of lower frequencies had to be greatly increased to be perceived at the equivalent loudness of a higher frequencies). This contrasts to sound pressure level, which describes the physical energy of the signal and does not apply any perceptually-based weighting to the frequencies that constitute the acoustic stimulation. The perceptual process of hearing low frequency sounds as softer parallels the antimasking functions of the middle ear muscles (attenuating the sounds at low frequencies).
Consistent with the parallel between music and social communication, the same frequency band that characterizes melodies defines, in human voice, the frequency band in which all “information” (i.e., verbal content) is communicated. When this frequency band is weighted to enhance the understanding of voice, it is known as the “index of articulation” (Kryter, 1962) and more recently as the “speech intelligibility index” (ANSI, 1997). These indices emphasize the relative importance of specific frequencies in conveying the information embedded in human speech. In the normal ear, acoustic energy within the primary frequencies of these indices is not attenuated as it passes through the middle ear structures to the inner ear. The frequency band defining the index of articulation is similar to the frequency band that composers have historically selected to express melodies. It is also the frequency band that mothers have used to calm their infants by singing lullabies. Modulation of the acoustic energy within the frequencies of human voice that characterize music, similar to vocal prosody, will recruit and modulate the neural regulation of the middle ear muscles, functionally calm behavioral and physiological state, and promote more spontaneous social engagement behaviors.
Based on the Polyvagal Theory we are able to deconstruct Music Therapy into biobehavioral processes that stimulate the Social Engagement System. When the Social Engagement System is stimulated, the client responds both behaviorally and physiologically. First, the observable features of social engagement become more spontaneous and contingent. The face and voice become more expressive. Second, there is a change in physiological state regulation that is expressed in more regulated and calmer behavior. The improved state regulation is mediated by the myelinated vagus, which directly promotes health, growth, and restoration. However for some clients, especially those who have been traumatized, face-to-face interactions are threatening and do not elicit a neuroception of safety. If this is the case, then the Social Engagement System can potentially be triggered through vocal prosody or music while minimizing direct face-to-face interactions.
Music therapy provides a special portal to reengage the social engagement system that does not require an initial face-to-face interaction. Music can be used to stimulate the social engagement system without requiring face-to-face reciprocity. Since melodic music contains acoustic properties similar to vocal prosody, music may be used to recruit the social engagement system by challenging and modulating the neural regulation of the middle ear muscles. If the social engagement system is effectively recruited, positive facial expressions will emerge, eye gaze will spontaneously be directed at the therapist, and the traumatized individuals will shift to a more calm and positive physiological state. Stephen Porges – The Polyvagal theory