Hearing voices is the quintessential sign of mental illness, especially schizophrenia, but Auditory Verbal Hallucinations (AVH) also show up in mania, psychotic depression, delirium, dementia, substance-use, and otherwise healthy people. Most individuals describe “hearing” external voices, which can be single or multiple, familiar or strange, male or female. They can comment on events, converse with each other, insult the hearer, or tell them to do things. They don’t even have to be voices, and might instead be animal or environmental sounds. But classically they consist of recognizable speech from outside the self.
As you’ve already guessed, we don’t understand how this happens. We can’t even agree whether voices are perceptions or some other mental process. We have several theories, which I will discuss at a high level, but none seem able to fully explain the phenomena.
Intrusive Memories
Auditory hallucinations could arise from erroneous activation of partial memories. We have a constant stream of suppressed mental activity. When inhibition fails, these thoughts pop up into awareness. To explain their alien quality, there must also be impairment in how the brain contextualizes memories. When both “hits” occur, the result is a flow of aberrant memories, such as speech fragments, that arise seemingly from nowhere.
This model purports to explain both verbal and non-verbal AH, identifiable voices, their personal and emotional quality, and the sense of external origin. The problem is that AVHs often have nothing to do with the person’s past life, and that people can converse with their voices. This is tough to explain as memory fragments.
Is it Me?
AVH could be a distorted inner monologue that’s mistaken for external speech. Auditory hallucinations are mostly people talking, so it’s a good bet that language regions of the brain are involved. First, talking to yourself generates “sub-vocalizations” in the larynx, which are detectable in hallucinating people. Second, imaging studies in people with AVH regularly find abnormalities in brain areas that control speech. For instance, studies have found activity in the speech-generating Broca’s area for auditory but not visual hallucinations. Others find change just in receptive language areas: inner hearing rather than inner speech.
This theory is attractive, but more so for thought insertion than AVH. It doesn’t seem to explain aspects like voices being perceived as specific people, the presence of multiple voices, or their often-derogatory tone. Why would self-generated inner speech consist of your mom saying that you’re horrible? Also, schizophrenics retain an inner monologue, so why do only some thoughts feel external?
Auditory Noise and Aberrant Salience
Auditory hallucinations are “heard,” implying a perceptual quality, and just as we see faces in neutral patterns, humans are prone to cast random sounds into voices. The idea here is that altered auditory processing leads to spurious noise being treated as signal, granted excess salience, then combined with memory representations and other top-down expectations.
This is predictive coding, a major paradigm in neuroscience. It emphasizes the brain as an information processor, constantly comparing how bottom-up information (sensory input) compares with top-down beliefs. When they match up you don’t pay attention, but a discrepancy indicates something to check out. Hallucinations would require both defective sensory processing and biased expectations, so that inaccurate perceptions don’t trigger an error signal and thus seem reliable.
This model gets points for using the terms predictive coding and salience, a must for any self-respecting neuroscience theory. Also, some studies find that more vivid hallucinations correspond to more sensory cortex activity. However, it’s hard to believe that AVH are just perceptions. If true, why do psychotic people mostly report auditory hallucinations, rarely visual, and hardly every olfactory or tactile? Doesn’t predictive coding apply there too? And why are they so often voices instead of dogs barking, music, or cars revving?
It’s a big leap from defective auditory processing to specific, articulate voices having a conversation with each other. As usual with predictive coding, “top-down expectations” does a lot of work here!
Feed-Forward Model
So perhaps a slightly different view would help. The feed-forward model posits a brain self-monitoring system. When the brain generates a motor command, it also sends an “efference copy” to sensory areas. This acts as a self-tag and comparator: the intended and perceived motion should cancel out, so to speak. If action occurs but the duplicate signal is defective or absent, the movement doesn’t feel internally generated. The self-tag is missing.
Feed-forward is an elegant explanation for how voices seem to arise outside the self, and an even better model for functional neurological disorders. Speech is a type of action, so it fits with the Inner Speech concept. On the other hand, it has the same problems: why do the voices sound like other people, and why do only some thoughts and actions lack the self-tag? Finally, it’s not clear how this model fits with overactive dopamine in mesolimbic circuits.
Conclusion
This is all unsatisfying. Predictive coding1 is the very model of a modern neuroscience theory, but it always seems to prove too much. Can’t everything wrong in the brain be chalked up to either erroneous top-down or bottom-up signals? With the primacy of recognizable speech in AVH - it’s hearing voices, not hearing leaf-rustling - brain language areas must be key. But going from inner speech to multiple identifiable voices is also a stretch, and here we are again.
Maybe classic Auditory Verbal Hallucinations need three or four “hits” to manifest: abnormal processing in speech production and speech reception and general auditory processing and biased expectations about the world.
Maybe hearing voices is not one thing but multiple similar things, like lightheadedness and dizziness and weakness, so no single model can explain it all. Or maybe it’s one thing with multiple causes, like atrial fibrillation (antipsychotics are the beta-blockers in this analogy).
Maybe we have unreliable data. We discount most of what psychotic people say; should we discount their descriptions of voices too? If top-down expectations are so important, then cultural memes around psychosis will prime people to experience and interpret hallucinations in certain ways, obscuring the underlying pathophysiology.
Maybe we can’t understand auditory hallucinations without a broader model for psychosis, distinct from the disease concept of schizophrenia, just as we distinguish the physiologic state of heart failure from various heart conditions.
References
Waters et al (2006). Auditory hallucinations in schizophrenia: Intrusive thoughts and forgotten memories. Cognitive Neuropsychiatry, 11(1), 65–83. https://doi.org/10.1080/13546800444000191
Jones SR, Fernyhough C. Thought as action: inner speech, self-monitoring, and auditory verbal hallucinations. Conscious Cogn. 2007 Jun;16(2):391-9. doi: 10.1016/j.concog.2005.12.003. Epub 2006 Feb 7.
Zmigrod L, Garrison JR, Carr J, Simons JS. The neural mechanisms of hallucinations: A quantitative meta-analysis of neuroimaging studies. Neurosci Biobehav Rev. 2016 Oct;69:113-23. doi: 10.1016/j.neubiorev.2016.05.037. Epub 2016 Jul 26. PMID: 27473935.
Waters et al. Auditory hallucinations in schizophrenia and nonschizophrenia populations: a review and integrated model of cognitive mechanisms. Schizophr Bull. 2012 Jun;38(4):683-93. doi: 10.1093/schbul/sbs045.
Hugdahl K. "Hearing voices": auditory hallucinations as failure of top-down control of bottom-up perceptual processes. Scand J Psychol. 2009 Dec;50(6):553-60. doi: 10.1111/j.1467-9450.2009.00775.x
Tracy DK, Shergill SS. Mechanisms Underlying Auditory Hallucinations-Understanding Perception without Stimulus. Brain Sci. 2013 Apr 26;3(2):642-69. doi: 10.3390/brainsci3020642.
Upthegrove, R., Broome, M. R., Caldwell, K., Ives, J., Oyebode, F., & Wood, S. J. (2016). Understanding auditory verbal hallucinations: A systematic review of current evidence. Acta Psychiatrica Scandinavica, 133(5), 352–367. https://doi.org/10.1111/acps.12531
Thakkar KN, Mathalon DH, Ford JM. Reconciling competing mechanisms posited to underlie auditory verbal hallucinations. Philos Trans R Soc Lond B Biol Sci. 2021 Feb;376(1817):20190702. doi: 10.1098/rstb.2019.0702.
McCleery A, Wynn JK, Mathalon DH, Roach BJ, Green MF. Hallucinations, neuroplasticity, and prediction errors in schizophrenia. Scand J Psychol. 2018 Feb;59(1):41-48. doi: 10.1111/sjop.12413.
Brain Language Regions Image from Ravi, Prakash, Harish & Korostenskaja, Milena & Castillo, Eduardo & Lee, Ki Hyeong & Baumgartner, James. (2017). Automatic Response Assessment in Regions of Language Cortex in Epilepsy Patients Using ECoG-based Functional Mapping and Machine Learning. 10.48550/arXiv.1706.01380.
Can we fit the feed-forward model into predictive processing? The tension is whether hallucinations result more from aberrant bottom-up noise or from overly strong top-down expectations about the world. For instance, psychotic delusions are neatly explained as your brain seeing patterns that aren’t there - fitting reality to match expectations. So, perhaps when the motor efference copy is missing, the brain weighs self-related information less. This mechanically shifts toward relying more on prior expectations. Thus weak and strong priors can co-exist.
V cool!