Technological bias is about more than audio quality—it’s about the forces that influence whose stories are told and how.
With the large number of women running for President this year, the word “shrill” is enjoying a resurgence in the national vocabulary, following its previous heyday, as an insult hurled at Hillary Clinton during the 2016 Presidential campaign. This spike in usage is hardly a revelation; women who speak publicly and challenge authority have long been dismissed as “shrill” or “grating.” What’s less widely understood is how the design of the technology that transmits human voices has shaped this gendered invective since the dawn of the broadcast era: everything from microphones to modes of transmission have been optimized for lower voices.
Starting in the late nineteenth century, women made up the majority of telephone-switchboard operators, but, when the new medium of commercial broadcast radio became popular, in the nineteen-twenties, women’s voices fell out of favor. When station directors were interviewed by Radio Broadcast magazine in 1924, they asserted that women sounded “shrill,” “nasal,” and “distorted” on the radio, and claimed that women’s higher voices created technical problems. The criticism didn’t end with pitch and timbre, however: the personality, authenticity, and sense of humor of female speakers were also questioned. Newspapers and magazines repeatedly referred to women on air as “affected,” “stiff,” “forced,” and “unnatural.” W. W. Rogers, a publicist for KDKA, in Pittsburgh, declared that “a woman speaker is rarely a success, and, if I were a broadcast manager, which I am not, I would permit few women lecturers to appear.” Charles Popenoe, the station manager of both WJZ and WJY, in New York, justified his avoidance of female announcers by offering an unscientific survey claiming that ninety-nine per cent of listeners preferred male to female announcers.
At the time, voice technology was still in flux, and many regulatory decisions proved to have lasting consequences. The proliferation of AM (amplitude-modulated) radio stations in the early nineteen-twenties led to frequent signal interference, and by 1927 Congress decided to intervene by regulating the bandwidth allotted to each station. Both as a result of these limitations and advances in telephony research, most broadcasters and equipment manufacturers eventually limited their signals to a range between three hundred and three thousand four hundred hertz—a range known as “voiceband”—which was viewed as the bare minimum amount of frequency information needed to adequately transmit speech. Unfortunately, the researchers and regulators who were deciding on this range primarily took lower voices into account when doing so. In the January 1927 issue of the Bell Laboratories Record, J. C. Steinberg, in a brief titled “Understanding Women,” quips that “man’s traditional inability to understand women may have a basis of fact if one so wishes to interpret certain recent experiments in our Laboratories.” Steinberg’s experiments showed that the voiceband frequencies reduced the intelligibility of female speech by cutting out the higher frequency components necessary for the perception of certain consonants. Steinberg asserted that “nature has so designed woman’s speech that it is always most effective when it is of soft and well modulated tone.” Hinting at the age-old notion that women are too emotional, he wrote that a woman’s raised voice would exceed the limitations of the equipment, thus reducing her clarity on air. He viewed this as a personal and biological failing on women’s part, not a technical one on his.
Experiments by the scientists Harvey Fletcher and Wilden Munson in 1933 showed that the human hearing apparatus is naturally more sensitive to frequencies between a thousand and seven thousand hertz, and that sounds in those ranges will be perceived as louder when emitted at an equal volume as those below a thousand hertz. This sensitivity likely has roots in evolutionary biology; warning calls for many species also sit in this range, and failure to hear them could mean death. For modern listeners, this sensitivity aids in the perception of consonants, which result from short, high-frequency noise bursts that punctuate the more continuous, lower-frequency pitched components that we perceive as vowels. However, for female voices, these noise bursts generally occur between five thousand and seven thousand hertz, whereas, for men, they lie below five thousand hertz. Capping a signal at three thousand four hundred hertz didn’t significantly impact intelligibility for many men, but it certainly did so for most women, because it removed a significant portion of the sonic information critical for consonant identification. This distortion was exacerbated by a common practice incited by the erroneous belief that women spoke more softly than men: engineers automatically turned up the volume knob when a woman took her place behind the mic. Many elements of female speech already sit in a range to which we are naturally more sensitive, and the improperly tuned equipment made women’s voices sound piercing or harsh.
In a 1928 Scientific American article titled “Why Is a Radio Soprano Unpopular?,” Steinberg admits that “the speech characteristics of women, when changed to electrical impulses, do not blend with the electrical impulses of our present day radio equipment.” While broadcast and voice technologies have significantly progressed during the last century, even the most up-to-date media formats still have yet to solve many of the double binds in which those with higher voices may find themselves. The development of frequency-modulation (FM) radio in the nineteen-thirties, which allowed for higher bandwidth signals and better audio quality, still didn’t significantly improve the lot of the spoken broadcast voice, especially because FM radio didn’t become more popular than AM radio until the nineteen seventies. Even today, many data-compression algorithms and bluetooth speakers disproportionately affect high frequencies and consonants, and women’s voices lose definition, sounding thin and tinny.
Consequently, women are still receiving the same advice that they were given in the nineteen-twenties: lower the pitch of your voice, and don’t show too much emotion. By following that advice, women expose themselves to another set of criticisms, which also have a long history: they lack personality, or they sound “forced” and “unnatural.” A 1906 description of switchboard operations notes that “the training of the voice to become soft, low, melodious, and to carry well is the most difficult lesson an operator has to learn.” In June, when a reporter asked the Democratic-debate coach Christine Jahnke how female candidates can avoid being considered shrill, she said she advises them to “very purposefully slow your pace and lower the tone a bit, because that will add meaning or gravitas to whatever it is you’re talking about.” At the first Democratic-primary debate, Kamala Harris’s measured tones as she addressed Joe Biden were widely lauded, whereas Kirsten Gillibrand was labelled as “shrill,” “affected,” and “inauthentic.”
A century of negative commentary on the female voice has had wide-ranging effects: a 1998 study of young Australian women found that the average frequency of female speech dropped twenty-three hertz between 1945 and 1993. Margaret Thatcher famously worked with voice coaches to hone her auditory image, dropping her voice sixty hertz between the nineteen-sixties and the nineteen-eighties. One of the most notable of the many bizarre deceptions in the Theranos saga involved Elizabeth Holmes’s deep voice; when I analyzed recordings of her speaking I found that the disparity between what is likely her real voice and her performative one is around a hundred hertz, which, in that range, is equivalent to nearly half an octave. (Her family denied that she changed her speaking voice.) And, of course, no discussion of lowered female speech would be complete without mentioning the increased prevalence of vocal fry—the gravelly, bottoming-out effect that occurs when a speaker lowers her voice to its most extreme limits. Unfortunately, many studies show that manipulating one’s voice is easily discernible, and particularly hurts the credibility of female speakers.
As the stakes now rise for the women on the campaign trail, they continue to face an unseen opponent in the form of the technology carrying their voices to voters. While old habits will probably continue to die hard, some of these biases can be counteracted with more conscientious audio engineering (which is admittedly difficult when negotiating the intense schedules, short staffing, and small budgets of the campaign trail). But technological bias is about more than audio quality—it’s about the forces that influence whose stories are told and how. In the end, the word “shrill” is not about the off-putting volume, pitch, or timbre of a woman’s voice—it’s an attempt to silence a voice. Whether as engineers, politicians, journalists, or artists, we must examine our own biases and work toward ensuring that everyone has an equal voice in our democracy.