What Would You Do?

The journalism that tweaks reality, then reports what happens

On a Friday morning last January, a group of Washington, D.C., commuters played an unwitting role in an experiment. As they emerged from the L’Enfant Plaza metro station, they passed a man playing a violin. Dressed in a long-sleeved T-shirt, baseball cap, and jeans, an open case for donations at his feet, he looked like an ordinary busker. In reality, he was Joshua Bell, an internationally renowned musician. The idea was to gauge
whether Bell’s virtuosic playing would entice the rushing commuters to stop and listen.

The experiment’s mastermind was Washington Post staff writer Gene Weingarten, who had dreamed it up after seeing a talented keyboardist be completely ignored as he played outside another metro station. “I bet Yo-Yo Ma himself, if he were in disguise, couldn’t get through to these deadheads,” Weingarten says he thought at the time. Ma wasn’t available to test the hypothesis, but Bell was.

For three-quarters of an hour, Bell played six pieces, including some of the most difficult and celebrated in the classical canon. Of 1,097 passersby, twenty-seven made donations totaling just over $30. Seven stopped for more than a minute. The remaining 1,070 breezed by, barely aware of the supremely talented violinist in their midst.

When Weingarten’s account of the experiment ran in the Post’s magazine three months later, readers followed the narrative with rapt attention that contrasted starkly with the indifference of the commuters. The article was discussed on blogs and other forums devoted to classical music, pop culture, politics, and social science. Weingarten said he received more feedback from readers than he had for any other article he had written in his thirty-five-year career. Many were taken with the chutzpah of disguising Joshua Bell as a mendicant just to see what would happen. Others were shocked that people could ignore a world-class musician. Still others argued that the results were insignificant: rerun the experiment outdoors on a sunny day, they said, and Bell would draw a massive crowd.

I was one of those rapt readers, but I wasn’t quite sure what to make of the piece’s appeal. Was it just a clever gimmick or was there something more profound going on? At the same time, the story felt familiar. Indeed, Weingarten’s experiment was a recent entry in a journalistic genre with deep, quirky roots.

Working on a hunch that begs to be tested or simply struck with an idea for a good story, journalistic “experimenters,” for lack of a better term, step out of their customary role as observers and play with reality to see what will happen. At their worst, these experiments are little more than variations on reality-TV operations that traffic in voyeurism and shame. At their best, they manage to deliver discussion-worthy insights into contemporary society and human nature. The very best, perhaps, serve up a bit of both. In any case, the growing number of journalists and news operations who do this sort of thing are heirs to a brand of social psychology practiced from the postwar years through the early seventies. During this period, considered by some the golden age of the discipline, experiments were bold and elaborately designed and frequently produced startling results. Many were conducted outside the laboratory and often placed subjects in stressful or disturbing situations.

These experiments also have roots in forms of investigative, immersion, and stunt journalism that have been practiced for more than a century. In 1887, while working on an exposé of asylum conditions, muckraker Nellie Bly demonstrated that one could feign insanity to gain admission to a madhouse—and when she began to insist that she was in fact perfectly sane, doctors interpreted her claims as delusions. In so doing, Bly anticipated psychologist David Rosenhan’s classic 1972 experiment in which “pseudopatients” claiming to hear voices were admitted to psychiatric hospitals and then kept for an average of several weeks despite reverting to sane behavior.

It’s difficult to pinpoint when the genre shifted, but by 1974, when New York City’s WNBC-TV asked its viewers to call in and pick the perpetrator of a staged purse snatching from a lineup of suspects, the journalistic experiment had attained its modern form. The station was flooded with calls and, after fielding over 2,100, cut the experiment short. The results: respondents picked the correct assailant no more frequently than they would have by guessing.

Over the last decade, as best-sellers such as The Tipping Point and Freakonomics have lent social science a sheen of counterintuitive hipness and reality television has tapped into a cultural fascination with how people behave in contrived situations, journalistic experimentation has become increasingly common. In addition to The Washington Post Magazine, it has been featured in The New York Times, Harper’s, and Reader’s Digest. Its most regular home, however, has been on network-television newsmagazines.

ABC’s Primetime has staged a series of experiments in recent years under the rubric “What Would You Do?” which enact provocative scenarios while hidden cameras capture the reactions of the public. Chris Whipple, the producer who conceived the series, refers to it as a “Candid Camera of ethics.” Starting with a nanny verbally abusing a child, the series has gone on to present similar scenarios: an eldercare attendant ruthlessly mocking an old man; a group of adolescents bullying a chubby kid; a man viciously berating his girlfriend, seeming on the verge of violence; etc.

The sequences tend to begin with the narrator pointing out that many pass right by the incident. Several witnesses are confronted and asked to explain why they didn’t step in. One man, who gave the fighting couple a long look before continuing on his way, reveals that he is an off-duty cop and says he determined that no laws were being broken, so there was nothing for him to do. The focus shifts to those who did intervene, and the camera lingers over the confrontations, playing up the drama.

These experiments are, in a sense, the flip side of the reality-TV coin: rather than show how people act in manufactured situations when they know they’re being watched, they show us how people act when they don’t. And the experiments have clearly appealed to viewers. From the first minutes of its first hour, when its ratings doubled those of the previous week, “What Would You Do?” has been a success. After appearing periodically in 2005 and 2006, ABC ordered five new hours that were scheduled to air last November before the writers’ strike put them on hold. It is, Whipple says, highly “watchable” television.

In the world of print, Reader’s Digest has come closest to making such experiments a franchise. Over the last two years, the magazine has pitted cities around the world against each other in tests of helpfulness and courtesy, to determine which city is most hospitable. The first round used the following three gauges to separate the rude from the solicitous in thirty-five cities: the percentage of people who picked up papers dropped by an experimenter; the percentage who held the door for experimenters when entering buildings; and the percentage of clerks who said “Thank you” after a sale. When the scores were tallied, it was clear that Reader’s Digest had hit the counterintuition jackpot: the winner was New York City. According to Simon Hemelryk, an editor with the UK edition of Reader’s Digest who came up with the idea for the tests, the press response was “totally, totally mad.” Hundreds of media outlets picked up the story. David Letterman presented a tongue-in-cheek, top-ten list of the “Signs New York City Is Becoming More Polite.”

The notion that New Yorkers are more polite than commonly believed was also at the center of a 2004 experiment conducted by The New York Times. Reenacting an experiment originally performed by graduate students of social psychologist Stanley Milgram at the City University of New York in the early seventies, two Times reporters asked riders on crowded subway cars to relinquish their seats. Remarkably, thirteen of fifteen did so. But the reporters found that crossing the unspoken social boundaries of the subway came at a cost: once seated, they grew tense, unable to make eye contact with their fellow passengers. Jennifer Medina, one of the reporters, says that she and Anthony Ramirez, her partner on the story, found the assignment ludicrous at first. “It was like, ‘What? Really? You want me to do what?’” she says. “We made so much fun of it while we were doing it, but we got so much feedback. It was one of those stories that people really talked about.” And papers around the world took notice: within weeks, reporters in London, Glasgow, Dublin, and Melbourne had repeated the experiment.

In these journalistic experiments, the prank always lurks just beneath the surface and is clearly part of the genre’s appeal. During ABC Primetime’s experiments, there always comes the moment when host John Quiñones enters and, with a soothing voice and congenial smile, ends the ruse. These people are actors. You have been part of an experiment. And in that moment, no matter how serious the scenario, there is always the hint of a practical joke revealed, a touch of “Smile, you’re on Candid Camera!”

Sometimes the experiment is overwhelmed by the prank. Last year, Radar Magazine sent a reporter to snort confectioner’s sugar in various New York City locales. The idea was to test anecdotal evidence from a New York Times article that cocaine use was growing more publicly acceptable. (The results: public snorting was actively discouraged at the New York Public Library’s main reading room, but not at a Starbucks or Vanity Fair editor Graydon Carter’s Waverly Inn.) Carter’s own Spy Magazine pulled a classic prank/experiment in the late eighties when it sent checks of dwindling value to moguls in an attempt to determine who was the cheapest millionaire. (Donald Trump reportedly cashed one for just thirteen cents.) Even Borat was, in a sense, an extended experiment in the extremes to which a Kazakh “journalist” could push pliant Americans, and was anticipated by one of Primetime’s “What Would You Do?” episodes in which a taxi driver goes off on racist or homophobic rants, baiting riders either to defy him or join in.

If Medina, the Times reporter, was made uneasy by the whiff of “stunt” in the subway experiment, she is not the only one. Even Weingarten, whose Joshua Bell experiment was a monumental success, looks at the genre slightly askance. Asked whether he plans to conduct similar experiments in the future, he replies: “If I can think of one this good, there’s no reason I’d quail at it. But, you know, you also don’t want to go off and be the stunt writer. I would need to feel as though the next thing I’m doing was of equal sociological importance. And this wasn’t just a lark. We had something we wanted to examine, and it was the nature of the perception of beauty.”

The appeal of the best journalistic experiments, indeed, runs much deeper than their entertainment value. Medina came to see her role in the subway experiment as that of a “street anthropologist or something, which is essentially what [reporters] are supposed to be doing every day.” And Weingarten received over one hundred messages from people who said that his piece on the Bell experiment made them cry. (One testimonial from an online chat Weingarten had with readers: “I cried because I find it scary and depressing to think of how obliviously most people go through daily life, even smart and otherwise attentive people. Who knows what beautiful things I’ve missed by just hurrying along lost in my thoughts?”) In essence, many readers imagined themselves as actors in the story. Weingarten set out to chronicle an experiment; he ended up writing a deeply effective profile of his own readers. “What Would You Do?” asks Primetime—and that, on some level, is the question that all such journalistic experiments ask. Would you walk by the famous violinist? Would you give up your seat on the subway? Would you protect a woman from an abusive boyfriend?

In that quirky, postwar “golden age” of the discipline that informs today’s journalistic experimenters, researchers captured the public imagination with bold, elaborately choreographed experiments that frequently drove subjects to extreme behavior or confronted them with seemingly life-or-death situations.

Stanley Milgram, the designer of the subway-seat experiment, was one of the most creative social psychologists of that era. His infamous obedience experiment, first performed in 1961, in which subjects were instructed to shock a man in a separate room every time he gave an incorrect answer on a memory test, showed that normal people were capable of great cruelty. Sixty-five percent of the subjects went to the maximum—450 volts—despite the test-taker’s cries of pain and pleas to be released due to a heart condition. By the end, the test-taker no longer responded at all, having presumably passed out or died. (In reality, the test-taker was an actor and his protests tape-recorded.) Even more unsettling was Stanford professor Philip Zimbardo’s 1971 prison experiment, in which college students randomly assigned to play the role of guards in a mock prison terrorized those playing inmates. Slated to run for two weeks, it was terminated after six days, during which several “prisoners” came close to nervous breakdown.

Given the dramatic nature of these experiments, it’s little wonder they’ve provided such inspiration to journalists. Bill Wasik, an editor at Harper’s, started the flash mobs trend in 2003 as an homage to Milgram, whom he considers as much performance artist as scientist. Flash mobs were spontaneous gatherings in which participants showed up at a given location for a brief period and did something absurd, such as drop to their knees en masse before a giant Tyrannosaurus Rex at Toys “R” Us. In a piece published in Harper’s, Wasik explained that he saw the mobs as a Milgram-esque test of hipster conformity. Like a hot new indie band, he hypothesized, the mobs would rapidly gain popularity before being discarded as too mainstream and, ultimately, co-opted by marketers, which is more or less what happened.

Wasik argues that the popular resonance of experiments by Milgram and others of the golden age derives from the compelling narratives they created. “It’s like a demonstration whose value is more in the extremes that you can push people to and the extremes of the story that you can get out of what people do or don’t do,” he says. “Milgram could have done an authority experiment in which he got people to do all sorts of strange things that didn’t seem to be simulating the death of the participant.” Many contemporary social psychologists credit researchers from this fertile era with cleverly demonstrating how frequently human behavior defies expectations. But others, such as Joachim Krueger of Brown University, argue that the experiments were designed in ways that guaranteed unflattering results. “You could call it a ‘gotcha psychology,’” he says.

Due in part to the rise of ethical concerns, contemporary social psychologists rarely do experiments that take place outside the laboratory or that involve deception or stressful situations. This has left journalistic experimenters as a sort of lost tribe of devotees of the golden-age social psychologists. Unlike investigative journalism, these experiments have largely flown under the ethical radar. This may be because of the fact that, while some journalistic experiments may be frivolous, they are on balance innocuous. However, as experimenters increasingly tackle sensitive topics, they have begun to draw some heat. In 2006, conservative bloggers accused Dateline of trying to manufacture a racist incident by bringing a group of Arab-looking men to a NASCAR race. And, last November, these same bloggers ripped an experiment by Primetime in which same-sex couples engaged in public displays of affection in Birmingham, Alabama, for attempting to provoke homophobic reactions. (As of press time, the same-sex segment had not yet aired, but according to the Fox affiliate in Birmingham, which broke the story, Birmingham police received several complaints from people disgusted by the sight of two men kissing in public.)

But what of the oft-cited “rule” that journalists should report the news rather than make it? Michael Kinsley, who conducted a 1985 experiment while at The New Republic to determine whether the Washington, D.C., elite actually read the books they act like they have, rejects the premise. “If you’ve got no other way to get a good story,” he says, “and you’re not being dishonest in what you write and publish, what’s wrong with it?” Kinsley’s experiment involved slipping notes deep into fashionable political books at several D.C. bookstores, offering $5 to anyone who called an intern at the magazine. In five months, not a single person claimed the reward.

Journalistic experiments have been criticized far more consistently for their scientific, rather than ethical, shortcomings. Robert Cialdini, an Arizona State University social psychologist, believes strongly in the value of communicating psychological insights via the media, but he has found that journalists don’t always value the same material that he does. For a 1997 Dateline segment on conformity, he conducted an experiment showing that the number of people who donated to a New York City subway musician multiplied eightfold when others donated before them. A fascinating result, but even more fascinating to Cialdini was that people explained their donations by saying that they liked the song, they had some spare change, or they felt sorry for the musician. These explanations did not end up in the finished program. “To me, that was the most interesting thing, the fact that people are susceptible to these social cues but don’t recognize it,” says Cialdini. “I think that’s my bone to pick with journalists—they’re frequently interested in the phenomenon rather than the cause of the phenomenon.”

Others are frustrated by the premium journalists place on appealing to a mass audience. Duncan Watts, a Columbia University sociologist, designed an experiment for Primetime to test Milgram’s small-world theory—commonly known as “six degrees of separation”—that people divided by great social or geographical distance are actually connected by a relatively small number of links. In the experiment, two white Manhattan residents competed to connect with a black boxer from the Bedford-Stuyvesant neighborhood of Brooklyn using the fewest links, then the boxer had to connect with a Broadway dancer. All three connections were made using at most six links. Watts says that after the segment aired in late 2006, he received an e-mail from its producer, Thomas Berman, saying that its ratings had been poor. (An ABC spokeswoman insists that the network was satisfied with the ratings.) “One of the limitations of this model is that it’s crowd-driven, it’s about entertainment,” says Watts. “It’s a bit of a Faustian bargain.”

Another quibble that some social psychologists have with these journalistic experiments is the use of the word “experiment” to describe them in the first place. To a dyed-in-the-wool researcher, an experiment involves comparing a control group with an experimental one, in which a single condition has been varied so that any changes in the outcome can be clearly attributed. Practically no journalistic “experiment” meets this standard, but many golden-age experiments didn’t either, strictly speaking. In addition, practically every journalistic experiment includes a disclaimer that its results are decidedly unscientific.

Wendell Jamieson, city editor at The New York Times who assigned the subway-experiment story, chafes at calling the exercise an “experiment,” pointing out that it was conducted in connection with another article about the original experiment. “It’s just a fun way to take a different approach to a story,” Jamieson says, comparing it to when he was at the New York Daily News and sent a reporter to Yankee Stadium during a subway series dressed in Mets regalia. “It’s tabloid trick two-hundred and fifty-two.” Bill Wasik, the Harper’s editor who started flash mobs, points out that using the word “experiment” is a way for journalists to appropriate the “alpha position” of science, lending their endeavors a sort of added legitimacy. “The piece is wearing a lab coat,” Wasik says of his own article, which repeatedly describes flash mobs as an experiment, “but it’s not entirely scientific by any means.”

Perhaps no media outlet has tried harder to achieve uniformity in conducting its experiments than Reader’s Digest. Detailed instructions for how to conduct its “studies” are distributed to researchers in more than thirty cities around the world to ensure that their results will be comparable. For the courtesy tests, researchers were told how long dropped papers were to be left on the ground, how far to walk behind people entering buildings to see whether they would hold the door, and what sort of demeanor to adopt when speaking with clerks who were being tested to see whether they would say “Thank you.” Nonetheless, despite all the careful planning, New York City’s courtesy title may need to be affixed with an asterisk. Robert Levine, a social psychologist at California State University, Fresno, did a series of helpfulness experiments in the early nineties in which New York City placed dead last out of thirty-six United States cities. While this doesn’t necessarily contradict the Reader’s Digest result, in which New York was the only U.S. city tested among a global selection of cities, Levine points out that all the Reader’s Digest New York tests were carried out at Starbucks, yielding a potentially skewed sample. What if Starbucks employees and customers are simply more courteous than New Yorkers as a whole? “I’m not saying they screwed up,” says Levine, “but that was certainly a flag that was raised for me.”

So maybe journalists can and should be more careful in how they design experiments, but that debate, in many ways, is beside the point. The best examples of the genre are undeniably good journalism, and the lesser lights, for the most part, amount to innocuous entertainment. Indeed, my hope is that some enterprising reporter is even now hatching a plan to find out whether Joshua Bell really would draw such a big crowd outdoors on a sunny day in D.C.

Has America ever needed a media watchdog more than now? Help us by joining CJR today.

Daniel Weiss is a freelance writer based in New York City.

Featured

Journalism is now the second draft of history

By James Harkin

The newspaper that #MeToo missed

By Jennifer Robison

Palestinian citizens of Israel struggle to tell their stories

By Miriam Berger

What a report from Germany teaches us about investigating algorithms

By Nicholas Diakopoulos