When Roadrunner, a documentary about late TV chef and traveler Anthony Bourdain, opened in theaters last month, its director, Morgan Neville, spiced up promotional interviews with an unconventional disclosure for a documentarian. Some words viewers hear Bourdain speak in the film were faked by artificial intelligence software used to mimic the star’s voice.
Accusations from Bourdain fans that Neville had acted unethically quickly came to dominate coverage of the film. Despite that attention, how much of the fake Bourdain’s voice is in the two-hour movie, and what it said, has been unclear—until now.
In an interview that made his film infamous, Neville told The New Yorker that he had generated three fake Bourdain clips with the permission of his estate, all from words the chef had written or said but that were not available as audio. He revealed only one, an email Bourdain “reads” in the film’s trailer, but boasted that the other two clips would be undetectable. “If you watch the film,” The New Yorker quoted the Oscar-winning Neville saying, “you probably don’t know what the other lines are that were spoken by the AI, and you’re not going to know.”
Audio experts at Pindrop, a startup that helps banks and others fight phone fraud, think they do know. If the company’s analysis is correct, the deepfake Bourdain controversy is rooted in less than 50 seconds of audio in the 118-minute film.
Pindrop’s analysis flagged the email quote disclosed by Neville and also a clip early in the film apparently drawn from an essay Bourdain wrote about Vietnam titled “The Hungry American,” collected in his 2008 book, The Nasty Bits. It also highlighted audio midway through the film in which the chef observes that many chefs and writers have a “relentless instinct to fuck up a good thing.” The same sentences appear in an interview of Bourdain with food site First We Feast on the occasion of his 60th birthday in 2016, two years to the month before he died by suicide.
All three clips sound recognizably like Bourdain. On close listening, though, they appear to bear signatures of synthetic speech, such as odd prosody and fricatives such as “s” and “f” sounds. One Reddit user independently flagged the same three clips as Pindrop, writing that they were easy to hear on watching the film for a second time. The film’s distributor, Focus Features, did not respond to requests for comment; Neville’s production company declined to comment.
When Neville predicted that his use of AI-generated media, sometimes termed deepfakes, would be undetectable, he may have overestimated the sophistication of his own fakery. He likely did not anticipate the controversy or attention his use of the technique would draw from fans and audio experts. When the furor reached the ears of researchers at Pindrop, they saw the perfect test case for software they built to detect audio deepfakes; they set it to work when the movie debuted on streaming services earlier this month. “We’re always looking for ways to test our systems, especially in real real conditions—this was a new way to validate our technology,” says Collin Davis, Pindrop’s chief technology officer.
Pindrop’s results may have resolved the mystery of Neville’s missing deepfakes, but the episode portends future controversies as deepfakes become more sophisticated and accessible for both creative and malicious projects.
0 Comments