Skip to content
Science & Technology

Audio deepfake detective developing new sleuthing techniques

Electrical and computer engineering PhD student You “Neil” Zhang has received a National Institute of Justice fellowship to develop audio deepfake detection systems. (University of Rochester photo / J. Adam Fenster)

A National Institute of Justice fellowship allows a Rochester graduate student to develop novel defenses against deepfake scams.

With artificial intelligence-powered audio generation making it increasingly hard to distinguish between real and fake audio, an electrical and computer engineering PhD student is working to develop tools to protect against scammers. You “Neil” Zhang of the Audio Information Research (AIR) Lab at the University of Rochester received a competitive National Institute of Justice graduate research fellowship to develop new audio deepfake detection systems.

Audio deepfakes have grabbed headlines and caught the attention of cybersecurity experts as scammers increasingly use voice-related AI schemes to target individuals’ bank accounts or spread political disinformation. Zhang expects these attacks to show an increase in scale and complexity.

“Audio deepfake techniques are getting more and more advanced these days,” he says. “People are developing new text-to-speech and voice-conversion techniques that criminals might take advantage of to threaten the security of more of our systems. We need to develop more advanced detection systems to keep up.”

Under the guidance of Zhiyao Duan, an associate professor of electrical and computer engineering and of computer science, Zhang will target the problem from three different angles. First, he intends to create robust algorithms that can spot new and unknown deepfake techniques using a novel training strategy called multicenter one-class learning. The technique would account for different widely used audio recording and compression techniques.

Zhang is also developing watermarking techniques for the audio-generation process that identify the origins of deepfakes. Akin to exploding dye packs that help identify money stolen from banks, these watermarks could be presented as evidence in cases where fraudulent activity took place, according to Zhang.

“Criminals who create these deepfakes typically do not have a very high technology background,” says Zhang. “They are often using open-source code or an API [application programming interface] provided by a company. So, if the developers who create these algorithms can add a watermark to their systems, experts can identify where deepfakes occur.”

Lastly, Zhang aims to create systems that also use visual information when available to spot telltale signs of deepfakes, such as mismatches on the synchronization of audio-visual cues.

Zhang was part of the first cohort of an augmented and virtual reality training program for PhD students at the University of Rochester funded through the National Science Foundation (NSF) Research Traineeship (NRT) program. The program was launched to give doctoral students the skills needed to advance AR/VR technologies and help them gain an appreciation for the broader cultural and societal implications of the technologies.

Beyond the new initiative funded by the NIJ, Zhang and his fellow researchers at the AIR Lab are studying other ways AI voice generation is disrupting established sectors, including the music industry. Through their SingFake project, for example, they are attempting to develop detection systems to identify when deepfake techniques are used to create authentic-sounding cover versions of songs sung by other artists.