Making a ‘deepfake’: How creating our own synthetic video helped us learn to spot one

Hazel Baker is global head of user-generated content newsgathering at Reuters news agency.

Those of us working in the field of verification know there is no shortage of ways video can be distorted.

Every day, the specialist team of social media producers I lead at Reuters encounters video that has been stripped of context, mislabelled, edited, staged or even modified using CGI.

It’s a challenge to verify this content, but it’s one we are committed to undertaking, because we know that very often the first imagery from major news events is that captured by eyewitnesses on their smartphones – it’s now an integral part of storytelling.

As technology advances, so does the range of ways material can be manipulated. One new threat on the horizon is the rise of so-called “deepfakes” – a startlingly realistic new breed of AI-powered altered videos.

These “deepfakes” (a portmanteau of “deep learning” and “fake”) are created by taking source video, identifying patterns of movement within a subject’s face, and using AI to recreate those movements within the context of a target piece of video.

As such, effectively a video can be created of any individual, saying anything, in any setting.

It’s not hard to see how the possibilities arising from this are serious – both for the individuals concerned, and for the populations targeted by a fake video. Therefore, the potential verification challenges posed by this new type of fake video were something I have to take seriously.

New challenges

When approaching third-party content, I like to think in terms of worst-case scenarios: how could this video, or the context it has appeared in, have been manipulated?

I also know through my experience of training journalists in verification techniques that delving into real-life examples is the best way to learn.

But when I looked into the world of deepfakes, I found very few news-relevant examples to study.

That’s why I decided to join forces with Nick Cohen, global head of video product at Reuters, to create our own piece of synthetic video as part of a newsroom experiment.

The aim of this was not to trick our colleagues, but rather to consider how our verification workflows might be tested by it.

Learning to create a fake

Through our research into the topic, it was clear we would need some technical expertise. This came in the form of a London-based startup that uses generative AI to create “reanimated video”, allowing for seamless dubbing into any language.

Previous deepfakes created by academic teams have involved public figures like Barack Obama and Angela Merkel. But I know every time such figures speak there is very often rows of cameras on them, plus a team of aides. For me the real risk came when there was only one camera in a room, and few witnesses to back up what was actually said.

We picked a situation in which this could be the case: an interviewee speaking “down-the-line” to a reporter from a remote studio. It’s a common set-up for news broadcasters.

A colleague of ours agreed to help us out, and we filmed her giving a convincing performance outlining her imaginary company’s upcoming expansion plans.

Next, we enlisted the help of our French colleague, who repeated the process in her native tongue. We gave both sets of video files to the technology start-up, and it took them just a few hours to map the French mouth movements and use generative AI to recreate the whole lower portion of our English-speaking colleague’s face.

We may have literally put words in her mouth, but our English colleague was left speechless when she watched herself talking with the voice of her French counterpart.

In our example we had the French script match the English. But clearly even a small change, such as a few extra billion here or there, could matter a great deal in a real-life scenario.

Red flags and instinct

Creating this video was an education in itself: we learnt what source material was required to give the most convincing results. It also gave us a piece of synthetic video that we could start showing to colleagues.

Normally when we attempt to verify video, we don’t look at a clip in isolation. We consider where, why and how it was shared. We trace it back to the source and we ask endless questions.

However, I did not wish to plant this content on social media, so I simply sent the video via a chat app to my colleagues, so they could view it on a mobile screen, and I asked for their reaction.

Some of my colleagues knew I was working on this project, and they were quick to identify the red flags. Here’s what they noted:

Audio to video synchronisation: our newsroom tackles sync issues all the time, but in this case the video appears to slip in and out of sync with the audio, which is extremely unusual.
Mouth shape and sibilant sound issues: native French speakers were particularly attuned to this: the shape of the mouth and lips is spot on in parts, but less convincing in others. Sibilant sounds (s and sh) are especially problematic.
Static subject: our colleagues spotted that the speaker was oddly robotic. They were correct: we’d asked her to be, because a still speaker, with little facial expression, gives the best results.

Interestingly, those of my colleagues who had no idea that we had created this video reacted very differently. They felt a sense of unease, and some commented on sound issues, but they found it hard to put their finger on precisely what was wrong.

This was an important learning point and for me it’s why our modest experiment was so useful.

It reminded us that we need to listen carefully to our instincts. But it also demonstrated that pre-exposure to a problem allows us to critically assess problems more effectively.

As such, I hope that even this brief exposure to piece of synthetic video has helped our journalists approach third-party video material armed with additional awareness.

Of course, this is not enough on its own.

Solutions

We also need to consider what technical solutions we can employ to help identify synthetic video, and this is something we are exploring.

Moreover, we are acutely aware that this technology is continually evolving and the red flags we identified may not be there in later iterations. We must track how these programmes develop.

Finally, we need to keep training our journalists to recognise that deepfake video simply sits at one, albeit extreme, end of a scale in which video can be distorted.

For each piece of content we need to apply a framework of verification, forensically examining the material itself, the source and the context in which it has been shared.

This methodological process – along with our editorial judgment and ability to consult Reuters’ subject experts globally, will continue to drive our approach to video verification, whatever the potential level of manipulation.

Picture: Reuters