Deep Fakes Technology, science, Legality and History: A research Review

Jul 19, 2021 - 19:49
Aug 11, 2021 - 08:33
Deep Fakes Technology, science, Legality and History: A research Review

Deepfakes (a portmanteau of "deep learning" and "fake") are synthetic media in which a person in an existing image or video is replaced with someone else's likeness. While the act of faking content is not new, deep fakes leverage powerful techniques from machine learning and artificial intelligence to manipulate or generate visual and audio content with a high potential to deceive.

The main machine learning methods used to create deep fakes are based on deep learning and involve training generative neural network architectures, such as autoencoders or generative adversarial networks (GANs). Deepfakes have garnered widespread attention for their uses in celebrity pornographic videos, revenge porn, fake news, hoaxes, and financial fraud. This has elicited responses from both industry and government to detect and limit their use.

When Was The Photo Manipulation Started?

Photo manipulation was developed in the 19th century and was soon applied to motion pictures. Technology steadily improved during the 20th century, and more quickly with digital video.

Deepfake technology has been developed by researchers at academic institutions beginning in the 1990s, and later by amateurs in online communities. More recently the methods have been adopted by industry.

Academic research

Academic research related to deepfakes lies predominantly within the field of computer vision, a subfield of computer science. An early landmark project was the Video Rewrite program, published in 1997, which modified existing video footage of a person speaking to depict that person mouthing the words contained in a different audio track. It was the first system to fully automate this kind of facial reanimation, and it did so using machine learning techniques to make connections between the sounds produced by a video's subject and the shape of the subject's face.

Contemporary academic projects have focused on creating more realistic videos and on improving techniques. The "Synthesizing Obama" program, published in 2017, modifies video footage of former President Barack Obama to depict him mouthing the words contained in a separate audio track. The project lists as the main research contribution its photorealistic technique for synthesizing mouth shapes from audio.

The Face2Face program, published in 2016, modifies video footage of a person's face to depict them mimicking the facial expressions of another person in real-time. The project lists as the main research contribution the first method for re-enacting facial expressions in real-time using a camera that does not capture depth, making it possible for the technique to be performed using common consumer cameras.

In August 2018, researchers at the University of California, Berkeley published a paper introducing a fake dancing app that can create the impression of masterful dancing ability using AI. This project expands the application of deepfakes to the entire body; previous works focused on the head or parts of the face.

Researchers have also shown that deepfakes are expanding into other domains such as tampering with medical imagery. In this work, it was shown how an attacker can automatically inject or remove lung cancer in a patient's 3D CT scan. The result was so convincing that it fooled three radiologists and a state-of-the-art lung cancer detection AI. To demonstrate the threat, the authors successfully performed the attack on a hospital in a White hat penetration test.

A survey of deepfakes, published in May 2020, provides a timeline of how the creation and detection deepfakes have advanced over the last few years. The survey identifies that researchers have been focusing on resolving the following challenges of deep fake creation:

Generalization. High-quality deepfakes are often achieved by training on hours of footage of the target. This challenge is to minimize the amount of training data required to produce quality images and to enable the execution of trained models on new identities (unseen during training).

Paired Training. Training a supervised model can produce high-quality results, but requires data pairing. This is the process of finding examples of inputs and their desired outputs for the model to learn from.

Data pairing is laborious and impractical when training on multiple identities and facial behaviours. Some solutions include self-supervised training (using frames from the same video), the use of unpaired networks such as Cycle-GAN, or the manipulation of network embeddings.

Identity leakage. This is where the identity of the driver (i.e., the actor controlling the face in a reenactment) is partially transferred to the generated face. Some solutions proposed include attention mechanisms, few-shot learning, disentanglement, boundary conversions, and skip connections.

Occlusions. When part of the face is obstructed with a hand, hair, glasses, or any other item then artefacts can occur. A common occlusion is a closed mouth that hides the inside of the mouth and the teeth. Some solutions include image segmentation during training and in-painting.

Temporal coherence. In videos containing deepfakes, artefacts such as flickering and jitter can occur because the network has no context of the preceding frames. Some researchers provide this context or use novel temporal coherence losses to help improve realism. As the technology improves, the interference is diminishing. Overall, deepfakes are expected to have several implications in media and society, media production, media representations, media audiences, gender, law, and regulation, and politics.

Amateur development The term deepfakes originated around the end of 2017 from a Reddit user named "deepfakes". He, as well as others in the Reddit community r/deepfakes, shared deepfakes they created; many videos involved celebrities’ faces swapped onto the bodies of actresses in pornographic videos, while non-pornographic content included many videos with actor Nicolas Cage’s face swapped into various movies.

Other online communities remain, including Reddit communities that do not share pornography, such as r/SFW deep fakes (short for "safe for work deepfakes"), in which community members share deep fakes depicting celebrities, politicians, and others in non-pornographic scenarios. Other online communities continue to share pornography on platforms that have not banned deep fake pornography.

Commercial development

In January 2018, a proprietary desktop application called FakeApp was launched. This app allows users to easily create and share videos with their faces swapped with each other. As of 2019, FakeApp has been superseded by open-source alternatives such as Faceswap, command line-based DeepFaceLab, and web-based apps such as De*****sW**.com

Larger companies are also starting to use deep fakes. The mobile app giant Momo created the application Zao which allows users to superimpose their face on television and movie clips with a single picture. The Japanese AI company DataGrid made a full-body deep fake that can create a person from scratch. They intend to use these for fashion and apparel.

Audio deep fakes, and AI software capable of detecting deep fakes and cloning human voices after 5 seconds of listening time also exist. A mobile deep fake app, Impressions, was launched in March 2020. It was the first app for the creation of celebrity deep fake videos from mobile phones.


Deepfakes technology can not only be used to fabricate messages and actions of others but it can also be used to revive deceased individuals. On 29 October 2020, Kim Kardashian posted a video of her late father Robert Kardashian; the face in the video of Robert Kardashian was created with deep fake technology. This hologram was created by the company Kaleida, where they use a combination of performance, motion tracking, SFX, VFX and DeepFake technologies in their hologram creation.

There was also an instance where Joaquin Oliver, the victim of the Parkland shooting was resurrected with deep fake technology. Oliver's parents teamed up on behalf of their organization Nonprofit Change the Ref, with McCann Health to produce this deep fake video advocating for the gun-safety voting campaign. This deep fake message, shows Joaquin encouraging viewers to vote.

Techniques of Deep Fake Technology

Deepfakes rely on a type of neural network called an autoencoder. These consist of an encoder, which reduces an image to a lower-dimensional latent space, and a decoder, which reconstructs the image from the latent representation. Deepfakes utilize this architecture by having a universal encoder that encodes a person into the latent space.

The latent representation contains key features about their facial features and body posture. This can then be decoded with a model trained specifically for the target. This means the target's detailed information will be superimposed on the underlying facial and body features of the original video, represented in the latent space. A popular upgrade to this architecture attaches a generative adversarial network to the decoder.

A GAN trains a generator, in this case, the decoder, and a discriminator in an adversarial relationship. The generator creates new images from the latent representation of the source material, while the discriminator attempts to determine whether or not the image is generated.

This causes the generator to create images that mimic reality extremely well as any defects would be caught by the discriminator. Both algorithms improve constantly in a zero-sum game. This makes deep fakes difficult to combat as they are constantly evolving; any time a defect is determined, it can be corrected.

Applications Of Deep Fakes

Education: Deepfake technology facilitates numerous possibilities in the education domain. Schools and teachers have been using media, audio, video in the classroom for quite some time. Deepfakes can help an educator to deliver innovative lessons that are far more engaging than traditional visual and media formats.

AI-Generated synthetic media can bring historical figures back to life for a more engaging and interactive classroom. A synthetic video of reenactments or voice and video of a historical figure may have more impact, engagement, and will be a better learning tool.

For example, JFK’s resolution to end the cold was speech, which was never delivered, was recreated using synthetic voice with his voice and speaking style will clearly get students to learn about the issue in a creative way.

Synthetic human anatomy, sophisticated industrial machinery, and complex industrial projects can be modelled and simulated in a mixed reality world to teach students and collaborate using Microsoft Hololens. Creative use of synthetic voice and video can increase overall success and learning outcomes with scale and limited cost.


For many decades, Hollywood has used high-end CGI, VFX, and SFX technologies to create artificial but believable worlds for compelling storytelling. In the 1994’s movie, Forest Gump, the protagonist meets JFK and other historical figures. The creation of the scenario and effect was accomplished using CGI and different techniques with millions of dollars. These days sophisticated CGI and VFX technologies are used in movies to generate synthetic media for telling a captivating story. Deepfakes can democratize the costly VFX technology as a powerful tool for independent storytellers at a fraction of the cost.

Deepfakes give a brilliant chance to decidedly affect our lives. Simulated intelligence Generated manufactured media can be very engaging and an incredible empowering agent. Deepfakes can give individuals a voice, reason, and capacity to affect scale and speed. Novel thoughts and capacities for strengthening have risen up out of varying backgrounds, from craftsmanship, articulation, and public security to openness and business. Deepfakes can make opportunities for all individuals regardless of their restrictions by increasing their organization 

Nonetheless, as admittance to manufactured media innovation increments, so does the danger of double-dealing. Deepfakes can be utilized to harm notorieties, manufacture proof, cheat people in general, and sabotage trust in just organizations.

What's Your Reaction?