Transcript: Rijul Gupta podcast

GRIP spoke to the CEO of DeepMedia AI about the challenges and opportunities presented by deepfake technology.

This is a transcript of the podcast Rijul Gupta of DeepMedia AI on the ethics of deepfake technology between GRIP Senior Reporter Carmen Cracknell and Rijul Gupta, CEO of DeepMedia AI.

[INTRO]

Carmen Cracknell: Thanks very much for joining me, Rijul I’ll just do a quick intro. Rijul Gupta is the CEO of Deep Media AI. So I’d love for you to tell me a bit more about your company and what you do.

Rijul Gupta: Yes, so Deep Media AI aims to be the gold standard ethical synthetic media brand. Essentially, we pioneer synthetic generation and detection at the same time, and we find ethical applications for technology like speech synthesis, voice cloning, facial reanimation, while using the advancements in that space to build unbeatable detection tools.

Carmen Cracknell: And what brought you into this area exactly?

Rijul Gupta: You know, it’s an interesting story. I remember one day back in 2017, I was working as a machine learning engineer at the time. And I came across a deep fake video for the very first time. It was Jordan Peele on the left hand side and Barack Obama on the right hand side. And Jordan Peele’s face and voice and emotions and everything was being transferred to Obama. Absolutely perfectly. I saw that video and was just immediately struck by the immense power that this technology had to transform media and communications across the board. I wanted to make sure that I was involved in the space so we could find ethical applications for that technology, while also simultaneously protecting people against misinformation created with the technology. So we spent the first few years kind of just developing machine learning algorithms, figuring out how the technology worked. I’d hired a couple of software engineers and a designer to wrap up the technology that I was building. Many of the algorithms were proprietary. We could wrap them up into products. But the company really kind of took shape when I started working with my now co-founder, Emma.

Carmen Cracknell: Awesome. And so it’s interesting to hear that you founded the company to to kind of take deepfake in a more ethical direction, because obviously it has received a lot of negative press. How can it be a force for good?

Rijul Gupta: Well, deepfakes and synthetic media are just a technology, right? And like any technology, it’s going to be used in various applications. And the technology itself is neither good nor bad. It’s how we use it. Right? And so if you look at how deepfakes have been traditionally used, it actually parallels a lot of early applications of truly innovative technology. People don’t realize this, but in the early days of the internet or social media or even video camera in general, they were used for things like pornography. They were used for things like fraud and scams. They were used for misinformation tools. But very quickly, a lot of people figured out how to reapply the same technology. And they were able to build massive corporations that made massive impacts on society, everything from film and television production to now social media companies like Twitter and Facebook to internet companies like Google. Right? And so if we’re looking at the synthetic media space, yes, it’s true that the early applications are things like frauds and scams and pornography. But there are ethical applications. And at deep media, we think the largest, most important ethical application is something we call universal translation, which takes people who speak one language like English or French, understands what they’re saying, can clone their voice and face and generate outputs of that person stating the same principles in 50 different languages.

Carmen Cracknell: And the technology is moving so fast. Do you think the ethics can keep up with that?

Rijul Gupta: Yeah, I think that because technology moves so quickly, it’s really important for leaders in the space to define ethics and ethical boundaries. It’s going to be very difficult for governments and society to catch up. They will, but it’s going to take five, 10, maybe a couple of decades, right? Five, 10 years, 20 years, something like that. And in that time, unless the leaders in the space have an ethical backbone, they care about ethics, they care about the applications they’re building. If that doesn’t happen, people are going to get hurt. And unfortunately, we see a lot of large companies, a lot of leaders in this space, not caring about ethics, they release their products that let people clone their voice or clone any voice they want or create text to speech with any voice they want or face swap or lip reanimate any voice they want. Totally free. You can use it. You just have to pay a little bit of money, but then you can upload any president, any news journalist, anyone you want and create massive amounts of misinformation. And that is not okay. And so until we as tech leaders step up and say, that’s not okay, it’s going to be hard. I mean, but the genie is out of the bottle at this point, right? And so it’s up to us to decide where the line is drawn. And I think the line is drawn at the point of misinformation.

Carmen Cracknell: And this is very relevant to the major thing that’s being talked about right now with chat GPT. How has that influenced what you’re doing?

Rijul Gupta: Well, we think of chat GPT kind of like the telegraph. A lot of innovative technologies when they first come out start in the text space, right? So if you think back to early forms of communication over electronics, it started with the telegraph. People were communicating over the wire in ones and zeros, you know, ons and offs. But very quickly, the technology evolved into telephones and radios and televisions. And so chat GPT is truly innovative. It is, and it’s going to transform the way people work, the way people live. But while that’s the telegraph, what we’re building at deep media is like the television. We’re focusing on the audio and video domains and really pioneering synthetic media in that space.

Carmen Cracknell: Are you working directly with ethics and compliance professionals? And do you know anything about how that relates to what you’re doing?

Rijul Gupta: Yeah, so I can’t speak about the details, but we do have very close relationships with people on Capitol Hill, people in the DOD, very established university professionals. And we are doing our best to help educate the lawmakers to help educate the thought leaders about how synthetic media technology works, the rapid rate at which it’s advancing, and steps that we can put in place from the top down and from the bottom up to make sure that people are protected against the unethical use cases.

Carmen Cracknell: And do you see any restrictions emerging on what you’re doing in terms of legal restrictions?

Rijul Gupta: Yeah, so you know, we have been working very hard on this for the past few years. And you might have seen this come out recently about China’s regulation of the deep fake industry on social media. We think that a similar thing is going to happen in the United States and Australia, and particularly even the European Union within the next one to two years, where social media companies will be legally required to probably not remove synthetically manipulated content, but report it to tag that as misinformation and present that misinformation tag to their end user. Additionally, we think that through our efforts, we will help enable some form of regulation around the creation of pornographic deep fakes without people’s consent. And so you might have seen like recent news articles about deep fake porn of, you know, YouTubers, Twitch streamers, as well as some celebrities. And while it is difficult to determine who created that content, it is possible to determine who’s hosting that content. And so the people hosting that content over the next few years will be liable for again, tagging that as deep fake produced and removing it from their platform.

Carmen Cracknell: Do you think this technology is going to be more important for say the people you’re working with, like the military, the Department of Defense or in the private sector, like financial services?

Rijul Gupta: There are different applications and each one is quite important. I think it’s going to be critical to national security as well as for the United States, as well as for our allies, as well as for any country, really, to be able to detect what’s real and what’s not. Synthetic manipulations, deep fakes are getting so good, so fast that we’re already at the place where many people can’t determine whether it’s real or whether it’s fake, right? And they’re getting cheaper, they’re getting faster. And so over the next 12 months, we’re going to see advances in vocal manipulations and face manipulations where anyone, you don’t even have to be a machine learning engineer, anyone will be able to create a fake video with high accuracy, post it online, send it to people over WhatsApp or email and create disturbances in reality, right? And that presents a very pressing threat to people who, in organizations, institutions, who are dependent on knowing truth, right? And many of those institutions are based in government, are based in militaries. And so that is absolutely critical to all of our security as citizens of these nations to make sure that we can protect that misinformation from causing chaos. But it’s equally important, I believe, that people, the average citizen has trust in what they’re seeing, whether it’s online, whether it’s something sent to them by a friend, whether it’s on social media, people need trust in these institutions. Otherwise, society breaks down. And so I don’t think you can say one is more important than the other, but they’re both very critical to making sure the world works the way it has for the past thousand years.

Carmen Cracknell: Yeah, I mean, I already asked you about how you’re working with compliance and legal professionals. Do you think regulators could be doing more? And if so, what should they be doing right now to prevent what you’re saying?

Rijul Gupta: Yeah, regulators are currently doing a lot. There’s not a lot that they can discuss, but a lot is being done, at least talked about currently. With regulation, it takes time. And time is not something that technology is usually beholden to. Right. And so there is that just inherent mismatch between the advancement of this technology, how good it’s getting, how quickly, and the ability for lawmakers, policymakers and regulators to catch up to that. And it’s not for lack of trying, there are serious efforts being made in the space. But just due to the nature of government, there’s going to be that lag. And with that lag will come problems. There will be people who get hurt. And there’s not a lot we can do about it. But I can say that we at Deep Media are doing whatever we can to help make sure policymakers are informed. And they’re working as hard as they can to get laws on the books to protect people as quickly as they can.

Carmen Cracknell: Yeah, that issue of regulation, not being able to keep up with tech is something that comes up in a lot of the conversations that I have. Given that, what are your goals for Deep Media for the next 10 years?

Rijul Gupta: Our goals at Deep Media for the next 10 years are to ensure that synthetic manipulations are advanced down a path that leads to the best possible future for humanity. And to us, that looks like regulation from the top down. I don’t think you’ll often find corporations or startups especially advocating for regulation of their industry. This is not something we’re doing from a business perspective. This is something we’re doing from an ethical perspective. At the end of the day, myself, my co-founder, everyone who works at the company are highly ethical human beings. And we’re people with families. We need to sleep at night. And so we’re doing whatever we can to make sure regulation, meaningful regulation, but also common sense regulation is put in place to make sure that when people do use this technology unethically, there are consequences for that, at least on the books. Now, that being said, people break laws. There are always going to be people who break laws, right? No matter how many laws you put on the books. And so being able to detect when someone has created a synthetically manipulated piece of content is also critical to making sure our mission of ethical synthetic media is achieved. And that looks like advancing the detection side of things at a very rapid pace, at a pace that can keep up or even push beyond what is going on in the generative side. And to that end, we’re working very closely with the United States Department of Defense, various organizations in the DOD, including the Air Force, to pioneer, build new detection, train this detection to very high levels of accuracy, and implement these detectors in ways that are usable by the DOD, by intelligence committees, by people in news organizations, on social media, and even for consumers themselves.

Now, there is a little bit of a challenge there, because the more we advance these detectors, the more unethical users will be able to run their own deepfakes through them and figure out ways to combat them. So we have to make sure that we protect against that. And that’s kind of where the third pillar of our efforts come in, and that is advancing the generative side. So being able to find ethical applications of synthetic media, ethical applications like universal translation, allows us to dedicate time and resources to advancing the technology of voice cloning, vocal synthesis of facial reanimation, face swap, lip reanimation. All of these technologies are integrated in one form or another as part of our universal translator. So the more time and resources we dedicate it to the universal translator, the better that technology gets. And that technology, that generative side feeds right back into our detectors. So we will always be on the forefront of this technology. Being able to pioneer the generative side of synthetic media is critical to making sure that our detectors are always trained on the most cutting edge, high quality deepfakes possible. And that is absolutely necessary to make sure that our detection accuracy and ability is always six to 12 months ahead of anyone else in the generative side of the space. And so, you know, again, the idea of informing and helping policymakers, providing detection tools, and making sure those detection tools are trained on the highest quality cutting edge deepfake detection, to us enables the possibility of 10 years down the line, creating a world where synthetic media is used ethically for the best possible purposes of humanity, and deep media is a leader in that generative AI space.

Carmen Cracknell: Just a more specific question, because we work a lot with financial services. I’m not sure if this is something you deal with, but what’s your view on facial recognition technology and banks use of that?

Rijul Gupta: Yeah, this is an interesting one. So there have been cases over the past few years where people have been able to synthetically alter their face and or voice on live stream chats, contact people’s bankers, their financial institutions, and steal tens of millions of dollars from these institutions. I remember there’s an example posted in Forbes where these hackers cloned the voice of a CEO and stole $35 million. Last year, there was an example of hackers accessing these, I believe it was some head executive at Binance, deepfaking his face on a live stream chat, getting their coin listed, and then using that to defraud millions of people out of millions of dollars, right? As we look towards the future, this stuff is going to become more and more common. That’s one of the things that no matter how many laws you put on the books, that’s not going to stop people from doing it. And that’s why building detection software and integrating it into these financial institutions is completely critical to making sure that people trust these financial institutions and to protect these financial institutions from large scale liability, which is coming in 2023.

Carmen Cracknell: Awesome. Well, those were all my questions. Was there anything else I’ve missed that you wanted to add that’s important?

Rijul Gupta: The fact that a lot of people, there are a lot of people scared, as they should be, but at the end of the day, this technology will be implemented in the ways that we as society decide it will be implemented. And so if we can come together and make sure it’s used ethically, I think that we have a really good chance of building a future that is equitable and fair for all of us.

Carmen Cracknell: Awesome. Well, thank you very much for talking to me, Rizal.

Rijul Gupta: Yeah, thank you. Have a good day.

Carmen Cracknell: You too.

Listen to the audio.