The Business Leadership Podcast | Transcript: Transforming Hollywood: Matt Panousis on Democratizing VFX with AI

Transforming Hollywood: Matt Panousis on Democratizing VFX with AI

September 6, 2024 / 22:58/E196

Stella - Edited + Filler
===

Salli: [00:00:00] You're listening to the business leadership podcast with Edwin Frondoso.

Matt: The product where, here to talk about and the product that we just launched after two years of R& D is Lip Dub AI.

Once you have a trained model generating about one minute of new content of us speaking, it takes a few minutes, that's all.

Edwin: Good morning. Good afternoon. And good evening biz leader. Welcome to another episode of the business leadership podcast. I'm your host, Edwin Frondozo. And today we are featuring a special episode from our future narrative mini series. Recorded live at the collision conference in Toronto Canada.

In this mini series, we explore the future of leadership, innovation, and storytelling with visionary leaders. Who are not just designing products, but are creating entire new worlds and markets.

Joining me today is Dr. Paul Newton and [00:01:00] together we'll be speaking with Matt. This Matt is the co-founder and COO of monsters, aliens, robots, and zombies. AKA MARZ. A company that is revolutionizing the visual effects industry (VFX) through artificial intelligence.

In our conversation, Matt will discuss how Mars a trust is the labor intensive and costly nature of traditional VFX aiming to speed up processes and make them more affordable and accessible. We'll explore too of Mars' key products vanity AI. which assist VFX artists and beautification and D aging. And the new lip dub AI, which enhances the accuracy of dub content by seamlessly sinking audio and facial movements.

So without further ado, here we go

We're now speaking with Matt Panusis. He's the co founder, CEO of Monsters, Aliens, Robots, and Zombies, aka [00:02:00] Mars.

How are you doing today?

Matt: I'm doing well, just enjoying collision here.

Edwin: Amazing. I'm super excited. Thank you for taking the time to join with us. I guess jumping right in, Matt, what problem is Mars solving in the entertainment industry?

Matt: Yeah, effectively what we're solving for is the problem that VFX today, as we know it is it's very labor intensive, which means it's very expensive.

It's very slow. We're trying to speed it up. Make it more affordable. And we're using artificial intelligence to do that. Looking at very specific applications. Are you familiar with VFX? ~Or do you have a lot of familiar? No, ~

the reason I say use case is because VFX is somewhat of an umbrella term.

So within VFX. You may be creating a monster. You may be turning the Toronto skyline into the New York skyline. You may be creating a fantastical environment that's never existed. And it's quite a broad umbrella term for anything to do with computer graphics and film and TV. And so you can't really [00:03:00] look at AI as solving for visual effects.

So we solve very specific applications and we try and solve them end to end automation.

Edwin: End to end automation. So how, in terms of ~because it's ~a very specific use case that you're working through, ~how is it like, aside from AI, ~is it just time that you're looking to solve or is it resources or ~just ~making it accessible for everyone within the industry?

Matt: Yeah, I think it really depends on the use case. So our first product was a product called Vanity AI and with Vanity AI. It is a VFX artist tool.

Edwin: Okay.

Matt: The use case is beautification and the de aging of celebrities. Okay.

Edwin: I like that. It's very specific.

Matt: Very specific. It takes so long to build these AI products, especially when you're doing your own research like we do.

Edwin: Okay.

Matt: But you have to solve a specific problem in our view, just because it's, unless you're doing something like. Technically companies working on text to video are solving a swath of different VFX [00:04:00] problems, but those kinds of problems require like insane amounts of data.

Edwin: Yeah.

Matt: At least for a company like us we like to stay more focused. So that vanity product is a VFX artist tool. It saves them about 90 percent of the time that they traditionally would spend doing that work. But the product where, here to talk about and the product that we just launched after two years of R& D is Lip Dub AI.

Also very specific the original thesis behind Lip Dub was we all watched Squid Games and we were all, we all had that same visceral experience that you get when you see a piece of dubbed content where the lips don't match. so we said wouldn't that be a cool problem to solve? We were already focused on the face and so we built a product that allows for Hollywood studios to feed in new audio tracks for their dubs and we recreate the face. So that every market that gets the content, it looks like it was made in that content. So we lip sync to all audio, to any dub track effectively.

Edwin: Yeah, I was just [00:05:00] picturing that everything like, wow, that's interesting.

this is a little too technical and maybe a little bit out of scope when it comes to business leadership and talking about narratives, but I'm just curious. Yeah. Is it, are you recreating just the mouth part?

Matt: No, you can't.

Edwin: You have to do the whole face.

Matt: Yeah, we go from the eyes down.

Edwin: Oh, because I guess everything moves.

Matt: Everything moves and you don't want to, you don't want to send the network contradictory signals.

Edwin: And when you mean network, the audience?

Matt: No, I mean the neural net. Oh, okay. As an example, ~if in the original speech, if we're in the, rather, ~if in the original performance, ~your ~your chin was down because your mouth was open.

Yeah. It would look weird, ~it would look weird if the if you didn't, ~if you didn't go higher on the face, your cheeks would provide a contradictory network that, the network wouldn't know what to do. And so it would compromise. Oh wow. ~And so we had to basically, ~and also the eyes are ~the ~the hardest thing to replicate with AI.

There's a lot of depth in the eyes. What's that old saying? Like the eyes are the

Edwin: The gateway to the soul.

Matt: Gateway to the soul. A lot of times when you see CG, CGI's, they look artificial.

Edwin: [00:06:00] Oh.

Matt: That was more of a side benefit.

But we're really excited about Lip Dub AI because yeah, we had this thesis for Hollywood originally, which was pretty simply put, let's make your dubs look real, make the content much more appealing to international audiences, both North American viewers watching international, but also international viewers watching North American content.

But what's been cool is just seeing all the appetite from like other industries. So in fact, like most of our business today is made up of global advertisers. That are using us to localize advertisements in a way that's cost effective, but also creates like this super culturally relevant, engaging end product online educators and YouTubers, big YouTubers that want to grow the reach of their channels. And even some podcasters.

Edwin: Yeah. We just chatted with half ago, a couple, maybe an hour or two ago, and he's like. Getting a million subscribers every month. Okay. Educate and he's. been doing a lot of the dubbing now to different regions as well and that's he just came to mind as you're saying I'm Like I wonder if [00:07:00] before you said the word youtubers.

I'm like, I wonder if youtubers could benefit from this.

Matt: Yeah, especially because Where we really excel is the quality of our articulation, the fidelity of textures. If I were to recreate your face, I could show every single strand of hair, right? Like that level, but also the ability to do complex content.

So you know, even a scene like this where there's two speakers both of us are speaking from the side. This would be very difficult for any off the shelf model. So this is really where we excel. We can do movement, we can do side pose, we can do objects interfering with the mouth. Like a mic. And YouTube has, and Hollywood obviously has a lot of that, but YouTube has a ton of that. So we always think of YouTube as like Hollywood light. A hundred percent. But it's also a growth market where, you know, candidly, Hollywood's in somewhat of a state of decline. YouTube's in a state of growth.

That's right. Yeah. So we're really excited about both spaces.

Edwin: I'm curious just because of my background when it comes to like software [00:08:00] development and everything when it comes to like Different subjects like myself or Paul like how long does it take your system to learn our language learn our face So is it is there like an onboarding phase right now when you work with new subjects?

Matt: So the way lip dub works is We do have it you can think of it as like a video setup Which is probably the most computationally heavy part of the process, okay? So if it was an interview of me and you that we wanted to, I don't know, go into 10 languages. Yeah. First step is user's going to have to come in label me and you.

So the system knows I'm Matt, you're Edwin. Okay. It'll track our faces perfectly. And then it'll train a model on us. Okay. So we have a, there's a pre trained model, but then we do a little bit of fine tuning on literally this content so that when it recreates your face, it's going to be perfect.

When it recreates my face, it's going to be perfect. It takes about three hours. Okay. But you do not have to do that for every language. So once you have a trained model generating about one minute of new content of us speaking, it takes a few minutes, that's all.

Edwin: That's great. [00:09:00] And I'm wondering in a, with the LipDub product, would my mother know the difference?

Matt: I don't think you, I don't think you would tell. Really? I'll show you some examples after.

Edwin: We'd love to share it for everyone who's listening too.

Matt: Oh my god yeah, I'll absolutely show you some stuff. Yeah,

it's again, we originally, with the view to solve this for Hollywood, it has to look perfect, otherwise Hollywood won't buy it.

Even these other markets that are little, that have slightly lower quality bars like YouTube, still appreciate

I mean at the end of the day it's a visual product and so or the value is in the visual and so To the extent the visuals don't look good. It's very defeatist.

Edwin: ~Yeah. Yeah,~ it makes sense. This is Matt. This is all fascinating stuff. I'm really excited to look at all that type of stuff. I'm curious from your point of view and within your industry whether it's yeah, whether it's in The FX industry or the VTX industry, what other major disruptions are there happening [00:10:00] that you're seeing?

Matt: In VFX in Hollywood, which is where we came from originally, um, there's two different streams of innovation, so there's there's a certain kind of breed of software coming out that Leverages some very powerful models, but works in line with how artists currently work today. So a lot of work in VFX is done with a very node based system. Within a node, you'll have specific discrete functions you want to do, and you don't want to do them on every frame. So you set up a flow chart and then you can run it across everything. And so there's a stream of new software that follows that same logic, but incorporates diffusion models.

And so as an example, it might be like. , one node says take the video of Matt and Edwin. The next node says, mask the table. The next node says remove the table and inpaint. Remove the table. And inpaint would normally be a manual step. Now it's a diffusion step.

So it's like there's a breed of software that, that [00:11:00] particular one is comfy ui.

But Wonder Autodesk, was very similar. Theirs was, how do we very easily bring CG characters into live footage? ~Okay. ~In the same way, their model is broken out into discrete steps, but~ artists, ~the point of this brand of software is artists are still in the loop at every step. With Wonder Dynamics it was take my video isolate the speaker, track the speaker.

Pick a new character, drag it over the speaker, and then we could use it to have you be a robot. Very easily. You wouldn't have to be an expert. So it's like tools that are speeding VFX artists up. But are ~very ~still VFX tools. And then the other swath is and Vanity, by the way, our first product, it's in line with that first branch of software.

So that's Obviously disruptive. It's making artists ~much, ~much faster. The second swath of innovation though, is ~like ~what we call ~like ~end to end automation. ~So like ~our second product, Lip Dub, falls ~more ~into that category. Also, what falls into that category are text to [00:12:00] video models. So Runway, Pika, everybody saw OpenAI Sora recently.

These models are extremely disruptive to the VFX space. They're not leverageable yet, both from a quality ~standpoint ~and ~from a ~control standpoint. Text is a very limiting input. It's very hard to describe exactly what you're looking for in video with text. But images which was the craze last year and it's amazing how quickly we advanced because literally that was groundbreaking and now that's not that interesting anymore.

But it took a very similar path where the first thing ever was text to image And then people started to say, what if I don't want to start with text? What if I want to start with a picture that I already have and I want to manipulate it? Or what if I want to start with a sketch of a monster that I'm conceptualizing and I want the AI to take it from there.

And that's how that space evolved, right? More controls videos now in a similar space that text images were in a year ago, where we're just now getting to a quality bar. That's really [00:13:00] good. But the biggest issues are text is very limiting. There's no temporal consistency across hundreds of shots.

If I wanted to recreate you across thousands of shots, you would look a little different in every shot or the background might look different. ~And then, ~I can't really perfectly control what I get from the models. As an example when somebody's paying us, a VFX firm, ~to do ~to create some world for this opening scene of Game of Thrones, they want perfect control, exactly how the video looks, where ~do ~the cameras are.

Edwin: They know what they want.

Matt: Yeah. So these text to video models are somewhat limiting, but the flip side is, the potential is insane. We're talking about things that used to take armies of VFX artists. Months to do. That, while the control isn't there, the quality needs to get better, can now be done with a keystroke.

Wow. Almost immediately.

Edwin: And all these tools, correct me if I'm wrong, it's really just speeding up and just like these super tools for the VFX industry. At the end of [00:14:00] the day, you still need these artists, these creatives to really like control, not control, lack of a better word. That. Cause I know you were using the word control.

Matt: Yeah. I think it just depends on the audience. Like you could make the argument that a VFX artist is going to be able to optimize a tool like Sora or cling, which just came out or dream machine. Once the controls are there, a VFX artist is going to do the best job with those tools, but then you have other markets like YouTube.

They never had the money for anything remotely close to this level of VFX. They're going to take the out of the box stuff that you can get today, and I think they're going to start to do some amazing stuff Like in YouTube

Edwin: It's outside of the industry right so you don't even know what they're thinking

Matt: You don't know what they're thinking, but they've just never had access to this right like to do visual effects It's always been primarily the real estate of Hollywood because it costs so much tens and sometimes hundreds of millions Now you have things that are pretty much very good that you can accomplish with a keystroke.[00:15:00]

And so that's why advertising is leaning in heavily. I think, VFX and the advertising landscape is going to get disrupted pretty soon because of because of technology like text to video. And I think YouTubers are just going to have so much fun with it because it's like endless possibilities now of what I can do with my YouTube content and I don't need an army of VFX artists to do so

Edwin: yeah, that's cool. I guess this is more on the personal side, but I wanted to ask you, what does a fully democratized world of FX across industries look like?

Matt: I think it looks like what I just described. ~So it's I think, ~I will use the text to video example as well, like I just did, and I'll use the lip dub example.

~So it's ~text to video represents is democratized visual effects. Visual effects being this thing that historically is unbelievably expensive, unbelievably time consuming, only affordable for the biggest Hollywood productions. Now a 13 year old kid who is starting a YouTube channel, who [00:16:00] has a vision, or let's just even go with a 13 year old kid that has this vision for a film that he wants to make. Will it look as good as theatrical? He can make something, no, but he can make something that actually, from a visual standpoint, has the same elements as what you would see in a Hollywood film. If that's ~not the definition of democra that is ~the definition of democratization, right? It's accessibility to something that you've never had accessibility to as well.

~Before, I should say. So ~it's going to cut across industries. Hollywood is going to adopt it once the controls are there. ~And then there's ~other industries ~who ~are going to be so excited about what it gives them today that they're going to start adopting now. Lip Dub is also ~again, we built it ~for Hollywood, but it's cheap and fast, and ~so ~that's why we're excited.

~It's ~YouTubers who don't have insane budgets are now, for the first time, ~they're ~going to be able to localize their channels. ~Which, if you even tried to attempt to lip dub, even with the FX artists, ~even Hollywood couldn't afford to lip sync content in the past, ~because think about it.~~ So much.~

It's across a 120 minute film. Every character, every scene has to be lip synced. In every language, you're talking about [00:17:00] hundreds of millions of dollars. That's why they never attempted it. That's why even at Hollywood, we still put up with bad dubs that aren't synced. So even Hollywood couldn't afford it.

Let alone the kid in his basement at YouTube. Now it's accessible to not only Hollywood, but it's going to be accessible to YouTubers. It's going to be accessible to the guy that has a really cool course that he wants to put out into different languages ~that sells courses online. And it's a lot of ~ these tools are ~just, again, they're ~super fast, ~they're ~super affordable, which means they're accessible. ~And like, ~all these industries are gonna be picking them up as a result.

Edwin: That's amazing, Matt. Thank you for sharing. What is your, what's the vision that you're building for the future?

Matt: Right now, ~we're really, again, as you can probably tell, ~we're really excited about LipTab.

We're really excited about the idea of helping to connect the global community by removing the language barriers that exist in media. I Namely video. We're seeing like both Hollywood and YouTube to your point around that YouTuber that was starting to dub. He's not the first one. Mr. Beast was the first, and we're now working with him to do the rest of his stuff, lip syncing it, ~but it's like ~[00:18:00] Hollywood ~with ~shows like squid game ~and ~money heist and dark have proven how much people don't care where content's made a good story is a good story. But there's barriers that exist both audio and visual barriers. So we think by lifting those barriers, there's going to be an unbelievable level of connection in the global ~of ~community that ~just ~has never existed. Another example would be ~again ~in the YouTube space, ~like we're doing ~we're doing K pop folks right now ~that ~in the past, if I were to watch their videos with captions, I would feel like I somewhat got to know them but not in the same way as somebody that spoke my language.

And when I watch a lipped up video of a K pop star. I just feel like ~I, ~I know that person for the first time. ~And it's ~we all live on this small planet, but there's these ~just ~very real barriers that prevent us from connecting. And I don't know, for us, it's like a really meaningful purpose for this, at least it's like a product vision that we were going to focus on for quite some time.

And then over time, we're just going to continue to work on more more products that again, help democratize different elements [00:19:00] of visual effects, but. Certainly for now this one's like taking up a lot of our time.

Edwin: Yeah that's great. Any final thoughts for the business leader, entrepreneur, founders that are listening today?

Matt: Just generally?

Edwin: Advice, recommendations,

Matt: like entrepreneurs

Edwin: or founders or executives even.

Matt: Yeah, I don't know. It's a hard one. I don't know. F make focus on the pro. Like I'll just say, I'll just say

Edwin: it could be navigating like the disruptions of AI and how it's doing that. What advice, because you're doing that right now.

Matt: What I'd say is ~like the, ~there's kind of two ways you can approach it. You

can look at ~the, obviously like ~the needs of your users ~and we ~

~certainly do that and ~ask yourself what needs are being unmet, what problem isn't being solved? But I also find, at least the way we did it ~was.~

~We asked, we as an example in VFX, ~we asked studios what do you wish you had that you weren't getting? ~And that's where ~we learned that ~they just wanted, ~they loved the quality. They just wanted it faster and cheaper. ~But ~the other thing we did is ~look at the technology ~look at the trends in the underlying tech, and then ask yourself the question, if the tech continues on this trajectory, what will it be able to do?

And if it can do those things, what will that mean [00:20:00] for my space? Because I find sometimes people assume that the tech ~is ~either doesn't work or it's not going to continue to accelerate. ~And I think it's, ~I think there's enough proof points at this point to say you should probably take the other stance, which is assume the tech gets incredibly good, incredibly fast.

If it does, what will it mean for my business? And if no one. I don't know, as an example, there was so many chatbot companies that I don't think asked that question around like, how good will LLMs get and how quickly, and if they had asked themselves that question, they might know that ~like ~their ~actual ~core business is about to be disrupted.~ Or even in our space ~in visual effects, when we started in 2019, the concept that AI would be able to generate something that even looked remotely as good as what a VFX artist could produce with a keystroke. Was laughable and people laughed at us for pursuing the space, but we took the stance of saying, let's just assume that it's true.

Let's ~just ~assume that AI is going to get really good at graphics ~really ~quickly. What's that going to mean for our space? It would mean disruption of the core VFX model. So instead [00:21:00] of being part of that core VFX model, let's be part of that innovation. ~And so it's ~look at the tech trends because the tech trends will give you The combo of asking your users what they need and looking at ~just ~tech trends that you think are relevant, you're probably going to come up with a product in the matrix of those two questions.

Edwin: That's great advice. I love that. I love that advice, man. It's been an absolute pleasure. Thank you for joining us.

Matt: Thanks so much, Edwin.

Edwin: That's it biz leader. Thank you for joining me on this special episode of the business leadership podcast, part of our future narrator mini series, which was recorded live. At the collision conference in Toronto, Canada.

This was for me an amazing conversation with Matt Panosis. just exploring how MARZ is transforming the VFX industry.

For links to all the resources we discussed to connect with Matt and to learn more about the future narrative project, please do swipe [00:22:00] into the show notes within the app that you're listening to now.

And if you are interested in reading, that's right, reading more about Matt and other business leaders that we profiled, please do join our waitlist for upcoming book.

And by the way, if you found any value in this episode, Please subscribe, rate and share it. With the very first person that comes to mind that could benefit and more importantly, be grateful from hearing from you. Your support helps us grow. And brings you more great content.

Thanks again for tuning in and being part of our community. Until next time have a 100 X day.

Creators and Guests

Host

Dr. Paul Newton

Host

Edwin J. Frondozo

Host & Producer of The Business Leadership Podcast

Producer

Amanda Lee Annis

Guest

Matt Panusis

Transforming Hollywood: Matt Panousis on Democratizing VFX with AI

Broadcast by

Creators and Guests

headphones Listen Anywhere

Listen Anywhere