Brief overview of the process:
- I first captured a few screenshots from YouTube of the people I turned into an animated character.
- I uploaded them to ChatGPT’s image tool, Sora.
- Using the following prompt, I generated an image:
“3D Pixar-style animation, closed mouth smile, open eyes looking directly into the camera.”
- I recorded a audio clip of the persons voice from YouTube using Downie to extract the MP3. You can upload audio samples of up to 60 seconds in length.
- The generated image was uploaded to Lemon Slice (formerly Infinity AI), a video foundation model that enables expressive, talking characters.
- I also uploaded the audio to Lemon Slice so the system could clone their voice.
- For the script, I use Claude 3.7 Sonnet. Here’s an example prompt:
“Based on her LinkedIn profile (attached), please create a short (600-character) script that Liberty White can use to promote herself. I will be using the Pixar-style animations I created for Liberty to read this script. Please make sure it’s engaging.”
Your written script can be up to 3,600 characters in length.
- Finally, I used that script in Lemon Slice to generate lip-synced video using the “Expressive” setting under Model V 2.5.
Alex LindsayLiberty White
I’ve included this video as an example of what can occasionally go wrong when using the “Expressive” setting in Lemon Slice.It may introduce unexpected visual artefacts, as you’ll notice here.
If that happens, I recommend either regenerating the video or switching to the “Medium” setting for a cleaner, more reliable result.
Tony Mobley
Try to capture an expressive voice segment, I find that they work best when making a clone of the persons voice.