Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

My tests show the character has a mask on his face #16

Open
zhanghongyong123456 opened this issue Dec 8, 2023 · 16 comments
Open

My tests show the character has a mask on his face #16

zhanghongyong123456 opened this issue Dec 8, 2023 · 16 comments

Comments

@zhanghongyong123456
Copy link

  1. result:
    image

  2. src:
    image
    What causes this and how can you get rid of it

@Elsaam2y
Copy link
Owner

This is probably caused from the color distribution in your video. The face mask is defined correctly but the merged lip-sync output is simply showing no details for the lips since the model wasn't trained on similar skin colors, like in this video. Would be better if you try changing the skin color in this video.

@zhanghongyong123456
Copy link
Author

This is probably caused from the color distribution in your video. The face mask is defined correctly but the merged lip-sync output is simply showing no details for the lips since the model wasn't trained on similar skin colors, like in this video. Would be better if you try changing the skin color in this video.

Is there any other good driving method

@Inferencer
Copy link

Inferencer commented Dec 12, 2023

This is probably caused from the color distribution in your video. The face mask is defined correctly but the merged lip-sync output is simply showing no details for the lips since the model wasn't trained on similar skin colors, like in this video. Would be better if you try changing the skin color in this video.

Is there any other good driving method

if you wanted to animate a single image as it's shown above then D-ID would be your best bet, you can get free credits on every email you sign up with, pretty sure they allow facial hair and skin discoloration, other than that there is wav2lip + upscaling but results would be poor, the rest of the tools out there are trained on real people without facial hair and normal skin so you would get similar results as the above if an error didn't appear, possible sadtalker would work too & that's opensource, and if you needed to animate it you could look into putting the results onto the original video with masking in a video editor although pretty sure sadtalker can copy the head movements

@zhanghongyong123456
Copy link
Author

D-ID

  1. Thank you very much for your reply. Now I know that it is because of hair and skin color. I always thought that the face was too fuzzy to be detected. I wonder if D-ID is based on sadtalker training, I find that D-ID and sadtalker get similar results, some head movement is allowed, the rest is forbidden, or does D-ID just use sadtalker's way of processing video, But D-ID gets better results than Sadtalker
  2. I found that this project seems to be working very well, have you paid attention to it [https://stylelipsync.github.io/], I could get the Predicted mesh but not the subsequent mouth mask,Can you give some advice on how to get the back mouth mask
    image

@Inferencer
Copy link

D-ID

1. Thank you very much for your reply. Now I know that it is because of hair and skin color. I always thought that the face was too fuzzy to be detected. I wonder if D-ID is based on sadtalker training, I find that D-ID and sadtalker get similar results, some head movement is allowed, the rest is forbidden, or does D-ID just use sadtalker's way of processing video, But D-ID gets better results than Sadtalker

2. I found that this project seems to be working very well, have you paid attention to it [https://stylelipsync.github.io/],     I could get the Predicted mesh but not the subsequent mouth mask,Can you give some advice on how to get the back mouth mask
   ![image](https://private-user-images.githubusercontent.com/48466610/290057095-28723965-ee1c-4ed5-8d75-11cb959a45e6.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0ODYwNzgsIm5iZiI6MTcwMjQ4NTc3OCwicGF0aCI6Ii80ODQ2NjYxMC8yOTAwNTcwOTUtMjg3MjM5NjUtZWUxYy00ZWQ1LThkNzUtMTFjYjk1OWE0NWU2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEzVDE2NDI1OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQyZTkwNjkyNDdiMzZiYjY3MmRkMDA1NmYyMWI4NDUyZGM5OWE0Yjg1MDBkNzQxM2M1YjIxNDQ0NzdhOWUwMWEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.suS330aUn-7gsD1SqPJFoTeGesMyX-2otSNsc486dSM)

I also downloaded that repo it's shame they deleted their code and didn't release their training scripts no point looking more into it if they are not continuing (i'm assuming they plan to keep it closed now) and yeah d-id is based off of sadtalker they just did some magic to it

@davidmartinrius
Copy link

The other magic is https://github.com/thygate/stable-diffusion-webui-depthmap-script

You can use depthmaps to create videos from images. (Or the new stable video diffusion SVD)

Depthmaps + sadtalker and you get an awesome talking faces with animated videos.

Actually you can do everything with automatic1111 and centralized in just one app.

You are welcome.

David Martin Rius

@davidmartinrius
Copy link

D-ID

1. Thank you very much for your reply. Now I know that it is because of hair and skin color. I always thought that the face was too fuzzy to be detected. I wonder if D-ID is based on sadtalker training, I find that D-ID and sadtalker get similar results, some head movement is allowed, the rest is forbidden, or does D-ID just use sadtalker's way of processing video, But D-ID gets better results than Sadtalker

2. I found that this project seems to be working very well, have you paid attention to it [https://stylelipsync.github.io/],     I could get the Predicted mesh but not the subsequent mouth mask,Can you give some advice on how to get the back mouth mask
   ![image](https://private-user-images.githubusercontent.com/48466610/290057095-28723965-ee1c-4ed5-8d75-11cb959a45e6.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0ODYwNzgsIm5iZiI6MTcwMjQ4NTc3OCwicGF0aCI6Ii80ODQ2NjYxMC8yOTAwNTcwOTUtMjg3MjM5NjUtZWUxYy00ZWQ1LThkNzUtMTFjYjk1OWE0NWU2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEzVDE2NDI1OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQyZTkwNjkyNDdiMzZiYjY3MmRkMDA1NmYyMWI4NDUyZGM5OWE0Yjg1MDBkNzQxM2M1YjIxNDQ0NzdhOWUwMWEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.suS330aUn-7gsD1SqPJFoTeGesMyX-2otSNsc486dSM)

I also downloaded that repo it's shame they deleted their code and didn't release their training scripts no point looking more into it if they are not continuing (i'm assuming they plan to keep it closed now) and yeah d-id is based off of sadtalker they just did some magic to it

By the way, I think the code was never released. There was just a readme, but finally deleted the repo

@Inferencer
Copy link

D-ID

1. Thank you very much for your reply. Now I know that it is because of hair and skin color. I always thought that the face was too fuzzy to be detected. I wonder if D-ID is based on sadtalker training, I find that D-ID and sadtalker get similar results, some head movement is allowed, the rest is forbidden, or does D-ID just use sadtalker's way of processing video, But D-ID gets better results than Sadtalker

2. I found that this project seems to be working very well, have you paid attention to it [https://stylelipsync.github.io/],     I could get the Predicted mesh but not the subsequent mouth mask,Can you give some advice on how to get the back mouth mask
   ![image](https://private-user-images.githubusercontent.com/48466610/290057095-28723965-ee1c-4ed5-8d75-11cb959a45e6.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0ODYwNzgsIm5iZiI6MTcwMjQ4NTc3OCwicGF0aCI6Ii80ODQ2NjYxMC8yOTAwNTcwOTUtMjg3MjM5NjUtZWUxYy00ZWQ1LThkNzUtMTFjYjk1OWE0NWU2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEzVDE2NDI1OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQyZTkwNjkyNDdiMzZiYjY3MmRkMDA1NmYyMWI4NDUyZGM5OWE0Yjg1MDBkNzQxM2M1YjIxNDQ0NzdhOWUwMWEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.suS330aUn-7gsD1SqPJFoTeGesMyX-2otSNsc486dSM)

I also downloaded that repo it's shame they deleted their code and didn't release their training scripts no point looking more into it if they are not continuing (i'm assuming they plan to keep it closed now) and yeah d-id is based off of sadtalker they just did some magic to it

By the way, I think the code was never released. There was just a readme, but finally deleted the repo

yeah code was released (not training code) I downloaded the pretrained model + code then they disappeared 2 days later. unfortunately the model is person specific/ not generalized.
I've seen a couple of repo's do that now especially when they decide to switch to a b2c/ paid model or have concerns about the use of their work in the wild so now I grab them regardless if it's complete.

@zhanghongyong123456
Copy link
Author

D-ID

1. Thank you very much for your reply. Now I know that it is because of hair and skin color. I always thought that the face was too fuzzy to be detected. I wonder if D-ID is based on sadtalker training, I find that D-ID and sadtalker get similar results, some head movement is allowed, the rest is forbidden, or does D-ID just use sadtalker's way of processing video, But D-ID gets better results than Sadtalker

2. I found that this project seems to be working very well, have you paid attention to it [https://stylelipsync.github.io/],     I could get the Predicted mesh but not the subsequent mouth mask,Can you give some advice on how to get the back mouth mask
   ![image](https://private-user-images.githubusercontent.com/48466610/290057095-28723965-ee1c-4ed5-8d75-11cb959a45e6.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0ODYwNzgsIm5iZiI6MTcwMjQ4NTc3OCwicGF0aCI6Ii80ODQ2NjYxMC8yOTAwNTcwOTUtMjg3MjM5NjUtZWUxYy00ZWQ1LThkNzUtMTFjYjk1OWE0NWU2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEzVDE2NDI1OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQyZTkwNjkyNDdiMzZiYjY3MmRkMDA1NmYyMWI4NDUyZGM5OWE0Yjg1MDBkNzQxM2M1YjIxNDQ0NzdhOWUwMWEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.suS330aUn-7gsD1SqPJFoTeGesMyX-2otSNsc486dSM)

I also downloaded that repo it's shame they deleted their code and didn't release their training scripts no point looking more into it if they are not continuing (i'm assuming they plan to keep it closed now) and yeah d-id is based off of sadtalker they just did some magic to it

By the way, I think the code was never released. There was just a readme, but finally deleted the repo

yeah code was released (not training code) I downloaded the pretrained model + code then they disappeared 2 days later. unfortunately the model is person specific/ not generalized. I've seen a couple of repo's do that now especially when they decide to switch to a b2c/ paid model or have concerns about the use of their work in the wild so now I grab them regardless if it's complete.

Thank you very much for your reply, I found this project, the author said the effect is too good, did not publish the inference model
https://hangz-nju-cuhk.github.io/projects/StyleSync

And the latest project, GAIA: Zero-shot Talking Avatar Generation Unfortunately, the project homepage is also down
image

@zhanghongyong123456
Copy link
Author

The other magic is https://github.com/thygate/stable-diffusion-webui-depthmap-script

You can use depthmaps to create videos from images. (Or the new stable video diffusion SVD)

Depthmaps + sadtalker and you get an awesome talking faces with animated videos.

Actually you can do everything with automatic1111 and centralized in just one app.

You are welcome.

David Martin Rius

  1. I want to achieve voice driven mouth speech, how to combine depth and sadtalker, I just briefly tested sadtalker, 2. About SVD, just move the picture, not just let the mouth move

@flipkast
Copy link

Hi @Inferencer you said that you downloaded the pretrained model and code of stylelipsync.. can you share it privately? we can figure out how we can train new models etc.. I run a service related to lipsync and we have a team that will help us on this.

@Inferencer
Copy link

Hi @Inferencer you said that you downloaded the pretrained model and code of stylelipsync.. can you share it privately? we can figure out how we can train new models etc.. I run a service related to lipsync and we have a team that will help us on this.

https://drive.google.com/drive/folders/1W9RAyqu2hwrieaWtGG19GmSjkkhreyIA?usp=sharing

@Inferencer
Copy link

D-ID

1. Thank you very much for your reply. Now I know that it is because of hair and skin color. I always thought that the face was too fuzzy to be detected. I wonder if D-ID is based on sadtalker training, I find that D-ID and sadtalker get similar results, some head movement is allowed, the rest is forbidden, or does D-ID just use sadtalker's way of processing video, But D-ID gets better results than Sadtalker

2. I found that this project seems to be working very well, have you paid attention to it [https://stylelipsync.github.io/],     I could get the Predicted mesh but not the subsequent mouth mask,Can you give some advice on how to get the back mouth mask
   ![image](https://private-user-images.githubusercontent.com/48466610/290057095-28723965-ee1c-4ed5-8d75-11cb959a45e6.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDI0ODYwNzgsIm5iZiI6MTcwMjQ4NTc3OCwicGF0aCI6Ii80ODQ2NjYxMC8yOTAwNTcwOTUtMjg3MjM5NjUtZWUxYy00ZWQ1LThkNzUtMTFjYjk1OWE0NWU2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFJV05KWUFYNENTVkVINTNBJTJGMjAyMzEyMTMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjMxMjEzVDE2NDI1OFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWQyZTkwNjkyNDdiMzZiYjY3MmRkMDA1NmYyMWI4NDUyZGM5OWE0Yjg1MDBkNzQxM2M1YjIxNDQ0NzdhOWUwMWEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.suS330aUn-7gsD1SqPJFoTeGesMyX-2otSNsc486dSM)

I also downloaded that repo it's shame they deleted their code and didn't release their training scripts no point looking more into it if they are not continuing (i'm assuming they plan to keep it closed now) and yeah d-id is based off of sadtalker they just did some magic to it

By the way, I think the code was never released. There was just a readme, but finally deleted the repo

yeah code was released (not training code) I downloaded the pretrained model + code then they disappeared 2 days later. unfortunately the model is person specific/ not generalized. I've seen a couple of repo's do that now especially when they decide to switch to a b2c/ paid model or have concerns about the use of their work in the wild so now I grab them regardless if it's complete.

Thank you very much for your reply, I found this project, the author said the effect is too good, did not publish the inference model https://hangz-nju-cuhk.github.io/projects/StyleSync

And the latest project, GAIA: Zero-shot Talking Avatar Generation Unfortunately, the project homepage is also down image

I recently found the new homepage for gaia https://gaiavatar.github.io/gaia/

@paulovasconcellos-hotmart

Did you guys manage to reproduce the StyleLipSync training algorithm?

@Inferencer
Copy link

Did you guys manage to reproduce the StyleLipSync training algorithm?

Nope, but this is coming next month, could drive it with a 3dmm or something so it is ctrl with audio rather than driving vid
https://yudeng.github.io/Portrait4D-v2/

@paulovasconcellos-hotmart

Do you have any recommendations for open-source modelos for lip sync that can be used commercially? All the ones that I'm finding have (1) no code or (2) non-commercial license

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants