instruct_pix2pix problem #7678
              
                Unanswered
              
          
                  
                    
                      mechigonft
                    
                  
                
                  asked this question in
                Q&A
              
            Replies: 0 comments
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
I'm using the instruct_pix2pix training method to regenerate backgrounds for cut-out food images. However, I've noticed that the generated backgrounds often contain numerous fragmented and distorted cups, plates, and bowls. What could be the reason for this? I've examined my training data, and although it also includes cups, plates, and bowls, there is only one of each, and all are in their normal shape. Could you help me look into this issue?

cut-out food image:
after regenerating the background:

my training data example:

input_image:
edited_image:

training script:
export MODEL_NAME="/models/stable-diffusion-v1-5"
export DATASET_ID=""
export OUTPUT_DIR=""
accelerate launch --mixed_precision="fp16" /ossfs/workspace/diffusers/examples/instruct_pix2pix/train_instruct_pix2pix.py
--pretrained_model_name_or_path=$MODEL_NAME
--dataset_name=$DATASET_ID
--enable_xformers_memory_efficient_attention
--resolution=256 --random_flip
--train_batch_size=1 --gradient_accumulation_steps=1 --gradient_checkpointing
--max_train_steps=5000
--checkpointing_steps=10000 --checkpoints_total_limit=1
--learning_rate=5e-05 --max_grad_norm=1 --lr_warmup_steps=0
--conditioning_dropout_prob=0.05
--mixed_precision=fp16
--seed=42
--output_dir=$OUTPUT_DIR
inference script:
import PIL
import requests
import torch
from diffusers import StableDiffusionInstructPix2PixPipeline
model_id = '' # <- replace this
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
generator = torch.Generator("cuda").manual_seed(0)
image_path = '/ossfs/workspace/result.png'
def download_image(image_path):
image = PIL.Image.open(image_path)
image = PIL.ImageOps.exif_transpose(image)
image = image.convert("RGB")
return image
image = download_image(image_path)
prompt = 'replace the background with a clean and concise background, simple and clean'
prompt = 'replace the background picture to pure white background'
prompt = 'extend background'
num_inference_steps = 20
image_guidance_scale = 1.5
guidance_scale = 10
edited_image = pipe(prompt,
ng_prompt = 'other food and drinks, white empty cups, white empty bowls, white empty plates, cutlery, knives and forks, chopsticks, complex background',
ng_prompt = 'cups, bowls, plates',
image=image,
num_inference_steps=num_inference_steps,
image_guidance_scale=image_guidance_scale,
guidance_scale=guidance_scale,
generator=generator,
).images[0]
edited_image.save("/ossfs/workspace/result_extend_background.png")
System Info
$diffusers-cli env
Setting ds_accelerator to cuda (auto detect)
Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.
diffusers version: 0.28.0.dev0
Platform: Linux-5.10.134-13.al8.x86_64-x86_64-with-glibc2.17
Python version: 3.8.16
PyTorch version (GPU?): 2.0.0+cu117 (True)
Huggingface_hub version: 0.22.2
Transformers version: 4.33.2
Accelerate version: 0.21.0
xFormers version: 0.0.21
Using GPU in script?:
Using distributed or parallel set-up in script?:
Beta Was this translation helpful? Give feedback.
All reactions