Generate Stunning AI Images for Free Using Diffusion Models

usmanmalik57 2 Tallied Votes 183 Views Share

In this tutorial, you will see how to generate stunning AI-generated images from text inputs using state-of-the-art diffusion models from Hugging Face. You'll learn about base diffusion models and how combining them with a refiner creates even more detailed, refined results. Diffusion models are powerful because they iteratively refine an image starting from pure noise.

Advanced generative AI tools like Midjourney and OpenAI DALL·E 3 use diffusion models to generate photo-realistic AI images. However, these models charge fees to generate AI images. With diffusion models from Hugging Face, you can generate AI images for free. So, let's dive in!

Installing Required Libraries

To begin, let's install the libraries necessary for this project. Execute the following commands to get all dependencies ready:

!pip install diffusers --upgrade
!pip install invisible_watermark transformers accelerate safetensors

Generating AI Images Using Base Diffusion Models

Most state-of-the-art text-to-image diffusion models consist of a base model and a refiner. We'll first generate an image using the base diffusion model. We will use the stabilityai/stable-diffusion-xl-base-1.0 (SDXL) model for image generation. SDXL employs an ensemble of expert models for latent diffusion. Initially, the base model generates (noisy) latent images, which are then refined by a specialized model during the final denoising stages. You can use any other text-to-image diffusions from Hugging Face.

The following Python script initializes a Hugging Face pipeline for the diffusion model and sets it up for GPU acceleration.


from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16,
                                         use_safetensors=True,
                                         variant="fp16")
pipe.to("cuda")

The next step is to pass a text prompt to the prompt attribute of the pipeline you defined. As shown in the script below, you can retrieve the generated image using the images list.


prompt = "A texas ranger riding a white horse"

images = pipe(prompt=prompt).images[0]

images

Output:

image1.png

Look at the image generated above; isn't it cool? You can even use this for commercial purposes.

Generating Refined Images using Ensemble of Experts

Using an ensemble of experts and a refiner, you can create more refined and advanced images. To do so, you first create a simple base model as you did before. Next, you create a refiner model and pass the base model to it.

The refiner will build upon the image created by the base model to deliver a more polished, detailed final output.

The script below creates our base model and refiner.


from diffusers import DiffusionPipeline
import torch

# load both base & refiner
base = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16, variant="fp16",
    use_safetensors=True
)
base.to("cuda")
refiner = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0",
    text_encoder_2=base.text_encoder_2,
    vae=base.vae,
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16",
)
refiner.to("cuda")

In the following script, we specify that the ensemble of experts should take 40 steps to generate an image from noise. Out of these 40 steps, the base model will take 80% (32 steps), and the refiner will use the remaining 20% (8 steps) to refine the image.


n_steps = 40
high_noise_frac = 0.8

prompt = "An  panda sitting on a table having a drink in a wooden room"

# run both experts
image = base(
    prompt=prompt,
    num_inference_steps=n_steps,
    denoising_end=high_noise_frac,
    output_type="latent",
).images

image = refiner(
    prompt=prompt,
    num_inference_steps=n_steps,
    denoising_start=high_noise_frac,
    image=image,
).images[0]

image

Output:

image2.png

From the above output, you can see a cute panda drinking in a wooden room. Excellent? Isn't it?

Conclusion

Diffusion models allow you to create stunning AI images. You can use diffusion modes from Hugging Face to generate AI images for free.

In this tutorial, we employed the SDXL model for image generation. The base model generates (noisy) latent images, which are then refined by a specialized model during the final denoising stages. The base model can also function independently as a standalone module.

I invite you to try these models and share what you generated.

rproffitt 2,662 "Nothing to see here." Moderator

Read this today:

A.I. made me believe in the concept of the human soul by showing me what art looks like without it.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.