AI
Fashion
Photography

Getting the Best Virtual Try On Results: Part II

A short visual guide for the additional sampling controls in the FASHN app and API.

Written by Dan Bochman | September 26, 2024

blog-image

Introduction

In this short guide, we will cover the advanced controls provided by FASHN for its internal diffusion-based AI model, ensuring successful virtual try-on generations across various scenarios, accompanied by visual examples. The controls discussed include: Timesteps, Guidance Scale, Seed and Number of Samples.

sampling controls

Diffusion 101

Before diving into the next sections, it’s helpful to have a basic understanding of how diffusion models work for image generation, without getting too technical.

Diffusion models edit images by gradually adding and removing noise over a series of steps. At each step, instead of noise, the model can introduce new pixels, guided by instructions we provide — such as a text prompt, or in our case, an image of the garment we want to try on. In the earlier steps, more noise is added, allowing many pixels to be altered. In the later steps, less noise is added, refining only small details.

For example, let’s consider the following inputs for virtual try-on, where we want to transfer the black t-shirt onto the model:

inputs

Model: https://www.instagram.com/ireneisgood

Garment: https://www.pinterest.fr/pin/326440673004655181/

Now let’s visualize how the try-on process looks like behind the scenes for a series of steps:

SCR-20240927-lvvvSCR-20240927-lwae

Pretty cool huh? 😎 By gradually restoring the image from noise, we get to intervene in intermediate steps and modify the original image!

Now, let’s go over some of the parameters FASHN provides to control this process.

Timesteps

The default (and maximum) number of steps FASHN uses is 50. This is sufficient for virtually all cases, ensuring a smooth generation process with enough steps to edit both coarse and fine details in the image.

50 timesteps

Model: FASHN AI

Garment: https://www.cefinn.com/products/riley-funnel-neck-blouse-cornflower-blue

The result is great, but for relatively simple cases like the one above, we can often achieve an equally good result with significantly fewer steps, reducing the time required to generate an image.

Let’s compare how the virtual try-on results change for the same inputs when using 10, 20, 30, 40, or 50 timesteps:

10-50-timesteps

We can see that 10 timesteps were insufficient for an accurate result, but 20 timesteps produced a result just as good as 50 timesteps, with less than half the runtime.

Now, let’s try a slightly more challenging example, such as the Prada t-shirt from earlier, using the same FASHN AI model.

prada-10-50

In this case, it seems like 20 steps were not enough for a good result, but the result is stable at 30 steps.

For the final experiment, let’s try a difficult pose and a complex garment combination:

difficult-pair

Model: FASHN AI

Garment: https://www.urbanoutfitters.com/shop/kimchi-blue-katie-mesh-floral-graphic-long-sleeve-tee

difficult-pair-10-50

Even for this challenging virtual try on task, 30 steps were enough to achieve a high-quality result, and running additional steps does not improve it further.

At the time of writing this guide, we were unable to find a spontaneous example where 30 timesteps were insufficient. However, more timesteps can provide added stability with respect to initial noise or seed, as we’ll explore in the following sections.

Guidance Scale

The guidance scale can be interpreted as the degree to which we force the diffusion model to incorporate the instruction guidance (in our case, the garment image) at each step.

The primary practical use case for adjusting this parameter arises when the try-on generation is perfect in terms of fit and details but slightly off regarding color saturation.

Let’s explore an example with a flat lay graphic t-shirt:

canyonland-default

Model: FASHN AI

Garment: https://www.urbanoutfitters.com/shop/hybrid/canyonland-utah-tee

We achieve an impressive result in terms of fitting the flat lay image and maintaining consistency in the graphic print details; however, the colors are a bit oversaturated. This is precisely where tweaking the guidance scale can make a difference. Let’s experiment with a few different values.

guidance-scale-range

This may be subjective to the observer, but I believe most will agree that the leftmost result, generated with a guidance scale of 1.5, is the most true to the source in this series.

As a rule of thumb, if you want colors to appear softer and less saturated, try decreasing the guidance scale. Conversely, if the colors or details are not “popping” enough, try increasing it!

Seed

Remember how we mentioned that noise is added at each step? This noise is inherently random and significantly influences the generation result. In most computer programs, randomness isn’t truly random; you can reproduce a random sequence of operations using what’s called a Seed.

This is useful in case you want to either:

  1. Reproduce earlier results

  2. Force different results

Number of Samples

Directly related to the seed, the Number of Samples parameter allows you to generate multiple images at once, each with different noise. This can increase your chances of obtaining a good result and is especially useful when working with fewer timesteps, as results can be more volatile.

Here’s an example of four images generated with 30 timesteps and a guidance scale of 2.5, where the only difference is the seed:

num-samplesseed-num-samples

The second-from-the-left result is great, while the others are quite poor. If someone were to generate only a single image and receive the first (left) result, they might conclude that the technology doesn’t work.

❣️ TL;DR Tips

🧘🏻 If time is not an issue, keeping the default of 50 timesteps and setting the Number of Samples to 4 will maximize your chances of getting a good result with each run.

🏃 For fast experimentation, set timesteps to 30; if you get a bad result, change the seed and retry.

🌈 If colors are oversaturated and/or details are too sharp, decrease the guidance scale.

Closing Words

In this short guide, we explored the advanced Sampling Controls that FASHN provides, allowing for finer control over the image generation process. We discussed the trade-offs of different configurations and how to adjust the controls based on your work session needs (e.g., fast experimentation).

Still not getting satisfactory results? Don’t be discouraged! If you haven’t already, join our Discord community and share with us what you’re trying to achieve. The FASHN team and other community members are here to help you get the results you’re looking for.