StableDiffusion3ImageToImage
classkeras_hub.models.StableDiffusion3ImageToImage(backbone, preprocessor, **kwargs)
An end-to-end Stable Diffusion 3 model for image-to-image generation.
This model has a generate()
method, which generates images based
on a combination of a reference image and a text prompt.
Arguments
keras_hub.models.StableDiffusion3Backbone
instance.keras_hub.models.StableDiffusion3TextToImagePreprocessor
instance.Examples
Use generate()
to do image generation.
image_to_image = keras_hub.models.StableDiffusion3ImageToImage.from_preset(
"stable_diffusion_3_medium", image_shape=(512, 512, 3)
)
image_to_image.generate(
{
"images": np.ones((512, 512, 3), dtype="float32"),
"prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
}
)
# Generate with batched prompts.
image_to_image.generate(
{
"images": np.ones((2, 512, 512, 3), dtype="float32"),
"prompts": ["cute wallpaper art of a cat", "cute wallpaper art of a dog"],
}
)
# Generate with different `num_steps`, `guidance_scale` and `strength`.
image_to_image.generate(
{
"images": np.ones((512, 512, 3), dtype="float32"),
"prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
}
num_steps=50,
guidance_scale=5.0,
strength=0.6,
)
# Generate with `negative_prompts`.
text_to_image.generate(
{
"images": np.ones((512, 512, 3), dtype="float32"),
"prompts": "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
"negative_prompts": "green color",
}
)
from_preset
methodStableDiffusion3ImageToImage.from_preset(preset, load_weights=True, **kwargs)
Instantiate a keras_hub.models.Task
from a model preset.
A preset is a directory of configs, weights and other file assets used
to save and load a pre-trained model. The preset
can be passed as
one of:
'bert_base_en'
'kaggle://user/bert/keras/bert_base_en'
'hf://user/bert_base_en'
'./bert_base_en'
For any Task
subclass, you can run cls.presets.keys()
to list all
built-in presets available on the class.
This constructor can be called in one of two ways. Either from a task
specific base class like keras_hub.models.CausalLM.from_preset()
, or
from a model class like keras_hub.models.BertTextClassifier.from_preset()
.
If calling from the a base class, the subclass of the returning object
will be inferred from the config in the preset directory.
Arguments
True
, saved weights will be loaded into
the model architecture. If False
, all weights will be
randomly initialized.Examples
# Load a Gemma generative task.
causal_lm = keras_hub.models.CausalLM.from_preset(
"gemma_2b_en",
)
# Load a Bert classification task.
model = keras_hub.models.TextClassifier.from_preset(
"bert_base_en",
num_classes=2,
)
Preset | Parameters | Description |
---|---|---|
stable_diffusion_3_medium | 2.99B | 3 billion parameter, including CLIP L and CLIP G text encoders, MMDiT generative model, and VAE autoencoder. Developed by Stability AI. |
stable_diffusion_3.5_large | 9.05B | 9 billion parameter, including CLIP L and CLIP G text encoders, MMDiT generative model, and VAE autoencoder. Developed by Stability AI. |
stable_diffusion_3.5_large_turbo | 9.05B | 9 billion parameter, including CLIP L and CLIP G text encoders, MMDiT generative model, and VAE autoencoder. A timestep-distilled version that eliminates classifier-free guidance and uses fewer steps for generation. Developed by Stability AI. |
backbone
propertykeras_hub.models.StableDiffusion3ImageToImage.backbone
A keras_hub.models.Backbone
model with the core architecture.
generate
methodStableDiffusion3ImageToImage.generate(
inputs, num_steps=50, strength=0.8, guidance_scale=7.0, seed=None
)
Generate image based on the provided inputs
.
Typically, inputs
is a dict with "images"
and "prompts"
keys.
"images"
are reference images within a value range of
[-1.0, 1.0]
, which will be resized to self.backbone.height
and
self.backbone.width
, then encoded into latent space by the VAE
encoder. "prompts"
are strings that will be tokenized and encoded by
the text encoder.
Some models support a "negative_prompts"
key, which helps steer the
model away from generating certain styles and elements. To enable this,
add "negative_prompts"
to the input dict.
If inputs
are a tf.data.Dataset
, outputs will be generated
"batch-by-batch" and concatenated. Otherwise, all inputs will be
processed as batches.
Arguments
tf.data.Dataset
. The format
must be one of the following:"images"
, "prompts"
and/or
"negative_prompts"
keys.tf.data.Dataset
with "images"
, "prompts"
and/or
"negative_prompts"
keys.images
are transformed. Must be between 0.0
and 1.0
. When
strength=1.0
, images
is essentially ignore and added noise
is maximum and the denoising process runs for the full number of
iterations specified in num_steps
.preprocessor
propertykeras_hub.models.StableDiffusion3ImageToImage.preprocessor
A keras_hub.models.Preprocessor
layer used to preprocess input.