
2024年6月12日晚,Stability AI 官方如期发布开源其Stable Diffusion 3 Medium 模型,该模型是其迄今为止最先进的文本到图像开放模型,包含 20 亿个参数。
Stable Diffusion 3 Medium 模型的尺寸较小,这使得它能够在消费级 PC 和笔记本电脑以及企业级 GPU 上良好运行。同时,它的这种尺寸特点也使其有潜力成为文本到图像模型的下一个标准。
主要特性与功能:
图像质量改进:该模型在图像质量上有显著提升,能够生成更高质量、更细腻的图像。
复杂提示理解:改进了对复杂文本提示的理解能力,能够更准确地将文本描述转换为图像。
资源效率:在资源使用方面进行了优化,能够在更少的计算资源下实现较高的性能。

以上是官方提供的10个例子的图片。
今天给大家在本地电脑上实际演示部署Stable Diffusion 3 Medium模型。
模型首先需要在huggingface 官方网站下载。
下载地址https://huggingface.co/stabilityai/stable-diffusion-3-medium/tree/main#/
这里注意需要有huggingface 官方账号登录并且填写相关授权才可以下载模型,不是免登录就可以下载模型的,这里需要注意一下。

以上截图就是模型全部文件。简单解释一下
comfy_example_workflows 文件夹下有3个文件。这个后面通过comfy_ui工具加载工作流使用,需要下载。

demo-images 这个文件夹下就是刚才上面图官方例子图片,这个可以不用下载。

text_encoders 文件夹下有clip文件 这个你的情况下载。 我们只用到了2个clip_g.safetensors、clip_l.safetensors

剩下文件跟目录下就是模型sd3_medium.safetensors 这个是不带CLIP模型。后面几个模型分别是带t5xx+clips 模型 分别是FP8 和FP16精度的。大家可以更加自己需要下载,不是4个模型都需要下载。我这边建议大家下载一个sd3_medium_incl_clips_t5xxlfp8.safetensors模型

下面的sd3demo_prompts.txt是模型官方提供的 10个提示词
a female character with long, flowing hair that appears to be made of ethereal, swirling patterns resembling the Northern Lights or Aurora Borealis. The background is dominated by deep blues and purples, creating a mysterious and dramatic atmosphere. The character's face is serene, with pale skin and striking features. She wears a dark-colored outfit with subtle patterns. The overall style of the artwork is reminiscent of fantasy or supernatural genres
Digital art, portrait of an anthropomorphic roaring Tiger warrior with full armor, close up in the middle of a battle, behind him there is a banner with the text "Open Source".
photo of a dog and a cat both standing on a red box, with a blue ball in the middle with a parrot standing on top of the ball. The box has the text "SD3"
selfie photo of a wizard with long beard and purple robes, he is apparently in the middle of Tokyo. Probably taken from a phone.
A vibrant street wall covered in colorful graffiti, the centerpiece spells "SD3 MEDIUM", in a storm of colors
photo of a young woman with long, wavy brown hair tied in a bun and glasses. She has a fair complexion and is wearing subtle makeup, emphasizing her eyes and lips. She is dressed in a black top. The background appears to be an urban setting with a building facade, and the sunlight casts a warm glow on her face.
anime art of a steampunk inventor in their workshop, surrounded by gears, gadgets, and steam. He is holding a blue potion and a red potion, one in each hand
photo of picturesque scene of a road surrounded by lush green trees and shrubs. The road is wide and smooth, leading into the distance. On the right side of the road, there's a blue sports car parked with the license plate spelling "SD32B". The sky above is partly cloudy, suggesting a pleasant day. The trees have a mix of green and brown foliage. There are no people visible in the image. The overall composition is balanced, with the car serving as a focal point.
photo of young man in a black suit, white shirt, and black tie. He has a neatly styled haircut and is looking directly at the camera with a neutral expression. The background consists of a textured wall with horizontal lines. The photograph is in black and white, emphasizing contrasts and shadows. The man appears to be in his late twenties or early thirties, with fair skin and short, dark hair.
photo of a woman on the beach, shot from above. She is facing the sea, while wearing a white dress. She has long blonde hair
后面再运行时候会用到。
如果第一次使用 ComfyUI 可以到github官方下载压缩解压包。
项目地址:https://github.com/comfyanonymous/ComfyUI/releases
选择一个对应CUDA版本比如

程序压缩解压就可以了。我本地电脑上之前安装过。所以我们点击升级ComfyUI 就可以了。最新的ComfyUI 是支持SD3的

点击update_comfyui.bat 直接升级到最新版本

将下载的模型sd3_medium_incl_clips_t5xxlfp8.safetensors 部署在D:\temp\ComfyUI_windows_portable\ComfyUI\models\checkpoints 文件夹中。
clip_g.safetensors 和clip_l.safetensors 复制到D:\temp\ComfyUI_windows_portable\ComfyUI\models\clip文件夹中

D:\temp\ComfyUI_windows_portable文件夹下,双击run_nvidia_gpu.bat 启动ComfyUI

程序启动后自动调用浏览器打开http://127.0.0.1:8188/#/ 地址

我们将刚才下载的comfy_example_workflows_sd3_medium_example_workflow_basic.json 工作流文件拖拽到浏览器中打开

load checkpoint 中选择sd3_medium_incl_clips_t5xxlfp8.safetensors 模型

cliploader设置我们之前下载的 clip_g.safetensors 和clip_l.safetensor模型

其他步骤我们都不需要设置。
接下来我们复制一个提示词
Digital art, portrait of an anthropomorphic roaring cat warrior with full armor, close up in the middle of a battle, behind him there is a banner with the text "AI".

下面反向提示词保持默认不用管。
点击后边小窗口“queue Prompt” 这样等1分钟左右就可以生成图片了。


运行后我们的显卡英伟达RTX3060系列的 消耗显存大概是6G不到。我这个模型是中等模型fp8的,如果觉的画面不行可以换FP16模型
下面是我生成的几张图,大家可以欣赏一下。
图1

图2

图3

图4

图5

目前网上有人通过SD3生成人物出现翻车现象,我这里面没有仔细测试。从测试几张图来看,人物方面确实会有点问题,最后2张图人物脸部是变形了的。其他方面生成的总体效果还是不错的。希望SD3后续大尺寸模型可以改进以上问题。今天的分享就到这里了,我们下个文章见。