Magnusviri[0], the original author of the SD M1 repo credited in this article, has merged his fork into the Lstein Stable Diffusion fork.<p>You can now run the Lstein fork[1] with M1 as of a few hours ago.<p>This adds a ton of functionality - GUI, Upscaling & Facial improvements, weighted subprompts etc.<p>This has been a big undertaking over the last few days, and I highly recommend checking it out. See the mac m1 readme [3]<p>[0] <a href="https://github.com/magnusviri/stable-diffusion" rel="nofollow">https://github.com/magnusviri/stable-diffusion</a><p>[1] <a href="https://github.com/lstein/stable-diffusion" rel="nofollow">https://github.com/lstein/stable-diffusion</a><p>[2] <a href="https://github.com/lstein/stable-diffusion/blob/main/README-Mac-MPS.md" rel="nofollow">https://github.com/lstein/stable-diffusion/blob/main/README-...</a>
Everyone posting their pip/build/runtime errors is everything that's wrong with tooling built on top of python and its ecosystem.<p>It would be nice to see the ML community move on to something that's actually easily reproducible and buildable without "oh install this version of conda", "run pip install for this package", "edit this line in this python script".
It's insane to me how fast this is moving. I jumped through a bunch of hoops 2-3 days ago to get this running on my M1 Mac's GPU and now it's way easier. I imagine we will have a nice GUI (I'm aware of the web-ui, I haven't set it up yet) packaged in an mac .app by the end of next week. Really cool stuff.
Is there a good set of benchmarks available for Stable Diffusion? I was able to run a custom Stable Diffusion build on a GCE A100 instance (~$1/hour) at around 1Mpix per 10 seconds. I.e, I could create a 512x512 image in 2.5 seconds with some batching optimizations. A consumer GPU like a 3090 runs at ~1Mpix per 20 seconds.<p>I'm wondering what the price floor of stock art will be when someone can use <a href="https://lexica.art/" rel="nofollow">https://lexica.art/</a> as a starting point, generate variations of a prompt locally, and then spend a few minutes sifting through the results. It should be possible to get most stock art or concept art at a price of <$1 per image.
Bananas. Thanks so much... to everyone involved. It works.<p>14 seconds to generate an image on an M1 Max with the given instructions (`--n_samples 1 --n_iter 1`)<p>Also, interesting/curious small note: images generated with this script are "invisibly watermarked" i.e. steganographied!<p>See <a href="https://github.com/bfirsh/stable-diffusion/blob/main/scripts/txt2img.py#L253" rel="nofollow">https://github.com/bfirsh/stable-diffusion/blob/main/scripts...</a>
After playing around with all of these ML image generators I've found myself surprisingly disenchanted. The tech is extremely impressive but I think it's just human psychology that when you have an unlimited supply of something you tend to value each instance of it less.<p>Turns out I don't really want
thousands of good images. I want a handful of excellent ones.
I've been playing with Stable Diffusion a lot the past few days on a Dell R620 CPU (24 cores, 96 GB of RAM). With a little fiddling (not knowing any python or anything about machine learning) I was able to get img2img.py working by simply comparing that script to the txt2img.py CPU patch. Was only a few lines of tweaking. img2img takes ~2 minutes to generate an image with 1 sample and 50 iterations, txt2img takes about 10 minutes for 1 sample and 50 generations.<p>The real bummer is that I can only get ddim and plms to run using a CPU. All of the other diffusions crash and burn. ddim and plms don't seem to do a great job of converging for hyper-realistic scenes involving humans. I've seen other algorithms "shape up" after 10 or so iterations from explorations people do online - where increasing the step count just gives you a higher fidelity and/or more realistic image. With ddim/plms on a CPU, every step seems to give me a wildly different image. You wouldn't know that steps 10 and steps 15 came from the same seed/sample they change so much.<p>I'm not sure if this is just because I'm running it on a CPU or if ddim and plms are just inferior to the other diffusion models - but I've mostly given up on generating anything worthwhile until I can get my hands on an nvida GPU and experiment more with faster turn arounds.
Are we being pranked? I just followed the steps but the image output from my prompt is just a single frame of Rick Astley...<p>EDIT: It was a false-positive (honest!) on the NSFW filter. To disable it, edit txt2img.py around line 325.<p>Comment this line out:<p><pre><code> x_checked_image, has_nsfw_concept = check_safety(x_samples_ddim)
</code></pre>
And replace it with:<p><pre><code> x_checked_image = x_samples_ddim</code></pre>
For those as keen as I am to try this out, I ran these steps, only to run into an error during the pip install phase:<p>> ERROR: Failed building wheel for onnx<p>I was able to resolve it by doing this:<p>> brew install protobuf<p>Then I ran pip install again, and it worked!
Is there anyway to keep up with this stuff / beginners guide? I really want to play around with it but it's kinda confusing to me.<p>I don't have an M1 Mac, I have an Intel one with an AMD GPU, not sure if i can run it? don't mind if it's a bit slow, or what is the best way of running it in the cloud? Anything that can product high res for free?
I'd rather see someone implemented glue that allows you to run arbitrary (deep learning) code on any platform.<p>I mean, are we going to see X on M1 Mac, for any X now in the future?<p>Also, weren't torch and tensorflow supposed to be this glue?
Without k-diffusion support, I don't think this replicates Stable Diffusion experience:<p><a href="https://github.com/crowsonkb/k-diffusion" rel="nofollow">https://github.com/crowsonkb/k-diffusion</a><p>Yes, running on M1/M2 (MPS device) was possible with modifications. img2img and inpainting also works.<p>However you'll run into problems when you want k-diffusion sampling or textual inversion support.
How long does it take to generate a single image? Is it in the 30 min type range or a few mins? It's hypothetically "possible" to run e.g. OPT175B on a consumer GPU via Huggingface Accelerate, but in practice it takes like 30 mins to generate a single token.
Has anybody had success getting newer AMD cards working?<p>ROCm support seems spotty at best, I have a 5700xt and I haven't had much luck getting it working.
The difference between an M2 air (8gb/512gb) versus an M1 pro (16gb/1tb) is much more than I expected.<p><pre><code> * M1 pro (16gb/1tb) can run the model in around 3 minutes.
* M2 air (8gb/512gb) takes ~60 minutes for the same model.
</code></pre>
I knew there would be some throttling due to the m2 air's fanless model, but I had no idea it would be a 20x difference (albeit, the m1 pro does have double the RAM. I don't have any other macbooks to test this on).
A few suggested changes to the instructions:<p><pre><code> /opt/homebrew/bin/python3 -m venv venv # [1, 2]
venv/bin/python -m pip install -r requirements.txt # [3]
venv/bin/python scripts/txt2img.py ...
</code></pre>
1. Using /opt/homebrew/bin/python3 allows you to remove the suggestion about "You might need to reopen your console to make it work" and ensures folks are using the just installed via homebrew python3, as opposed to Apple's /usr/bin/python3 which is currently 3.8. It also works regardless of the user's PATH. We can be fairly confident /opt/homebrew/bin is correct since that's the standard homebrew location on Apple Silicon and folks who've installed it elsewhere will likely know how to modify the instructions.<p>2. No need to install virtualenv since Python 3.6 which ships with a built-in venv module which covers most use cases.<p>3. No need to source an activate script. Call the python inside the virtual environment and it will use the virtual environment's packages.
Very interestingly, this is the first true use case I have noticed where new, bleeding edge technology is much better, seemingly, on M1 than Intel GPUs.
I keep running into issues, even after installing Rust in my condo environment (using conda). Specifically the issue seems to be building wheels for `tokenizers`:<p><pre><code> warning: build failed, waiting for other jobs to finish...
error: build failed
error: `cargo rustc --lib --message-format=json-render-diagnostics --manifest-path Cargo.toml --release -v --features pyo3/extension-module -- --crate-type cdylib -C 'link-args=-undefined dynamic_lookup -Wl,-install_name,@rpath/tokenizers.cpython-310-darwin.so'` failed with code 101
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for tokenizers
Failed to build tokenizers
ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects
</code></pre>
Any suggestions?
I'm working on getting this running. Instead of "venv/bin/activate" I had to run "source venv/bin/activate". And I got an error installing the requirements, fixed by running "pip install pyyaml" as a separate command.
How is Stable Diffusion on DreamStudio.ai so much faster than the reports here? Seems to only take 5-10 seconds to generate an image with the default settings.<p>I.e. How are they providing access to GPU compute several orders of magnitude more powerful than an M1, for free?
Hm, when I run the example, I get this error:<p>> expected scalar type BFloat16 but found Float<p>Has anyone seen this error? It's pretty hard to google for.
Thanks for writing this up!! I enjoyed getting TensorFlow running with the M1, although a multi-headed model I was working on wouldn’t run.<p>I just made my Dad’s 101 year old birthday card using OpenAI’s image generating service (he loved it) and when I get home from travel I will use your instructions in the linked article.<p>Any advice for running Stable Diffusion locally vs. Colab Pro or Pro+? My M1 MacBook Pro only has 8G ram (I didn’t want to wait a month for a 16G model). Is that enough? I have a 1080 with 10G graphics memory. Is that sufficient?
Thanks for this - it's rare to see a setup guide that actually works on each step!<p>I did need to run the troubleshooting step too, could probably just move that up as a required step in the guide.
Between this and efforts to add 3D dimension to 2D images, I don’t see much of a future for digital multimedia creator jobs.<p>Even TikTok could be an endless stream of ML models.<p>Fears of a tech dystopia may be overblown; the masses will just shut off their gadgets and live simpler if labor markets implode within the traditional political correct economic system we have.<p>Open source AI is on the verge of upending the software industry and copyright. I dig it.
For me:<p><pre><code> File "/Users/layer/src/stable-diffusion/venv/lib/python3.10/site-packages/torch/serialization.py", line 250, in __init__
super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'models/ldm/stable-diffusion-v1/model.ckpt'
</code></pre>
The directory is empty. Hmm.<p>I forgot to<p><pre><code> mv sd-v1-4.ckpt models/ldm/stable-diffusion-v1/model.ckpt
</code></pre>
On a Mac Studio<p><pre><code> data: 100%|| 1/1 [00:43<00:00, 43.20s/it]
Sampling: 100%|| 1/1 [00:43<00:00, 43.20s/it]</code></pre>
Thanks for the writeup! It works smoothly on my M1 Macbook Pro!<p>A few days ago, I tried Stable Diffusion code and was not able to get it work :( Then I gave up...<p>Today, following steps in this blog post, it works for the very first try. Happy!
If you have a top of the line M1 MBP but the hard drive is 2TB, would it make sense to plug in an external hard drive for the 4TB model or would it render the effort futile due to performance issues?
Thanks for this tutorial, I had errors and spent time to fix them and I found this script that install LStein project on M1: <a href="https://github.com/glonlas/Stable-Diffusion-Apple-Silicon-M1-Install" rel="nofollow">https://github.com/glonlas/Stable-Diffusion-Apple-Silicon-M1...</a><p>On my side this helped me to make it works. I ran it and it was installed.
I keep getting `No module named 'ldm'` after I run `python scripts/dream.py --full_precision`. I've confirmed 'ldm' is activate in conda. Any idea?
The various articles/tutorials seem a bit confusing: even though they say "M1", they also worked fine for me on an Intel Mac (and does end up using GPU).<p>Does anyone know how to think about the --W --H and --f flags to create larger images? I have 64GB memory, but I get errors from PyTorch saying things like "Invalid buffer size: 7.54 GB" when I try to increase W and H, and I haven't managed to make the Python process use more than about 15GB by playing around so far.
Anyone know the largest possible image size > 512x512? I'm getting the following error when trying 1024x1024 with 64 GB RAM on M1 MAX:<p>/opt/homebrew/Cellar/python@3.10/3.10.6_2/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Anyone else surprised that the results just aren't very good? I followed the instructions and it works, but just seems kinda-like deep dream level results circa 2005. Lots of blurry eyes in the wrong spots, most objects seem really blurrly and cut off. Nothing like the demo examples I've seen online. Are the default options not optimal to get the best results?
As a side bit, this model appears to have difficulty with the prompt "wolf with bling walking down a street" and often generates an image that I am fairly sure is not unique and is not representative of the idea that is trying to be communicated in that text.
One beautiful thing I realized about all this progress in AI.<p>We will still need people to do the hard yards, and get dirt between their fingernails. I am firmly in the camp of those people.<p>Fancy algorithms won't dig holes, or lay out rail tracks of over hundreds of miles.. or build houses all across the world.
What's this log message about when generating an image?<p>Creating invisible watermark encoder (see <a href="https://github.com/ShieldMnt/invisible-watermark" rel="nofollow">https://github.com/ShieldMnt/invisible-watermark</a>)...
Tried "transparent dog", got rickrolled. Why is this NSFW? ...anyway, I disabled the filter and... it's pretty neat! Calling all AI Overlords, soon. :))
I don't want to sound lazy, but I would be expecting a .dmg for Macs, and I don't seem to find it. Am I blind, or it simply hasn't been prepared yet?
Note: I ran this and haven't yet been able to get img2img working yet. I borked it up trying to get conda working.<p>It's been a lot of fun to play with so far though!
Is there a proper term to encapsulate M1/M2 Macs now that we have the M2? IE Apple Silicon Macs works but is a bit long. MX Macs? M-Series? ARM Macs?
I just got rick-rolled by the model.<p>Using the prompt: "1990s textbook background mephis style"[sic] (yup I meant memphis)[0], I got back this: [1]. Rerunning the same prompt, I got: [2].<p>[0] <a href="https://files.littlebird.com.au/Shared-Image-2022-09-02-10-29-53-RILg9J.png" rel="nofollow">https://files.littlebird.com.au/Shared-Image-2022-09-02-10-2...</a><p>[1] <a href="https://files.littlebird.com.au/grid-0004-2xXAGF.png" rel="nofollow">https://files.littlebird.com.au/grid-0004-2xXAGF.png</a><p>[2] <a href="https://files.littlebird.com.au/grid-0005-kcfgq7.png" rel="nofollow">https://files.littlebird.com.au/grid-0005-kcfgq7.png</a>