Thread #108051632 | Image & Video Expansion | Click to Play
File: Screenshot from 2026-02-03 11-49-00.png (244.6 KB)
244.6 KB PNG
Comfyui has Ace Step 1.5
Currently broken on AMD, apparently, but if you have nvidia, you can gen songs locally, and with lots of good features.
https://www.reddit.com/r/comfyui/comments/1quzawn/acestep_15_is_now_av ailable_in_comfyui/
36 RepliesView Thread
>>
File: Screenshot from 2026-02-03 11-58-49.png (23.8 KB)
23.8 KB PNG
>>108051632
the error I'm getting, with amd.
>>
>>
>>
>>
>>
>>108051632
MY FIRST GEN
https://files.catbox.moe/hqkvul.mp3
btw, has anyone ever gotten the gradio crap to work with amd?
>>
>>
File: Screenshot from 2026-02-03 13-23-27.png (70.2 KB)
70.2 KB PNG
God damn github, what a shit website, I searched it like 10 timed and it pukes like it's being ddos'd. massive trash.
>>
>>
>>
>>
no luck solving
probability tensor contains either `inf`, `nan` or element < 0
(ie I still have to use cpu vae - that or buy an s tier nvidia gpu to just do stuff that comfyui is incapable of)
try to keep the thread alive, I must do life crap
>>
>>
>>108053243
For me, I have to use --cpu-vae
MIOPEN_FIND_MODE=FAST python main.py --cpu-vae
the miopen_find_mode thing helps with my amd card, on Linux.
anyway, --cpu-vae is necessary on amd, because comfyui's audio vae is immature. really, audio everything is immature on comfy.
anyway, back slow genning.
possible areas of solution, which I may try to investigate as time allows:
>zluda comfyui
>(actually I won't bother with this one) some other audio gen thingies don't have the cpu-vae issue. heartmula didn't, not sure why. I had to restart the server every few gens, again, not sure exactly how that happened.
>>
>>
HOW DO YOU KNOW IF YOU ARE USING CPU VAE????
Your cpu usage will go way up. I find all cores utilized, but not all out constantly. 1%, 20%, 80%, 1% again, on the same core say CPU7.
>>108054987
comfyui isn't considered malware. audio gen is very different from /ldg/ and /sdg/ interests. 99% of those guys will never be interested in even listening to any music at all. music is kind of a Gen X thing. Once you get past Gen X, music just isn't that important to people. That's why "hit songs" for 2025 per Billboard were on average 3 minutes or so. vs 5 minutes as the traditional song length, with a few super long songs making it to the top some years.
>>108054980
correction, this is what is working for me (AMD XTX rx 6950xt):
PYTORCH_ALLOC_CONF=expandable_segments:True TORCH_ROCM_AOTRITON_ENABLE_EXPERIME NTAL=1 MIOPEN_FIND_MODE=FAST python main.py --use-pytorch-cross-attention --novram --cpu-vae
I don't know if I need all options lol. so many combinations to try, but hopefully I'll take the time to look into it.
This is btw what I used with SongBloom
Yes I am genning. It will take maybe 30 minutes for a 120 minute song, for me, I guess. We'll see. This is only my second acestep 1.5
>>
>>108055108
>>108053243
yeah, so I'm getting a vae decode error too, with audio decode, despite being under cpu decode. it's not a ram issue.
I had one successful gen, so I know it *can* work...
>>
>>
>>108055405
share!
https://catbox.moe/
so far, my gens are taking forever, but I think this one will succeed (I had a node that overrides others, that is incompatible with ace step; it might be an old node that I should avoid anyway)
>>
>>
>>
>>
>>108055492
Thanks!
be sure to include what your style prompts were!
First one is doing what I knew the base model could do. I would sometimes get like 1 second that punched through the low bitrate and laid down bass.
second
isn't it strange rich people can't get a girlfriend?
>>108055519
Yeah but it's the first song model that has promise. heartmula doesn't follow the prompt. ace step 1.35 was low bitrate. songbloom's method is experimental and not ready for prime time.
>>108055545
style prompt: teen pop
>>
>>108055545
>>108055576
also, it took me <38 minutes, and needed <50gb system memory (to cpu vae). idk why it needs that much lol
>>
File: 3221311515.png (150.1 KB)
150.1 KB PNG
Official Gradio UI has a lot of features, including LoRA training etc... Plus I feel like the stuff is broken for Comfy, on 3090 it reloads prompt each time
>>
>>
>>
>>
>>
>>
>>
>>
>>