Thread #108062725 | Image & Video Expansion | Click to Play
File: long dick general.jpg (1 MB)
1 MB JPG
App Mode Edition
Discussion of Free and Open Source Diffusion Models
Prev: >>108058266
https://rentry.org/ldg-lazy-getting-started-guide
>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP
>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows
>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/tdrussell/diffusion-pipe
>Z
https://huggingface.co/Tongyi-MAI/Z-Image
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo
>Anima
https://huggingface.co/circlestone-labs/Anima
>Klein
https://huggingface.co/collections/black-forest-labs/flux2
>LTX-2
https://huggingface.co/Lightricks/LTX-2
>Wan
https://github.com/Wan-Video/Wan2.2
>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46
>Illustrious
https://rentry.org/comfyui_guide_1girl
https://tagexplorer.github.io/
>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
GPU Benchmarks: https://chimolog.co/bto-gpu-stable-diffusion-specs/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Bakery: https://rentry.org/ldgcollage
>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg
>Local Text
>>>/g/lmg
>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
312 RepliesView Thread
>>
File: o_00008_.png (2.2 MB)
2.2 MB PNG
>>
>>
>>
>>
File: o_00010_.png (1.4 MB)
1.4 MB PNG
>>
>>
>>
File: z_image_bf16_00126_.png (3.1 MB)
3.1 MB PNG
>>
>>
File: Polemic.jpg (298.5 KB)
298.5 KB JPG
This is bad, I didn’t know Comfy gave Anima $1 million. Funding models that are likely to go closed source like in their repo specifies is bad for the ecosystem. Why not give Laxhar, Lodestone, Neta Yume, WAI or Newbie $1 million for a full finetuning in ZiB? Randomly handing out huge sums of money to friends makes no sense and looks shady or are we seriously supposed to believe Anima is worth $1 million???
>>
>>
>>
>>
>>
>>
>>108062933
No, why Comfy deliberately gave 1 million dollars to literally who? I didn’t even ask for a finetune on a 2B model, we are going back instead of forward. What is Neta Lumina Labs supposed to do now that Comfy is deliberately sponsoring and supporting a model? Comfy is creating unfair and uneven competition.
>>
>>
>>
>>
File: 00064-2147368178.png (841.5 KB)
841.5 KB PNG
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: Flux2-Klein_00976_.png (3.1 MB)
3.1 MB PNG
>>
>>
>>
>>
>>
File: o_00019_.png (1.2 MB)
1.2 MB PNG
>>
>>
>>
>>
>>
>>
>>
>>
File: z_image_bf16_00131_.png (2.2 MB)
2.2 MB PNG
>>108062909
seems to work actually.
has to be portrait
>>
>>
>>
File: rxn_imokaywiththis.png (17.5 KB)
17.5 KB PNG
>>108062926
Anima ain't bad for a first shot. I appreciate the smaller text encoder and low RAM requirements, even if the sampler feels on the slow side.
>>
>>
>>
File: z_image_bf16_00132_.jpg (1.4 MB)
1.4 MB JPG
>>
>>
>>108062957
Neta Art ran out of money a long time ago, nobody has been doing anything with their model other than the NetaYume guy duongve. Who seems to be doing useful stuff WRT Anima now anyways, e.g.
https://huggingface.co/circlestone-labs/Anima/discussions/25#6981e0a14 721e99b00df1f2e
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108063154
>>108063164
chatgpt give me instructions for baking a chocolate cake
>>
File: 3087428.jpg (12.1 KB)
12.1 KB JPG
Why does laten2rgb preview fail to display Klein's lighting properly? the differrence is insane.
>>
File: 1739507491053018.png (2.2 MB)
2.2 MB PNG
>>108063166
yet he keeps asking for them
>>
File: Flux2-Klein_00045_.png (1 MB)
1 MB PNG
>>108063168
I am banned from Civitai.
>>
>>
>>
>>
>>108063196
>>108063189
What's wrong with it?
>>
>>
>>
>>
>>
>>
>>
>>108062966
The idea that “more models = better” is wrong. What made SDXL the fact that almost everyone built on top of it. A single strong base model allowed the community’s work to stack LoRAs, finetunes, tools, and techniques could be combined and improved together.
Today we have the opposite situation. ]eople are spread across many different base models, ZiB, Klei, Anima, Newbie, NetaYume, Qwen. Each one has its own isolated ecosystem and we get fragmentation instead of collective progress.
Even though there are more models now, the overall quality doesn’t improve at the same rate, because effort is divided. Fewer strong, widely adopted base models would produce better results than many disconnected ones.
>>
>>
>>108063177
Kijai implmented Flux.2 TAESD support, the PR just hasn't been accepted yet
https://github.com/Comfy-Org/ComfyUI/pull/12043
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108063316
We've seen this before.
On the Linux desktop, hundreds of distros and incompatible systems weakened adoption, while unified platforms like Windows and macOS stayed dominant.
In messaging, open standards like XMPP fragmented and lost to centralized apps like WhatsApp and Telegram.
In game dev, progress only accelerated once engines like Unreal and Unity became common standards.
>>
File: o_00024_.png (1.7 MB)
1.7 MB PNG
>>
File: FluxKlein4B_Distilled_Output124141.png (3.6 MB)
3.6 MB PNG
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: o_00031_.png (1.5 MB)
1.5 MB PNG
>>
File: o_00033_.png (1.1 MB)
1.1 MB PNG
>>
>>
File: z_image_turbo_bf16.safetensors_00069_.png (3.8 MB)
3.8 MB PNG
>>
>>
>>
File: o_00035_.png (1.4 MB)
1.4 MB PNG
>>
File: Flux2-Klein_00981_.png (383.4 KB)
383.4 KB PNG
>>108063623
>>
>>
>>
File: z_image_bf16_00137_.jpg (3 MB)
3 MB JPG
>>108063592
>>
>>
>>
>>108063623
I had strange patterns at the image edges just week ago. Then when I took a template workflow the patterns were gone. My workflow was pretty much identical and didn't use custom nodes. Also did a node refresh. Didn't help either.
I have zero idea what happened.
One possibility is the issue was pycache related.
>>
>>
File: Screenshot 2026-02-04 213021.png (348.7 KB)
348.7 KB PNG
>prompt for spread legs
>get 3 legs
happens every single time.
z-image will be superseded, it's a matter of when not if
>>
>>
File: o_00044_.png (1.2 MB)
1.2 MB PNG
>>
File: z_image_bf16_00141_.jpg (2.7 MB)
2.7 MB JPG
when there's a will
>>
>>
>>
>>
File: z_image_bf16_00127_.jpg (2.8 MB)
2.8 MB JPG
>>108063911
some women have thick ankles anon
>>
>>
>>
>>
>>
post your z-image negatives
ugly, mature woman, old woman, man, african, bad proportions, bad anatomy, transgender, transvestite, deformed, asian, korean, chinese, japanese, low quality, (jpeg artifacts:1.2), ambiguous items,
>>
>>
>>
>>108063959
>>108063975
Does this just work equally well while saving on tokens?
>>
File: z_image_bf16_00142_.jpg (2.9 MB)
2.9 MB JPG
>>
File: ComfyUI_00022_.png (1.9 MB)
1.9 MB PNG
>>108063959
(低质量, 最差质量:1.4), 文字, 水印, 签名, jpeg伪影, 模糊, 低分辨率, 颗粒感, 照片, 摄影, 写实, 真实, 相机, 超写实, 变异的手, 糟糕的脸, 多余的肢体, 畸形, 缺失的手指, 悬浮的肢体, 断开的肢体, 斜视, 变形, 糟糕的人体解剖, 糟糕的手, 缺失的手臂, 多余的腿, 融合的手指, 手指过多, 长脖子, 变形的盔甲, 弯曲的织针, 悬浮的羊毛, 糟糕的龙, 扭曲的城堡, 变形的建筑, 糟糕的透视, 消失点错误, 裁剪的头部
>>
>>
>>
File: doomerism.png (62.9 KB)
62.9 KB PNG
kek
>>
>>
>>
>>
File: z_image_bf16_00145_.jpg (3.1 MB)
3.1 MB JPG
>>
>>
>>
>>
File: 1756457239912161.png (889.9 KB)
889.9 KB PNG
>>
File: z_image_bf16_00146_.png (3.5 MB)
3.5 MB PNG
>>
>>
File: 1740776839167255.mp4 (1.7 MB)
1.7 MB MP4
>>
>>
>>
File: Qwen Image Max_This eye-level, vertical (1).jpg (126.9 KB)
126.9 KB JPG
>>108064043
have this doomer forgotten that qwen image exists?
>>
File: z_image_bf16_00148_.png (3.6 MB)
3.6 MB PNG
>>
>>
>>
File: Flux2-Klein9BDistill_00054_.png (1.4 MB)
1.4 MB PNG
>>
>>
>>
>>
>>
File: z_image_bf16_00150_.jpg (2.7 MB)
2.7 MB JPG
>>108064391
cute
>>108064423
no u
>>
>>
>>108064437
>Comfy really botched his implementation of Ace Step
99% sure it was intentional at this point. I fucking hate the gradio interface, but there's nothing in it that comfy shouldn't be able to do easily.
>>
>>
>>
>>
>>
I'm diving into citivai.
I'm a newbie.
I've only used ZIT default. Used it for a few weeks. It's fun. But I'm moving on to the next step. I look at citivai and I see there are LORAs and checkpoints. I don't know what those are yet but it looks like add-ons to improve ZIT.
Anyone have any advice before I try out these LORAs and checkpoints?
>>
>>
>>
>>
>>
>>
>>
File: 1587347161774.jpg (42.9 KB)
42.9 KB JPG
>get qwen tts to work
>give it profanities to limit-test
>it all works
>>
File: 1756191505462776.png (1.6 MB)
1.6 MB PNG
>>
>watching tutorial on qwen 3 tts
>it's the jeet in disguise making tutorials
>he uses opera winfrey as a voice clone
>mentions how he finds her super sexy and commanding
Fucking jeets and their god damn desi aunt fucking whatever bullshit fetish.
>>
>>
>>
>>
File: My Hand.jpg (25.9 KB)
25.9 KB JPG
>>108064962
Hand? Here.
>>
>>
>>
>>
>>
File: 69b9df65-ffa2-4efa-8f9c-69c5e65fb403.png (1.9 MB)
1.9 MB PNG
>>
File: 092cb397-e41d-448e-b6f7-5d35d7d83645.png (587.7 KB)
587.7 KB PNG
>>108065120
>>
>>
File: ComfyUI_Anima_00034_.png (1.5 MB)
1.5 MB PNG
>>108064437
Yes, I've been telling anons this since Comfy is missing the thinking option. Though even Gradio implementation has its issues, repaint is not working properly.
Anyways, more explorations with Acestep 1.5. I feel like I'm getting its prompting rhythm
More raw kino gens
https://files.catbox.moe/kwh1vm.mp3
https://files.catbox.moe/3v4zxj.mp3
The model truly is as good if not better than Suno v4.5 on its best gens, truly a miracle for local.
>>
File: Flux2-Klein_00364_.png (1.5 MB)
1.5 MB PNG
>>
>>
File: Flux2-Klein_00366_.png (1.5 MB)
1.5 MB PNG
>>
>>108064437
Comfy trying to cast a wide net with his implement first strat. It's the sole reason he captured the majority market share. Sadly, most of the new supports tends to be shit and you're better off using custom nodes.
>>
>>
>>108065272
Yes, these gens I posted are at least v4.5 tier. Still trying to test the overall musicality of it, seems like it's sensitive BPM changes and precise way genres are described, but songs that are just as catchy as that are possible. As for Udio, it's not there yet, but 85% of the way there. Udio's perk is that it's very catchy out of the box- https://files.catbox.moe/dkchd5.mp3
ACEStep 1.5 can already do the catchiness on good seeds but it doesn't have that big model/RLHF energy to always get it right so it just needs a LoRA or tune
Yes not quite there yet but we are so close.
>>
File: z_imageBASEd_00395_.jpg (709.8 KB)
709.8 KB JPG
>>
>>
>>
>>
File: z_imageBASEd_00397_.jpg (712.2 KB)
712.2 KB JPG
>>108065422
>Skate 2.
that was a great game
>>108065455
>Zimage fixed that?
You can get busty women but you have to prompt for 'massive hyper-tits' and all that crap, so I just trained lora filled with busty asian gravure models, amateurs and k-pop stars for more natural body.
>>
>>
File: 1756471130812543.png (3.2 MB)
3.2 MB PNG
just got done watching some shoelacer slop
>>
File: 1741401472990345.png (2.8 MB)
2.8 MB PNG
my laces are tied
>>
File: 1755365430573123.jpg (683.7 KB)
683.7 KB JPG
oh no shoelacer-kun, let me tie them too!
>>
File: Flux2-Klein_00373_.png (1.7 MB)
1.7 MB PNG
>>
File: 2ed72500-9665-4185-9459-fa4a00652a17.png (1.3 MB)
1.3 MB PNG
>>108065455
NTA, but if you can't prompt large breasts, that's a skill issue. Stock zit
>>
File: Flux2-Klein_00369_.png (1.6 MB)
1.6 MB PNG
>>
File: Flux2-Klein_00375_.png (1.6 MB)
1.6 MB PNG
>>108065543
what do you type into the promptbox to get such massive tits? lmao
>>
File: 1760369877590311.jpg (615.1 KB)
615.1 KB JPG
i dont need your shoelacing bitch, I can solo this nigga
>>
File: 1756686804564084.jpg (474 KB)
474 KB JPG
AIEEEE shoelacerkun please let me be your onehole! end of story.
>>
File: Flux2-Klein_00996_.png (2.1 MB)
2.1 MB PNG
>>
File: Flux2-Klein_00376_.png (1.5 MB)
1.5 MB PNG
>>
File: lo9.png (2.7 MB)
2.7 MB PNG
>>108065540
>>108065543
wouldn't even rape
>>108065499
would rape
>>108062854
>>108063142
>>108063364
>>108063532
>>108064244
neat
>>108065569
gem
>>108062926
>creative AI
>>
>>
>>108065587
but would u rape
>>108065563
>>
File: 49765101-0231-46b9-95f7-586c601346d2.png (1.8 MB)
1.8 MB PNG
>>
File: yy8.png (1.3 MB)
1.3 MB PNG
>>108065623
yes
>>108065626
neat
>>
File: ea802c7a-765d-4fbd-832b-8b268775dcd6.png (1.8 MB)
1.8 MB PNG
>>
File: 979d1cd3-e7fb-4d97-a558-e84dec9d3c36.png (1.8 MB)
1.8 MB PNG
>>
>>
>>108065330
Damn I just tried with 4B clip, shit is like night and day difference. Leave it comfy to default to the worst settings.
Also Lora's work great
https://huggingface.co/Sayoyo/ACE-Step1.5-ZUTOMAYO-LoRA
https://files.catbox.moe/vlvgzk.mp3
>>
File: z_imageBASEd_00408_.jpg (873.3 KB)
873.3 KB JPG
>>
>>
File: g6u.png (2.1 MB)
2.1 MB PNG
>>108065691
that artificial tin can thing that you also see in TTS makes me want to stab my ears brah
>>
File: c33b3c93-f7eb-4290-8c87-803966d99506.png (2.2 MB)
2.2 MB PNG
>>
>>
File: 20f81cac-82c0-4373-9ac2-159a77521b5d.png (1.5 MB)
1.5 MB PNG
>>
>Have to verify E-mail to post in this thread (don't have to do this for other threads, what?)
>Verify from my throwaway proton mail
>Invalid or expired link despite it being 2 minutes old
>Practically forced to post here from the proxy website
Anyone encountered this? Even allowing third party cookies doesn't help (not that I would leave this shit on). Why do they keep making 4chan worse and worse all the time?
>>
>>
>>108065759
>Have to verify E-mail to post in this thread
Is this why I've seen so much less schizo wars lately? If what you say is true. You're probably in the greater area the schizo was using as his shitting ground.
>>
>>108065691
I'm convinced there's a slander campaign against ace step.
Yeah. It's not suno 5 tier. Not even really 4.5 tier, but you'd think people jammed toothpicks into their ears the way people react on reddit.
>>
>>108065769
I'd assume he uses the same website as me right now so I doubt it. The shit they're doing is ineffective against determined schizos, it just keeps making the experience more and more annoying for the occasional posters. But hey, at least the 15 minute captcha is gone.
>>
>>108065775
>you'd think people jammed toothpicks into their ears the way people react on reddit.
It's very sensitive to how you prompt it. Like to get good results you need good settings and prompt. A small change results in bad lyrics adherence. With the LoRA that anon is using quality of the musicality per gen for that type of genre is improved a lot, showing its power with LoRAs (I recommend archiving it as a guide to train LoRAs because it probably won't last). Also the default voices in ACEStep 1.5, especially for English are admittedly not the best and very robotic, which could explain some bad reception, but this LoRA proves it's possible to bring it up to Udio's level at least in Japanese, and very likely increased substantially in English (possibly using various artists with different voices).
>>
>>
>>
File: 1759615666334823.jpg (579.2 KB)
579.2 KB JPG
Anima's understanding of various angles and poses alone BTFOs illustrious
>>
>>
File: z_imageBASEd_00420_.jpg (913.6 KB)
913.6 KB JPG
almost does ak47
>>
>>108065805
Here's where every model stands
ACEStep 1.5 turbo- v4.5 tier (seed dependent)
ACEStep 1.5 turbo LoRA/finetune enhanced or combinations of it- v5+ (I've never heard a Suno gen that sounds human, that LoRA gen is even better than v5), Udio tier (or better)
The reason why with a LoRA it is better than Udio is because the sound quality for instruments is better than what Udio lets you stream, whereas a LoRA has musicality matching v5/Udio (which are about equal I'd wager though I have no access to v5).
>>
>>108065775
Because music is a whole different thing compared to images/videos/text. People's tastes in music can be very personal and specific while other mediums tend to have people going "good enough". You aren't recreating anything from reality like generic 1girl slop so if it sucks, people get a very strong reaction to it. It's also why I rarely use music AI in general other than to play around with some music ideas I made. 90% of text2music sounds like derivative ass, Suno/Udio included.
>>
>>108065832
It's this one https://youtu.be/1l1EtvXV1PQ?t=423
>>
>>
>>
>>108065881
I would be more interested in something like giving the AI my midi notes and small phrases, or my own recorded guitar parts, and it would improvise a song out of these, with specific instruments etc. I have zero interest in throwing dice to get slop music...
>>
>>
>>108065973
>my own recorded guitar parts
ACEStep 1.5 gives you tools precisely for that purpose (similar to what API already has and could take a way at any moment like the rights to any song you modify (E.G. Udio). It will only get better from here on out. Some of you musicians who can't get thru your thick skull that it is meant to be a tool like anything else will be in for a surprise in the future.
>>
>>
>>
>>
>>
File: ComfyUI_09952.png (2.6 MB)
2.6 MB PNG
>>108062926
Damn, where do I stick my hand out for ol' money bags to see it?
>>108066091
It's fine... if you just use it for making LoRAs.
>>
File: 1742957065898503.webm (3.3 MB)
3.3 MB WEBM
How do you make a LoRA waifu? I mean, I want to gen my ideal woman and create a large dataset of her, but I have no idea how to maintain consistency between the gens for the dataset with the current models. I've tried the edit models and they're good enough to get one-off different angles, but for a dataset they're all over the place and constantly change the features/proportions.
>>
>>
>>
>>108066132
Say I used zit and got the perfect gen and I wanted to use it as the base reference for the rest of the dataset. How would I keep the details exact? I want to get her at different angles. Under different lighting. NSFW. Different expressions. It's just so hard to keep things consistent if you're not training on photos of a real person/celebrity.
>>
File: 1754878040094049.png (308.1 KB)
308.1 KB PNG
help
>>
>>
>>
File: z-image_00260_.png (1.6 MB)
1.6 MB PNG
>>
>>
File: ComfyUI_09910.png (2.6 MB)
2.6 MB PNG
>>108066171
Horse!? Where you getting horse?
>>
>>
File: z-image_00269_.png (1.8 MB)
1.8 MB PNG
>>
>>
>>
>>108066228
I haven't played with the angle LoRAs. Thanks. Klein and Qwen edits do a fine job if you're perfecting a gen, but they change the face and details of the person at the more extreme angles, so it doesn't build a consistent dataset, unfortunately. I also like a little asymmetry and imperfections for realism and they almost always get lost even with simple head turns. The API models are a lot better at that.
>>
File: o_00047_.png (2.2 MB)
2.2 MB PNG
>>
>>108066292
>>108066217
same picture
>>
File: o_00049_.png (2.2 MB)
2.2 MB PNG
>>
>>
>>
File: 1739826706370388.png (2.6 MB)
2.6 MB PNG
>>108066316
look, they're different
>>
File: 1748552242366460.png (3.4 MB)
3.4 MB PNG
genned this kino.
free for your friends
yw
>>
File: trv.png (3.5 MB)
3.5 MB PNG
>>108066344
fake and gay
>>
>>
File: 1752629273428.jpg (2.2 MB)
2.2 MB JPG
>>108066257
You could probably make an argument about her thicker/stockier than average legs maybe being horse-like, but her face is extremely feminine (weak chin, upturned nose, etc).
Here's a real picture of her. Not horse-like in the slightest!
>>
>>
>>
File: o_00054_.png (1.2 MB)
1.2 MB PNG
>>
File: hmm.png (121.2 KB)
121.2 KB PNG
>>108066381
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108062926
>likely to go closed source like in their repo specifies
obvious bait but where in the repo do they specify that it'll go closed source?
>>108062957
>literally who
tdrussell has made a couple of goated finetunes. They made the first storytelling finetune of llama 3, and unlike most story/narration tunes it wasn't slopped to the gills.
Would you rather another million for frankensteined models or another SDXL tune?
>>
>>
>>
>>108066498
I'm aware that the dribbling morons from stablediffusion's subreddit often break containment, come here and ask stupid shit. I can no longer tell when someone is being a bad faith troll or is just a goonbrained redditor.
>>
File: Flux2-Klein_01005_.png (377.9 KB)
377.9 KB PNG
holy shit
https://huggingface.co/Tongyi-MAI/Z-Image-Edit
>>
>>
>>
>>
>>
>>
File: 1768113995188083.png (3.2 MB)
3.2 MB PNG
>tfw no goofs of new captioning model
>>
>>
>>108066594
>>108066594
>>108066594
two rules for the new thread
>discuss free and open source diffusion models
>do not court death
>>
>>