File: highlights_g_108990829_1780806702_1.jpg (1.7 MB)
Discussion and Development of Local Image, Video, and Music Models
Previous: >>108990829
https://rentry.org/ldg-lazy-getting-started-guide
>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
SDWebUI: https://rentry.org/ldg-lazy-getting-started-guide#the-stable-diffusion -web-ui-lineage
Wan2GP: https://github.com/deepbeepmeep/Wan2GP
>Checkpoints, LoRAs, & Upscalers
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/tdrussell/diffusion-pipe
https://github.com/kohya-ss/sd-scripts
https://github.com/kohya-ss/musubi-tuner
>Z
https://huggingface.co/Tongyi-MAI/Z-Image
>Anima
https://huggingface.co/circlestone-labs/Anima
https://tagexplorer.github.io/
https://animadex.net
>Qwen
https://huggingface.co/collections/Qwen/qwen-image
>Klein
https://huggingface.co/collections/black-forest-labs/flux2
>Wan
https://github.com/Wan-Video/Wan2.2
>LTX-2.3
https://huggingface.co/collections/Lightricks/ltx-23
>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46
>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Collage: https://rentry.org/ldgcollage
>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg
>Local Text
>>>/g/lmg
>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
Showing all 373 replies.
>>
>>
>>
>>
>>
>>
>>
More ACEStep XL ZUTOMAYO LoRA kino, this time just J-Rock. These are the raw, unmastered outputs, without DCW enabled.
https://vocaroo.com/1cCVdYQg5ZnQ
https://vocaroo.com/1mpl1HJOqEvD
For those not aware, I made and refined an ACEStep XL LoRA training guide a while back https://rentry.co/s8fg8ber
I think I have now found the definitive way to inference ACEStep XL and get the most quality out of it, both with and without a LoRA. I tested against prompts on the official showcase- https://ace-step.github.io/ace-step-v1.5.github.io/
Notice how all the music Turbo makes sounds washed. The model you want to fix this while retaining most of Turbo's musical abilities is Base Turbo XL merge from https://huggingface.co/scragnog/ace-step-1.5-gguf-merge-models/tree/ma in
I have tested other merges as well and have arrived at this one as the best for all LoRAs trained on base, including for non-LoRA outputs as well.
With this model alone, there's no immediate need to master the outputs, because they aren't noisy by default. The UI I now use is https://github.com/scragnog/HOT-Step-CPP
since it has more samplers and settings (DPM++ 3M which I used here, etc...). As always, on 90% of prompts, DiT-only outputs are best, 50 steps, with a guidance scale of 12+. DCW is not needed with this model, as it's that good, and previous DCW settings I shared were not the best on this model (seems to have muted some instrumentals).
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>mfw Resource news
06/06/2026
>HugginFace VFS Plugin: Native Total Commander file system for Hugging Face models
https://github.com/mikinko/HuggingFace_WFX
>ComfyUI Lance AIO: Custom nodes to run Lance-3B
https://github.com/SteveImmanuel/comfyui-lance-aio
>Cube: Generative AI System for 3D
https://github.com/Roblox/cube
>The token bill comes due: Inside the industry scramble to manage AI’s runaway costs
https://techcrunch.com/2026/06/05/the-token-bill-comes-due-inside-the- industry-scramble-to-manage-ais-run away-costs
06/05/2026
>RhymeFlow: Training-Free Acceleration for Video Generation with Asynchronous Denoising Flow Scheduling
https://simon-dcs.github.io/Website-of-RhymeFlow
>Complexity-Balanced Diffusion Splitting
https://noamissachar.github.io/CBS
>Can We Predict The Human Preference For Text-to-Image Content Prior To Generation And Is It Even Useful To Do So?
https://github.com/LSU-ATHENA/HPM-Predict
>SAM-Flow: Source-Anchored Masked Flow for Training-Free Image Editing
https://github.com/chwbob/Sam-Flow
>Geometry-Aware Dataset Condensation for Diffusion Model Training
https://github.com/2018cx/GADC
>StoryVideoQA: Scaling Deep Video Understanding with a Large-Scale, Multi-Genre and Auto-Generated Dataset
https://github.com/nercms-mmap/StoryVideoQA
>Lightricks to split into two companies as it cuts 75 jobs
https://www.calcalistech.com/ctechnews/article/r1dgjt5gmg
>Akium Sampler: Custom k-diffusion sampler for Stable Diffusion Forge / A1111
https://github.com/AkiumAI/akium-sampler
>When AI builds itself: Our progress toward recursive self-improvement, and its implications
https://www.anthropic.com/institute/recursive-self-improvement
>U.S. Government Officials In Talks To Acquire Shares In AI giants
https://www.notus.org/technology/trump-ai-stake-openai
06/04/2026
>Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation
https://echo-team-joy-future-academy-jd.github.io/Echo-Infinity
>>
>mfw Research news
06/06/2026
>Physics-Informed Video Generation via Mixture-of-Experts Latent Alignment
https://arxiv.org/abs/2606.04737
>Real-Time Generation of Streamable Talking Portrait Video with Reference-Guided Deep Compression VAEs
https://arxiv.org/abs/2606.01620
>SymTRELLIS: Symmetry-Enforced Voxel Latents for 3D Generation
https://arxiv.org/abs/2606.04108
>Resonant Minds: Closed-Loop Social Avatars with Theory of Mind
https://arxiv.org/abs/2606.05896
>Pool-Select-Refine: Allocation-Aware Generative Dataset Distillation with Soft-Label-Guided Latent Refinement
https://arxiv.org/abs/2606.01920
>MeshWeaver: Sparse-Voxel-Guided Surface Weaving for Autoregressive Mesh Generation
https://arxiv.org/abs/2606.04688
>CV-Arena: An Open Benchmark for Instructional Computer Vision Problem Solving with Human-AI Collaborative Preferences
https://arxiv.org/abs/2606.00931
>Diversity Matters: Revisiting Test-Time Compute in Vision-Language Models
https://arxiv.org/abs/2605.30713
>Beyond False Stability: High-Noise Drift Gating for Test-Time Adversarial Defenses in Vision-Language Models
https://arxiv.org/abs/2606.03730
>Density-Aware Translation of Spurious Correlations in Zero-Shot VLMs
https://arxiv.org/abs/2606.01710
>Splatshot: 3D Face Avatar Generation from a Single Unconstrained Photo
https://arxiv.org/abs/2606.01493
>Disentangling Visual and Factual Correctness in LVLMs' Visualization Literacy
https://arxiv.org/abs/2606.03142
>Chroma Clues: Leveraging Color Statistics to Detect Synthetic Images
https://arxiv.org/abs/2606.02224
>Beyond Encoder Accumulation: Measuring Encoder Roles in Multi-Encoder VLMs
https://arxiv.org/abs/2606.03879
>P2-DPO: Grounding Hallucination in Perceptual Processing via Calibration Direct Preference Optimization
https://arxiv.org/abs/2606.03376
>Visual Persuasion: What Influences Decisions of Vision-Language Models?
https://arxiv.org/abs/2602.15278
>>
>>
File: anima edit test.png (1.7 MB)
>>108997048
Ok you don't need the patch and you need a lora like this https://civitai.red/models/2652469/anima-edit-experimental?modelVersio nId=2978373 or this https://civitai.red/models/2650553/ anima-edit-nude-filter-clothes-chan ge-more?modelVersionId=2976234
Unfortunately they seem like very underbaked proof of concepts or specialized for narrow tasks like changing clothing.
I wonder how many image pairs you would need until the model obtains general purpose edit capability, as in applying t2i concepts it knows properly to i2i even if it never saw the precise task during training? Would you need a proper finetune or can that still be done as a lora?
>>
>>108996994
Krea 2 is not a local model, and anyone suggesting that Ideogram 4 ISN'T the most censored local model of all time by a ridiculously massive margin should actually rope ASAP. It's like six gorillion timess worse than anything else ever was, period, end of story. Only a shill would claim otherwise.
>>
>>
>>108997072
Lyrics following is just perfect for those two gens btw.
Miku (DECO 27) generalizes well too
https://vocaroo.com/152fHPEB9In7
https://vocaroo.com/1iYfkcT7xV0N
>>108997078
Actually I was thinking of dropping these LoRAs, but given nobody has released anything for ACEStep XL, releasing them might get music mafia after me, because they're insane. Training LoRAs is very easy, at least on Modal with a rented H100 it's pretty quick.
>>
>>
>>
>>108997251
There's an extreme astroturfing campaign on Reddit right now for it.
It makes the Qwen shilling look organic.
>>108996994
All of their models are fully uncensored. On the level of pretty much every popular local finetune. Had some stuff slip through the cracks unprompted.
I would be very surprised if they released even one of those versions locally without safety alignment.
We'll see I guess.
>>
>>
>>
>>
>>
>>
>>
>>108997315
Ok here my yap out:
>you aren't allowed to train LoRAs or finetune it for NSFW
IANAL obviously but technically the license only says:
>However, we may also implement certain safety measures, content protections and other technological measures for the Model, including content filters and watermarking, and you agree that you will not circumvent, remove, alter, deactivate, degrade or thwart any such measures.
Without explicitly forbidding nudity anywhere in the license. Like I wonder if you can argue that you are just finetuning the model for boobies and the like (perfectly legal and you never agreed not to do that anywhere) and any change in the way the filter functions was a side effect. Like the model can already generate naked people with the correct json prompt. It just generates weird flesh sludge for the nether regions because it never saw enough cunts and penises during training. If these requests aren't flagged by the filter already with json prompting, and I am just making them look less shit, how am I circumventing the filter?
And no json prompting isn't circumventing since it's an explicit feature they trained the model for.
>>
>>
>>108997315
people will keep using Z family and or Klein is more like it. Given Ideogram 4 isn't even actually better than either of those in any meaningful way even putting the laughable inbuilt safety filters aside.
>>
>>
>>108997386
The license also says that any usage of the model or derivative models has to abide by their referenced usage policy, and that usage policy prohibits you from generating anything lewd or pornographic. Obviously they can't stop people from doing what they want on their own devices but they can absolutely tell CivitAI or Huggingface to stop hosting finetunes or LoRAs that allow people to violate that part of the license, and that's an issue that at least Civit has folded on before.
>>
File: Ernie-Image_00073_.png (1.7 MB)
>>108997072
>>108997284
>>108997082
For me, it's my Fate Gear LoRA. Pure kino, able to do so much complex stuff with its instruments.
https://vocaroo.com/1iHt3NwfVDPi
https://vocaroo.com/12ERUa8VqKVB
>>108997083
>can i plug instrumentals into this model and generate pure vocals from it?
Model should be able to handle instrumentals and vocals separately, depending on what you mean by that. For pure vocals (acapellas) you need a LoRA as far as I know.
>>108997303
These are just model weights though. Maybe community fear is overblown.
>>
>>108997395
if you peruse the /r/stablediffusion subreddit right now, you'd think ideogram is the second coming. A giant leap forward. But with all the shilling, I'm yet to see single a gen that looks like anything approximating a real photo.
I don't know what to call the style, hyper-realistic? Like a lifelike drawing combined with cgi? But certainly nothing real looking. Is ideogram training on AI data or something?
>>
>>
>>108997444
I think it looks fine but in a way that is totally equivalent to other models that exist. Nobody I've seen has even tried to explain how it's actually BETTER than other models. Like can it even do editing? I've seen zero examples of that.
>>
>>108997442
>Model should be able to handle instrumentals and vocals separately, depending on what you mean by that.
i want to compose an instrumental and then use the model as the singer so i can have all of my stems separate for mixing
>>
>>108997315
> you aren't allowed to train LoRAs or finetune it for NSFW
wait, I thought the point of open weights was that they could be finetuned to do anything? Localkeks are getting excited over censored garbage they can't even teach to do porn?
>>
>>
>>108997472
I remember a time when they released Llama 1 for researchers only and nobody gave a fuck and used it to goon shared the weights anyway.
Now everyone is kveching and clutching pearls over a license that cannot feasibly touch them. I don't know why this place is fully of weeping vaginas now.
>>
>>
>>108997444
It's funny how that subreddit was up in arms over SD2 and SD3's censorship and now it's exclusively populated by retarded thirdies who salivate over any new model regardless of how censored it is or how dogshit the results look.
>>
>>
>>108997174
>>108997176
thanks!
>>
>>
>>
>>108997174
> https://github.com/Roblox/cube
> Acknowledgments
> We thank the leadership, Nishchaie Khanna, Karun Channa, Anupam Singh, and David Baszucki, for their support and guidance throughout this work.
lol
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108997631
Lodestone has unsurprisingly fucked up Zeta Chroma and has shifted to working on some 2.5B pixel space model instead while Zeta and Radiance train in the background in the hope that they'll be usable by July.
>>
https://huggingface.co/circlestone-labs/Anima/discussions/174#6a1ef472 9f9c1460465d145f
>TensorArt's commercial license is permissive, and they can choose to use the model and charge for it however they want. They pay only a per-image fee, nothing else is restricted or costs anything. They can allow whatever creator monetization programs they want. The license doesn't require the model to be gated, paywalled, or anything else. It doesn't charge for model training either.
>I see a great number of people calling be greedy and unreasonable. I think the license fees I'm charging for Anima are very reasonable, and much lower than you would get from almost any other comparable model, based on the information I can find. I'm trying to build a sustainable business, and if I allow large platforms to use Anima for free, I will just go out of business and never train another model again. If you believe I'm being greedy and unreasonable, then please explain specifically what you think I should do differently.
>>
>>
>>
>>
>>
>>108997619
HOLY FUCK
PLEASE PLEASE PLEASE PLEASE end up being good.
This guy went radio silent for months ago. I guess this is what he ended up working on.
Not trying to jinx it but in theory this has the ingredients to succeed, a decent quality base model, large TE, best vae, someone who made large scale finetunes in the past without going too schizo about them. But obviously still lots of things can go wrong easily, I wish him the luck it needs to succeed.
This guy is also training on his own pocket I believe, so non-commercial restrictions shouldn't be a problem.
>>
>>108997631
He's been wasting 6-8 months on chasing pixel space which he just can't manage to converge with, radiance is dead no more training, zeta is practically dead
Eventually he will give up on pixel space, but he will have wasted SO much time that it's just over at this point.
>>
>>108997639
>Lodestone has unsurprisingly fucked up Zeta Chroma and has shifted to working on some 2.5B pixel space
I'd say I told you so to the anons I told them this would happen but I'm sure they've all roped by now.
>>
>>108997651
Was it only for the original BFL license and Klein license isn't as Draconian when it comes to arbitrary termination?
I might be misremembering but I believe something like this should be the case.
>>
>>
>>
>>
>>108997670
No, it's entirely arbitrary, if they interpret it as you circumventing their safety protections, then can just yank your right to distribute it, meaning sites like CivitAI etc will have to take it down
>>
>>
>>108997680
>Retards who think you can just ignore licenses are more annoying
I've trained dozens of LoRAs for models that told me I can't. You've never seen them. Nobody ever will. It doesn't affect me at all. I hate license faggots because they all carry the implication that everything will be served to them on a silver platter if the license allows it.
It's just outrage at not being spoon fed wrapped in the guise of pretending to care about the law.
Utter faggotry.
>>
File: zeta loss.png (117.0 KB)
>>108997639
He ditched dino stuff from Zeta, do you know if that's the reason it's training much faster than before, or if he pruned dataset or if he also did something else?
Not that I expect anything with this loss curve to turn out good. So that's why I hope BigAsp guy succeeds.
While we are at it do you know if anything else changed with Radiance?
And lastly, what's this 2.5b model? I can't think of anything public that fits the bill.
>>
>>
>>
>>
>>108997716
civitai are supreme faggots but other sites? Lol I don't think they're monitoring jack shit. As long as you're not posting it with nsfw previews. Anyway, if a model is really good and does all the nsfw stuff, loras required would be few and far between.
>>
>>108997706
>I've trained dozens of LoRAs for models that told me I can't.
That's not the issue, what you do on YOUR computer can't be controlled, but you can't build an ecosystem around a model if you can't share loras / finetunes for it, which is the case here should BFL yank the license.
Why even take the risk, he could have gone with Z-Image Base or Flux 2 Klein 4b.
>>
>>108997710
I haven't been paying super close attention to what he's been doing with Zeta, just that he changed some training stuff and was giving it until the end of July. I remember reading on his Discord that he fucked with the batch size so that might be why it's training faster. He started Radiance up again for some reason (I don't know the exact reason) around the time that people started noticing that Zeta wasn't getting much better.
This is apparently the 2.5B model https://huggingface.co/lodestones/debug-flow
>>
>>
>>108997727
When was the list time we got a large scale fine tune worth a shit from anyone regardless of license? Nobody gives a fuck about lode of shit stones money furnace.
As long as I have a training script and a GPU I don't care what other think about the license or their shitty ecosystem. Again. Expecting to be served wrapped in concern for a license.
>>
>>
>>108997716
if the model is good enough, loras will emerge. it really is that simple. there has yet to be an actual good model that failed to take off. i said the same thing back when chroma amounted to 3 loras per week and chromakeks pissed and shat themselves but look at it now, everyone moved on because it sucked.
if there truly was a generational leap in nsfw models people would find a way to build a community around it. remember that civitai only got popular because people wanted to easily browse and share loras for the nai leak
>>
>>
>>
>>
>>
>>
>>108997731
They don't have to, they just contact sites like Civitai and tell them that model x is violating their license, and it will be removed
Even a mega-autist like lodestone has been very clear that he would never do a large finetune on anything not permissively licensed
>>
>>
>>
>>
>>108997471
The closest that I know of is the Cover NoFSQ feature.
I can think of a workflow that might work. Haven't tried pure instrumentals, will that and let you know. But here's something that might work. You generate with your instruments, then you lower the cover strength, then you should have lyrics that are aligned over similar instrumentals. Then you place the vocals on the original instruments.
Here's the cover NoFSQ feature used on Black No.1 at 0.3 strength, lyrics are aligned though the instruments are off so this should theoretically work.
https://vocaroo.com/12EgDtyMeM0z
>>108997619
Was his dataset really that good or is everyone nostalgiafagging? Last I checked a bigASP model was SDXL days, and since then he's had plenty of opportunities to shine (Chroma.1 HD tune, Lumina, etc...) And he never delivered.
>>
File: radiance.jpg (84.8 KB)
>>108997666 >>108997678
its working quite ok, not sure what you're complaining about.
sam altman didn't give him 1.5 trillion USD, and on consumer hardware training takes quite a lot of time. unsurprisingly. including attempts that didn't work he still basically achieved yet another model.
>>
>>
>>
>>108997878
i don't understand. did you get an output that was just the vocals and nothing else? i don't want the model to apply the vocals to the song, i need the separate vocals so i can mix it in my own DAW and apply effects without affecting the instruments
>>
>>
>>
>>
>>
File: Ernie-Image_00076_.png (1.9 MB)
>>108997878
Model's meme potential is just insane.
>>108997897
Nah, this was the workflow ->
Input song, custom lyrics on ACEStep.cpp. Enable the cover-nofsq feature and use the instrumentals you want as a source (Src option checked).
Write caption, metadata, and lyrics for your generation. Cover strength should be anywhere from 0.5-0.01 (experiment for what works best for your particular song). I found 0.3 to be a sweet spot for this on merge Base Turbo model.
The model makes a cover with similar instruments and aligned lyrics (in this case it has lyrics, but yours won't so it will just place vocals described in the caption over the song that doesn't have them). That's what that vocaroo is.
Then theoretically you'd want to use something like a vocalremover API or model to isolate the vocals from the generation and mix them that way.
>>
>>
>>
>>
>>
File: zeta.png (968.8 KB)
>>108997710
>Not that I expect anything with this loss curve to turn out good.
it's not a single training with unchanged settings or on the low part of the lr trajectory from high to low lr so this doesn't mean so much
in the end it's mostly just a question if the non-dino distance training now works better. zeta is very rough still.
>>
>>
>>
>>108997965
Worth noting, if you have just the vocals of a song, there's a complete feature that adds instruments to it https://github.com/ace-step/ACE-Step-1.5/blob/main/docs/en/ace_step_mu sicians_guide.md
No clue if that works for instruments, worth a try for your usecase, and if that doesn't work then cover-nofsq feature it is.
>>
>>
>>108997965
eh, a vocal isolation model wouldn't sound as good as the original. i guess i have to wait for someone to make a model that does it since i already have all of my drums and other stuff as separate tracks, so i want the vocals to be the same for full control
>>
>>
>>
what's the deal with ideogram??? it seems to have the most censorship of any local model, yet bypassing it apparently unlocks some of the craziest prompt comprehension available? so it's like early dall-e 3? is it possible this gets any finetuning attention or is it just another point-and-look like flux [dev]?
>>
>>
>>108998059
>it seems to have the most censorship of any local model, yet bypassing it apparently unlocks some of the craziest prompt comprehension available? so it's like early dall-e 3?
Got examples of that? Isn't this model slopped to hell and back btw.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108998073
i genuinely don't know, i'm just reading around. all i've seen isjust generic shit i would expect from nano banana. some people are saying the model is really good if you bypass the filter using a long json prompt, and others are saying it's censored slop unusable for nsfw and unable to be finetuned. i'm curious if anyone actually has any interesting nsfw outputs or if it's all just a shill brigade making shit up
>>
>>
>>
>>
>>
>>
>>
>>
>>
>create a comic in Ideogram
>copy the JSON format
>paste it in an LLM and ask it to continue the comic's story along with the official prompting instructions
>spits out a new JSON prompt that continues the story
It's not perfect but it's like a visual chatbot instead of just text. Schizos can seethe all they want but I haven't had this much fun with image models in a while.
>>
>>
>>
>>
>>
>>
>>
>>108998152
>ideogram too has a lot of character knowledge
Does it? I didn't try too much but couldn't get it to gen any other Vocaloid girl besides Miku properly.
It seems to know "a bit" about wide variety of characters, but the amount of characters it can recognizably gen without major errors isn't very high.
Oh and don't get me wrong, it is still more than local base model releases nowadays, most are completely safetycucked out of captioning any during training. But still it knows a tiny fraction of what booru models know.
>>
>>
File: radiance.png (2.3 MB)
>>108997948
sure.
i think it's not as interesting for 1girl, realistic atm (qwen/z-image and so on are better)
>>
File: radiance.jpg (108.1 KB)
>>108997948
>>
File: ComfyUI_00044_.jpg (1.2 MB)
>>108998197
That's not a GUI, anon.
>>
File: radiance.jpg (119.4 KB)
>>108997948
beach with a fortress on a rock in the ocean
>>
>>108998253
>>108998257
thanks. It leaves a lot to be desired
>>
>>
>>
>>
>>
>>
File: Screenshot 2026-06-07 115059.png (2.8 MB)
Oof.
>>108998277
Yeah I know it will translate your regional prompting, but I want it visually, it's so much faster and easier.
>>108998281
I have a feeling that id4 is good at taking directly from its trained images, so it injects whatever else there was in the image if you prompt is tiny. So this was probably a promotional wallpaper for some anime shit.
>>
>>108998270
You only need "Load Diffusion Model INT8 (W8A8)" node technically. You should enable on the fly quantization and convrot toggles, set the appropriate model type and then you can load any bf16 checkpoint. You can use save int8 node to make your own checkpoints so that you don't need to wait through quantization in the future.
Lora situation is a bit messy. Dynamic and preloading have the highest quality. Dynamic has 10% speed hit and pre-load means you need to sit through quantization again every time you change loras.
Stochastic and None still have decent quality, when they work. Some loras will work with both. Some prefer one of the other. Stochastic is the safer choice overall in my experience.
Here, I re-run my Brazil Miku gen from the last thread with int8 >>108996329. Just incase there is confusion you obviously don't need CFG stuff:
https://litter.catbox.moe/027m1d9fs3aswy79.png
>>
>>
>>
File: Flux2-Klein_01110_.png (1.5 MB)
>>108997619
So I tried it and yeah it needs a lot of work.
I wish him the best.
>>
>>108998329
Vagina is technically the term for the internal portion so that image sort of makes sense, probably has a bit of medical internal camera shit in its data somewhere. Try labia or vulva or cleft of venus or something.
>>
>>
>>
>>
>>
>>
>>
File: radiance.jpg (124.6 KB)
>>108998268
no problem
>>
>>
>>
File: ComfyUI_00058_.png (3.5 MB)
I wonder if there's a way to reduce how much the prompt takes from the training data. Like this is just straight up like 80% of the original image I bet.
>>108998336
Yeah all the results, even for anus, looked like the peephole for intestinal surgery.
With more detail it was just something resembling a crotch but still body horror.
>>
File: ComfyUI_00063_.png (1.4 MB)
>>108998414
It's time to go full abstract, my forte.
>>
File: ComfyUI_00083_.png (1.5 MB)
>>108998426
Damn, wasn't as fun as with normal models.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: 1759495099466894.png (1.9 MB)
>>108998577
anima shills moved onto shilling ideogram.
there are discord servers where you can put up bounties for jeets to promote your models
>>
>>
It's interesting to me. I sometimes go on those isthisAI subreddits (I know I know) and I notice how a lot of the gens here are generally more realistic than the very obvious saas slop you see there. I wonder how much of this just escapes normie filters entirely.
>>
>>
>>
>>
>>
>>
File: 1748351684074420.png (217.4 KB)
>>108996994
but why are they censored, since the people of these models do not have genitals?
>>
>>
>>
>>
>Yes, Ideogram (Welcome to Ideogram) lacks native support for complex, non-Latin character sets with unusual diacritics Text and Typography - Ideogram. Generating Polytonic Greek (which includes breathings like psili and dasia, as well as multiple accents) and specifically combining these with macrons will likely result in jumbled, hallucinated, or completely incorrect letters Text and Typography - Ideogram.
Saved myself a download. It doesn't do anything useful to me.
>>
>>108998833
rocm support has gotten better this year.
However, nobody at all has rdna2 support for trellis. One guy, last month, managed to get a fork of Microsoft's trellis code to work with rdna3.
rdna4 apparently works fine.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108998905
Deleted it
https://civitai.red/models/2088956/famegrid-2nd-gen-z-image-qwen?model VersionId=2604982
>>
>>
>>108998890
I've basically switched. But, gguf aren't working with at least my rdna2 card. not sure what's up:
https://github.com/leejet/stable-diffusion.cpp/issues/1488
I don't have a github account *shrug*.
I remember trying to get into discussing tech topics and getting confused by all the weird down voting and the massive bad attitude, lots of places, but a big one was Stack Exchange. Indians really are totally incompatible with us in every sense possible, socially.
>>
>>
>>
>>108998829
correction, anima is half as fast as msft lens (lens is trash, but idk maybe lens2 will be good?)
>>108998889
ok I think it's because I had cfg=1.
>>
>>
>>
File: output_1780835577.png (1.3 MB)
cute girls will say hi to you. go to church.
>>108998925
It was good at the start, but the ai "moot chinned" her at the end. real women just rarely have massive boobs or massive chins, but ai thinks moar boob=moar indiangood and moar chin =moar indiangood
>>108999015
It is, I have totally switched. Use Obsidian.
you have things to paste between ``` marks like```
cd [I typed the path to the bin folder here]
HSA_OVERRIDE_GFX_VERSION=10.3.0 HIP_VISIBLE_DEVICES=0 ./sd-cli --diffusion-model ~/ComfyUI/models/diffusion_models/anima-base-v1.0.safetensors --vae ~/ComfyUI/models/vae/qwen_image_vae .safetensors --llm ~/ComfyUI/models/text_encoders/qwen _3_06b_base.safetensors -p "1girl, laughing, yellow socks, green dress, pews. The girl is standing with her legs spread out on top of a church. @kanosawa" -n "worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, sepia" --cfg-scale 6.0 -v -W 832 -H 1216 -s -1 --offload-to-cpu --steps 8 -o "output_$(date +%s).png" --sigmas "1.0000, 0.9982, 0.9962, 0.9939, 0.9913, 0.9883, 0.9848, 0.9807, 0.9759, 0.9700, 0.9628, 0.9538, 0.9421, 0.9266, 0.9051, 0.8744, 0.8290, 0.7627, 0.6769, 0.5911, 0.5247, 0.4793, 0.4486, 0.4272, 0.4117, 0.4000, 0.3968, 0.3931, 0.3890, 0.3842, 0.3788, 0.3724, 0.3650, 0.3561, 0.3456, 0.3328, 0.3172, 0.2982, 0.2752, 0.2478, 0.2163, 0.1825, 0.1486, 0.1172, 0.0898, 0.0667, 0.0477, 0.0322, 0.0194, 0.0088" --sampling-method heun --preview proj --preview-path ./preview.png
```
That's just Tan2 sigmas pulled from comfyui. You can steal any sigmas using the preview as text node thing.
it works fine with Powershell, you just have use backslashes idk stuff like that.
>>108999039
you did it. You found the use case for idiogram: backrooms and Mall World dream simulation.
btw, have you ever been to Mall World in your dreams? It's like a mall, only where idk it stretches forever, and the geometry of it is goofy. Sometimes stores and things that are almost never in malls showup
>>
>>
>>
>>
>>
>>
>>
File: output_1780836795.png (1.6 MB)
>>108999054
>>
using anima and the prompt scheduler from asagi4, can i specify that i want something like [dog:cat:4]? so i want cat to replace dog after 4 steps? instead of using 0.1 or whatever? there's an advanced node on comfyui that has the number of steps parameter, would i have to repeat the prompt multiple times and then specify only the differing part on each one, in order to assign the number of steps i want? e.g. <full text> + dog with 4 steps, then <full text> + cat on another?
>>
File: file.png (123.2 KB)
>>108999094
i only made a little bit of money from paypigs but thats about it.
apparently you get banned if you don't label your account as ai. wtf is this shit?
IM MAKING CONTENT ON YOUR SITE NIGGER WHO GIVES A FUCK IF ITS "REAL" OR NOT!!
>>
>>
File: output_1780838771.png (1.6 MB)
>>108999167
>>
>>
>>
>>
>>
>>
>>
>>
>>108997619
If he is now training on non-commercial models, why not at least use Anima? That way the model is starting with full booru tag knowledge plus it's probably 3x faster (and therefore cheaper) to train. Anima's realism is already almost good, a few million image finetune easily converts it to a competent realism model.
>>
>>
>>108999309
because imagination takes active focus, no matter how small, meaning you are not exploring something but creating every aspect of it, same as why you cant enjoy a story in the same way when you are just reading it for the first time vs when you were the one to have to come up with everything in it with nothing to surprise you coming up
and imagination doesnt have the rng engine of the ai which creates interesting things that you wouldn't have thought of easily or wouldn't have known
>>
File: 1749810492186288.png (15.8 KB)
>>108999382
>>
>>
>>108999397
what i said has nothing to do with what one can and cant imagine because of his ability, unless you give yourself brain damage after writing a story, you will never be surprised by a big twist in that story when reading it, retard
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: Untitled.png (254.8 KB)
>>108999453
GROKKED
>>
>>
>>
>>108999654
open-weights != Non commercial. it's an open commercial model because there isn't a restriction to monetizing the model, it's just that you need to pay. commercial is not impossible with this licence ergo it's non commercial. grok being a sycophant and not realizing this is also hilarious. wtf is Elon even doing anymore? judging by the ipo, he's trying to gtfo
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: 1773542612112793.png (189.7 KB)
>>109000000
>>
>>
>>
>>
>>108999194
What are your options now? Where will you find paypigs?
>>108999039
Wow, mysterious.
>>
>>
File: 1780828711790032.jpg (78.5 KB)
>>109000000
tomorrow
>>
>>
>>
>>
>>
File: ComfyUI_00744_.png (891.0 KB)
>>109000120
here
>>
File: 151264163087903.png (1.6 MB)
>>109000010
>>109000014
Luis Royo lora with (@murata range:0.6)
>>
>>108997619
good to see he hasnt abandoned local training. i hope he recaptioned his data set and didnt just literally reuse the joycaption data though, its become a bit dated
>>108997651
not just that ive heard kleins arch isnt as good as people initially thought itd be. i wonder why he still picked it over z-image, i believe in december when turbo came out he said somewhere that he's considering it for bigasp3
>>108999378
literal skill issue, just dont butcher the LR and regularize
>>108999369
klein has the superior arch (in theory, ive heard people have grown somewhat sour on it), same for z-image
>>108997878
>Was his dataset really that good or is everyone nostalgiafagging?
he constantly delivered AND documented what he did quite detailed. could end up being nostaligiafagging but so far none of his releases have been outright bad. though he also tried to train a bigasp 2.6 that he abandoned since it ended up being shit. beyond that hes also the same guy who trained joycaption
>since then he's had plenty of opportunities to shine
i think chroma came out around the time when he was playing around with SDXL rectified flow models.
>>
File: 1760075868571105.png (1.7 MB)
>>109000086
>>
File: murata_range_comfyui_00001_.png (863.6 KB)
>>109000154
>murata range
thanks
>>109000171
Klein?
>>
>>108998257
>>108998262
you can literally get this quality out of ootb anima. like im not even joking, this is about the same quality as "photo (medium, cosplay photo" as tags give you + "a photograph of..." nl prompt
>>
>>
>>
>>
>>
>>
>>
>>109000442
No there's actually a point where the image gets worse, with the composition I mean.
I guess it depends on the sampler, but for the samplers that change into a new composition every X amount of steps they do get worse (more boring or whatever).
It also depends on the model obviously, some huge models want 50 steps to work properly.
>>
>>
>>
>>
is optimal steps actually just dependent on resolution?
like maybe the reason high resolution gives you mistakes and boring compositions is simply because you're using the same old 30 steps when you should be using higher?
>>
>>
>>109000600
>>109000593
Big dogs
>>
>>
>>109000618
it is from my experience, though ive seen other anons reflect on this. the higher you go the more steps you want. of course theres other factors like sampler etc but this seems a pretty clear correlation
some anons here talking about 150 steps unironically, but ive had gens at 4MP with anima that actually benefitted from shit in that range, though obviously not with res samplers
honestly tdrussell should add this to the readme on the HF too
>>
>>108997154
>>108997352
Is this trellis? is the retopo a new feature?
>>
File: 1776775319598367.png (1.9 MB)
>>109000183
yeah
>>
>>
>>109000754
that makes sense. youll usually see lower recommendations because it saves time and from a certain point onwards you simply have diminishing returns from adding even more steps, as in youre close enough to convergence anyway.
though it also seems like higher resolution -> convergence takes longer. since anima supports more than just 1024, that matters
30 steps is close enough to convergence at 1024, at 1536 you want to aim higher and at 2048 even higher than that from my experience
might post some examples later
>>
>>
>>109000830
This is slow gpu cope
>>109000812
This is true that higher steps help above stock resolution with the first pass only. You should target as high as the model will allow you on the first pass
>>
>>
>>
>>
>>
File: 173512CUI_00002_.png (2.1 MB)
>>109000154
Kino
>>
>>
>>
>>
>>
>>
>>109000196
sure? both are not the best realistic trainings so far, they're more training to address the gap for 2d/2.5d/3d artwork including questionable/nsfw that most of the commercial base models suck at
>>
>>
>>
>>
>>
>>
File: ComfyUI_temp_lxupo_00001_.png (1.5 MB)
>>108999339
>>
>>
>>
>>
>>109001048
i mean try it out. i dont mean to shit on radiance but this isnt really impressive so far, especially when an anime model can replicate it more or less unintentionally
>>108998253 this looked definitely better though
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>109000843
more steps = you give more time for the sampler to converge
how long the sampler needs to converge, or whether it converges at all, depends on the sampler and scheduler, but with something like euler you wont really see meaningful changes at 1mp (1024x1024px) beyond 30-50 steps
if you increase to 2mp or even 4mp, which anima can do, this changes though. youll usually see details still being messy and more steps still cleaning up shit, which feels intuitive
from my own experience, 2mp -> steps up to around 70 still yield meaningful progress, at 4mp around 90
and yes this shit is slow as fuck but depending on what youre going for its worth it
>>
>>
>>
>>
>>
>>109001181
>I dont think they will train on dicks though.
>I guess this is stuff for checkpoints isnt it?
It shouldn't be needed to be trained on that specifically, there should be enough of it in the data for it to learn, since it didn't learn then its either extremely unbalanced or undertrained, I'm pretty sure it's just undertrained and not unbalanced since it recognizes plenty of nsfw stuff pretty well, requiring a nsfw finetune just to make dicks better as opposed to just do some aesthetic tuning/novel concept learning feels... wrong, but then again what do I know about image model pre-training
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>109001542
https://civitai.red/models/2268008/realistic-snapshot-z-image-turbo?mo delVersionId=2617751
>>
>>
yeah looks like I was wrong about latent upscales being necessary for Anima.
I did some comparisons and you can use a high base resolution as long as you have a much higher step count. my bad, sorry to that anon the other day.
>>
>>
Wondering if https://github.com/BuffaloBuffaloBuffaloBuffalo/ai-toolkit-perceptual works with other model archs like ACEStep. Side-Step is currently the best way to train a LoRA but this might be worth a look at.
>>
>>
File: gob wife.jpg (439.6 KB)
>>109001262
>>109001556
this loras great, using it at 0.5 strength to smooth things out
>>
>>
>>
>>
>>109001307
The plan was to off himself and then do the cat. He squatted on the floor holding the cat in his lap with one hand. The gun was in the other.
His hand was quivering as he pressed the gun to his temple. He started screaming.
"Wait a minute!" his internal monologue was screaming at him now. "The cat! If you shoot yourself first who's gonna shoot the---"BLAM!
The thought was cut short by a sudden ringing in his ears followed by feeling of disorientation. After a short while the ringing died down followed by a silence so deep and engulfing it was unlike anything he'd ever experience.
He was straining now to get control of his thoughts, trying to get back to his train of thought.
"The cat!" He heard his own thought echoing in his ears. "You need to take him with you so he doesn't die of hunger."
"Oh don't worry about me." A voice suddenly echoed in his mind and seemed so intrusive and so all encompassing as if it was a tank of water he was submerged in.
Perception suddenly rushed back to him and he saw the cat standing there on the tile floor. And the cat was smiling at him.
"You know," the voice came again, "if they should take a while to find me I can always eat..." The cat was now inclining his head toward the crimson pool widening on the floor, the movement fluid and terribly deliberate.
>>
"…you," the voice finished. It did not sound like a meow translated into words. It sounded like an ancient mechanism clicking into place, heavy with the weight of centuries.He tried to blink, but his eyelids felt like lead weights. The silence of the room was no longer empty; it had become an active pressure, pressing against his skin like deep ocean water. Every small detail of the kitchen tile seemed magnified. He looked at the cat—really looked at him—and a cold dread crawled up his spine.This wasn't his goofy tabby who used to chase laser pointers. The way the animal stood, its spine perfectly aligned, its ears perked at an angle of absolute authority, radiated a supreme, terrifying intelligence. The eyes were the worst part. The pupils weren't slits anymore; they were deep, cosmic wells that seemed to understand the exact geometry of the universe, looking down at him with the detached pity of a scientist observing an insect. It was evident now, with a clarity that shattered his remaining sanity, that this creature had always been this way. The purring, the begging for food, the clumsy play—it had all been a mask. A centuries-long masquerade."No," his mind stammered, the thought bouncing weakly against the walls of his fracturing consciousness. "No, this is a hallucination. The bullet. It's just a firing sequence in the temporal lobe. Synapses discharging. Lack of oxygen."The denial felt like a flimsy shield against a tidal wave."A comforting theory," the cat’s voice vibrated directly through his skull, dripping with amusement. The feline took a slow, elegant step forward, its paws making absolutely no sound on the blood-slicked tile. "But you’ve always been so blind to the obvious, Michael."
>>
>>
>>
>>108997619
wtf is this guy's malfunction? Current latest anima realistic works out the box for nsfw, its almost perfect. And is fast as fuck as in instant prompt changing and it understands natural English + tags perfectly. Yeah I downloaded the 18GB model to try it, it can do a blowjob whoopee do, anal produces body horror no shock there considering they went out of their way to break NSFW prompts within the training of klein 9b. GUESS WHAT IDIOT THAT'S WHY IT WILL NEVER WORK... You will to be limited to text book prompting with no variation.
The moment you change that prompt to include some of the naughty words it will sabotage the image, either hand in the way, clothes put onto woman, body orientated the wrong way etc. Its in the damn safety rails read their blog on how they fucked it before they allowed us to have it. You'd have to strip out everything they put in it.
Besides fuck that POS model, its not even good ffs its prompting is a nightmare, like it don't even understand certain angles unless you write a fucking story just to get a rear view of a woman on a bed. And that will include 3 legs or 2 heads...