Thread #108604726
File: highlights_g_108597963_1776186337_1.jpg (2.8 MB)
2.8 MB JPG
Discussion and Development of Local Image and Video Models
Previous: >>108597963
https://rentry.org/ldg-lazy-getting-started-guide
>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP
>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows
>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/tdrussell/diffusion-pipe
>Z
https://huggingface.co/Tongyi-MAI/Z-Image
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo
>Anima
https://huggingface.co/circlestone-labs/Anima
https://tagexplorer.github.io/
>Qwen
https://huggingface.co/collections/Qwen/qwen-image
>Klein
https://huggingface.co/collections/black-forest-labs/flux2
>LTX-2
https://huggingface.co/Lightricks/LTX-2
>Wan
https://github.com/Wan-Video/Wan2.2
>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46
>Illustrious
https://rentry.org/comfyui_guide_1girl
>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Collage: https://rentry.org/ldgcollage
>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg
>Local Text
>>>/g/lmg
>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
305 RepliesView Thread
>>
>mfw Resource news
04/14/2026
>ERNIE-Image: Text-to-image generation model built on a single-stream Diffusion Transformer
https://huggingface.co/baidu/ERNIE-Image
>Danbooru Dataset Filter: High-Speed Metadata Explorer for AI Training
https://github.com/ThetaCursed/Danbooru-Dataset-Filter
>ChatGPT will praise the mood and 'bedroom/DIY texture' of fart sounds pulled from YouTube
https://www.pcgamer.com/software/ai/chatgpt-will-praise-the-mood-and-b edroom-diy-texture-of-fart-sounds-p ulled-from-youtube
>RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details
https://limuloo.github.io/RefineAnything
>Long-Horizon Streaming Video Generation via Hybrid Attention with Decoupled Distillation
https://github.com/leeruibin/hybrid-forcing
>Energy-oriented Diffusion Bridge for Image Restoration with Foundational Diffusion Models
https://jinnh.github.io/E-Bridge
>FashionMV: Product-Level Composed Image Retrieval with Multi-View Fashion Data
https://github.com/yuandaxia2001/FashionMV
>Degradation-Aware and Structure-Preserving Diffusion for Real-World Image Super-Resolution
https://github.com/jiyang0315/DASP-SR.git
04/13/2026
>LTX 2.3 Distilled v1.1
https://huggingface.co/Lightricks/LTX-2.3/blob/main/ltx-2.3-22b-distil led-1.1.safetensors
>UniCom: Unified Multimodal Modeling via Compressed Continuous Semantic Representations
https://huggingface.co/tencent/Unicom-Unified-Multimodal-Modeling-via- Compressed-Continuous-Semantic-Repr esentations
>CatalogStitch: Dimension-Aware and Occlusion-Preserving Object Compositing for Catalog Image Generation
https://catalogstitch.github.io
>Realizing Immersive Volumetric Video: A Multimodal Framework for 6-DoF VR Engagement
https://github.com/Metaverse-AI-Lab-THU/ImViD
>Seeing is Believing: Robust Vision-Guided Cross-Modal Prompt Learning under Label Noise
https://github.com/gezbww/Vis_Prompt
>MixFlow: Mixed Source Distributions Improve Rectified Flows
https://github.com/NazirNayal8/MixFlow
>>
>mfw Research news
04/14/2026
>EditCrafter: Tuning-free High-Resolution Image Editing via Pretrained Diffusion Model
https://editcrafter.github.io
>VGA-Bench: A Unified Benchmark and Multi-Model Framework for Video Aesthetics and Generation Quality Evaluation
https://arxiv.org/abs/2604.10127
>FineEdit: Fine-Grained Image Edit with Bounding Box Guidance
https://arxiv.org/abs/2604.10954
>AIM-Bench: Benchmarking and Improving Affective Image Manipulation via Fine-Grained Hierarchical Control
https://arxiv.org/abs/2604.10454
>Continuous Adversarial Flow Models
https://arxiv.org/abs/2604.11521
>OmniScript: Towards Audio-Visual Script Generation for Long-Form Cinematic Video
https://arcomniscript.github.io
>Immune2V: Image Immunization Against Dual-Stream Image-to-Video Generation
https://arxiv.org/abs/2604.10837
>Differentiable Vector Quantization for Rate-Distortion Optimization of Generative Image Compression
https://arxiv.org/abs/2604.10546
>Rethinking the Diffusion Model from a Langevin Perspective
https://arxiv.org/abs/2604.10465
>Do Thought Streams Matter? Evaluating Reasoning in Gemini Vision-Language Models for Video Scene Understanding
https://arxiv.org/abs/2604.11177
>SVD-Prune: Training-Free Token Pruning For Efficient Vision-Language Models
https://arxiv.org/abs/2604.11530
>Revisiting Compositionality in Dual-Encoder Vision-Language Models: The Role of Inference
https://arxiv.org/abs/2604.11496
>LDEPrompt: Layer-importance guided Dual Expandable Prompt Pool for Pre-trained Model-based Class-Incremental Learning
https://arxiv.org/abs/2604.11091
>Agentic Video Generation: From Text to Executable Event Graphs via Tool-Constrained LLM Planning
https://arxiv.org/abs/2604.10383
>Omnimodal Dataset Distillation via High-order Proxy Alignment
https://arxiv.org/abs/2604.10666
>What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models
https://arxiv.org/abs/2601.06165
>>
File: 1768864323566751.png (1.7 MB)
1.7 MB PNG
ok ernie turbo is fucking garbage at prompt following
>>
>>108604751
forgot prompt
>A photorealistic candid photo of a woman with long, flowing hair that transitions from icy white at the roots to vibrant cyan-blue at the tips, cascading over her shoulders and partially obscuring her face as she looks downward. She wears a form-fitting, sleeveless top with a high neckline, primarily white with bold geometric yellow trim and a large, faceted blue diamond-shaped emblem centered on the chest. The garment has a structured, armored appearance with gold-brown segmented panels along the waist and hips, suggesting a fantasy or sci-fi outfit. Her right hand rests on a smooth, light-colored surface in the foreground, fingers slightly curled. The background is an out-of-focus twilight landscape under a deep indigo sky, with a soft gradient of magenta and purple along the horizon. A faint, glowing horizontal line runs across the lower portion of the frame, possibly a railing or edge of a platform. The lighting is directional, casting soft shadows and highlights on her hair and clothing, emphasizing texture and form with natural depth and contrast. No text, speech bubbles, or tears are visible.
>>
File: 1746834295472832.png (3.6 MB)
3.6 MB PNG
https://huggingface.co/baidu/ERNIE-Image
https://huggingface.co/baidu/ERNIE-Image-Turbo
https://yiyan.baidu.com/blog/posts/ernie-image
https://ernieimageprompt.com/
LOCAL IS SAVED!!
>>
File: 1768040078201688.png (1.6 MB)
1.6 MB PNG
>>108604754
wait nvm im gay, fucked up a setting
>>
>>
>>
File: Ernie.png (2.1 MB)
2.1 MB PNG
>>108604759
>no edit
that's a shame, imagine doing edit with such a monster of a model, the prompt following is on another level, can't believe it's using a simple 3b text encoder to get that shit, and fucking ministral of all things
>>
>>
>>108604759
https://github.com/Comfy-Org/workflow_templates/blob/main/templates/im age_ernie_image_turbo.json
https://huggingface.co/Comfy-Org/ERNIE-Image
>AttributeError: 'Ministral3_3B' object has no attribute 'generate'
thanks Comfy
>>
>>
>>
File: 1750850705420377.png (1.9 MB)
1.9 MB PNG
>>108604759
bruh, turbo has garbage anatomy, downloading the base model
>>
>>108604759
buy an ad
>>108604810
have you pulled?
>>
>>
File: Ernie-Image_00001_.png (1.3 MB)
1.3 MB PNG
The gen times for non-turbo on my 3060 is a bit slow, 2 and half minutes for 20 steps, probably needs more steps, but it's not unusually slow for a model of this size.
Let's see how it holds up further testing.
>>
>>108604817
>Can it do nude?
https://litter.catbox.moe/9z9qwbnxpflyqt27.jpg
>>
>>108604751
>>108604763
What did you fuck up so I can avoid it
>>
>>108604861
I see you tasted the base model, I hope it's the good one, I don't really like my tests on turbo so far
>>108604843
yes I'm on the latest version, seems like comfy hasn't implemented the prompt rewriting yet
https://github.com/Comfy-Org/ComfyUI/pull/13395
>Needs template before it works properly.
>>
>>
>>
>>108604817
>>108604862
https://litter.catbox.moe/tz2g5anklf3bmmmt.jpg
as expected, garbage genitals lol
>>108604879
the best one, flux 2's vae
>>
>>108604817
>>108604772
It hasn't been trained on boobs, it generates mediocre breasts. Though from my very limited testing it doesn't seem to be deliberately poisoned like Flux models are.
>>108604871
I just had a feeling that the distill will be problematic and went for the base immediately.
>>
>>
>>108604889
>I just had a feeling that the distill will be problematic and went for the base immediately.
good, was about time that we got a fully finetuned model that isn't distilled, no need for some NAG cope, we can directly use CFG, and we'll be able to train and make loras on it
>>108604893
turbo
>>
>>
>>
File: Ernie base.png (1.7 MB)
1.7 MB PNG
>>108604842
>downloading the base model
I really don't like the anatomy, like this is base at 50 steps, come on
>>
File: 1771543722896827.jpg (647.4 KB)
647.4 KB JPG
>>108604940
smells like more and more like a nothingburger, the realism quality is Klein tier, but ernie can't even edits to compensate, sad
>>
File: Ernie-Image_00006_.png (1.1 MB)
1.1 MB PNG
>>108604940
I am wondering if Comfy fucked something up, or did they do Chroma-tier cherry picking for the images?
>>108604922
FP32 is usually only used for training because the benefits to inference are almost non-existent.
>>
File: Ernie-Image_00007_.png (1.1 MB)
1.1 MB PNG
>>108604974
50 steps turned out better.
Seems also a bit wild when it comes to adding shit to the image. First time I have seen AI add a knife to 1girl, standing prompt unsolicited.
>>
File: 1769295316640072.jpg (923.6 KB)
923.6 KB JPG
>>108604959
>>
>>108604983
Oh I think image is so different due to the fact that Control after generate is bugged with the retarded subgraph Cumfy has shipped with the template. So it ran a whole new seed.
The point about knife stands though, same prompt.
>>
File: 1767451626519673.jpg (889.7 KB)
889.7 KB JPG
>>108604991
>>
File: o_00245_.png (390.7 KB)
390.7 KB PNG
>>
File: 1765165312128737.jpg (1.1 MB)
1.1 MB JPG
>>108605000
Ernie knows only one anime style: "Nano Banana Pro"
:]
>>
File: 1757245838942284.jpg (1.5 MB)
1.5 MB JPG
>>108605023
kek, I think I've seen enough
>>
File: 1771533625427241.jpg (1.7 MB)
1.7 MB JPG
>>108605045
maybe turbo at 16 steps is the best it can get
>>
>>
File: 1757491738029006.jpg (973.4 KB)
973.4 KB JPG
>>108605060
Z-image turbo be like:
https://youtu.be/WO23WBji_Z0?t=10
>>
File: Ernie-Image_00009_.png (1.2 MB)
1.2 MB PNG
One of the better gens I got.
Still has this Kleiny look to it.
>>
File: o_00247_.png (1.7 MB)
1.7 MB PNG
>>
>>
>>108605064
>3 feet
>>108605080
>3 hands
lol I think I won't downloading this
>>
File: LTX3 will beat Seedance 2.0!.png (237.4 KB)
237.4 KB PNG
it's all right, the jews will save us
https://xcancel.com/ltx_model/status/2044108750592643279#m
>>
File: Ernie Comparison.png (2.6 MB)
2.6 MB PNG
This model has been trained on 3 billion images of Nano Banana Pro kek.
>>
File: Ernie-Image_00010_.png (1.6 MB)
1.6 MB PNG
>>
>>108605126
>This model has been trained on 3 billion images of Nano Banana Pro kek.
Z-Image supremacy, yeaaah! We had Qwen Edit and then the Tongyi model/s, but all other Chinese t2i are all equally sloppy, GLM, this, whatever.
>>
File: Ernie-Image_00011_.png (1.5 MB)
1.5 MB PNG
I am kinda liking things about it despite it's faults.
But they probably either overcooked this thing or it needed a little bit of post training aesthetic alignment to temper schizo anatomy.
>>
File: o_00252_.png (770.7 KB)
770.7 KB PNG
>>
>>108604974
>I am wondering if Comfy fucked something up
I think the model is just not that good, in my tests it's inferior to Z-image turbo almost everywhere
It can be a great base model to train on though, but yeah, 8b is big, people prefer something smaller like 2b so that they can do Anima type of models or some shit
>>
File: 1485680357151.png (298.9 KB)
298.9 KB PNG
>>108605183
>8b is big
>>
>>
File: 1756553466182638.jpg (459.2 KB)
459.2 KB JPG
>>108605183
>it's inferior to Z-image turbo almost everywhere
the niggas thought that training a model only on Nano Banana Pro's images would do the trick, all we got is that Synth-ID watermark pattern everywhere lmao, once again, synthetic data BTFO
>>
>>108605115
oops, forgot to attach their paper
https://arxiv.org/abs/2604.11788
>>
>>108605183
I think there are issues with finetuning klein and ZIB for some reason.
If it responds to training well this look salvageable. Decent text encoder + best vae + good size balance between quality and being able to be run on most hardware + OK quality bar anatomy issues + mid instruction following but can be possibly ironed out.
I hope someone besides Kekstone takes a crack at it.
>>108605208
Can't we improve realism with finetuning/lora? I know training on slop sucks but banana pro is really high quality baseline.
>>
File: weird.png (97.8 KB)
97.8 KB PNG
>>108604759
>https://ernieimageprompt.com/
or else something is wrong with ComfyUi, or those baidu fucks are straight up lying to us, I'm not getting something even close to those images in that site
>>
File: 1746478579501469.jpg (37.8 KB)
37.8 KB JPG
>>108605236
Chinks lying? How can it be...
>>
File: jpeg artifacts.png (1.7 MB)
1.7 MB PNG
I love to complain about the jpeg artifacts on Z-image turbo, but for Erenie we arrived to a whole other level, jesus this is ugly af
>>
>>108605262
I don't think those are jpg artifacts, probably the watermark patterns of NBP >>108605126
>>
>>
File: 1764213941152989.jpg (1 MB)
1 MB JPG
turbo seems more slopped overall, and if there's one thing I can say base does better than Z-image turbo, is that it seems to know more stuff, but knowing more stuff is useless if the anatomy is ass and the realism is not even close too
>>
>>
File: 1755056615501464.jpg (873.8 KB)
873.8 KB JPG
>>108605278
I think you are right anon, base doesn't seem to have that much noise
>>
>>
File: 1764603567551374.jpg (869.1 KB)
869.1 KB JPG
I don't see anything in which Ernie is the best at, Chroma has the best kino, Z-image has the best realism and anatomy, this shit is just slop after slop
>>
>>108605317
it's been compared here >>108605080
>>
File: 1761780172470859.jpg (623.1 KB)
623.1 KB JPG
>>
>>
File: now what?.png (113.9 KB)
113.9 KB PNG
the ledditors are loving it though
https://www.reddit.com/r/StableDiffusion/comments/1slg4wh/we_may_have_ a_new_sota_opensource_model/
>>
>>
File: Nano Banana Amateur.jpg (1.1 MB)
1.1 MB JPG
Can't the chinks do anything else than just make cheap copies of murica's products?
>>
File: 635872472572.jpg (2.1 MB)
2.1 MB JPG
>>
File: _AnimaPreview3_00291_.jpg (464.8 KB)
464.8 KB JPG
>>
File: Ernie-Image_00022_.png (1.3 MB)
1.3 MB PNG
>>108605408
>>
File: 1773347391030829.png (706.6 KB)
706.6 KB PNG
>Tezuka Rin \(katawa shoujo\) sitting on a bench
is that how you're supposed to prompt on Anima? I can't manage to get her
>>
>>
>>108605262
>>108605278
>>108605276
i never had the artifacts problem with zit, just dont use the suggested retard samplers and instead use:
euler (/euler_a) + simple (/normal)
>>
>>108605468
Yes for tag based prompts but I don't think there is full consensus on how to prompt characters when prompting with natural language. Try Tezuka Rin from Katawa Shoujo.
If all options are exhausted try it on preview 2.
>>
File: _AnimaPreview3_00310_.jpg (458.8 KB)
458.8 KB JPG
>>
File: 1712175743062.jpg (1.6 MB)
1.6 MB JPG
>>
File: blaze it.png (1.5 MB)
1.5 MB PNG
>>108605468
>Tezuka Rin from Katawa Shoujo, a girl with short messy red hair and green eyes and no arms, sitting on a wooden bench, wearing her school uniform, calm distant expression, soft afternoon light, On the left knee there's a plush of Hatsune Miku, on the right there's a plush of Kazane Teto
skill issue
>>
File: 1770648835789723.mp4 (2.1 MB)
2.1 MB MP4
https://xcancel.com/DylanTFWang/status/2043952886166761519
>Open-source tomorrow
damn, if it's not too big to run locally maybe Tencent finally cooked
>>
big jump in real time interactable video gen
Waypoint-1.5 apache2 first person shooter focused 1.2b 720p 512 frames of context 56fps on 5090, need at least 30xx
online demo https://www.overworld.stream/
https://github.com/Overworldai/world_engine
>>
>>
>>108605539
forgot that link too
https://3d-models.hunyuan.tencent.com/world/
>>
File: Flux2-Klein_00092_.png (81.8 KB)
81.8 KB PNG
>>
>>108605552
newfag. luddite. brown, even.
the point is to enjoy the cool new tech and tinker with it while thinking about how you can maybe use it and change it yourself now while also thinking about how cool it will be in a year from now on.
for example chaining multiple generated rooms you can traverse infinitely is a software problem and thus solveable relatively easily while allowing you to get much more out of that tech there.
>>
File: 1758430737520461.png (2.4 MB)
2.4 MB PNG
>>108605550
>512 frames of context 56fps on 5090
So? less than 10 seconds? lol
>>108605552
desu I'd enjoy lurking on a world made out of a cool drawing image, like this shit
>>
File: Ernie-Image_00023_.png (1.1 MB)
1.1 MB PNG
A very sloppy double exposure sloppa.
>>
>>108605586
https://huggingface.co/Kijai/WanVideo_comfy/tree/main/LoRAs/Wan22_Ligh tx2v
kijai made the loras out of the new lightning version of Wan 2.2
>>
File: Flux2-Klein_00182_.png (1.7 MB)
1.7 MB PNG
There are some who call me...Tim
>>
File: 1747739466716694.jpg (777 KB)
777 KB JPG
>>
>>
File: 1770342916813218.webm (3.7 MB)
3.7 MB WEBM
>>108605592
not bad, Wan 2.2 may be an ancient model, it's still the best thing we have :')
>>
>>108605651
that's cool, I was tired of the ultra metalic sound of ltx, if those jews keep improving on that shit it might end up being a genuinely good model, still a long way to go to seedance 2.0 though lol >>>/wsg/6128285
>>
File: 1759932705998330.webm (3.8 MB)
3.8 MB WEBM
>>108605654
>first frame + last frame
kek, I forgot how much vram wan 2.2 is asking, I think I might return to LTX just for that
>>
>>
>>
File: 720p Wan 2.2.webm (3.8 MB)
3.8 MB WEBM
>>108605701
it uses a less heavy VAE so the kv cache usage is less punitive, good luck going for 720p on wan 2.2
>>
>>
>>108604726
does any of this shit run simply and reasonably well on AMD cards yet?
I have tried multiple times over the last couple of years to get a functional pipeline up and running on my 6800xt 16gb and it has never once worked
I'm no genius but I'm also not retarded
>>
File: Flux2-Klein_00152_.png (1.8 MB)
1.8 MB PNG
>>108605689
yeah
>>
>>
>>
File: 00109-58636226.png (885.8 KB)
885.8 KB PNG
>>
File: BULLSHIT.png (632.6 KB)
632.6 KB PNG
https://youtu.be/XUxKm40X__g?t=907
benchmarks was a mistake...
>>
using ltx 2.3 ver 1.1 (new one):
the man says "I'm LITERALLY Ryan Gosling in the movie Drive", and the camera zooms out through the windshield as he speeds down the road in new york city at night.
https://litter.catbox.moe/ekn0ujlh88fd37ox.mp4
>>
File: local lost (again).png (2.1 MB)
2.1 MB PNG
https://xcancel.com/flowersslop/status/2043591433731408126
those retards at Ernie should've trained their model on GPT-image 2's output instead lool
>>
>>
>>
>>108606080
this turned out better
the man says "I'm LITERALLY Ryan Gosling in the movie Drive", and the camera zooms out very far through the windshield as he drives the car off a ramp on a road in new york city at night.
https://litter.catbox.moe/z04qhvm91v3etfmw.mp4
>>
>>
>>
>>108606114
>>108606080
are you using the distilled model or the base model + you apply the distilled lora on top of it?
>>
File: 1745735247929790.jpg (1.5 MB)
1.5 MB JPG
babe, wake up, a second image model has hit the tower
https://huggingface.co/NucleusAI/Nucleus-Image
>>
>>108606190
>We release the full model weights, training code, and dataset, making Nucleus-Image the first fully open-source MoE diffusion model at this quality tier.
kek, if they release the dataset it means they trained this shit with only copyright-free garbage, DOA
>>
File: Overall-Performance.png (655 KB)
655 KB PNG
>>108606190
>>
>>
>>
>>
>>
File: 42.png (7.2 KB)
7.2 KB PNG
>>108606219
geg
>>
>>
>>
File: 242605513.png (228.6 KB)
228.6 KB PNG
>>108606275
even pixart BIGMA is in their report, a shitton of models I've never heard about or only heard on release and never again
>>
File: animap3_00029_.png (1.6 MB)
1.6 MB PNG
>>
>>
>>
>>
>>
>>
File: Ernie-Image_00002_.png (1.8 MB)
1.8 MB PNG
Ernie Base, 20 steps
>Touhou Project characters in a screenshot of Diablo 1. Screenshot set in a gothic, candlelit cathedral dungeon — stone floors, blood-stained altars, flickering torches casting long shadows. Reimu Hakurei appears as a weathered warrior, clad in rusted plate armor with subtle Shinto motifs, wielding a glowing sword and heavy iron shield. Marisa Kirisame is a gritty sorceress, her blackened robe frayed at the edges, holding a staff crackling with low-res magical sparks. Patchouli Knowledge floats slightly above the ground like a corrupted cleric, surrounded by ancient grimoires emitting a ghostly glow. All characters match the sprite-based, isometric art of Diablo 1
>Visual fidelity must match Diablo 1’s aesthetic: muted earth tones, dark reds and greens, harsh shadows, dithering effects, and low ambient lighting. The entire composition should be a screenshot from a 1996 pre-rendered isometric dungeon crawler. Include UI elements.
Trying again with 50
>>
File: Ernie-Image_00004_.png (1.8 MB)
1.8 MB PNG
>>108606372
50 steps, I guess it's better?
>>
>>
>>
>>108605813
On newer GPUs it should work, but 6800XT is not officially supported so far. I think the quickest way to try is to update your AMD GPU driver (to either 26.2.2 or 26.3.1), then download the latest ComfyUI portable AMD release from their Github, and see if it just werks:
https://github.com/Comfy-Org/ComfyUI/releases
>>
File: Ernie-Image_00012_.png (1.8 MB)
1.8 MB PNG
does okay with schizoprompts, but honestly it's just not good at fine details, this is base model at 50 steps and it should be way better for how long it takes to gen
>>
>>
File: ComfyUI_temp_targv_00022_.png (1.3 MB)
1.3 MB PNG
>>
File: 1776156560704245.jpg (292.5 KB)
292.5 KB JPG
what diffusion model is best for modifying an image based on text input
>>
>>
>>
>>
>>
>>
File: 1775684896827863.png (206 KB)
206 KB PNG
>>108606370
>>
>>
>>
>>
File: lmaoo.png (975.9 KB)
975.9 KB PNG
>>108606190
absolute slop
>>
File: bbs-zit-2026-04-15_00100_.png (3.9 MB)
3.9 MB PNG
>>108606190
>we have zit at home
>>
>>108606219
Laxhar should train Noob2 on Qwen image. Yes, I know nobody is going to be able to run it, finetune it, and shitmerge it, but:
1. There are no good finetunings or shitmergers.
2. Most of them don't know what they're doing, or they call "improving the dataset" contaminating it with their slop.
3. It's better that this behemoth of a model only gets finetuned and updated by him and his team.
4. LLM bros have been renting GPUs to run their Noromaids since early times.
It's the best option regarding quality. At the end of the day, I want an anime image model that's excellent quality. It doesn't bother me to use a free trial from some GPU rental startup to be able to run it. Better to have good models that I can't run than to have millions of snake oil models that make me waste my time.
>>
>>
File: LOCAL IS SAVED.png (468.9 KB)
468.9 KB PNG
Finally, Tongyi has released what we've been waiting for!
>>
File: 1767168885395325.png (376.4 KB)
376.4 KB PNG
>>
>>
File: 1755091184275991.png (276 KB)
276 KB PNG
https://xcancel.com/peter9863/status/2044269457086779877#m
babe wake up, Flow Matching is not the best diffusion architecture anymore
>>
>>
File: 1756058143539523.png (48.3 KB)
48.3 KB PNG
>>108607260
https://xcancel.com/bdsqlsz/status/2044308129043886119#m
it's obvious we're still far from having found the perfect way to train those image/video models, at some point it'll be so elaborate we'll get a 6b model as good as Seedance 2.0, we're still in the era of computers as big as a house and as powerful as a modern calculator lol
>>
File: bbs-zit-2026-04-15_00121_.png (3.8 MB)
3.8 MB PNG
>>
File: 1761138898231064.jpg (3.7 MB)
3.7 MB JPG
>>108607290
it's impressive how well it's able to reproduce the original image, tencent is shit at making models, but when it's about making cool new training methods they are definitely cooking
https://hy-soar.github.io/
>>
File: CAFM paper Z-Image.png (494.6 KB)
494.6 KB PNG
>>108607260
Sounds like another garbage p-hacked meme paper that will be forgotten desu.
They apparently trained Z-Image on this thing, but while the (most probably cherry picked) prompt adherence often looks better, the images look dogshit aesthetically and fried.
>>
>>108607352
glad that there's someone here that knows what it's talking about, what do you think of that method too? >>108607290 >>108607345
>>
>>108607363
The examples are better, includes more concrete benchs like OCR (Although these too can easily be benchmemed).
If I must criticize, there is relatively limited data about comparisons between SOAR and RL, despite "Better results than RL at the roughly same cost of SFT" being a central part of the paper's premise.
But overall looks more credible than the other paper.
Also, I have no idea what I am talking about.
>>
File: you should be ashamed of yourself lmao.png (48.7 KB)
48.7 KB PNG
https://www.reddit.com/r/StableDiffusion/comments/1slz1rq/last_week_in _generative_image_video/
the absolute state of localkeking, while seedance 2.0 is making hollywood sweat, we're still trying to figure out how to make a local model count to 3
>>
>>
File: _AnimaPreview3_00326_.jpg (133.8 KB)
133.8 KB JPG
>>
>>
File: bbs-zit-2026-04-15_00015_.png (3.7 MB)
3.7 MB PNG
>>
File: 1775937119882316.png (947.1 KB)
947.1 KB PNG
https://xcancel.com/ErnieforDevs/status/2044290766349185257#m
Oh great, another Klein tier edit model
>>
>>
File: Shifty AnimaPreview3 MK4 sample.jpg (1.5 MB)
1.5 MB JPG
How well can Ernie do anime? I'm not interested in genning 3DPD.
>>
>>108607530
>How well can Ernie do anime?
>>108605023
>>108605126
>>
>>
File: bbs-zit-2026-04-15_00137_.png (3.6 MB)
3.6 MB PNG
>>
File: banished.png (3.5 MB)
3.5 MB PNG
>>
>>
>>108607521
it's always the same. Model can generate sloppy looking, generic stock-image trash. Needs a stack full of loras for anything else. You might get lucky if they don't deliberately make the model untrainable.
I want a text to vid/image model trained on the entire EvilAngel catalog (including early Rocco Siffredi titles)
>>
File: 1764774418418463.png (1.3 MB)
1.3 MB PNG
>>108607606
that's probably why Alibaba will never release Z-image edit, it was just too good and unslopped for the gweilos
>>
File: bbs-zit-2026-04-15_00160_.jpg (888.6 KB)
888.6 KB JPG
>>
>>
>>
>>
>>
File: let's go kids, a jeet is in town.png (356.6 KB)
356.6 KB PNG
>>108607707
>I still use sdxl, VAE never was a problem for me but a spook
>>
File: 1775193996577303.png (309.3 KB)
309.3 KB PNG
>>108607713
>no such thing exists
https://github.com/Comfy-Org/workflow_templates/blob/main/templates/im age_ernie_image_turbo.json
>>
File: Ernie-Q8-Turbo_00017_.png (2.2 MB)
2.2 MB PNG
GGUF version seems completely broken
>>
>>108607521
speaking of edit models, why Comfy didn't implement that one?
https://github.com/jd-opensource/JoyAI-Image/tree/main/joyai_image_com fyui
>>
>>
File: 1770337420719969.png (1.7 MB)
1.7 MB PNG
https://xcancel.com/bdsqlsz/status/2044317768414310633#m
>Illstrious SFT based on Z-image-turbo with S3-DiT.
Are we back?
>>
>>
>>
>>
File: img_00080_.jpg (736.4 KB)
736.4 KB JPG
>>
>>108607868
Describe the things you want to see in the background and it will put them there. You could also try describing the background as cluttered, messy, detailed, etc. (haven't tried this one yet.). You may be using artists that tend to draw undetailed backgrounds, this has a huge effect.
>>
>>
File: bbs-zit-2026-04-15_00219_.png (3.4 MB)
3.4 MB PNG
you fools
>>
>>
File: 1751495597543479.png (90.9 KB)
90.9 KB PNG
>>108607884
just draw, retard
>>
>>
File: ZIT_00008_.png (1.1 MB)
1.1 MB PNG
>>108607817
Good old ZIT or Klein maybe if it hasn't been safety trained against gore.
>>108607834
>Fine-tuned on Z-image-turbo
This is kinda scary. I am skeptical that they managed to pull it off without frying or undershooting on a distilled model.
Also every gen they use to showcase its textual capabilities have very short text. Makes me further worried that it's so fried it can't gen longer than a word now.
>>
File: fucking idiots.png (100.6 KB)
100.6 KB PNG
>>108607888
>he started putting ai gens in his dataset
wait really? it's fucking doa then...
>>
File: img_00087_.jpg (705 KB)
705 KB JPG
Fruit punch with vodke and unpeeled bananas
>>
>>
File: file.png (2.5 MB)
2.5 MB PNG
>>108607834
v3.5 has a lot of sovl, what happened?
>>
File: file.png (2.9 MB)
2.9 MB PNG
>>108607912
sovl vs sovless
>>
File: img_00096_.jpg (708.5 KB)
708.5 KB JPG
>>
>>
>>
>>
>>
>>
File: img_00102_.jpg (890.4 KB)
890.4 KB JPG
>>
>>
File: CHUD DOOMER.png (396.3 KB)
396.3 KB PNG
>>108607916
>Nonsensical second tail on the left.
>Nonsensical background object (lamp) on the right
>The hand and the cup are broken: massive thumb, distorted fingers and handle
>Melted "cleavage" and clothes texture
>And this is probably a good gen that got picked
Yep, they cooked this thing.
It's so fucking over. We will still be finetrooning SDXL clip in 2032 at this rate.
>>
>>
>>
>>108607947
>We will still be finetrooning SDXL clip in 2032 at this rate.
anima went for a retarded base model but it's still better than SDXL so I guess we're moving in the right direction... really slowly though...
>>
>>
>>108607834
>However, as prompts became longer and more descriptive—and as users increasingly required multi-character interactions and structured scene composition—the limitations of the existing architecture became more apparent.
holy LLM slop, come on guys you can't write that shit by yourselves?
>>
File: img_00107_.jpg (691.5 KB)
691.5 KB JPG
>>
>>
>>
>>
>>
>>
>>
File: Midjourney v8.1.jpg (1.9 MB)
1.9 MB JPG
>>108607834
wake me up when someone will manage to bring back the kino of midjourney
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: img_00163_.jpg (700.4 KB)
700.4 KB JPG
>>
>>
File: Ernie-Image_00028_.png (390.2 KB)
390.2 KB PNG
>>108608040
Is there any good theory on how Midjourney created this "Midjourney style"?
>>
File: 1775819798125401.jpg (328.8 KB)
328.8 KB JPG
>>108607834
WE BACK
>>
>>
>>
File: img_00187_.jpg (499 KB)
499 KB JPG
>>
File: Ernie-Image_00035_.png (415.8 KB)
415.8 KB PNG
>>
File: Ernie-Image_00042_.png (337.2 KB)
337.2 KB PNG
>>
>>108608171
>>108608304
>>108608368
I genned 20 images at 512p with the model. 32 steps, cfg 4, euler simple.
Some images had lesser issues like weird composition for backgrounds and problems with minor details like blurry eyes but overall I didn't get any body horror like extra limbs you get at 1024p.
Makes me think that fucked up high res training (perhaps they intentionally had too few high res steps to save money, Chinese culture shenanigans) which makes it further likely that the body horror possibly can be ironed out during finetuning.
That is again IF it responds well to training, which is sadly a big if nowadays.
>>
File: 1767788119001037.png (82.9 KB)
82.9 KB PNG
>>108607521
>we are getting literally showered with new stuff.
but we are getting showered with a bunch of nothingburger, it's literally a golden shower, except that this isn't molten gold, but piss
>>
>>
>>
>>108608469
i think it is impossible to source pre-ai slop image datasets anymore unless you're a gigacorp that's been hoarding data for decades, the days of LAION are over, everything new is trained on synthetic slop
>>
>>
>>
>>
>>108608542
I think it's more about cost.
Mass generating synth slop is relatively cheap.
You need to scrape entire internet (in an era where many websites actively fight bots), extensively prune and filter the dataset, and then generate reliable enough captions. It's bandwidth and time intensive.
Of course it's still the way to go if you are aiming to make a SOTA API model where you hope that you can turn profit, but if you are making slop for freeloader local peasants, whom you are only helping to get your name out there, why bother?
>>
>>108608530
>i think it is impossible to source pre-ai slop image datasets anymore
its trivial to detect and filter ai images now, its basically solved
also cameras still exist so you can just create your own dataset if you wanted to
>>
>>108608601
>its trivial to detect and filter ai images now, its basically solved
A bold claim. Do you have anything to back that up?
>also cameras still exist so you can just create your own dataset if you wanted to
Just travel around the world and take millions of photos. Easy.
>>
>>
File: Hernia-Image-Turbo.jpg (458.9 KB)
458.9 KB JPG
its........ kino
>>
File: Hernia-Image-Turbo.png (1.8 MB)
1.8 MB PNG
prompt: swastika (flux could do it, and it was made by germans kek)
>>
>>
>>
File: artexceiling.jpg (184 KB)
184 KB JPG
>>108608688
>the sky
rofl
>>
>>
>>
File: 1765056688171543.png (1.8 MB)
1.8 MB PNG
>>108608749
nah, google is more based than that
https://arena.ai/
>>
>>
File: 1756224984536942.png (3.3 MB)
3.3 MB PNG
>>108607834
>shill handpicked the best of the lot
How about posting the rest lol. I'd take 8 fingers per hand over this pure slop.
>>
>>
>>
File: AI Models Civitai.png (198.6 KB)
198.6 KB PNG
BABE BABE, WAKE UP WAI ANIMA HAS RELEASED!
https://civitai.com/models/2544636/wai-anima
>>108608824
shill better
>>
File: slop.png (6.6 KB)
6.6 KB PNG
>>108608861
>>
Thoughts on Ernie Image? I think it's ok but I'm not sure it offers that much overall in terms of the actual quality / speed ratio
Seems a little too reliant on the extra prompt enhancer model also which adds even more overhead
>>
>>
>>
>>
>>
>>
>>
>>108608918
they will >>108607521
>>
>>108608876
The best opinion isn't here but CivitAI. When ZImage, Anima, and Klein were released, there were instantly a ton of loras and fine tunes. It seems that Ernie’s lab paid someone to shill here, like you mentioning Ernie again.
>>
>>
>>
>>
>>108608824
>>108608861
ill wait for the noob tune........
>>
>>
File: 1748848618248548.png (1.3 MB)
1.3 MB PNG
>>108607868
it's insane to say this when anima has the best backgrounds of all current anime models, NAI 4.5 included
>>
>>
>>
>>108609134
Start by taking your own advice midwit faggot.
>>108609437
There is a difference between "theoretically detectable" and "its practical and cost efficient to implement accurate and reliable detection for wide variety of different image generation models each with their own idiosyncratic quirks, that are getting increasingly subtle with each generation, and keep doing this as new models get released every week" but sure go on.
>>
File: int8.jpg (850.3 KB)
850.3 KB JPG
>>108607786
The base works pretty well with int8. I haven't tested turbo though.
15322MB -> 8238MB, from 2.38s/it -> 1.4s/it on my 3090.
https://github.com/BobJohnson24/ComfyUI-INT8-Fast
>>
>>
>>108609502
you know you are in an ai general and you can just train a classifier right, a drastic majority of slop gens is going to be SD1.5/SDXL/FLUX/CHATGPT and even a minimal effort to reduce those is going to be better than 99% of models
but the people that train those ultra slopped models train on synthetic garbage on purpose to avoid copyright issues and morons having a melty about deepfakes/csam
on danbooru for example you are gonna get AT MOST like 10k untagged ai images out of several million
>>
>>
>>108609580
If you think a universal "classifier" is this easy, why not train it yourself? Why isn't there any such classifiers without non-meme accuracy out there despite huge demand to filter out AI slop? Could it be more complex than reiterating that one word which stuck with you after you watched some oversimplified goyslop youtube video a year ago?
>but the people that train those ultra slopped models train on synthetic garbage on purpose to avoid copyright issues and morons having a melty about deepfakes/csam
This is true but:
a) Costs of processing real data are still a huge factor
b) It's irrelevant to the first point
>on danbooru for example you are gonna get AT MOST like 10k untagged ai images out of several million
This is true for making an anime tune of existing model but for training a model from scratch you need data from wide variety of contaminated sources and you run back to the curation/filtering problem.
>>108609601
It needs to know them because said synthetic patterns are different for every diffusion model out there.
>>
>>108609631
there is
https://thehive.ai/demos/ai-generated-content-detection
which is quite decent
why would you even train a generation model if you are too incompetent or broke to not make it look like synthetic dogshit?
>>
>>108604759
The Chinese always come out with really nice architectures, but they really can't into quality training data. Shame, the model had potential, but it's clearly slopped. It's very strange, due to Flux 2 VAE, some photos look very realistic, while others don't look it at all. They likely used a mixture of both slopped and real data, and it shows. Now I wait until BFL releases a model with similar capabilities.
>>
>>108609650
1) This thing seems to test each model individually (or at least under an umbrella), so it's not universal as claimed possible earlier, and it needs to be updated for each new model.
2) Even assuming it's accurate enough (won't spend hours testing) it's going to cost metric ton of money to test many millions of images through it.
>>
>>108609706
the anima gens i tried out on it get classified as "other" ai just fine
convenient that you assert that a reliable detector cant be trained yet refuse to test it, clearly it is possible
it does not have to even be 100% reliable as reducing them would already would be a massive positive compared to everyone else that just trains on ai on purpose like i stated before
>>
Fresh when ready
>>108609718
>>108609718
>>108609718