Thread #108597963
File: highlights_g_108590807_1776095455_1.jpg (3.4 MB)
3.4 MB JPG
Discussion and Development of Local Image and Video Models
Previous: >>108590807
https://rentry.org/ldg-lazy-getting-started-guide
>UI
ComfyUI: https://github.com/comfyanonymous/ComfyUI
SwarmUI: https://github.com/mcmonkeyprojects/SwarmUI
re/Forge/Classic/Neo: https://rentry.org/ldg-lazy-getting-started-guide#reforgeclassicneo
SD.Next: https://github.com/vladmandic/sdnext
Wan2GP: https://github.com/deepbeepmeep/Wan2GP
>Checkpoints, LoRAs, Upscalers, & Workflows
https://civitai.com
https://civitaiarchive.com/
https://openmodeldb.info
https://openart.ai/workflows
>Tuning
https://github.com/spacepxl/demystifying-sd-finetuning
https://github.com/ostris/ai-toolkit
https://github.com/Nerogar/OneTrainer
https://github.com/kohya-ss/musubi-tuner
https://github.com/tdrussell/diffusion-pipe
>Z
https://huggingface.co/Tongyi-MAI/Z-Image
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo
>Anima
https://huggingface.co/circlestone-labs/Anima
https://tagexplorer.github.io/
>Qwen
https://huggingface.co/collections/Qwen/qwen-image
>Klein
https://huggingface.co/collections/black-forest-labs/flux2
>LTX-2
https://huggingface.co/Lightricks/LTX-2
>Wan
https://github.com/Wan-Video/Wan2.2
>Chroma
https://huggingface.co/lodestones/Chroma1-Base
https://rentry.org/mvu52t46
>Illustrious
https://rentry.org/comfyui_guide_1girl
>Misc
Local Model Meta: https://rentry.org/localmodelsmeta
Share Metadata: https://catbox.moe | https://litterbox.catbox.moe/
Img2Prompt: https://huggingface.co/spaces/fancyfeast/joy-caption-beta-one
Txt2Img Plugin: https://github.com/Acly/krita-ai-diffusion
Archive: https://rentry.org/sdg-link
Collage: https://rentry.org/ldgcollage
>Neighbors
>>>/aco/csdg
>>>/b/degen
>>>/r/realistic+parody
>>>/gif/vdg
>>>/d/ddg
>>>/e/edg
>>>/h/hdg
>>>/trash/slop
>>>/vt/vtai
>>>/u/udg
>Local Text
>>>/g/lmg
>Maintain Thread Quality
https://rentry.org/debo
https://rentry.org/animanon
298 RepliesView Thread
>>
>>
File: 1764952745517916.png (139.6 KB)
139.6 KB PNG
>>108598018
>Remember Anima discussion belong to Anime generals
>>
>>
>>
>>
>>
>>
>>
File: converted.jpg (1.1 MB)
1.1 MB JPG
>post in my shitty general!
>>
File: 1769913703128476.jpg (1.8 MB)
1.8 MB JPG
https://huggingface.co/duongve/AnimaYume
why are they finetuning an unfinished base model? lool
>>
>>
File: Z-image turbo.png (3 MB)
3 MB PNG
>>108598145
>Zimage slop
it's a good model anon, it can even do good anime images out of the box :(
>>
File: 1639326039084.png (1.3 MB)
1.3 MB PNG
>>108598106
He caters to me personally.
>>
>>
>>
>>
>mfw Resource news
04/13/2026
>LTX 2.3 Distilled v1.1
https://huggingface.co/Lightricks/LTX-2.3/blob/main/ltx-2.3-22b-distil led-1.1.safetensors
>UniCom: Unified Multimodal Modeling via Compressed Continuous Semantic Representations
https://huggingface.co/tencent/Unicom-Unified-Multimodal-Modeling-via- Compressed-Continuous-Semantic-Repr esentations
>CatalogStitch: Dimension-Aware and Occlusion-Preserving Object Compositing for Catalog Image Generation
https://catalogstitch.github.io
>Realizing Immersive Volumetric Video: A Multimodal Framework for 6-DoF VR Engagement
https://github.com/Metaverse-AI-Lab-THU/ImViD
>Seeing is Believing: Robust Vision-Guided Cross-Modal Prompt Learning under Label Noise
https://github.com/gezbww/Vis_Prompt
>MixFlow: Mixed Source Distributions Improve Rectified Flows
https://github.com/NazirNayal8/MixFlow
>VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images
https://zlab-princeton.github.io/VisionFoundry
>Tango: Taming Visual Signals for Efficient Video Large Language Models
https://github.com/xjtupanda/Tango
>VL-Calibration: Decoupled Confidence Calibration for Large Vision-Language Models Reasoning
https://github.com/Mr-Loevan/VL-Calibration
>pixlstash v1.0
https://github.com/Pikselkroken/pixlstash/releases/tag/v1.0.0
>SD Forge — CivitAI Helper
https://github.com/ArthureCodage/sd-forge-civitai-helper
>Is AI the greatest art heist in history?
https://www.theguardian.com/books/2026/apr/12/is-ai-the-greatest-art-h eist-in-history
>VisionCaptioner: Automated image & video captioning using Qwen-VL and SAM3
https://github.com/Brekel/VisionCaptioner
04/12/2026
>Stretchy Studio: FOSS 2D animation tool for turning static illustrations into mesh-deformable characters
https://github.com/MangoLion/stretchystudio
>LTX-2 VBVR LoRA - Video Reasoning
https://huggingface.co/LiconStudio/Ltx2.3-VBVR-lora-I2V
04/11/2026
>ComfyUI-RookieUI: The ultimate A1111-style sidebar
https://github.com/rookiestar28/ComfyUI-RookieUI
>>
>mfw Research news
04/13/2026
>InsEdit: Towards Instruction-based Visual Editing via Data-Efficient Video Diffusion Models Adaptation
https://arxiv.org/abs/2604.08646
>CT-1: Vision-Language-Camera Models Transfer Spatial Reasoning Knowledge to Camera-Controllable Video Generation
https://arxiv.org/abs/2604.09201
>On Semiotic-Grounded Interpretive Evaluation of Generative Art
https://arxiv.org/abs/2604.08641
>SCoRe: Clean Image Generation from Diffusion Models Trained on Noisy Images
https://arxiv.org/abs/2604.09436
>Training-free, Perceptually Consistent Low-Resolution Previews with High-Resolution Image for Efficient Workflows of Diffusion Models
https://arxiv.org/abs/2604.09227
>ELT: Elastic Looped Transformers for Visual Generation
https://arxiv.org/abs/2604.09168
>EGLOCE: Training-Free Energy-Guided Latent Optimization for Concept Erasure
https://arxiv.org/abs/2604.09405
>Post-Hoc Guidance for Consistency Models by Joint Flow Distribution Learning
https://arxiv.org/abs/2604.08828
>MeshOn: Intersection-Free Mesh-to-Mesh Composition
https://threedle.github.io/MeshOn
>BlendFusion -- Scalable Synthetic Data Generation for Diffusion Model Training
https://arxiv.org/abs/2604.09022
>Strips as Tokens: Artist Mesh Generation with Native UV Segmentation
https://arxiv.org/abs/2604.09132
>Region-Constrained Group Relative Policy Optimization for Flow-Based Image Editing
https://arxiv.org/abs/2604.09386
>Detecting Diffusion-generated Images via Dynamic Assembly ForestsDetecting Diffusion-generated Images via Dynamic Assembly Forests
https://arxiv.org/abs/2604.09106
>RIRF: Reasoning Image Restoration Framework
https://arxiv.org/abs/2604.09511
>AniGen: Unified S3 Fields for Animatable 3D Asset Generation
https://arxiv.org/abs/2604.08746
>Do Vision Language Models Need to Process Image Tokens?
https://arxiv.org/abs/2604.09425
>LADR: Locality-Aware Dynamic Rescue for Efficient T2I Generation with Diffusion LLMs
https://arxiv.org/abs/2603.13450
>>
>>108598135
I dunno, if you download any of these anima "finetunes" and drop in the workflow of one of your gens and regen with it you get almost the exact same picture so I can only assume they want to steal credit for how good the model is.
>>
>>108598242
"Is AI the greatest anime tiddie in history?"
>>
>>
>>
>>
File: 1760560355557145.png (1.3 MB)
1.3 MB PNG
>>108598380
really? looks like some Qwen Image slop, the skin is smooth as fuck
>>
>>
>>108597866
Why are offloading all models? Disable layer offloading.
Probably set TE precision higher and enable unload TE.
Disable caption dropout probably.
Differential Guidance meme didn't workout too well for me for other models, but try your luck I guess.
>>
>>108598449
thanks for the response anon. going to try this another day when I'm moralized again :'( . i wasted a whole 8 hours of the bullshit and was bored as fuck. I wish there lora commissioners available for high end models like ltx and qwen.
>>
>>
>>
>>
>>
>>
File: 00005-2336170344.jpg (1.9 MB)
1.9 MB JPG
>>
>>
>>
File: 597411051901623.png (768.6 KB)
768.6 KB PNG
>>
File: 884561598637606.png (1023.2 KB)
1023.2 KB PNG
>>
>>
File: 314212686033261.png (604.8 KB)
604.8 KB PNG
>>
>>
Hello, I uploaded another lora, feel free to post your questions here.
https://civitai.com/models/2540444/anima-highresaesthetic-boost
>>108598018
>>108598106
I’m not going through all the 4chan generals, I’ll just post here.
>>
>>
>>
>>
>>
File: _AnimaPreview3_00041_.jpg (554.7 KB)
554.7 KB JPG
>>
>>
>>
>>
>>
File: _AnimaPreview3_00048_.jpg (495.5 KB)
495.5 KB JPG
>>
>>
File: 1774617995278017.png (308.1 KB)
308.1 KB PNG
>>108598971
>feel free to post your questions here.
I'm not seeing any images on civitai :(
>>
>>108598971
Russ I am busy this week so I will probably make my huggingface post about it next week, but I should give you a heads up so that you can hopefully take your time to test it on your own.
Have you compared character knowledge of preview 3 vs preview 2? I see it struggling with some characters that preview 2 could do easily, but now preview 3 is struggling to do them with same consistency. I love your work with anima but it got me worried a bit.
>>
>>
>>
>>
File: _AnimaPreview3_00054_.jpg (401.2 KB)
401.2 KB JPG
>>
>>
>>
>>
>>
>>108598988
>>108599114
you are hard dude to reach
>>
>>
>>
>>
File: _AnimaPreview3_00066_.jpg (539.8 KB)
539.8 KB JPG
>>
>>
File: 615463074307759.png (1.6 MB)
1.6 MB PNG
>>
>>
>>
File: needahand.png (569.5 KB)
569.5 KB PNG
>>108599119
>doesn't recognize Rin Tezuka
? it does
>>
>>
>>
>>
>>
File: _AnimaPreview3_00090_.jpg (433.6 KB)
433.6 KB JPG
>>
>>108599213
Why would he post in a dedicated hentai thread? Like why do you think every example image on the Civit page is SFW? Same shit as Noob, everyone knows what the model can do but there's reasons to not openly advertise that.
>>
File: 408309586443103.png (2.4 MB)
2.4 MB PNG
>>
>>
>>
File: 00006-3867242695.jpg (937.1 KB)
937.1 KB JPG
>>
>>
>>
File: 123800191121980.png (2.8 MB)
2.8 MB PNG
>>108599274
Works alright in my experience.
>>
>>
>>
>>108599260
>>108599297
the colors are too saturated, decrease the cfg I guess
>>
File: _AnimaPreview3_00108_.jpg (757.1 KB)
757.1 KB JPG
>>
>>
>>
>>
File: 1773217571435873.jpg (961.6 KB)
961.6 KB JPG
>>108598971
finally, we can see the images
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: 1747796594292521.png (1.8 MB)
1.8 MB PNG
SOUL - SOULLESS
>>
>>
>>
>>
>>
>>
>>
File: ComfyUI_09456_.png (782.6 KB)
782.6 KB PNG
>>
File: 333539074583949.png (2.3 MB)
2.3 MB PNG
>>
>>
>>
>>
>>
>>
>>
>>
File: deMA_zi_00012_.png (2.5 MB)
2.5 MB PNG
>>
File: 102148519303150.png (2.5 MB)
2.5 MB PNG
>>
>>
>>108599350
>>108598971
Just call it illustrious 2.0 lora
>>
>>
>>108598971
>https://civitai.com/models/2540444/anima-highresaesthetic-boost
Noob here, what model do I need to use this with?
>>
>>
>>108599894
The page contains enough information:
>Base Model
>Anima
>About this version
>Trained on preview3
I will handhold you further though:
https://huggingface.co/circlestone-labs/Anima/tree/main/split_files
>>
File: 835673734828.jpg (1.8 MB)
1.8 MB JPG
Anima is getting decent at replicating characters, but the details are still missing sadly.
>>
>>108599911
Ok, I dont know which anime model though, so this one wont work:
>https://civitai.com/models/2458426/anima-official
anyways thanks
>>
>>
>>
>>
>>108599960
>the details are still missing sadly.
that's what happens when you go for a meme base model with a subpar vae >>108596443
>>
File: 1769747479955588.jpg (578.4 KB)
578.4 KB JPG
>>
>>
>>
>>
>>
File: 20260318_123209.jpg (194.9 KB)
194.9 KB JPG
I still can't do two unique characters without it morphing them into one or mutating them. I've used Forge Couple, etc, but it just doesn't work. I had to give up and use Nano Banana...
>>
>>
>>
File: rtthuc.jpg (1 MB)
1 MB JPG
>>108600245
They do have names. They even have their own Booru tags.
>>
>>
>>
>>
>>108600283
There isnt and wont be a good for all model, SDXL is better at quickly capturing styles and merging aesthetics. SDXL for the hires fix pass is kino in Anima and Anima has much better composition than SDXL. To me the two should coexist and complement each other.
>>
>>
>>
File: deMA_zi_00016_.png (2.1 MB)
2.1 MB PNG
>>
>>
>>
File: o_00233_.png (1.9 MB)
1.9 MB PNG
>>
>>
>>
File: 00405-1635270013.png (1.7 MB)
1.7 MB PNG
>>
>>
>>
>>
>>
>>
>>
>>108600715
apparently is extracted from kijai's loras
https://huggingface.co/lightx2v/Wan2.2-Distill-Models/tree/main
>>
File: 385188574252897.png (2.3 MB)
2.3 MB PNG
>>
File: 1774466994663736.png (553 KB)
553 KB PNG
i just started genning locally with Comfy, and its kind of confusing.
How do I set it up so that I can start generating stuff that doesn't look like complete dogshit
>>
>>
>>
>>
File: ComfyUI_temp_ohcoz_00001_.png (1.7 MB)
1.7 MB PNG
>>
>>
Okay, so there's a million fuckin AI GF websites that go:
>Pick realistic or anime
>Pick race
>Pick hair color
>Pick bust size
>Pick ass size
>Pick relationship
>Pick 3 hobbies from a long list, a lot of which are hyper-specific
>Then it asks you to log in with Google or make an account, then asks for credit card
Hell, there's a 50% chance your 4chan ad is for one right now.
Their urls and specific genned images and videos are different but they're clearly all running the exact same software. And for there to be that many, it's probably something prebuilt and easy to set up.
So, any ideas where to find the guts? I don't want to make my own website, I just want to run it locally. I have the hardware, so fuck paying those blatant scammers.
>>
>>
File: 327.jpg (1.3 MB)
1.3 MB JPG
anima highres lora
https://civitai.com/models/2540444/anima-highresaesthetic-boost?modelV ersionId=2855073
with/without
>>
>>
>>
>>
File: 115754419284491.png (1.7 MB)
1.7 MB PNG
>>
File: 54885957594595.png (1.6 MB)
1.6 MB PNG
>>
>>
>>108601306
the guts are probably just a base model and half a dozen loras and a basic bitch llm to write a prompt.
>user selects photorealistic black woman with big tits and a small ass.
>user selects surfing and cooking as hobbies.
>load realism model + black woman lora + big tits lora + small ass lora
>llm prompt big titty black bitch lying on surfboard and rubbing her vagene with a bigmac.
>>
File: 541646248919810.png (612.1 KB)
612.1 KB PNG
>>
>>
>>
File: 420273897860536.png (1.3 MB)
1.3 MB PNG
>>
>>
>>
I've looked through a lot of this info and a lot of these options but I can't find what I'm looking for: I want something akin to chatgpt's "upload an image and a prompt to edit it" where i can do something like post a picture of a green ball and say make it red with blue stripes. Any good options you guys know?
>>
>>
>>
>>
>>108601682
yes
https://www.youtube.com/results?search_query=Run+Qwen-Image-Edit+Local ly
>>
File: ComfyUI_temp_eijtg_00001_.png (1.7 MB)
1.7 MB PNG
>>
File: ComfyUI_temp_eijtg_00011_.png (1.7 MB)
1.7 MB PNG
>>
>>
File: ComfyUI_temp_eijtg_00013_.png (1.4 MB)
1.4 MB PNG
>>
>>
>>108601653
well it's not that hard to get consistent gens for generic 1girl shots, worst case you could gen a big batch and then run them through a face analyzer and only output the best matches.
if you are using loras it just gets easier.
>>
File: ComfyUI_temp_eijtg_00022_.png (1.6 MB)
1.6 MB PNG
>>
File: ComfyUI_temp_gqiel_00004_.png (792 KB)
792 KB PNG
>>108601306
Most popular sites like that are sold in a white glove service by several sites, this is one
https://www.scrile.com/ai
Its basically pay and deploy but I don't know if you can tinker the workflows or stuff like that, setting an AI adult site from the ground can be tricky, since you will have to invest money and time on hosting, coding (even with vibecoding), marketing, payment processors, creating and setting up the characters, it could take you several months
>>
File: output.webm (3.9 MB)
3.9 MB WEBM
>>108601380
>>108601744
>>108601749
>>108601751
>>108601778
Damn does this bitch just never wash her clothes?
>>
File: Untitled.png (892.2 KB)
892.2 KB PNG
>>
>>
File: Untitled.png (937.3 KB)
937.3 KB PNG
>>108601831
>>
File: 1742437393421802.jpg (89.1 KB)
89.1 KB JPG
i managed using realistic anima. cleary better than klein, qwen,sdxl. and no need millions loras for the body, yay
>>
>>
File: ComfyUI_temp_cmbpk_00074_.png (2.3 MB)
2.3 MB PNG
I iterated over this with dozens of different prompts and tried three different models and every time it adds a weird light in the middle of the scene.
>>
>>
File: ComfyUI_temp_cmbpk_00083_.png (2.5 MB)
2.5 MB PNG
>>108601962
I tried describing a gunfight (can't say firefight or it will think like putting out fires) and all the projectile trails (can't say tracers or your image is overwatch themed now) always are coming from the light in the center of the image, often in a pillar going vertical into the sky. If I describe soldiers or silhouettes in the perimeter it places them surrounding whatever pyre is in the middle, half of the are bowing to it like in worship or something. Now it keeps adding a dog in it for no reason in every seed.
I was looking through images on civitai thinking that I'm just shit at prompting. But it turns out every prompt in there is half ignored anyway. Like I saw one with "disembodied limb" that didn't feature a disembodied limb in the image. This shit is a complete joke.
It's not even that people can't generate good looking images. I can make convincing images that I find interesting but it's never actually what I had in mind or intended. And it's clear none of the stuff other people make is any different. It's all so fucking typical.
Like why not throw a fucking dog into my image right? People love dogs. I didn't ask for one in my prompt but what do I know so fuck me right?
>>
>>108601998
thats just how it is with classifier free guidance and rng.
if you want a controlled composition you need to control it, either with weighted tokens, clusters of prompts to reinforce concepts, controlnets, proper negative prompts, etc.
>>
File: 00382-2054072178.jpg (286.7 KB)
286.7 KB JPG
kek, i have no idea how to upscale this without slopping it though
High quality cosplay photo of a young and pretty japanese woman with long pink hair cosplaying as power from chainsaw man. The bedroom is full of toys and plushies. The woman is wearing gym shorts with her panties exposed. She is lying on her stomach and looking at the viewer. She is looking back. She has a toned body and the photo has an ass focus. A gaming computer is visible in the background. Her computer has a picture of Donald Trump.
Negative prompt: anime, illustration, cartoon, stable diffusion, worst quality, low quality, score_1, score_2, score_3, bad hands, bad fingers, bad feet, bad anatomy, ai-generated, ai-assisted, bad quality, normal quality, average quality, adversarial noise, resized, downscaled, source larger, lowres, jpeg artifacts, compression artifacts, blurry
Steps: 40, Sampler: ER SDE, Schedule type: Beta, CFG scale: 5, Shift: 3, Seed: 2054072178, Size: 896x1152, Model hash: 14fffe8ad5, Model: anima-preview3-base, Clip skip: 2, RNG: CPU, MaHiRo: True, Beta schedule alpha: 0.6, Beta schedule beta: 0.6, Version: neo, Module 1: Qwen_Image-VAE, Module 2: qwen_3_06b_base
>>
>>
File: Flux2-Klein_00071_.png (1.7 MB)
1.7 MB PNG
>>
File: f2k9b_00034.png (2.2 MB)
2.2 MB PNG
>>
>>
>>
>>
>>
>>
>>
File: Flux2-Klein_00119_.png (122.6 KB)
122.6 KB PNG
>>
File: 1746365606508663.jpg (1.3 MB)
1.3 MB JPG
babe wake up, a base model that doesn't use VAE (pixel space) got released
https://huggingface.co/blog/sensenova/neo-unify
>>
>>
>>
>>
File: file.png (374.9 KB)
374.9 KB PNG
>>108602575
>unified
that means it doesn't use a text encoder anymore? damn that looks interesting, it's just 3 models (TE + diffusion model + VAE) in one, I like that
>>
File: file.png (1.9 KB)
1.9 KB PNG
>>108602575
>>
>>
>>
>>
>>
>>
File: yayy.png (83.5 KB)
83.5 KB PNG
>>108602625
>VAEless
>Text-encoder-less
make this shit a 15b model and I'm sold
>>
>>
>>
>>
>>
File: Capture.png (2.1 MB)
2.1 MB PNG
>>108602575
>pixel space
>still has loss reconstruction and color shift
why? isn't it supposed to be a lossless process?
>>
>>
File: When baidu ERNIE?.png (73.2 KB)
73.2 KB PNG
>>
>>
File: 1773121883282732.png (655.1 KB)
655.1 KB PNG
https://huggingface.co/lodestones/Zeta-Chroma
the loss curve is flattened, meaning that the training is over, yet the images it produces are still so fucking ASS
>>
>>108602575
One month later and nothing released.
Judging by the slop look, we are probably not missing out much.
Though it seems like they managed to make it converge into something besides crunchy blurslop. Kekstone might benefit from that.
>>108602843
His schizo meme architecture is unable to converge. Retard is just pointlessly wasting electricity instead of admitting that he fucked (again) with vibe training slop.
>>
>>108602843
>>108602858
that's really disappointing. I was hoping that amazing tunes of z-base would be around by now but it's kind of dead
>>
so many open source ai models die off and get no traction. This one got release yesterday yet not a peep from anyone.
https://huggingface.co/tencent/Unicom-Unified-Multimodal-Modeling-via- Compressed-Continuous-Semantic-Repr esentations
https://github.com/Tencent-Hunyuan/UniCom
https://miazhao7708.github.io/UniComPage/
>>
File: Son I'm crine.png (437.4 KB)
437.4 KB PNG
>>108603004
https://huggingface.co/tencent/Unicom-Unified-Multimodal-Modeling-via- Compressed-Continuous-Semantic-Repr esentations/tree/main/siglip2-so400 m-patch16-naflex
>using clip in the year of our lord 2026
>>
File: 1770827840901658.png (64.3 KB)
64.3 KB PNG
>>108603004
anon, you know it's a meme model when they're not comparing with the best edit models like Qwen Image Edit or Klein
>>
File: ComfyUI_21230.png (2.2 MB)
2.2 MB PNG
>>108602575
>2B
Seems small. Too bad there's nothing there to try.
>>108602843
He's got that Civitai mindset; just fry that bitch 'till it's charred.
>>
File: Capture.jpg (435.5 KB)
435.5 KB JPG
>>108603004
>>108603004
this is so ass, it completly changed the poor squirel
>>
>>
>>108603004
I was interested in trying it out before I saw that it's 15gb.
>>108603013
Siglip isn't clip?
>>108603041
It uses older flux vae which is suboptimal for edit tasks now.
>>
>>108603054
>Siglip isn't clip?
https://medium.com/@jiangmen28/siglip-vs-clip-the-sigmoid-advantage-45 7f1cb872ab
it's like saying Jake Paul is better than KSI, when ultimately we want fucking Mike Tyson (LLMs text encoders)
>>
File: 00008-2187282030.png (3.3 MB)
3.3 MB PNG
>>
>>
File: Ernie-image.png (465.2 KB)
465.2 KB PNG
https://xcancel.com/bdsqlsz/status/2043981799693660215#m
where did he get those images?
>>
File: 1749297024678439.png (67.2 KB)
67.2 KB PNG
>>108603546
https://github.com/Comfy-Org/ComfyUI/pull/13369#issuecomment-423764215 9
get chinese culture'ed (again)
>>
>>
>>
File: 00015-2179401690.png (1.2 MB)
1.2 MB PNG
>>
File: Ernie-image.png (1.6 MB)
1.6 MB PNG
>>108603546
https://github.com/Comfy-Org/workflow_templates/blob/main/templates/im age_ernie_image-1.webp
Come on Comfy, the dude has 6 fingers on his left hand
>>
>>
>>108603585
Wait a minute...
https://civitai.com/models/2540444/anima-highresaesthetic-boost
>high-res support released as an official lora
Loras can do that? Sweet, gonna try it out.
>also comfy 0.19 came out, with an Intel portable release
Congrats to Intelbros
>>
File: 1759411545203475.png (52.1 KB)
52.1 KB PNG
>>108603546
that's bullshit, but I believe it
>>
File: Anima0.3+HiResLora_00001_compare.png (3.4 MB)
3.4 MB PNG
>>108603758
Left without lora, right with lora. The lighting gets fancier and proportions change a bit.
>>
>>
File: miku.png (1.9 MB)
1.9 MB PNG
>>108603552
lol
>>
>>
File: Anima0.3+HiResLora_00002_compare.png (3 MB)
3 MB PNG
>>108603878
A taller pic. The composition changed a lot with this one, even with the same seed and inputs unless I missed something. The periphery's less fuzzy, but her details look a bit more slopped.
>>
File: o_00235_.png (302.3 KB)
302.3 KB PNG
>>
File: miku 4.png (1.5 MB)
1.5 MB PNG
>>
File: miku 5.png (744.7 KB)
744.7 KB PNG
>>108603985
catbox?
>>
File: o_00236_.png (276.9 KB)
276.9 KB PNG
>>108604000
prompt was just:
cat, @umi \(srtm07\), smoking cigarette, spiral eyes
no negative prompt
>>
File: miku 6.png (1.5 MB)
1.5 MB PNG
>>108604015
The Japanese text on the previous one surprised me because anima isn't supposed to do that usually. Not that it's meaningful.
I guess just a lucky slop. Thanks.
>>
File: Anima0.3+HiResLora_00003_compare.png (3 MB)
3 MB PNG
>>108603983
Same prompt and seed, but 1280x1600. Different composition, butterface.
>>
>>
File: miku 7.png (772.6 KB)
772.6 KB PNG
>>
>>
File: 00001-3142776389.jpg (1.3 MB)
1.3 MB JPG
>>
File: deMA_zi_00020_.png (2.1 MB)
2.1 MB PNG
>>
File: _AnimaPreview3_00229_.jpg (564.2 KB)
564.2 KB JPG
>>
>>
File: o_00241_.png (989.9 KB)
989.9 KB PNG
>>
https://huggingface.co/baidu/ERNIE-Image
https://huggingface.co/baidu/ERNIE-Image-Turbo
comfy workflow when? it seems it was already patched in but i don't see any nod
>>
File: 1749532566066051.jpg (919.9 KB)
919.9 KB JPG
>>
>>
File: 1757044410024398.jpg (2.2 MB)
2.2 MB JPG
>>108604511
you can download the workflow here
https://github.com/Comfy-Org/workflow_templates/blob/main/templates/im age_ernie_image.json
>>
>>108604511
This looks good? Unless the images are aggressively cherry picked, we seem to have a decently capable model with non-slopped look and SOTA text capability at just 8B. Hopefully it isn't slow as shit to run inference. And responds well to training.
>>108604519
Kino gen
>>
File: Ernie.png (1.3 MB)
1.3 MB PNG
>>108604511
>>108604578
https://huggingface.co/Comfy-Org/ERNIE-Image
ok that's pretty good
>>
File: 1760655397965910.png (3.6 MB)
3.6 MB PNG
>>108604511
>>108604578
>>108604636
>ERNIE-Image: Our SFT model, delivers stronger general-purpose capability and instruction fidelity
>ERNIE-Image-Turbo: Our Turbo model, optimized by DMD and RL, achieves faster speed and higher aesthetics
I'm getting mixed signials, which one is the least slopped ultimately?
https://yiyan.baidu.com/blog/posts/ernie-image
>>
>>
>>
>>108604659
the text seems next level, and it doesn't look really slopped, can't believe Z-image turbo got beaten so quickly lmao (4chan get your shit together why are you bugging now we have a new decent model I wanna discuss about it!!)
>>
File: 1749319865213888.png (1.5 MB)
1.5 MB PNG
>>108604659
>https://yiyan.baidu.com/blog/posts/ernie-image
Anima btfo!!
>>
Fresh when ready
>>108604726
>>108604726
>>108604726
>>
>>
>>108604729
It's not perfect (green eye in the middle for example), but impressive character consistency for a local model doing multiple views gen.
I am still downloading and haven't tested yet so I don't want to jinx it but we might be eating good with this one.