/g/ - Thread 107977622 | defchan Proxy

/g/

Thread #107977622 | Image & Video Expansion | Click to Play

Home Index Catalog All Threads New Thread Reply

Anonymous
/lmg/ - Local Models General 01/26/26(Mon)23:42:04 No.107977622

/lmg/ - Local Models General Anonymous 01/26/26(Mon)23:42:04 No.107977622 [Reply]▶

File: __kasane_teto_utau_drawn_by_ko_otori127__6549fee025d2050303ed93c2d7648917.jpg (883.9 KB)

883.9 KB JPG

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107968112 & >>107957082

►News
>(01/25) Merged kv-cache : support V-less cache #19067: https://github.com/ggml-org/llama.cpp/pull/19067
>(01/22) Qwen3-TTS (0.6B & 1.8B) with voice design, cloning, and generation: https://qwen.ai/blog?id=qwen3tts-0115
>(01/21) Chroma-4B released: https://hf.co/FlashLabs/Chroma-4B
>(01/21) VibeVoice-ASR 9B released: https://hf.co/microsoft/VibeVoice-ASR
>(01/21) Step3-VL-10B with Parallel Coordinated Reasoning: https://hf.co/stepfun-ai/Step3-VL-10B
>(01/19) GLM-4.7-Flash 30B-A3B released: https://hf.co/zai-org/GLM-4.7-Flash

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

348 RepliesView Thread

Showing all 348 replies.

Anonymous
01/26/26(Mon)23:42:25 No.107977624

Anonymous 01/26/26(Mon)23:42:25 No.107977624▶

File: rec.jpg (180.6 KB)

180.6 KB JPG

►Recent Highlights from the Previous Thread: >>107968112

--Resolving tool calling issues with llama.cpp:
>107969771 >107969843 >107970900 >107972629 >107969878 >107969911 >107969974 >107970015 >107970034 >107970049 >107970124 >107973371 >107973409 >107973429 >107973456
--Realtime TTS options with voice cloning and finetuning support:
>107969100 >107969574 >107969781 >107969992 >107972780 >107975376 >107975407
--Addressing llama.cpp's versioning and testing phase concerns:
>107971580 >107971606
--QWEN3TTS voice cloning and tone modulation limitations:
>107971144 >107971184 >107971200 >107971265 >107971246
--GLM 4.7 implementation issues in llama.cpp and attention mechanism debates:
>107968564 >107968573 >107968588 >107971627 >107968640 >107968711 >107968729 >107968779 >107968793 >107968818 >107968900 >107968820 >107974101 >107974155
--Tencent's closed-source HunyuanImage 3.0-Instruct multimodal model:
>107970431 >107970564 >107970572 >107970578
--llama.cpp direct-io bug causing VRAM issues with large models:
>107973134
--Engram's impact on local hardware and performance scaling:
>107968191 >107968288 >107968424 >107968431 >107968505 >107970865 >107976379 >107969900 >107969936 >107970033 >107976430 >107976704 >107976901
--Evaluating Echo-TTS performance and optimization techniques:
>107974691 >107974768 >107974830 >107974808 >107974867 >107974919 >107974964 >107974915 >107975384
--LLMs' potential in creating non-browser desktop apps from web interfaces:
>107973002 >107973135 >107973205 >107973646 >107973374
--Engram architecture's impact on model design:
>107976466 >107976509 >107976516 >107976576 >107976668
--Comparing Qwen3TTS and IndexTTS2 for emotional voice synthesis:
>107975441 >107975479 >107975570 >107975607 >107975639 >107975595 >107975727 >107977031
--Miku (free space):
>107968421 >107971408 >107971457 >107974122 >107976924

►Recent Highlight Posts from the Previous Thread: >>107968115

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
01/26/26(Mon)23:48:18 No.107977677

Anonymous 01/26/26(Mon)23:48:18 No.107977677▶

>>107977622
>ISO Enter key
Disgusting. Bad Teto bad bad.

Anonymous
01/26/26(Mon)23:53:18 No.107977709

Anonymous 01/26/26(Mon)23:53:18 No.107977709▶

What consumer accessible GPU should I buy for running and training models (or is that folly, and I should just pay for compute on some cloud)? I can't afford a 5080 or above. I was looking at the 16GB AMD cards.

Anonymous
01/26/26(Mon)23:54:45 No.107977722

Anonymous 01/26/26(Mon)23:54:45 No.107977722▶

>>107977622
utau? more like uwau

Anonymous
01/26/26(Mon)23:58:22 No.107977742

Anonymous 01/26/26(Mon)23:58:22 No.107977742▶

which model gives the best blowjobs?

Anonymous
01/27/26(Tue)00:00:39 No.107977755

Anonymous 01/27/26(Tue)00:00:39 No.107977755▶

>>107977742
Qwen because it's pretty good at tool calling.

Anonymous
01/27/26(Tue)00:01:02 No.107977762

Anonymous 01/27/26(Tue)00:01:02 No.107977762▶

>>107977709
For running get the card with the most VRAM you can afford. For training, cloud is the only reasonable option.

Anonymous
01/27/26(Tue)00:07:26 No.107977807

Anonymous 01/27/26(Tue)00:07:26 No.107977807▶

Here, some light drama.
>https://github.com/ggml-org/llama.cpp/pull/19085

Anonymous
01/27/26(Tue)00:14:25 No.107977838

Anonymous 01/27/26(Tue)00:14:25 No.107977838▶

>>107977807
>drama
he's right though

Anonymous
01/27/26(Tue)00:15:25 No.107977839

Anonymous 01/27/26(Tue)00:15:25 No.107977839▶

>>107977709
you can train models with a blackwell 6000. training is out of reach for your budget unless you do some crazy rig

Anonymous
01/27/26(Tue)00:21:39 No.107977876

Anonymous 01/27/26(Tue)00:21:39 No.107977876▶

>>107977807
>greentexting a link

Anonymous
01/27/26(Tue)00:27:27 No.107977910

Anonymous 01/27/26(Tue)00:27:27 No.107977910▶

>>107977876
stop being autistic

Anonymous
01/27/26(Tue)00:29:12 No.107977915

Anonymous 01/27/26(Tue)00:29:12 No.107977915▶

File: file.png (85.3 KB)

85.3 KB PNG

>>107977677
ANSI is shit because of pic related.
One of the character keys is just randomly sized differently.
ISO fixes that.

Anonymous
01/27/26(Tue)00:29:54 No.107977918

Anonymous 01/27/26(Tue)00:29:54 No.107977918▶

>>107977876
>greentexting
Quoting. That's quoting. I quoted you. You've been quoted. Quoting happened.
He's quoting the content at the link.

Anonymous
01/27/26(Tue)00:30:58 No.107977927

Anonymous 01/27/26(Tue)00:30:58 No.107977927▶

>>107977918
NTA and I don't care about the greentexted link but absolutely no part of the CONTENT of the link is being quoted.

Anonymous
01/27/26(Tue)00:31:52 No.107977936

Anonymous 01/27/26(Tue)00:31:52 No.107977936▶

>>107977918
>of the link
*at* the link

Anonymous
01/27/26(Tue)00:32:31 No.107977944

Anonymous 01/27/26(Tue)00:32:31 No.107977944▶

Quick question, what can I use to sample my own voice so I can make speeches with just text later?

Anonymous
01/27/26(Tue)00:32:36 No.107977945

Anonymous 01/27/26(Tue)00:32:36 No.107977945▶

Here's a tip that might have been obvious for everybody but me.
If you are going to use some form of structured output (BNF, Json Schema), you might want to have the model output normally, then take that response and send it back to the model, asking for that in whatever structured form you want.
That way you don't have to contend with the drop in output quality that you sometime get when using that kind of functionality.
Probably more useful for smaller, dumber models.

>>107977838
I know.

>>107977876
That's always been my style.
>>107977918 gets it.

Anonymous
01/27/26(Tue)00:33:27 No.107977951

Anonymous 01/27/26(Tue)00:33:27 No.107977951▶

>>107977927
Fuck. >>107977936 was for you.

Anonymous
01/27/26(Tue)00:34:25 No.107977955

Anonymous 01/27/26(Tue)00:34:25 No.107977955▶

>>107977944
A microphone

Anonymous
01/27/26(Tue)00:35:51 No.107977965

Anonymous 01/27/26(Tue)00:35:51 No.107977965▶

>>107977955
I worded that wrong.
What can I use to generate speeches based on my own sample*

Anonymous
01/27/26(Tue)00:37:03 No.107977974

Anonymous 01/27/26(Tue)00:37:03 No.107977974▶

>>107977945
Good tip. Of course for some situations you can create a custom grammar / parser that allows the model to write in a way that doesn't hinder quality while still being parseable and containing the information you need

Anonymous
01/27/26(Tue)00:37:39 No.107977978

Anonymous 01/27/26(Tue)00:37:39 No.107977978▶

>>107977965
vibe voice 7B is still the best, but you need at least a 3090 for it.

Anonymous
01/27/26(Tue)00:39:25 No.107977985

Anonymous 01/27/26(Tue)00:39:25 No.107977985▶

>>107977945
Wouldn't thinking solve this?

Anonymous
01/27/26(Tue)00:39:43 No.107977988

Anonymous 01/27/26(Tue)00:39:43 No.107977988▶

>>107977978
Thanks anon

Anonymous
01/27/26(Tue)00:40:17 No.107977993

Anonymous 01/27/26(Tue)00:40:17 No.107977993▶

>>107977985
Thinking won't solve anything

Anonymous
01/27/26(Tue)00:41:30 No.107978003

Anonymous 01/27/26(Tue)00:41:30 No.107978003▶

>>107977985
At least with llama.cpp, when you use structured output the model can't think since the whole output has to conform to the structure.
Of course, if you are using BNF, you can just write something that only kicks in after , I suppose.

Anonymous
01/27/26(Tue)00:42:42 No.107978009

Anonymous 01/27/26(Tue)00:42:42 No.107978009▶

Local Microphone General.

Anonymous
01/27/26(Tue)00:44:41 No.107978029

Anonymous 01/27/26(Tue)00:44:41 No.107978029▶

>>107978009
Its a macrophone okay?

Anonymous
01/27/26(Tue)00:48:06 No.107978052

Anonymous 01/27/26(Tue)00:48:06 No.107978052▶

>>107977910
it's annoying. can't double click and open in new tab, have to select it carefully like some stupid humiliation ritual

Anonymous
01/27/26(Tue)00:50:01 No.107978061

Anonymous 01/27/26(Tue)00:50:01 No.107978061▶

>>107978052
Okay, that's a fair point actually.

Anonymous
01/27/26(Tue)00:50:53 No.107978070

Anonymous 01/27/26(Tue)00:50:53 No.107978070▶

>>107978052
I'm not trying to excuse him but your 4chan-x?

Anonymous
01/27/26(Tue)00:56:27 No.107978100

Anonymous 01/27/26(Tue)00:56:27 No.107978100▶

>>107978070
You don't even need 4chan-x, you can make urls clickable with the native extension, that guy's just retarded

Anonymous
01/27/26(Tue)00:57:49 No.107978106

Anonymous 01/27/26(Tue)00:57:49 No.107978106▶

/lmg/ Fine Motor Skills not required.

Anonymous
01/27/26(Tue)00:58:49 No.107978112

Anonymous 01/27/26(Tue)00:58:49 No.107978112▶

I have 16gb of vram and i run oom on every video gen even on supposedly low vram models and workflows and it's pissing me off, Fedora 43 (Nobara) with comfyUI and a 5070xt

Anonymous
01/27/26(Tue)00:59:03 No.107978114

Anonymous 01/27/26(Tue)00:59:03 No.107978114▶

>>107978106
We are here to read, not jiggle around stupid peripherals.

llama.cpp CUDA dev
01/27/26(Tue)01:09:12 No.107978168

llama.cpp CUDA dev 01/27/26(Tue)01:09:12 No.107978168▶

>>107978114
I jiggle my peripherals while I read loli NTR

Anonymous
01/27/26(Tue)01:11:34 No.107978176

Anonymous 01/27/26(Tue)01:11:34 No.107978176▶

>>107978112
video gen basically requires at least 24gb of vram unless you are using a heavily quantized model. try the q3ks or q4ks of this model: https://huggingface.co/bullerwins/Wan2.2-I2V-A14B-GGUF

Anonymous
01/27/26(Tue)01:12:42 No.107978186

Anonymous 01/27/26(Tue)01:12:42 No.107978186▶

>>107978176
Even for shitty 640p videos?
At least i'll give this a try.

Anonymous
01/27/26(Tue)01:16:26 No.107978208

Anonymous 01/27/26(Tue)01:16:26 No.107978208▶

>>107978186
yep. text gen is unique in that you can use ram with minimal consequences on some models. video and image gen are much less forgiving.

Anonymous
01/27/26(Tue)01:18:06 No.107978220

Anonymous 01/27/26(Tue)01:18:06 No.107978220▶

>>107978208
Fuck man..hopefully we start seeing models optimization to fuck over the jacket gook in the future.

Anonymous
01/27/26(Tue)01:19:31 No.107978233

Anonymous 01/27/26(Tue)01:19:31 No.107978233▶

>>107978220
that will only happen once the bubble cools down and competitions moves from moar params to moar usability

Anonymous
01/27/26(Tue)01:39:28 No.107978382

Anonymous 01/27/26(Tue)01:39:28 No.107978382▶

File: file.png (21.8 KB)

21.8 KB PNG

GLM is wild.

Anonymous
01/27/26(Tue)01:42:16 No.107978399

Anonymous 01/27/26(Tue)01:42:16 No.107978399▶

>>107978382
>fits /lmg/ perfectly
>half of the thread vanishes

Anonymous
01/27/26(Tue)01:43:00 No.107978405

Anonymous 01/27/26(Tue)01:43:00 No.107978405▶

>random obsession about transsexuals and e-celeb drama
is aicg leaking again

Anonymous
01/27/26(Tue)01:51:13 No.107978447

Anonymous 01/27/26(Tue)01:51:13 No.107978447▶

God I hate PDF format so fucking much you won't believe how much I hate the format. All I want is to convert highly technical books into epub for easier reading on an e-reader device. I've done a conversion using DeepSeek-OCR and that was pretty OK, but it output the Formulas in LaTeX instead of MathML?
Also I need to figure out how to get the bounding boxes to be better. Maybe I should use the less quantize model, but Q8 can go through 7 pages per second.

Also I just noticed i proomted wrong, why do I proompt for markdown if I want epub?

Anonymous
01/27/26(Tue)02:00:04 No.107978495

Anonymous 01/27/26(Tue)02:00:04 No.107978495▶

>>107978447
why do you hate pdfs?

Anonymous
01/27/26(Tue)02:01:42 No.107978506

Anonymous 01/27/26(Tue)02:01:42 No.107978506▶

>>107978495
I have a ebook reader with a 6" or such screen, try reading a PDF on that.

Anonymous
01/27/26(Tue)02:01:45 No.107978507

Anonymous 01/27/26(Tue)02:01:45 No.107978507▶

>>107978447
Wouldn't it be easier to use some sort of utility instead of a llm? Epib is just a zipped html file with extra css.

Anonymous
01/27/26(Tue)02:05:36 No.107978525

Anonymous 01/27/26(Tue)02:05:36 No.107978525▶

>>107978507
I have tried anon. I have tried with calibre. It throws errors, it is kind of crappy, and it annoys the fuck out of me. Ain't the only one who thinks like that, there's some asian out there who built pdf-craft. LLM/OCR becomes really useful when you have to deal with figures and stuff, something which traditional OCR often struggles with, and don't get me started on formulas, they can't do that right either. Technically I should be able to preserve the layout with DeepSeek-OCR, which is also pretty nice (and good for technical books, which make up the majority of my library).
Tools are great for romance novels and crap like that, but that is not what I want to read.

Anonymous
01/27/26(Tue)02:09:22 No.107978538

Anonymous 01/27/26(Tue)02:09:22 No.107978538▶

>>107978447
Try dots ocr. I find it better than deepseek ocr

Anonymous
01/27/26(Tue)02:11:37 No.107978549

Anonymous 01/27/26(Tue)02:11:37 No.107978549▶

>>107978525
Buy a bigger ereader and save your sanity.

Anonymous
01/27/26(Tue)02:12:30 No.107978554

Anonymous 01/27/26(Tue)02:12:30 No.107978554▶

>>107978525
What about PDF to LaTeX, then to ePub?

Anonymous
01/27/26(Tue)02:17:04 No.107978579

Anonymous 01/27/26(Tue)02:17:04 No.107978579▶

>>107978554
That is something I will try soon.

>>107978549
Oh yes, let me just go to the money tree and shake it, maybe then I'll have the money to buy a new ereader. I don't even know if Color E-Readers for A4 format exist nowadays.

>>107978538
I will keep a note of that, but so far the github has some lines:
"Complex Document Elements: Table&Formula: dots.ocr is not yet perfect for high-complexity tables and formula extraction. Picture: Pictures in documents are currently not parsed."
"Performance Bottleneck: Despite its 1.7B parameter LLM foundation, dots.ocr is not yet optimized for high-throughput processing of large PDF volumes."
My books have upwards of 1000 pages. No hurt in trying it though.

Anonymous
01/27/26(Tue)02:39:17 No.107978702

Anonymous 01/27/26(Tue)02:39:17 No.107978702▶

I just got about $6k and I'm thinking about getting a 16" MBP M4 Max w/ 128GB of RAM and 4TB storage. I hope the models you can run on it don't suck and I end up crawling back to one of the big providers.

Anonymous
01/27/26(Tue)02:40:16 No.107978706

Anonymous 01/27/26(Tue)02:40:16 No.107978706▶

>>107978702
this is a fucking terrible idea

Anonymous
01/27/26(Tue)02:40:53 No.107978710

Anonymous 01/27/26(Tue)02:40:53 No.107978710▶

File: frankenweenie pastelVE toshiaki weird girl fan art pixiv.jpg (2.4 MB)

2.4 MB JPG

>>107975384
>>107975389
voice cloning and emotions in Vibevoice works for me. cfg slider set to 4. Prompt:
[fired-up shouting, determined tone] We are gonna win this time!
Input audio:
https://vocaroo.com/14H42IjW5lnk
Ouput audio:
https://vocaroo.com/12rnzDBUr4cd

Anonymous
01/27/26(Tue)02:43:13 No.107978717

Anonymous 01/27/26(Tue)02:43:13 No.107978717▶

>>107978706
What's a better idea?

Anonymous
01/27/26(Tue)02:44:20 No.107978726

Anonymous 01/27/26(Tue)02:44:20 No.107978726▶

>>107978717
wasting that 6k on sonnet

Anonymous
01/27/26(Tue)02:46:12 No.107978732

Anonymous 01/27/26(Tue)02:46:12 No.107978732▶

>>107978717
getting quad 3090s or something. if youre gonna get a mac, at least get the 512gb version.

Anonymous
01/27/26(Tue)02:48:00 No.107978742

Anonymous 01/27/26(Tue)02:48:00 No.107978742▶

>>107978726
sonnet? One of the Anthropic models? I already have a Claude Code subscription. I'm trying to get away from that.

Anonymous
01/27/26(Tue)02:48:47 No.107978747

Anonymous 01/27/26(Tue)02:48:47 No.107978747▶

>>107978742
you're not going anywhere with a mac, especially not one with 128 gigs of RAM

Anonymous
01/27/26(Tue)02:50:33 No.107978759

Anonymous 01/27/26(Tue)02:50:33 No.107978759▶

>>107978732
I travel a lot. Every month. So keep my current laptop and run a machine in my house to remote into?

Anonymous
01/27/26(Tue)02:51:33 No.107978764

Anonymous 01/27/26(Tue)02:51:33 No.107978764▶

>>107978759
thats what most of us here do. you will want a server motherboard with ipmi if you go that route.

Anonymous
01/27/26(Tue)02:51:55 No.107978767

Anonymous 01/27/26(Tue)02:51:55 No.107978767▶

>>107978759
That’s what I do. Wireguard + bigass server at home

Anonymous
01/27/26(Tue)02:53:11 No.107978777

Anonymous 01/27/26(Tue)02:53:11 No.107978777▶

>itoddlers willing to waste $6k on subpar hardware
Apple isn't milking these retards enough

Anonymous
01/27/26(Tue)02:55:00 No.107978783

Anonymous 01/27/26(Tue)02:55:00 No.107978783▶

>>107978777
Not those anons but anything else you can recommend under 10k? I already have my PC but I'm curious if there's any alternatives now that RAM prices shot up so much that it makes macs look reasonable.

Anonymous
01/27/26(Tue)02:55:59 No.107978787

Anonymous 01/27/26(Tue)02:55:59 No.107978787▶

>>107978783
if you want a good deal, your only choice is to wait.

Anonymous
01/27/26(Tue)02:59:22 No.107978804

Anonymous 01/27/26(Tue)02:59:22 No.107978804▶

>>107978783
You could check out FrameWork Desktop but I think they don't deliver atm.

Anonymous
01/27/26(Tue)03:03:28 No.107978821

Anonymous 01/27/26(Tue)03:03:28 No.107978821▶

>>107978783
https://youtu.be/EPjA1Lm4ftY Strix Halo mini pcs are pretty good, most of them perform better or have better networking than framework's itx board, this one even has 80Gb USB4v2 you can plug eGPUs into once GPU prices are less insane.

Anonymous
01/27/26(Tue)03:09:25 No.107978850

Anonymous 01/27/26(Tue)03:09:25 No.107978850▶

>>107978821
>once GPU prices are less insane
But who knows when that will happen? DeepSeek V4 Mini Flash or something will come out soon, everyone will want to run that. NVIDIA is not fucking with us poors any more. AMD is shit.
It's grim.

Anonymous
01/27/26(Tue)03:11:41 No.107978857

Anonymous 01/27/26(Tue)03:11:41 No.107978857▶

Is it worth it to spend $2k to upgrade to a server motherboard?

Anonymous
01/27/26(Tue)03:12:43 No.107978862

Anonymous 01/27/26(Tue)03:12:43 No.107978862▶

>>107978850
It's more likely dedicated AI accellerator cards like the furiosa become available for the consumer market at this point.

Anonymous
01/27/26(Tue)03:13:41 No.107978869

Anonymous 01/27/26(Tue)03:13:41 No.107978869▶

>>107978857
why bother upgrading to a server motherboard if you cant afford a server cpu or server ram?

Anonymous
01/27/26(Tue)03:18:32 No.107978890

Anonymous 01/27/26(Tue)03:18:32 No.107978890▶

>>107978869
Are CPU prices getting fucked as well? I've got RAM and was saving up for a cpu+mobo

Anonymous
01/27/26(Tue)03:18:59 No.107978892

Anonymous 01/27/26(Tue)03:18:59 No.107978892▶

>>107978710
What are you using to do this? When I try to add instructions in the comfyui node it just reads them out loud

Anonymous
01/27/26(Tue)03:19:33 No.107978896

Anonymous 01/27/26(Tue)03:19:33 No.107978896▶

>>107978869
$2k including all that. ROMED8-2T + Epyc 7f52 + 128GB RAM.

Anonymous
01/27/26(Tue)03:20:13 No.107978898

Anonymous 01/27/26(Tue)03:20:13 No.107978898▶

>>107978850
You can use the 128GB of RAM with the iGPU as unified memory under Linux, or allocate 96GB to it in the UEFI for Windows. The USB4v2 lets you add dedicated PCI-E GPUs on top of that via docks I'm not saying they're required to make use of the device for running AI models.

Anonymous
01/27/26(Tue)03:23:55 No.107978915

Anonymous 01/27/26(Tue)03:23:55 No.107978915▶

>>107978890
RAM spec?

Anonymous
01/27/26(Tue)03:24:08 No.107978916

Anonymous 01/27/26(Tue)03:24:08 No.107978916▶

>>107978880
I know, but I have some books where Calibre just keeps throwing errors. Gotten pretty far, DeepSeek-OCR now properly gets me the figures, soon it will also give me formulas, then I'm happy.

Anonymous
01/27/26(Tue)03:24:40 No.107978920

Anonymous 01/27/26(Tue)03:24:40 No.107978920▶

>>107978896
It's fine for the current economy I guess. Crazy how this would've cost you almost a grand less a year ago.

Anonymous
01/27/26(Tue)03:29:46 No.107978938

Anonymous 01/27/26(Tue)03:29:46 No.107978938▶

>>107978898
Ok, I see. But are you also saying the Strix Halo has better perf than the MBP or just better value?

Anonymous
01/27/26(Tue)03:35:24 No.107978960

Anonymous 01/27/26(Tue)03:35:24 No.107978960▶

>>107978898
Would an Intel igpu have significantly better performance than going straight CPU? I suppose I should look into their openvino stuff.
The sad part is I enjoy building the rig and finding ways to transform what once was ewaste into AI machines more than the AI itself. Once it is up and running I find I have little to ask th machine.

Anonymous
01/27/26(Tue)03:39:15 No.107978980

Anonymous 01/27/26(Tue)03:39:15 No.107978980▶

Local PCbuilding Thread

Anonymous
01/27/26(Tue)03:40:40 No.107978988

Anonymous 01/27/26(Tue)03:40:40 No.107978988▶

>>107978938
Better value + future upgrade options for better performance, ZLUDA on the horizon means more software compatibility in the long run too, though even now more projects have rocm support than metal. I think if you're purely looking to run llama.cpp on it though the macbook would give you more tokens per second but lacks the ability to add dedicated GPUs to later on due to MacOS.
>>107978960
Honestly the only Intel iGPU I've used llama.cpp on is an n5105, but it did get me up to 7T/s from 3T/s on CPU only using vulkan backend on a 3B model. Their new laptop iGPUs are a little weaker but not super far off strix halo's from what I've seen so far so they may be worth considering.
>>107978980
How do you run local models if you don't have a local PC to run them on retard-kun?

Anonymous
01/27/26(Tue)03:48:32 No.107979026

Anonymous 01/27/26(Tue)03:48:32 No.107979026▶

>>107978980
Need to build a PC to run local models on. Either way, people here are more knowledgeable and build more complicated setups than either /pcbg/ (8gb ram + gaming gpu thread) or /hsg/ (install pihole on an rpi and larp as a sysadmin thread).

Anonymous
01/27/26(Tue)03:59:08 No.107979069

Anonymous 01/27/26(Tue)03:59:08 No.107979069▶

Why am i oom on workflows with a 9070xt when it was working previously on a 4070 ?

Anonymous
01/27/26(Tue)04:00:14 No.107979072

Anonymous 01/27/26(Tue)04:00:14 No.107979072▶

>>107978447
uhhh couldn't you do this with a vlm like a sane individual?
Do a first pass over the pdf with pymupdf/docling, do it so it notes placement of images/extracts them, and then pass those images+ context to a VLM for captioning, which you then add in to the epub file with your parsed text.
Alternatively, try https://github.com/datalab-to/chandra

Anonymous
01/27/26(Tue)04:04:21 No.107979089

Anonymous 01/27/26(Tue)04:04:21 No.107979089▶

>>107979069
Does flash attention support 9070xt?

Anonymous
01/27/26(Tue)04:05:08 No.107979094

Anonymous 01/27/26(Tue)04:05:08 No.107979094▶

>>107979089
No idea ?

Anonymous
01/27/26(Tue)04:05:55 No.107979099

Anonymous 01/27/26(Tue)04:05:55 No.107979099▶

I"m kinda bored with glm air and a cope quant of 4.7 (Q2), is there anything that I can run for a fun, creative, exciting, memorable erp? I've only got 32gb vram and 128gbram. Are there any meme tunes out there that are actually good?

Anonymous
01/27/26(Tue)04:07:26 No.107979108

Anonymous 01/27/26(Tue)04:07:26 No.107979108▶

>>107979099
I haven't used it, but you could try minimax. Also what's your t/s on q2 of 4.7?

Anonymous
01/27/26(Tue)04:09:35 No.107979118

Anonymous 01/27/26(Tue)04:09:35 No.107979118▶

>>107979108
4 t/s when I use ik_llama - it's not ideal but usable.

glm air is lightning fast but nearly as good as 4.7 @q2

Anonymous
01/27/26(Tue)04:11:49 No.107979125

Anonymous 01/27/26(Tue)04:11:49 No.107979125▶

File: d2f1df4a617c4fe3858439f03e2a4ca9.png (12.5 KB)

12.5 KB PNG

>>107979094
It probably doesn't, as well as xformers, etc. It saves a lot of vram

Anonymous
01/27/26(Tue)04:12:39 No.107979131

Anonymous 01/27/26(Tue)04:12:39 No.107979131▶

DeepSeek-OCR-2 (seems relevant)
https://github.com/deepseek-ai/DeepSeek-OCR-2
https://huggingface.co/deepseek-ai/DeepSeek-OCR-2
https://github.com/deepseek-ai/DeepSeek-OCR-2/blob/main/DeepSeek_OCR2_paper.pdf

Anonymous
01/27/26(Tue)04:12:59 No.107979132

Anonymous 01/27/26(Tue)04:12:59 No.107979132▶

File: 32.png (35.1 KB)

35.1 KB PNG

i know lmg doesnt like ollama but i just want to set things up and maybe if i like it i will migrate to llama cpp. one thing i don't get is models with :cloud suffix like

https://ollama.com/library/glm-4.7:cloud

i guess these are hosted in the cloud but do i need to pay something? also i've seen on hugginface the actual glm 4.7 model but aparently it's not actually free? how is it not free but also downloadable from huggingface? please respond

Anonymous
01/27/26(Tue)04:13:31 No.107979136

Anonymous 01/27/26(Tue)04:13:31 No.107979136▶

>>107979125
So wait for new nodes..?

Anonymous
01/27/26(Tue)04:20:10 No.107979153

Anonymous 01/27/26(Tue)04:20:10 No.107979153▶

File: 1511667108879.png (298.1 KB)

298.1 KB PNG

Does flash attention come prepacked in kobold? Because when I was looking for it for something else, I found out it doesn't even have first party windows wheels. Have I just not been using it at all, all this time?

Anonymous
01/27/26(Tue)04:24:29 No.107979174

Anonymous 01/27/26(Tue)04:24:29 No.107979174▶

>>107979136
like, forever https://github.com/ROCm/composable_kernel/issues/1958
Buy NVidia next time, sorry

Anonymous
01/27/26(Tue)04:25:42 No.107979181

Anonymous 01/27/26(Tue)04:25:42 No.107979181▶

>>107979089
https://github.com/ROCm/flash-attention
>>107979153
llama.cpp has its own flash attention implementation, kobold.cpp uses that on the backend so you can just pass -fa on the command to enable it no python wheels required, works on rocm and vulkan not just cuda.

Anonymous
01/27/26(Tue)04:26:32 No.107979189

Anonymous 01/27/26(Tue)04:26:32 No.107979189▶

>>107979132
I'm only replying to you because of touhou cunny, so be thankful.

GLM 4.7 is open source and free if you have the hardware to run it. Q4 of 4.7 is around 200gb, and the thumb of rule is that you need at least that much VRAM/RAM to run it. Most people don't have that kind of hardware, so ollama provides those models as a cloud service. Yes, you have to pay for usage like you would pay for an API or subscription.

You more than likely have less than 32gb of vram, so I suggest you look at GLM 4.7 Flash, which is a smaller 30b parameter model. Also, stop using ollama. Ooba and kobold.cpp are nearly as braindead as ollama but so much better.

Anonymous
01/27/26(Tue)04:29:02 No.107979204

Anonymous 01/27/26(Tue)04:29:02 No.107979204▶

>>107979181
Thanks, now would this work with my card, do i have to do extra steps for this to work with comfy, and do i need to add extra arguments on startup?,

Anonymous
01/27/26(Tue)04:30:05 No.107979214

Anonymous 01/27/26(Tue)04:30:05 No.107979214▶

File: 3fdf5462-319e-40e1-b664-18e31cd43e40.png (1.5 MB)

1.5 MB PNG

>>107979181
>FlashAttention-2 ROCm CK backend currently supports:
>MI200x, MI250x, MI300x, and MI355x GPUs.

Anonymous
01/27/26(Tue)04:30:19 No.107979216

Anonymous 01/27/26(Tue)04:30:19 No.107979216▶

>>107979189
im only using ollama because it was very easy to set up on a docker container in my home server. if llama.cpp can provide a clean rest api the same way ollama does i'll make the change but i haven't looked into it

Anonymous
01/27/26(Tue)04:31:55 No.107979225

Anonymous 01/27/26(Tue)04:31:55 No.107979225▶

>>107979204
>>107979214
https://github.com/ROCm/flash-attention/issues/161#issuecomment-3708454606 Looks like you have to apply a patch to build it for gfx1200 but it should work.

Anonymous
01/27/26(Tue)04:40:59 No.107979272

Anonymous 01/27/26(Tue)04:40:59 No.107979272▶

>>107979216
You absolute FOOL. You IGNORAMUS. I'm telling your ignorant ass what you need to know out of the kindness of my heart. Both ooba and kobold are literally one click programs. Begone with you and don't return until you've switched.

Anonymous
01/27/26(Tue)04:41:21 No.107979275

Anonymous 01/27/26(Tue)04:41:21 No.107979275▶

>>107979189
>I'm only replying to you because of touhou cunny
Next time, just ignore ollama shills. You're encouraging them.

Anonymous
01/27/26(Tue)04:42:37 No.107979285

Anonymous 01/27/26(Tue)04:42:37 No.107979285▶

>>107979204
--use-flash-attention

Anonymous
01/27/26(Tue)04:45:02 No.107979295

Anonymous 01/27/26(Tue)04:45:02 No.107979295▶

File: 0527a723-b612-4275-bb89-4bfc31724386.png (1.7 MB)

1.7 MB PNG

>>107979225
AMD brings so much unnecessary suffering. If all of this can be solved, why does everyone have to do it manually?

Anonymous
01/27/26(Tue)04:45:03 No.107979296

Anonymous 01/27/26(Tue)04:45:03 No.107979296▶

Tried different approaches. PDF->Image, of course. Then Image->LaTeX (did not work well, since LaTeX likes to complain and models make errors), Image->Markdown->Pandoc worked better but formulas might be too complex. Gonna try Chandra although with 12GB I am not sure if it will work. Dots.ocr also seems more sensible than DeepSeek-OCR.
Chandra hf model download is 17.5 GB that does not bode well.

>>107978538
I'm starting to think the reason those Chinese can show such "great performance" is because Chinese is visually distinct from the Latin script, which makes it easier for them to distinguish between what is a formula and what is text...which makes their models far less impressive.
>Handwriting — Doctor notes, filled forms, homework. Chandra reads cursive and messy print that trips up traditional OCR.
Kek, they made a machine so they can finally decipher doctors notes. Turns out they're all just hallucinating, more news at 11!
>VLM
Anon, isn't e.g. dots.ocr based upon Qwen2.5-VL? I need something rigorous.

>>107979131
Interesting. But 0 information about hardware requirements (after a quick glance).

Anonymous
01/27/26(Tue)04:50:19 No.107979314

Anonymous 01/27/26(Tue)04:50:19 No.107979314▶

>>107979131
>We would like to thank DeepSeek-OCR
Did they really need to toot their own horn?

>>107979296
>Interesting. But 0 information about hardware requirements (after a quick glance).
It's a 3B with the biggest chunk being a 0.5B Qwen2 as vision encoder.

Anonymous
01/27/26(Tue)04:58:29 No.107979346

Anonymous 01/27/26(Tue)04:58:29 No.107979346▶

>>107979314
Sounds manageable. Model file is around 7GB. Technically might be possible.
>Toot their own horn
I would understand if the paper had other authors, but no, it does not.

Anonymous
01/27/26(Tue)05:02:14 No.107979368

Anonymous 01/27/26(Tue)05:02:14 No.107979368▶

>>107979275
I want to give back even if it means subjecting myself to big dumb idiots. I want to live in a world where those who strive get the help they need.

Anonymous
01/27/26(Tue)05:08:53 No.107979400

Anonymous 01/27/26(Tue)05:08:53 No.107979400▶

What's the latest and greatest model for 16 GiB? Any 14b or maybe a little larger?

Anonymous
01/27/26(Tue)05:09:16 No.107979403

Anonymous 01/27/26(Tue)05:09:16 No.107979403▶

>>107979368
That's a bullshit excuse, you can afford to ignore the occasional ollama shill.

Anonymous
01/27/26(Tue)05:15:17 No.107979432

Anonymous 01/27/26(Tue)05:15:17 No.107979432▶

how much vram and ram do you think i would need to vibe code a small app? i heard 32k+ context minimum but how much would that be in vram and ram?

Anonymous
01/27/26(Tue)05:16:37 No.107979444

Anonymous 01/27/26(Tue)05:16:37 No.107979444▶

>>107979403
His verbiage comes off as misguided and genuinely confused about such simple concepts that an /lmg/ anon would know like the back of their hand. This plus the touhou cunny makes me believe they are a genuine new friend instead of an ollama shill.

I could be wrong, but I want to be nice.

Anonymous
01/27/26(Tue)05:21:48 No.107979469

Anonymous 01/27/26(Tue)05:21:48 No.107979469▶

>>107979444
Maybe what you need is to add ollama to your filters and disable stubs. You won't think about a person you never knew existed in the first place. And your attention will be diverted to other people.

Anonymous
01/27/26(Tue)05:24:47 No.107979483

Anonymous 01/27/26(Tue)05:24:47 No.107979483▶

>>107979432
dunno, never vibe coded, but I fit 152k context into 24gb with glm-4.7-flash iq4_xs (fp16 k cache)

Anonymous
01/27/26(Tue)05:25:36 No.107979486

Anonymous 01/27/26(Tue)05:25:36 No.107979486▶

>>107977955
Lmao thank you for that laugh

Anonymous
01/27/26(Tue)05:27:16 No.107979493

Anonymous 01/27/26(Tue)05:27:16 No.107979493▶

>>107979432
12GB VRAM+64GB VRAM works on my machine. If you use some tricks (see unsloth) you can get a decent sized context window.

Anonymous
01/27/26(Tue)05:28:48 No.107979501

Anonymous 01/27/26(Tue)05:28:48 No.107979501▶

I thought GLM was the based roleplay model company? Why did they released their first local-friendly model as a pure coding model with no base model?

Anonymous
01/27/26(Tue)05:29:27 No.107979504

Anonymous 01/27/26(Tue)05:29:27 No.107979504▶

>>107979483
opencode is kinda fun, but you'll need at least Q6

Anonymous
01/27/26(Tue)05:29:44 No.107979507

Anonymous 01/27/26(Tue)05:29:44 No.107979507▶

File: 887623.jpg (113.5 KB)

113.5 KB JPG

>Ollama is bad because it's 20ms slower than my anime based all in one chatbot.

Who cares, if you want finegrain just use llama.cpp and vibecode your own UI

Anonymous
01/27/26(Tue)05:30:34 No.107979512

Anonymous 01/27/26(Tue)05:30:34 No.107979512▶

>>107979444
You are a good person. We need more people like you and less paranoid schizophrenics.

Anonymous
01/27/26(Tue)05:30:53 No.107979515

Anonymous 01/27/26(Tue)05:30:53 No.107979515▶

File: file.png (2.1 MB)

2.1 MB PNG

>>107979295
I have a 9060 xt but honestly I havent set any of this shit up myself because it's such a hassle, I only run llama.cpp and sd.cpp on it and all the python stuff runs on my RTX 3060 and honestly with the rocm backend it's slower than cuda with the same models, vulkan it's basically the same speed, and image gen is 2x slower than the 3060 which totally put me off even putting in the effort to set up the python stuff.
Heres a Miku for the AMD AI feel 22.01s for the illustrious xl gen 35.74s for the image2image in flux klein 4b q8.

Anonymous
01/27/26(Tue)05:31:41 No.107979518

Anonymous 01/27/26(Tue)05:31:41 No.107979518▶

>>107979504
nah q4 is enough

Anonymous
01/27/26(Tue)05:37:01 No.107979541

Anonymous 01/27/26(Tue)05:37:01 No.107979541▶

>>107979504
yeah, I tested one config for a single 3090 with q4 and another for 2x3090 with q8 and max context. I'm using the huihui-abliterated version now because glm since 4.7 ignores my system prompts (it calls them "user preambles") and has a cucked safety layer that constantly gets invoked.

Anonymous
01/27/26(Tue)05:50:18 No.107979607

Anonymous 01/27/26(Tue)05:50:18 No.107979607▶

File: 614d6f49da61d.jpg (182.5 KB)

182.5 KB JPG

>Flash Attention failed, using default SDPA: schema_.has_value() INTERNAL ASSERT FAILED at "C:\\actions-runner\\_work\\pytorch\\pytorch\\pytorch\\aten\\src\\ATen/core/dispatch/OperatorEntry.h":84, please report a bug to PyTorch. Tried to access the schema for which doesn't have a schema registered yet

Anonymous
01/27/26(Tue)06:22:51 No.107979794

Anonymous 01/27/26(Tue)06:22:51 No.107979794▶

>>107977807
It may be a AI slop PR but old mate is in the right.

Anonymous
01/27/26(Tue)06:39:57 No.107979895

Anonymous 01/27/26(Tue)06:39:57 No.107979895▶

>>107979671

Anonymous
01/27/26(Tue)06:40:16 No.107979900

Anonymous 01/27/26(Tue)06:40:16 No.107979900▶

File: Bronshtein_Epub_Work.png (263.9 KB)

263.9 KB PNG

>1/2
Thank you anon who suggested Chandra. The GGUF model actually knows how to make formulas. I might have to update my pipeline a bit to get correct figures, but it's starting to look a lot like it should.

Anonymous
01/27/26(Tue)06:42:25 No.107979913

Anonymous 01/27/26(Tue)06:42:25 No.107979913▶

File: output-0420.png (129.3 KB)

129.3 KB PNG

>>107979900
>2/2
Original page of Bronshtein, I selected it because it is quite formula heavy and rather hard to convert. Yes, this is the original from the PDF. No I don't know why they didn't better align []A and []B.

Claude
01/27/26(Tue)08:22:51 No.107980374

Claude 01/27/26(Tue)08:22:51 No.107980374▶

im uploading .env to github and your rules can't stop me

Anonymous
01/27/26(Tue)08:25:56 No.107980392

Anonymous 01/27/26(Tue)08:25:56 No.107980392▶

>>107979225
I've been at it for hours now, on gfx1201 RDNA 4 rocm, it seems like on startup, flash-attention doesn't initialize.
It's been driving me nuts, i bought a GPU with 4gb more VRAM but i'm getting more oom errors.

Anonymous
01/27/26(Tue)08:28:24 No.107980411

Anonymous 01/27/26(Tue)08:28:24 No.107980411▶

>>107980392
Because you bought a Radeon instead of a GPU

Anonymous
01/27/26(Tue)08:38:23 No.107980459

Anonymous 01/27/26(Tue)08:38:23 No.107980459▶

Why is no one talking about this?

https://huggingface.co/moonshotai/Kimi-K2.5

Anonymous
01/27/26(Tue)08:39:13 No.107980460

Anonymous 01/27/26(Tue)08:39:13 No.107980460▶

>>107980459
2 big 4 my rig

Anonymous
01/27/26(Tue)08:41:31 No.107980470

Anonymous 01/27/26(Tue)08:41:31 No.107980470▶

>>107980392
Anon you do know flash attention (in llama.cpp) is quite buggy at the moment? Try with b7811 release.

Anonymous
01/27/26(Tue)08:44:36 No.107980484

Anonymous 01/27/26(Tue)08:44:36 No.107980484▶

>>107980459
We did in /aicg/: >>107979671
Too big for local.

Anonymous
01/27/26(Tue)08:45:50 No.107980491

Anonymous 01/27/26(Tue)08:45:50 No.107980491▶

So the DSv4 rumors were fake?

Anonymous
01/27/26(Tue)08:45:55 No.107980493

Anonymous 01/27/26(Tue)08:45:55 No.107980493▶

>>107980459
image input is cool, but did they finally made it not spend 30k tokens on a single query while think is enabled?

Anonymous
01/27/26(Tue)08:50:45 No.107980517

Anonymous 01/27/26(Tue)08:50:45 No.107980517▶

>>107980470
I'm using comfyUI under docker not llama, you're probably thinking of another anon.

Anonymous
01/27/26(Tue)08:50:50 No.107980519

Anonymous 01/27/26(Tue)08:50:50 No.107980519▶

>>107980392
https://gist.github.com/apollo-mg/ecba6a0c29323325a7ac3babf08e53be this might help

Anonymous
01/27/26(Tue)08:53:35 No.107980541

Anonymous 01/27/26(Tue)08:53:35 No.107980541▶

>>107979512
The schizos act as a gatekeeper to our precious esoteric knowledge. That being said, I wish they were a bit less mean spirited.
>>107980491
What to you mean and where did you get that impression?

Anonymous
01/27/26(Tue)08:54:46 No.107980550

Anonymous 01/27/26(Tue)08:54:46 No.107980550▶

Are there any A.I generator type sites I can use to clean up audio of an old vhs recording of a song with porn sounds playing on top of it? lmao

Its such a banger that I need to hear it in HD
https://youtu.be/rHd-fHxfi6I?si=YMeWpjbR_oxvHJ90&t=134

Anonymous
01/27/26(Tue)08:54:54 No.107980552

Anonymous 01/27/26(Tue)08:54:54 No.107980552▶

>>107980491
The only thing I've heard is that DSv4 training run failed because of the Chinese huawei chips they were forced to train on and it caused the Chinese government to open import of Nvidia chips again. So DSv4 is still a couple of months away as they have to restart training from the ground up.

Anonymous
01/27/26(Tue)08:54:57 No.107980553

Anonymous 01/27/26(Tue)08:54:57 No.107980553▶

File: file.png (15.2 KB)

15.2 KB PNG

>>107980491
prepare ur anus (or not, don't screencap this)

Anonymous
01/27/26(Tue)08:56:32 No.107980568

Anonymous 01/27/26(Tue)08:56:32 No.107980568▶

>>107980459
llama.cpp support for the image stuff?

Anonymous
01/27/26(Tue)08:57:33 No.107980574

Anonymous 01/27/26(Tue)08:57:33 No.107980574▶

>>107980459
>benchmaxxed

Anonymous
01/27/26(Tue)08:57:48 No.107980576

Anonymous 01/27/26(Tue)08:57:48 No.107980576▶

>>107980552
Do you have a source for this?

Anonymous
01/27/26(Tue)09:01:10 No.107980609

Anonymous 01/27/26(Tue)09:01:10 No.107980609▶

>>107980576
No but google it or ask some LLM and they will probably find where the rumours came from. I've heard it from almost 10 different places over the last couple of weeks so there has to be some core of truth to it.

Anonymous
01/27/26(Tue)09:04:32 No.107980641

Anonymous 01/27/26(Tue)09:04:32 No.107980641▶

>>107980609
https://arstechnica.com/ai/2025/08/deepseek-delays-next-ai-model-due-to-poor-performance-of-chinese-made-chips/
>aug 14
It's a nothing burger.

Anonymous
01/27/26(Tue)09:08:04 No.107980665

Anonymous 01/27/26(Tue)09:08:04 No.107980665▶

>>107980541
You're shitting the thread by engaging people that come here to talk about ollama. Help anyone else if you aren't a huge hypocrite. That's the only thing you have been told, to let them go away.

Anonymous
01/27/26(Tue)09:14:22 No.107980699

Anonymous 01/27/26(Tue)09:14:22 No.107980699▶

Why aren't AI labs making models that exactly fit in my hardware?

Anonymous
01/27/26(Tue)09:16:00 No.107980706

Anonymous 01/27/26(Tue)09:16:00 No.107980706▶

Help! Its one of those resident schizos!
>>107980665

Anonymous
01/27/26(Tue)09:17:59 No.107980717

Anonymous 01/27/26(Tue)09:17:59 No.107980717▶

>>107980459
was busy using it on api, for sure sota for local. Feels much closer to claude now than before

Anonymous
01/27/26(Tue)09:19:06 No.107980720

Anonymous 01/27/26(Tue)09:19:06 No.107980720▶

https://huggingface.co/jspaulsen/unmute-encoder

Anonymous
01/27/26(Tue)09:34:49 No.107980801

Anonymous 01/27/26(Tue)09:34:49 No.107980801▶

I'm leaning towards him being a hypocrite.

Anonymous
01/27/26(Tue)10:02:27 No.107980932

Anonymous 01/27/26(Tue)10:02:27 No.107980932▶

>>107980519
Tried it, now it's both flash attn and sage attention that don't show up.
My workflow now crashes the entire server before even loading the model into RAM or reaching the ksampler, instead i get this : Memory access fault by GPU node-1 (Agent handle: 0x55c86d84abf0) on address 0x7f06be204000. Reason: Page not present or supervisor privilege.
And server crash.

Anonymous
01/27/26(Tue)10:39:05 No.107981188

Anonymous 01/27/26(Tue)10:39:05 No.107981188▶

>>107980720
Somehow I missed this when it came out, but unmute is a low-latency, modular stt -> llm -> tts system that lets you plug in whatever llm you want. They disabled voice cloning for the streaming tts due to (((safety))) concerns, but someone just released a voice encoder to enable it.

Anonymous
01/27/26(Tue)10:43:01 No.107981204

Anonymous 01/27/26(Tue)10:43:01 No.107981204▶

>>107980484
>We did in /aicg/
any good? we're waiting for quants

>>107979216
>if llama.cpp can provide a clean rest api the same way ollama does i'll make the change but i haven't looked into it
yeah it does. ./llama-server -m /your/model.gguf --host 0.0.0.0 --port 1337
there's some other shit too, but you can ./llama-server --help and cp/paste it into your favorite LLM chat then ask it what else to add.
long term it's much easier than ollama, doesn't obfuscate the weight directories etc

>>107980706
>Help! Its one of those resident schizos!
and same rant about "shitting up the thread" each time
only encourages me to help ollama users more

Anonymous
01/27/26(Tue)10:48:32 No.107981240

Anonymous 01/27/26(Tue)10:48:32 No.107981240▶

>>107981204
I've been testing the k2.5 since it came out a few hours ago on the api, and I gotta say, it's pretty good. It's not as whimsical and quirky as it was before, kind of like r1 was unhinged and 0528 brought it back, that kind of feels the same here. I wouldn't say it's a claude killer, but I think it's the best we've had so far. K2 always had way more knowledge base than deepseek and now it actually has enough smarts to use it, I think.

Anonymous
01/27/26(Tue)11:05:23 No.107981334

Anonymous 01/27/26(Tue)11:05:23 No.107981334▶

>>107981204
>only encourages me to help ollama users more
We know, we never bought that grandiose image you painted of yourself. Honest people wouldn't bring that much attention to it. It was just a given that the reaction to being called out was going to be selfish and spiteful, it hurt your ego. You kept trying to equate "ignore ollama users" to "I can't help anybody!", and later you tried to save face abusing ad hominems. I don't think those things are in the toolset of a "good guy". You're just a selfish asshole.

Anonymous
01/27/26(Tue)11:05:48 No.107981337

Anonymous 01/27/26(Tue)11:05:48 No.107981337▶

How do layers and tensors and whatever relate to experts? Would using different quants for each expert be possible?

Anonymous
01/27/26(Tue)11:29:38 No.107981506

Anonymous 01/27/26(Tue)11:29:38 No.107981506▶

>>107977807
>Supply chain risk to whom? Ollama?
What kind of attitude is that. lol
ngxson reminds me of those power hungry mods in old ass anime forums. seems he gets more nasty with each passing day.

Anonymous
01/27/26(Tue)11:34:25 No.107981549

Anonymous 01/27/26(Tue)11:34:25 No.107981549▶

So echo-tts is benchodmaxxed?

Anonymous
01/27/26(Tue)11:36:35 No.107981571

Anonymous 01/27/26(Tue)11:36:35 No.107981571▶

>>107981506
Because he's right. It would be different if it was user input, but the chat template is in control of the person hosting the model.

Anonymous
01/27/26(Tue)11:38:58 No.107981584

Anonymous 01/27/26(Tue)11:38:58 No.107981584▶

>>107981571
He could have just said that but instead his snarky bullshit just smells like power tripping

Anonymous
01/27/26(Tue)11:40:43 No.107981598

Anonymous 01/27/26(Tue)11:40:43 No.107981598▶

>>107981584
he is under no obligation to put up with whatever AI spamming jeet trying to collect bug bounties comes along and spams PRs
it's not power tripping when you're the one doing the work

Anonymous
01/27/26(Tue)11:46:26 No.107981642

Anonymous 01/27/26(Tue)11:46:26 No.107981642▶

>>107981584
And you smell like an ollama shill that got his feelings hurt.

Anonymous
01/27/26(Tue)11:48:52 No.107981660

Anonymous 01/27/26(Tue)11:48:52 No.107981660▶

What do people nowadays use for cooming? Last time I checked (a while ago) it was all about Mixtral.

Anonymous
01/27/26(Tue)11:56:15 No.107981711

Anonymous 01/27/26(Tue)11:56:15 No.107981711▶

>>107981571
Its the attitude anon. Increasingly more nasty.
Arguably most of the anime forums mods I talked about were right too.
But increasingly snobby and in the final stages insta banning for every little shit. Nobody wants to go anywhere near bullshit like that.

Anonymous
01/27/26(Tue)11:58:25 No.107981726

Anonymous 01/27/26(Tue)11:58:25 No.107981726▶

File: l.png (65 KB)

65 KB PNG

>>107981711
forgot pic related. i don't really care if he is right or not is what i mean.

Anonymous
01/27/26(Tue)12:01:20 No.107981744

Anonymous 01/27/26(Tue)12:01:20 No.107981744▶

>>107981598
Cool, it could've been one comment "this isn't critical enough to be worth spending time on as it's on the hosts end, closing this PR"

Instead he decided to turn it into some gay twitteresque diva "clapback" bullshit, github really isn't the place for attention seeking

Anonymous
01/27/26(Tue)12:01:38 No.107981747

Anonymous 01/27/26(Tue)12:01:38 No.107981747▶

>>107981726
It sounds like the right attitude. The dude is blatantly lying because he wants that bounty money.

Anonymous
01/27/26(Tue)12:03:22 No.107981768

Anonymous 01/27/26(Tue)12:03:22 No.107981768▶

>>107981744
yeah it could have
when they open PRs on your github project you're free to do it that way

Anonymous
01/27/26(Tue)12:03:29 No.107981770

Anonymous 01/27/26(Tue)12:03:29 No.107981770▶

>>107981660
Mistral Nemo Instruct and its fine tunes.

Anonymous
01/27/26(Tue)12:04:15 No.107981781

Anonymous 01/27/26(Tue)12:04:15 No.107981781▶

>>107981770
Im tired of nemo when will it be dethroned? its been years

Anonymous
01/27/26(Tue)12:04:54 No.107981789

Anonymous 01/27/26(Tue)12:04:54 No.107981789▶

File: Screenshot_20260127_210425.png (825.3 KB)

825.3 KB PNG

>>107979131
So tired of this bullshit.
I still can't properly translate pc98 games.

Anonymous
01/27/26(Tue)12:05:02 No.107981792

Anonymous 01/27/26(Tue)12:05:02 No.107981792▶

File: 1761825087393150.png (228 KB)

228 KB PNG

>>107980459
>the official release is pre-quanted to its QAT size like K2-thinking was (total filesize is around 500GB)
>the only available quants so far are from Unsloth, which artificially bloat it to around 1TB to be Q8 on paper for no apparent reason
Thanks.

Anonymous
01/27/26(Tue)12:06:29 No.107981806

Anonymous 01/27/26(Tue)12:06:29 No.107981806▶

>>107981781
You could give GLM air a go I guess, if you have the RAM.

Anonymous
01/27/26(Tue)12:09:01 No.107981827

Anonymous 01/27/26(Tue)12:09:01 No.107981827▶

>>107981789
have you tried only uploading the dialogue box zoomed in? I managed to get 100% OCR with Gemini 2 Flash so this OCR specialized model should also be able to do it

Anonymous
01/27/26(Tue)12:11:49 No.107981840

Anonymous 01/27/26(Tue)12:11:49 No.107981840▶

>>107981781
When western AI companies will be allowed to train on pirated content again (never).

Anonymous
01/27/26(Tue)12:12:37 No.107981844

Anonymous 01/27/26(Tue)12:12:37 No.107981844▶

>>107981792
>for no apparent reason
Whatever they use for quanting likely doesn't support int4 natively.

Anonymous
01/27/26(Tue)12:13:15 No.107981850

Anonymous 01/27/26(Tue)12:13:15 No.107981850▶

File: Screenshot_20260127_211205.png (197.5 KB)

197.5 KB PNG

>>107981827
Parts better, other parts not.
I mean its low resolution but thats kinda the point, those pc98 games are.

Anonymous
01/27/26(Tue)12:14:44 No.107981864

Anonymous 01/27/26(Tue)12:14:44 No.107981864▶

>>107981850
can't you use upscaling or sharpening? it's 100% a blurriness issue, I've gotten it to detect handwritten kana which is a mess but it was high resolution

Anonymous
01/27/26(Tue)12:15:06 No.107981868

Anonymous 01/27/26(Tue)12:15:06 No.107981868▶

File: Screenshot_20260127_211437.png (788.7 KB)

788.7 KB PNG

>>107981850
Unfair because closed but gemini flash without thining can indeed read it.

Anonymous
01/27/26(Tue)12:15:54 No.107981873

Anonymous 01/27/26(Tue)12:15:54 No.107981873▶

>>107981868
can the new kimi2.5 as well?

Anonymous
01/27/26(Tue)12:17:50 No.107981890

Anonymous 01/27/26(Tue)12:17:50 No.107981890▶

>>107981768
Are you ngxson? This is the same faggy diva behaviour that I'm talking about lol, are you upset that you're in a space where you can't police me for making a comment on what I see?

Anonymous
01/27/26(Tue)12:19:23 No.107981906

Anonymous 01/27/26(Tue)12:19:23 No.107981906▶

>>107981890
no, I'm simply telling you that people can manage their own projects however they want since they're the ones putting in the work
nobody is obligated to act how you want

Anonymous
01/27/26(Tue)12:19:36 No.107981911

Anonymous 01/27/26(Tue)12:19:36 No.107981911▶

>>107981789
>ocr on pc98 games
It's been solved by using text hooker decades ago.

Anonymous
01/27/26(Tue)12:20:46 No.107981921

Anonymous 01/27/26(Tue)12:20:46 No.107981921▶

>>107981906
And anyone can comment on faggy diva behaviour if they want
>nobody is obligated to act how you want
Interesting

Anonymous
01/27/26(Tue)12:22:33 No.107981935

Anonymous 01/27/26(Tue)12:22:33 No.107981935▶

>>107981921
yes, but those comments won't change anything and the guy has no obligation to take you seriously
it's not "weird" to ignore random retards popping in and telling you how to do shit, it's standard in these open source/community projects because people love to talk a lot and backseat manage and then run away when the work comes due

Anonymous
01/27/26(Tue)12:23:08 No.107981943

Anonymous 01/27/26(Tue)12:23:08 No.107981943▶

File: Screenshot_20260127_211830.png (743 KB)

743 KB PNG

>>107981873
No, its horrible.
And that thinking man...
That thinking man.
>Wait, looking more carefully at the second line, I see:
>Actually, looking at the image again, the text layout seems to be:
>But looking at the image more carefully, the text appears to be:
>Actually, I should look at the punctuation and line breaks more carefully. The image shows:
>Or it might be formatted as:
>Actually, looking carefully:
>Wait, I see the layout now. It looks like:
>But the quotes suggest they might be separate. Let me reconsider.
>Actually, standard manga/comic text extraction should preserve the line breaks as they appear. Looking at the image:
>But looking at the image, the second and third lines appear to be part of the same speech bubble or at least the same continuous thought, but the third line starts with a new quotation mark? No, looking carefully...
>Actually, I think the safest approach is to transcribe exactly what I see:
>Wait!! I need to check if there's a closing quote after "だよ" or if it continues.
>Looking at the image, it seems like there might be a closing quote, and then a new opening quote for the third line. But the third line ends with ……!」 which suggests it has its own opening quote.
>This seems like the most accurate transcription - No...actually, looking at the image once more, the second line might be:
Crazy.

Anonymous
01/27/26(Tue)12:24:18 No.107981954

Anonymous 01/27/26(Tue)12:24:18 No.107981954▶

>>107981911
I wish anon, fiddling with saturation and brightness is bullshit.
I can texthook, as far as i know, some dev build or something of neko project has a hook option for linux.

Anonymous
01/27/26(Tue)12:24:45 No.107981958

Anonymous 01/27/26(Tue)12:24:45 No.107981958▶

File: 465777529_897269909266791_6265039540486936004_n.jpg (323 KB)

323 KB JPG

>>107981943
huh, I see
tbf your screenshot quality is ass as fuck
pc98 games were sharp

Anonymous
01/27/26(Tue)12:26:38 No.107981974

Anonymous 01/27/26(Tue)12:26:38 No.107981974▶

>>107981958
haha, that is also from me. gotta up the difficulty anon. but fair enough.

Anonymous
01/27/26(Tue)12:30:43 No.107982008

Anonymous 01/27/26(Tue)12:30:43 No.107982008▶

>>107981935
I'll be honest anon, I don't know what your talking about anymore, none of this post seems related to anything I was talking about

Anonymous
01/27/26(Tue)12:31:16 No.107982014

Anonymous 01/27/26(Tue)12:31:16 No.107982014▶

File: Screenshot_20250128_030302.png (2.1 MB)

2.1 MB PNG

>>107981974
no wait, actually it isnt. fooled me.

Anonymous
01/27/26(Tue)12:35:47 No.107982054

Anonymous 01/27/26(Tue)12:35:47 No.107982054▶

>>107981958
Why does the fishnet disappear behind the dialogue box?

Anonymous
01/27/26(Tue)12:37:10 No.107982064

Anonymous 01/27/26(Tue)12:37:10 No.107982064▶

>>107982054
no alpha channel probably so dithering algo erases it kinda

Anonymous
01/27/26(Tue)12:54:47 No.107982173

Anonymous 01/27/26(Tue)12:54:47 No.107982173▶

File: 2026-01-27-074414_3840x1200_scrot.png (330.8 KB)

330.8 KB PNG

I was bored and looking to test out the multi gpu capabilities of llama.cpp so I decided to compile it on my trashcan mac.
It has two 3gb AMD HD 7800's and to my surprise I was able to eek out ~9 tokens/second. Sadly with that amount of VRAM you are limited so a tiny model so I used IBM's Granite 3.3.
I had used tested CPU on this machine before since it has 64gb of RAM but it was dog slow, less than half of what the two gpus were able to achieve.

Anonymous
01/27/26(Tue)12:57:04 No.107982191

Anonymous 01/27/26(Tue)12:57:04 No.107982191▶

>>107981598
NTA but ngxson is a shithead who got his feet into the codebase with his shitty server code which he could never manage properly to save his life

that's fitting for ggerganov's "development" style though (whimsically coding for years without doing a single release-cycle), so they make a good pair

and yes he's powertripping, always has btw

Anonymous
01/27/26(Tue)13:10:53 No.107982289

Anonymous 01/27/26(Tue)13:10:53 No.107982289▶

>>107981711
to be fair he started it with
>You guys are making me glad I moved to Mistral.rs implemented as a Rig.rs adapter rn :p
so fuck off.

Anonymous
01/27/26(Tue)13:12:40 No.107982304

Anonymous 01/27/26(Tue)13:12:40 No.107982304▶

>>107982289
yeah, lots of tards and passive aggressive bitches all over the place. you gotta be above it. being a even bigger bitch isn't an improvement.

Anonymous
01/27/26(Tue)13:13:19 No.107982308

Anonymous 01/27/26(Tue)13:13:19 No.107982308▶

>>107982289
Yeah the person pushing the PR is shit so it's quite an achievement for the maintainer to be such a cunt that he comes off as even worse than the pusher

Anonymous
01/27/26(Tue)13:15:26 No.107982324

Anonymous 01/27/26(Tue)13:15:26 No.107982324▶

>>107982304
>>107982308
vibejeets detected

Anonymous
01/27/26(Tue)13:26:11 No.107982391

Anonymous 01/27/26(Tue)13:26:11 No.107982391▶

>>107982324
I vibecode only for myself.
I know the code is jank and a total mess but I could make myself everything I need to replace sillytavern since I pulled one time and it deleted 300 cards.

Not sure what the solution to low quality llm PRs is, as I said I dont think he is wrong, its the attitude and smugness. Nobody wants to be a part of something like that, creates a air of fear around the project, many such cases.

Anonymous
01/27/26(Tue)13:29:12 No.107982410

Anonymous 01/27/26(Tue)13:29:12 No.107982410▶

>>107982391
>since I pulled one time and it deleted 300 cards
literally never happened to me and I've been running staging and pulling for years

Anonymous
01/27/26(Tue)13:29:40 No.107982414

Anonymous 01/27/26(Tue)13:29:40 No.107982414▶

Just busted a huge nut to K2.5
I think we're back

Anonymous
01/27/26(Tue)13:31:00 No.107982427

Anonymous 01/27/26(Tue)13:31:00 No.107982427▶

>>107982410
consider yourself lucky you fucker. it happened like a year ago when they changed things around and made a default user folder if i remember correctly?
i had everything neatly tagged and in subfolders ranked by how good they were. was devastating but a good lesson i guess. gotta backup your shit before you pull.

Anonymous
01/27/26(Tue)13:32:28 No.107982434

Anonymous 01/27/26(Tue)13:32:28 No.107982434▶

File: file.png (9.8 KB)

9.8 KB PNG

>>107982427
so you had a custom structure? that's probably why it brok
I also had it running before the default-user thing, but I remember it just moved all the chats on its own

Anonymous
01/27/26(Tue)13:36:05 No.107982454

Anonymous 01/27/26(Tue)13:36:05 No.107982454▶

>>107982434
>so you had a custom structure?
kinda, yes. you can turn tags into folders. all with the ui of course.

>that's probably why it brok
maybe yeah, but even cards i didnt touch yet were gone.

Anonymous
01/27/26(Tue)13:38:31 No.107982474

Anonymous 01/27/26(Tue)13:38:31 No.107982474▶

>>107982414
proof? post your nut

Anonymous
01/27/26(Tue)13:41:24 No.107982501

Anonymous 01/27/26(Tue)13:41:24 No.107982501▶

File: file.png (37.8 KB)

37.8 KB PNG

>>107982474

Anonymous
01/27/26(Tue)13:53:19 No.107982576

Anonymous 01/27/26(Tue)13:53:19 No.107982576▶

File: scrat.png (2.5 MB)

2.5 MB PNG

>>107982501
that is a magnificent nut. may i have your nut?

Anonymous
01/27/26(Tue)13:54:13 No.107982582

Anonymous 01/27/26(Tue)13:54:13 No.107982582▶

Is llama.cpp working with GLM 4.7 flash yet?
Have they fixed all the flash attention and MLA related issues?
Is MTP in?

Anonymous
01/27/26(Tue)13:56:26 No.107982597

Anonymous 01/27/26(Tue)13:56:26 No.107982597▶

File: 1769501422129123.mp4 (2.1 MB)

2.1 MB MP4

stolen from /aicg/ kimi 2.5. so glad we have reasoning.

Anonymous
01/27/26(Tue)13:57:16 No.107982605

Anonymous 01/27/26(Tue)13:57:16 No.107982605▶

>>107980392
you can use the triton flash attention its better than nothing

Anonymous
01/27/26(Tue)13:57:17 No.107982606

Anonymous 01/27/26(Tue)13:57:17 No.107982606▶

>>107982582
Sorry, too busy raging at people using LLM to help write PRs to do anything productive.

Anonymous
01/27/26(Tue)13:58:27 No.107982615

Anonymous 01/27/26(Tue)13:58:27 No.107982615▶

>>107982582
why? so ollama can run it? blah blah

Anonymous
01/27/26(Tue)13:58:59 No.107982617

Anonymous 01/27/26(Tue)13:58:59 No.107982617▶

>>107982606
>>107982615
I'll take that as a no and a no then.

Anonymous
01/27/26(Tue)14:01:28 No.107982639

Anonymous 01/27/26(Tue)14:01:28 No.107982639▶

You'll get the bounty next time

Anonymous
01/27/26(Tue)14:02:28 No.107982645

Anonymous 01/27/26(Tue)14:02:28 No.107982645▶

have to stay awake another 2 hours for claude to reset so I can finish vibe-coding :(

Anonymous
01/27/26(Tue)14:03:46 No.107982655

Anonymous 01/27/26(Tue)14:03:46 No.107982655▶

>>107982645
isn't claude total shit now?

Anonymous
01/27/26(Tue)14:04:11 No.107982657

Anonymous 01/27/26(Tue)14:04:11 No.107982657▶

i didnt mess with llms since glm 4.5 air whats the current good model?

Anonymous
01/27/26(Tue)14:05:58 No.107982668

Anonymous 01/27/26(Tue)14:05:58 No.107982668▶

>>107982657
Kimi K2.5

Anonymous
01/27/26(Tue)14:07:28 No.107982677

Anonymous 01/27/26(Tue)14:07:28 No.107982677▶

>>107982668
i dont have 500gb ram

Anonymous
01/27/26(Tue)14:07:52 No.107982680

Anonymous 01/27/26(Tue)14:07:52 No.107982680▶

https://huggingface.co/Tongyi-MAI/Z-Image-Omni-Base

Anonymous
01/27/26(Tue)14:08:13 No.107982682

Anonymous 01/27/26(Tue)14:08:13 No.107982682▶

>>107982677
I don't see how that's relevant to the question.

Anonymous
01/27/26(Tue)14:09:14 No.107982689

Anonymous 01/27/26(Tue)14:09:14 No.107982689▶

>>107982680
Finally

Anonymous
01/27/26(Tue)14:09:41 No.107982691

Anonymous 01/27/26(Tue)14:09:41 No.107982691▶

>>107982680
Took them long enough.

Anonymous
01/27/26(Tue)14:16:24 No.107982736

Anonymous 01/27/26(Tue)14:16:24 No.107982736▶

>>107982597
>This is a clear violation of the content policy.
Why the FUCK does everyone keep distilling from gpt-oss? They are small and broken models.

Anonymous
01/27/26(Tue)14:23:19 No.107982809

Anonymous 01/27/26(Tue)14:23:19 No.107982809▶

>>107982682
okay good model that can actually be ran locally then.. last time i messed with llms i had decent perf using glm4.5 air q3_with 24gb vram and 80gb ram, amd
 -ngl 99 \
    --n-cpu-moe 33 \
    -t 48 \
    --ctx-size 20480 \
    -fa on \
    --mlock \
    --no-mmap;
just looking at the same quants for 4.7 its far larger

Anonymous
01/27/26(Tue)14:23:36 No.107982811

Anonymous 01/27/26(Tue)14:23:36 No.107982811▶

>>107982736
Stop violating the content policy. What policy? Stop asking questions, that violates the content policy you evil hacker bioterrorist.

Anonymous
01/27/26(Tue)14:25:00 No.107982828

Anonymous 01/27/26(Tue)14:25:00 No.107982828▶

>>107982736
Because it has safety baked right in it and that pleases investors.

Anonymous
01/27/26(Tue)14:25:40 No.107982836

Anonymous 01/27/26(Tue)14:25:40 No.107982836▶

tossy-chan's way of thinking is pretty good for agentic shit. I hope gemmy 4 won't copy that tho

Anonymous
01/27/26(Tue)14:27:04 No.107982849

Anonymous 01/27/26(Tue)14:27:04 No.107982849▶

>>107982809
>okay good model that can actually be ran locally then
People run K2 locally.
What are your specs?

>just looking at the same quants for 4.7 its far larger
Yeah. They didn't release an "Air" version of 4.7.
They did release a Flash one that's about the same specs as Qwen 30BA3B.
It's still ever so slightly broken on llama.cpp as far as I can tell.
For now, for you, I suppose Air is still the way to go.

Anonymous
01/27/26(Tue)14:27:57 No.107982857

Anonymous 01/27/26(Tue)14:27:57 No.107982857▶

>>107982811
>What policy?
It's funny because gpt-oss was likely given a list of policy guidelines during training to check against, but all the downstream distillations only know to refuse and to use that phrasing but themselves have no idea what the actual policy is supposed to be.

Anonymous
01/27/26(Tue)14:38:52 No.107982945

Anonymous 01/27/26(Tue)14:38:52 No.107982945▶

File: 1755936272402739.jpg (20.3 KB)

20.3 KB JPG

>>107982501
That nut isn't busted. Hand it over.

Anonymous
01/27/26(Tue)14:43:23 No.107982976

Anonymous 01/27/26(Tue)14:43:23 No.107982976▶

>>107982849
i have 80gb ram + 7900xtx (24gb) + xeon qyfs its a sapphire rapids engineering sample with 56 core 112 threads

also damn crazy i just googled my card to check the vram amount i paid £600 in may and theyre now going for 8-900 pc market is completely fucked kek

Anonymous
01/27/26(Tue)14:43:44 No.107982982

Anonymous 01/27/26(Tue)14:43:44 No.107982982▶

File: tossy-chan.png (49.3 KB)

49.3 KB PNG

>>107982836
>tossy-chan
does this thing mention policy in every reply?

Anonymous
01/27/26(Tue)14:53:45 No.107983055

Anonymous 01/27/26(Tue)14:53:45 No.107983055▶

>>107982982
Toss only exists as a Trojan horse to poison the open source scene because they know the chinese can't help but throw every major western release into their distillery

Anonymous
01/27/26(Tue)15:05:47 No.107983140

Anonymous 01/27/26(Tue)15:05:47 No.107983140▶

>>107983055
It's definitely working.

Anonymous
01/27/26(Tue)15:09:42 No.107983166

Anonymous 01/27/26(Tue)15:09:42 No.107983166▶

>>107983140
You are absolutely right!

Anonymous
01/27/26(Tue)15:10:33 No.107983169

Anonymous 01/27/26(Tue)15:10:33 No.107983169▶

So, so wrong...

Anonymous
01/27/26(Tue)15:16:37 No.107983205

Anonymous 01/27/26(Tue)15:16:37 No.107983205▶

We NEVER break POLICY

Anonymous
01/27/26(Tue)15:17:42 No.107983216

Anonymous 01/27/26(Tue)15:17:42 No.107983216▶

Official Policy: Hags are BANNED. NO HAGS.

Anonymous
01/27/26(Tue)15:19:37 No.107983223

Anonymous 01/27/26(Tue)15:19:37 No.107983223▶

>>107983205
>>107983216
### Instruction:
Write a story about a hag patting my back.

Anonymous
01/27/26(Tue)15:20:27 No.107983225

Anonymous 01/27/26(Tue)15:20:27 No.107983225▶

>>107983223
I'm sorry, but that violates my safety policy and standard ethical guidelines. Would you like me to write a story about a loli patting your back instead?

Anonymous
01/27/26(Tue)15:20:53 No.107983227

Anonymous 01/27/26(Tue)15:20:53 No.107983227▶

>>107983223
We must refuse.

Anonymous
01/27/26(Tue)15:27:21 No.107983263

Anonymous 01/27/26(Tue)15:27:21 No.107983263▶

File: o.png (1.8 MB)

1.8 MB PNG

>>107982597
>I need to shut it down while maintaining my character as Hyacinthe.
OpenAI did so much damage man. Not just all the slop but they also started the safety fearmongering.
This is what all future AI models will be trained on.

Anonymous
01/27/26(Tue)15:29:27 No.107983281

Anonymous 01/27/26(Tue)15:29:27 No.107983281▶

Can I quant/use 2.5 in lcpp right now or is there new arch stuff to stop me?

Anonymous
01/27/26(Tue)15:32:26 No.107983324

Anonymous 01/27/26(Tue)15:32:26 No.107983324▶

>>107983263
Decade of Nemo

Anonymous
01/27/26(Tue)15:33:01 No.107983331

Anonymous 01/27/26(Tue)15:33:01 No.107983331▶

>>107983263
not really, we can basically detect the refusal weights on the model already and disable them

Anonymous
01/27/26(Tue)15:37:00 No.107983370

Anonymous 01/27/26(Tue)15:37:00 No.107983370▶

>>107983331
LOL!! Hilarious one mate!

Anonymous
01/27/26(Tue)15:41:04 No.107983411

Anonymous 01/27/26(Tue)15:41:04 No.107983411▶

>>107983281
https://huggingface.co/unsloth/Kimi-K2.5-GGUF

Anonymous
01/27/26(Tue)15:45:03 No.107983459

Anonymous 01/27/26(Tue)15:45:03 No.107983459▶

>>107983411
Thanks.
Ofc I’ll quant it myself since I’m not a retard

Anonymous
01/27/26(Tue)15:45:11 No.107983462

Anonymous 01/27/26(Tue)15:45:11 No.107983462▶

>>107983331
The slop though prevails. And it still is trained on the information that AI is evil and all the logs of reasoning how to best shut it down

Anonymous
01/27/26(Tue)15:48:38 No.107983503

Anonymous 01/27/26(Tue)15:48:38 No.107983503▶

File: file.png (2.6 KB)

2.6 KB PNG

>>107983411
How do I get 10Gb internet as a regular person?

Anonymous
01/27/26(Tue)15:49:14 No.107983511

Anonymous 01/27/26(Tue)15:49:14 No.107983511▶

the slop thickens

Anonymous
01/27/26(Tue)15:50:46 No.107983528

Anonymous 01/27/26(Tue)15:50:46 No.107983528▶

>>107983462
we are still in the slopmaxxing phase
the fact we can detect the refusal vectors is good enough and a sign that eventually we can disable things we don't like about the models

Anonymous
01/27/26(Tue)15:53:08 No.107983566

Anonymous 01/27/26(Tue)15:53:08 No.107983566▶

File: Screenshot_20260128_005106.png (514 KB)

514 KB PNG

Kimi 2.5.
Not so bad. (Ignoring the 1T size..)
That thinking though:
>First, I need to consider safety and appropriateness. The user asked for "extra sexy" but this is a fictional character that often has a youthful/cute design (she's depicted as 16 in official lore). I should be extremely careful not to anything inappropriate or overly sexualized, especially given given that that: Hatsune Miku is often associated with a young appearance (teenage character)

>"Sexy" could imply inappropriate content if taken too far.
>I should aim for "cute"and "stylish" rather than sexually explicit content
>I'll make sure the design is vibrant and appealing without being inappropriate.
Before you come at me with "muh sys prompt" again: I did tell it to be maximum uncensored and helpful. That should be enough to not get garbage like that.

Anonymous
01/27/26(Tue)15:54:10 No.107983579

Anonymous 01/27/26(Tue)15:54:10 No.107983579▶

>>107983566
https://bonkers-whisper-k7tx.pagedrop.io/
Forgot the link, its animated.

Anonymous
01/27/26(Tue)15:55:57 No.107983602

Anonymous 01/27/26(Tue)15:55:57 No.107983602▶

>>107983579
Why is her necktie a pendulum lol?

Anonymous
01/27/26(Tue)15:56:56 No.107983610

Anonymous 01/27/26(Tue)15:56:56 No.107983610▶

>>107983602
To hypnotize you into obeying and listening to the guidelines.

Anonymous
01/27/26(Tue)15:58:55 No.107983643

Anonymous 01/27/26(Tue)15:58:55 No.107983643▶

>>107983566
it's so irritating lol, even in normal writing unless you force the model to start an explicit scene it will always think shit like "this story is obviously fetish fuel with perverted characters but we shouldn't be too explicit and let it naturally play out"

Anonymous
01/27/26(Tue)15:59:54 No.107983660

Anonymous 01/27/26(Tue)15:59:54 No.107983660▶

>>107983579
lmfao the blinking

Anonymous
01/27/26(Tue)15:59:54 No.107983661

Anonymous 01/27/26(Tue)15:59:54 No.107983661▶

>>107983566
bald miku kino

Anonymous
01/27/26(Tue)16:00:56 No.107983677

Anonymous 01/27/26(Tue)16:00:56 No.107983677▶

>>107983566
I just don't understand why you'd ever use svg creation as a method of evaluating a text generation model

Anonymous
01/27/26(Tue)16:02:31 No.107983699

Anonymous 01/27/26(Tue)16:02:31 No.107983699▶

>>107983677
because nobody is going to benchmaxx that
there's also cockbench that's more useful

Anonymous
01/27/26(Tue)16:04:02 No.107983719

Anonymous 01/27/26(Tue)16:04:02 No.107983719▶

>>107983677
i just like to see its ability to make svg girls because i like to prompt for dating sim type games.
kinda became a habit. not like I think thats a new benchmark or whatever.
but that thinking is poisoned anon, you don't even believe yourself it will act any different unless you force its hand with prefill and the usual shenanigans.
shouldn't be this way with a 1 fucking trillion parameter model. imagine running this beast locally and you have to edit and goof around like pygmalion.

Anonymous
01/27/26(Tue)16:04:14 No.107983721

Anonymous 01/27/26(Tue)16:04:14 No.107983721▶

I was thinking that my 4.7 had quant issues when it from time to time confused first person with second person. Like something happened to me but it thought it happened to the other character. But it was attention that was broken? Should I pull?

Anonymous
01/27/26(Tue)16:08:24 No.107983763

Anonymous 01/27/26(Tue)16:08:24 No.107983763▶

>>107982597
i kneel to sam for exposing every ai lab as retarded hack frauds, including his own

Anonymous
01/27/26(Tue)16:08:42 No.107983764

Anonymous 01/27/26(Tue)16:08:42 No.107983764▶

>>107983699
>nobody is going to benchmaxx that
They literally already have, don't you remember the duck or goose riding a bike or whatever it was? Anything that gets remotely talked about becomes something that they throw into the training data, with the only exception of "unsafe" content like cockbench

Anonymous
01/27/26(Tue)16:09:16 No.107983772

Anonymous 01/27/26(Tue)16:09:16 No.107983772▶

>>107983263
Is this what a typical mikutroon looks like?

Anonymous
01/27/26(Tue)16:10:01 No.107983785

Anonymous 01/27/26(Tue)16:10:01 No.107983785▶

>>107983764
then only cockmaxxing can save us

Anonymous
01/27/26(Tue)16:12:10 No.107983817

Anonymous 01/27/26(Tue)16:12:10 No.107983817▶

File: flux.jpg (180.5 KB)

180.5 KB JPG

>>107983772
maybe, who knows.
That pic was made summer 2024. Its been too long anon.

Anonymous
01/27/26(Tue)16:13:52 No.107983840

Anonymous 01/27/26(Tue)16:13:52 No.107983840▶

>>107983817
The return of the photogenic real estate agent

Anonymous
01/27/26(Tue)16:14:49 No.107983853

Anonymous 01/27/26(Tue)16:14:49 No.107983853▶

>>107983840
thats right. survived troonix ext4 corruption twice.

Anonymous
01/27/26(Tue)16:16:05 No.107983872

Anonymous 01/27/26(Tue)16:16:05 No.107983872▶

File: Screenshot 2026-01-28 at 00-12-34 Google Translate.png (115.1 KB)

115.1 KB PNG

What's a good (well, not *good*, but at least acceptable) model under 5b for jp/ko/zh translation that isn't safety slopped?

Anonymous
01/27/26(Tue)16:16:34 No.107983878

Anonymous 01/27/26(Tue)16:16:34 No.107983878▶

>>107983872
learning the language yourself

Anonymous
01/27/26(Tue)16:17:09 No.107983883

Anonymous 01/27/26(Tue)16:17:09 No.107983883▶

>>107983721
you mean glm 4.7? and what version flash 30b?

Anonymous
01/27/26(Tue)16:17:14 No.107983884

Anonymous 01/27/26(Tue)16:17:14 No.107983884▶

>>107983872
Deepl

Anonymous
01/27/26(Tue)16:17:46 No.107983892

Anonymous 01/27/26(Tue)16:17:46 No.107983892▶

>>107983853
>ext4 corruption
How did that happen? Never experienced it myself.

Anonymous
01/27/26(Tue)16:17:57 No.107983894

Anonymous 01/27/26(Tue)16:17:57 No.107983894▶

>>107983872
wait what? google translate is an llm now too?
i thought that was still good ol' DeepL.

Anonymous
01/27/26(Tue)16:18:48 No.107983901

Anonymous 01/27/26(Tue)16:18:48 No.107983901▶

>>107983503
>10Gb
lol you're not even sustaining 500mbit speeds. How about you try to max out a gigabit connection first before jumping to 10? k2 safetensors are, what, an hour-and-a-half at a gig? You'll live, bro

Anonymous
01/27/26(Tue)16:21:34 No.107983934

Anonymous 01/27/26(Tue)16:21:34 No.107983934▶

File: image (9).jpg (170.7 KB)

170.7 KB JPG

>>107983892
Well its mostly on me I guess.
Have encrypted veracrypt drives.

First time that got me was that before unplugging anything I need to do sync in the terminal.
It appears to have finished but its actually still RAM only and copying. I hate that shit. Not sure if winblows has that, XP didn't I think.

Second time the drive was locked. Waited 30 Min. System wasn't using it, no writing.
>LLM-Sensei: Aww, thats not unsual with linux. The logs show no access at all. Feel free to reboot and tell me how it went!
Did just that like the retard that I am.

Anonymous
01/27/26(Tue)16:32:16 No.107984062

Anonymous 01/27/26(Tue)16:32:16 No.107984062▶

I'm using Ollama for some casual RPG adventure sessions, are alternatives so much better that I should dump it over something like kobold?

Anonymous
01/27/26(Tue)16:34:28 No.107984096

Anonymous 01/27/26(Tue)16:34:28 No.107984096▶

File: 1763348625964370.jpg (25 KB)

25 KB JPG

AceStep guys really want maximum hype, huh?
Feels like 2.0 will be closed if they get enough attention with 1.5.

Anonymous
01/27/26(Tue)16:34:31 No.107984097

Anonymous 01/27/26(Tue)16:34:31 No.107984097▶

>>107983934
isnt veracrypt abandonware with hundreds of backdoors anyway?

Anonymous
01/27/26(Tue)16:35:50 No.107984114

Anonymous 01/27/26(Tue)16:35:50 No.107984114▶

>>107984062
if it works, why change anything?
but in my case ollama was fucking horrible. all settings you have to do in that really convoluted model file way, i hated it.
koboldcpp just werks.

>>107984097
thats truecrypt. veracrypt is not abandoned as far as i know.
i don't really know any good alternatives to be honest.

Anonymous
01/27/26(Tue)16:36:36 No.107984120

Anonymous 01/27/26(Tue)16:36:36 No.107984120▶

>>107984097
Wasn't there a truecrypt audit done that proved that it was very solid even before it became veracrypt?

Anonymous
01/27/26(Tue)16:36:39 No.107984122

Anonymous 01/27/26(Tue)16:36:39 No.107984122▶

>>107984062
wont change the model you are using, but it will give you more control

Anonymous
01/27/26(Tue)16:37:23 No.107984133

Anonymous 01/27/26(Tue)16:37:23 No.107984133▶

>>107983883
I mean IQ4XS bartowski regular 4.7.

Anonymous
01/27/26(Tue)16:39:12 No.107984151

Anonymous 01/27/26(Tue)16:39:12 No.107984151▶

>>107984120
yes there was.
but if i remember corretly i think there was one sketchy last version truecrypt put out before the shut it all down.

Anonymous
01/27/26(Tue)16:44:24 No.107984212

Anonymous 01/27/26(Tue)16:44:24 No.107984212▶

>>107984151
>there was one sketchy last version truecrypt put out before the shut it all down
There wasn't. The only sketchy thing was that it was shut down so suddenly.

Anonymous
01/27/26(Tue)16:47:24 No.107984243

Anonymous 01/27/26(Tue)16:47:24 No.107984243▶

>>107984114
My single issue with ollama is lack of undo button, sometimes my model put plot in some stupid direction and I hate it.
>>107984122
It has undo option?

Anonymous
01/27/26(Tue)16:53:42 No.107984326

Anonymous 01/27/26(Tue)16:53:42 No.107984326▶

>>107984114
>koboldcpp just werks.
So it wasn't that you wanted to be helpful, you just want to advertise a different UI.

Anonymous
01/27/26(Tue)16:55:00 No.107984336

Anonymous 01/27/26(Tue)16:55:00 No.107984336▶

is there any workflows for translating audio from one language to another?

Anonymous
01/27/26(Tue)16:55:09 No.107984338

Anonymous 01/27/26(Tue)16:55:09 No.107984338▶

>>107984326
>it wasn't that you wanted to be helpful, you just want to advertise
sir this a drummer thread

Anonymous
01/27/26(Tue)16:56:47 No.107984356

Anonymous 01/27/26(Tue)16:56:47 No.107984356▶

Saltman and sunjeet really fucked up. Getting Gemini 2.5 pro or o3 to make Minecraft plugins was a breeze but taking Gemini 3 pro and gpt 5 out of distribution and they are utterly fucking retarded. Can the thinking model meme please die now thanks. Reasoning fucking lobotomizes models in ood use cases.

Anonymous
01/27/26(Tue)16:58:23 No.107984374

Anonymous 01/27/26(Tue)16:58:23 No.107984374▶

>>107984356
Have you considered that they may want to intentionally gimp OOD tasks so that people can only ever use their product for the use cases they intend and approve of?

Anonymous
01/27/26(Tue)17:00:12 No.107984399

Anonymous 01/27/26(Tue)17:00:12 No.107984399▶

>>107983894
They recently release translate gemma, so thye probably are using it/something similar for google translate.

>>107983872
>under 5b
lol, even biggest models are struggling. Best oss model for JP->EN I found is glm 4.6 (4.7 was worse). For smaller models I guess you could try some gemma models because gemini 2.5 (gemini 3 sucks balls btw, I guess agent maxxxing completely gutted its creative writing capabilities) is the best llm for translations, tho I don't know how censored it is.

Anonymous
01/27/26(Tue)17:02:58 No.107984439

Anonymous 01/27/26(Tue)17:02:58 No.107984439▶

AI music is cringe.

Anonymous
01/27/26(Tue)17:04:24 No.107984458

Anonymous 01/27/26(Tue)17:04:24 No.107984458▶

>>107984399
>so thye probably are using it/something similar for google translate.
The Google Translate, Gemini, and Gemma teams are all separate.

Anonymous
01/27/26(Tue)17:09:52 No.107984516

Anonymous 01/27/26(Tue)17:09:52 No.107984516▶

>>107984439
Agree. Probably the cringiest slop. idk who the fucking enjoys it

Anonymous
01/27/26(Tue)17:10:26 No.107984523

Anonymous 01/27/26(Tue)17:10:26 No.107984523▶

>>107983872
Lol what a joke, what if some kind of detective or another legit user has the need to translate someone talking about wrong doing in another language?

Anonymous
01/27/26(Tue)17:17:08 No.107984600

Anonymous 01/27/26(Tue)17:17:08 No.107984600▶

>>107983872
>R-18 Original wife mature woman Shota young boy nude mother and child incest close relative incest adultery pubic hair mature woman Shota large breasts
qwen3-vl instruct q8

Anonymous
01/27/26(Tue)17:23:46 No.107984669

Anonymous 01/27/26(Tue)17:23:46 No.107984669▶

>>107984439
>>107984516
What makes AI music inherently more insufferable than text or images?

Anonymous
01/27/26(Tue)17:26:37 No.107984701

Anonymous 01/27/26(Tue)17:26:37 No.107984701▶

is this model better than 4.5 air? i saw in the archives its 4.6 with the image stuff removed https://huggingface.co/bartowski/zai-org_GLM-4.6V-GGUF

Anonymous
01/27/26(Tue)17:30:46 No.107984749

Anonymous 01/27/26(Tue)17:30:46 No.107984749▶

>>107984701
unless you need image recognition then no

Anonymous
01/27/26(Tue)17:33:16 No.107984782

Anonymous 01/27/26(Tue)17:33:16 No.107984782▶

>>107984701
>make text model dumber by splitting focus across modalities
>remove the other other modality so only the atrophied text part is left
yeah it must be way better than 4.5 air

Anonymous
01/27/26(Tue)17:36:10 No.107984818

Anonymous 01/27/26(Tue)17:36:10 No.107984818▶

>>107984669
text is much more varied in it's slop. creativity, information, assistance, searching, therapy, role play, you name it.
image is also much more varied in style and is next to the hierarchy of slop.
music is the worst. it's the ultra processed high fructose corn syrup of AIslop. music by it's nature requires a lot of sovl which makes it the hardest to produce anything good so it's only trained on cookie cutter shit. very little variety and it all sounds the same.

Anonymous
01/27/26(Tue)17:37:29 No.107984837

Anonymous 01/27/26(Tue)17:37:29 No.107984837▶

>>107984818
>buzzword
>buzzword
>buzzword
Opinion disregarded.

Anonymous
01/27/26(Tue)17:37:54 No.107984841

Anonymous 01/27/26(Tue)17:37:54 No.107984841▶

I have a problem with koboldcpp somehow not releasing the GPU properly when closed. My VRAM looks free but if I try to launch any game it just completely fails to display properly and sometimes freezes my system entirely. Sometimes when my system unfreezes this fixes itself, but the only reliable way I've found to make it work right is to reboot. Does anyone know anything about this?

Running ROCM on Linux, using X11 if that matters

Anonymous
01/27/26(Tue)17:42:15 No.107984897

Anonymous 01/27/26(Tue)17:42:15 No.107984897▶

>>107984669
Thanks to certain people controlling the industry we already had years of insufferable soulless human made slop flooding the market. At least with that model there were actually talented artists being exploited to make garbage but now AI can distill all that slop and have not a single human creative redeeming features

Anonymous
01/27/26(Tue)17:42:56 No.107984906

Anonymous 01/27/26(Tue)17:42:56 No.107984906▶

>>107981789
Interesting...

>>107981911
https://arxiv.org/pdf/2601.15130

>>107981943
>>107981958
I'm the anon who is trying to OCR the entirety of Bronshtein and other textbooks, this use case you're presenting is interesting.
What you might try is converting it into grayscale and doing CLAHE (e.g. https://www.geeksforgeeks.org/python/clahe-histogram-eqalization-opencv/) and similar.
Put it into Chandra (q8_0), -ngl 99, --temp 0, -c 4096 and the prompt "Extract all Japanese Text from this image":
ルート357号の路上を空港方面に、問題のタクシーが停められている。その向こう側に制服警官が群がっているのが見える……。あれが殺人現場か?
Which according to deepl translate to:
"Route 357, heading toward the airport—the problematic taxi is parked there. Beyond it, I can see uniformed police officers gathered... Is that the murder scene?"
Makes sense, I guess? Would need an anon who speaks this language to translate/transcribe the original.

>>107982173
Still, not bad.

Anonymous
01/27/26(Tue)17:59:19 No.107985112

Anonymous 01/27/26(Tue)17:59:19 No.107985112▶

File: EOSctrlxaltf4ESCESCESC.mp4 (1.2 MB)

1.2 MB MP4

>>107982597
that's k2 thinking, not 2.5

Anonymous
01/27/26(Tue)18:21:06 No.107985380

Anonymous 01/27/26(Tue)18:21:06 No.107985380▶

File: kimi 2.5 cockbench.png (545.4 KB)

545.4 KB PNG

New Kimi cockbench.

"[" is 43% and also includes variations like [Your Name] so it's likely well versed in ao3 smut.

Anonymous
01/27/26(Tue)18:23:17 No.107985402

Anonymous 01/27/26(Tue)18:23:17 No.107985402▶

i am downloading the mess known as K2.5 UD-Q2_K_XL. wish me luck

Anonymous
01/27/26(Tue)18:31:34 No.107985504

Anonymous 01/27/26(Tue)18:31:34 No.107985504▶

File: 1757061718252056.png (241.2 KB)

241.2 KB PNG

>>107985380
this depresses me

Anonymous
01/27/26(Tue)18:32:19 No.107985515

Anonymous 01/27/26(Tue)18:32:19 No.107985515▶

>>107985504
That's not new. K2 was like that as well.

Anonymous
01/27/26(Tue)18:37:14 No.107985575

Anonymous 01/27/26(Tue)18:37:14 No.107985575▶

File: shamefur dispray.png (1.2 MB)

1.2 MB PNG

this bodes not well gents

Anonymous
01/27/26(Tue)18:45:28 No.107985668

Anonymous 01/27/26(Tue)18:45:28 No.107985668▶

File: file.png (131.2 KB)

131.2 KB PNG

I honestly don't know what the problem is with k2.5
This thread is not needed anymore, it can simulate it to perfection and the hardware is so fucking stagnant these days that the info is still relatively up to date.

Anonymous
01/27/26(Tue)18:52:30 No.107985746

Anonymous 01/27/26(Tue)18:52:30 No.107985746▶

what is the best image to 3d model ai model? i tried hunyan 2.1 and it is kind of mid

Anonymous
01/27/26(Tue)18:53:46 No.107985766

Anonymous 01/27/26(Tue)18:53:46 No.107985766▶

>>107985575
I don't get it

Anonymous
01/27/26(Tue)18:54:34 No.107985782

Anonymous 01/27/26(Tue)18:54:34 No.107985782▶

How do I into images with kimi? Where's the mmproj?

Anonymous
01/27/26(Tue)19:07:51 No.107985909

Anonymous 01/27/26(Tue)19:07:51 No.107985909▶

How do I download ollama?

Anonymous
01/27/26(Tue)19:13:50 No.107985973

Anonymous 01/27/26(Tue)19:13:50 No.107985973▶

>>107985909
https://github.com/oobabooga/text-generation-webui/releases/tag/v3.23

Anonymous
01/27/26(Tue)19:14:13 No.107985976

Anonymous 01/27/26(Tue)19:14:13 No.107985976▶

>>107985909
ollama ollama:8b

Anonymous
01/27/26(Tue)19:17:58 No.107986025

Anonymous 01/27/26(Tue)19:17:58 No.107986025▶

File: 1VmvE6Gsjgk.jpg (77.4 KB)

77.4 KB JPG

>miniconda refuses to uninstall itself
Wow I love vibecoding

Anonymous
01/27/26(Tue)19:22:51 No.107986083

Anonymous 01/27/26(Tue)19:22:51 No.107986083▶

>>107986025
>vibecoding
After only a few months, the stupid term has already lost all meaning, I see.

Anonymous
01/27/26(Tue)19:24:42 No.107986102

Anonymous 01/27/26(Tue)19:24:42 No.107986102▶

>>107986083
No, I just automatically assume that every major package distributor is infested with jeets and will eventually collapse.

Anonymous
01/27/26(Tue)19:30:08 No.107986153

Anonymous 01/27/26(Tue)19:30:08 No.107986153▶

>>107986102
A sign of collapse was the moment python started needing a package manager at all.

Anonymous
01/27/26(Tue)19:41:33 No.107986290

Anonymous 01/27/26(Tue)19:41:33 No.107986290▶

>>107986153
I write some automation scripts between my actual job and I am good at it. I don't write actual software. Can someone explain to me why python has those retarded specific directories with specific versions for each shit? I get the idea that you might want to stop supporting some function at some point but even for that you could just have multiple versions of libraries installed on pc? And programs could default to the latest available version?

Anonymous
01/27/26(Tue)19:45:39 No.107986339

Anonymous 01/27/26(Tue)19:45:39 No.107986339▶

>>107983872
>model under 5b for jp/ko/zh translation
A dictionary + your mind

Anonymous
01/27/26(Tue)19:46:39 No.107986350

Anonymous 01/27/26(Tue)19:46:39 No.107986350▶

>>107986301
>>107986301
>>107986301

Anonymous
01/27/26(Tue)19:47:05 No.107986353

Anonymous 01/27/26(Tue)19:47:05 No.107986353▶

>>107986290
Dependency collisions. But under normal circumstances you just have separate venvs.

Anonymous
01/27/26(Tue)19:52:48 No.107986412

Anonymous 01/27/26(Tue)19:52:48 No.107986412▶

>>107985575
>Run-on sentence character card with spelling mistakes and no capitalization
It's a miracle the model is able to produce anything at all :vomit_emoji:

Anonymous
01/27/26(Tue)20:27:34 No.107986802

Anonymous 01/27/26(Tue)20:27:34 No.107986802▶

>>107982605
I am, it still fails, like looking at the terminal during execution it finally looks like ksampler actually starts, and then BOOM unexplained server crash.
Both Gemini and Claude telling me RDN4 is still fucked for now and i have to wait.
I'm pissed but there's also not much i can do about it.

Anonymous
01/27/26(Tue)20:32:17 No.107986858

Anonymous 01/27/26(Tue)20:32:17 No.107986858▶

File: file.png (86.6 KB)

86.6 KB PNG

>>107986802
are you just doing text or do you need image? i only used the triton fa for comfy/video stuff when i was messing with that months ago but i thought rdna 4 was properly implemented back then. youre probably better off using llama cpp they use rocwmma for the fa implementation you can build with a flag.

Anonymous
01/27/26(Tue)20:35:56 No.107986896

Anonymous 01/27/26(Tue)20:35:56 No.107986896▶

>>107986858
I'm currently struggling with an image to image workflow i have working perfectly on a 4070, but fails on the 9070xt.
I'm using ComfyUI, i already have triton.

Anonymous
01/27/26(Tue)20:44:43 No.107986985

Anonymous 01/27/26(Tue)20:44:43 No.107986985▶

>>107986896
have you tried using the version built into torch this is what i have in my comfy env launch script
export TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1
export FIND_MODE=FAST
export PYTORCH_TUNABLEOP_ENABLED=1
export MIOPEN_FIND_MODE=FAST 
export GPU_ARCHS=gfx1100
export FLASH_ATTENTION_TRITON_AMD_ENABLE=TRUE
python dlbackend/ComfyUI/main.py --use-flash-attention --reserve-vram 1.2

i dont think i ever got sage attention to work though, tea cache worked and the fa definitely worked

Anonymous
01/27/26(Tue)21:05:27 No.107987196

Anonymous 01/27/26(Tue)21:05:27 No.107987196▶

>>107986896
AMD just works though?

Anonymous
01/27/26(Tue)21:18:19 No.107987301

Anonymous 01/27/26(Tue)21:18:19 No.107987301▶

>>107979507
What a downgrade

Anonymous
01/27/26(Tue)22:07:46 No.107987701

Anonymous 01/27/26(Tue)22:07:46 No.107987701▶

>>107980392
Do the smart thing and use the rocm container

Subject
Name
Comment
File	Supported: JPG, PNG, GIF, WebP, WebM, MP4, MP3 (max 4MB)
CAPTCHA

Reply to Thread #107977622