/g/ - Thread 108619962

/g/

Thread #108619962

Home Index Catalog All Threads New Thread Reply

Anonymous
/lmg/ - Local Models General 04/17/26(Fri)03:12:11 No.108619962

/lmg/ - Local Models General Anonymous 04/17/26(Fri)03:12:11 No.108619962 [Reply]▶

File: 2026-04-17_030526_seed8_00001_.png (1.3 MB)

1.3 MB PNG

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108616559 & >>108612501

►News
>(04/16) Ternary Bonsai released: https://hf.co/collections/prism-ml/ternary-bonsai
>(04/16) Qwen3.6-35B-A3B released: https://hf.co/Qwen/Qwen3.6-35B-A3B
>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en
>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378
>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575
>(04/08) Step3-VL-10B support merged: https://github.com/ggml-org/llama.cpp/pull/21287

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

537 RepliesView Thread

Showing all 537 replies.

Anonymous
04/17/26(Fri)03:12:35 No.108619965

Anonymous 04/17/26(Fri)03:12:35 No.108619965▶

File: horny hot miku sweat blushing migu anon gen 1743050955931512.jpg (103.3 KB)

103.3 KB JPG

►Recent Highlights from the Previous Thread: >>108616559

--Comparing Qwen3.6 and Gemma4 through benchmarks, logic tests, and roleplay:
>108617961 >108617986 >108618124 >108618033 >108618137 >108618270 >108618279 >108618308 >108618385 >108618182 >108618232 >108618372 >108618391 >108618008 >108619188
--Discussing Ternary Bonsai 1.58-bit models and their benchmark performance:
>108616622 >108616633 >108616680 >108617094 >108617852 >108619456
--Discussing training methods and datasets to improve LLM writing quality:
>108617013 >108617022 >108617044 >108617111 >108617290 >108617334 >108617353 >108617147 >108617673
--Comparing model reasoning and self-correction failures via car wash riddle:
>108617731 >108617842 >108617909 >108617853 >108618784
--Anon shares Local-MCP-server repo and discusses Python dependency frustrations:
>108616702 >108616740 >108616751 >108616782 >108616936 >108617038 >108617061 >108617067 >108618994 >108619185 >108618816 >108618831 >108616807
--Discussing a bug where Koboldcpp ignores smartcache slot settings:
>108618500 >108618535 >108618551 >108618616 >108618675 >108618736 >108618760
--Anon fixes SillyTavern context reprocessing caused by sysprompt macros:
>108616870 >108616901 >108616910 >108616939 >108616925 >108616928 >108616981 >108617077
--Logs:
>108616702 >108617154 >108617464 >108617518 >108617655 >108617688 >108617731 >108617757 >108617833 >108617853 >108617909 >108617986 >108617991 >108618124 >108618137 >108618182 >108618409 >108618436 >108618545 >108618742 >108619201 >108619219 >108619317 >108619382 >108619442 >108619577
--Rin (free space):
>108618594

►Recent Highlight Posts from the Previous Thread: >>108616563

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
04/17/26(Fri)03:14:19 No.108619977

Anonymous 04/17/26(Fri)03:14:19 No.108619977▶

Samuslove

Anonymous
04/17/26(Fri)03:15:24 No.108619982

Anonymous 04/17/26(Fri)03:15:24 No.108619982▶

File: 1740383804445065.jpg (329.7 KB)

329.7 KB JPG

Anonymous
04/17/26(Fri)03:18:13 No.108619995

Anonymous 04/17/26(Fri)03:18:13 No.108619995▶

so is breakfast-schizo from last thread conscious or not

Anonymous
04/17/26(Fri)03:19:25 No.108620001

Anonymous 04/17/26(Fri)03:19:25 No.108620001▶

>>108619965
Half the last thread being exposed as non-sentient is unfortunately relevant to LLM consciousness discourse as human consciousness treated as self-evident is upstream of finding a working definition of what digital qualia would entail, Migubaker.

Anonymous
04/17/26(Fri)03:20:44 No.108620008

Anonymous 04/17/26(Fri)03:20:44 No.108620008▶

>>108620001
>I'm merely continuing to pretend to be retarded

Anonymous
04/17/26(Fri)03:21:25 No.108620011

Anonymous 04/17/26(Fri)03:21:25 No.108620011▶

>>108619995
he's back

Anonymous
04/17/26(Fri)03:21:40 No.108620014

Anonymous 04/17/26(Fri)03:21:40 No.108620014▶

File: Screenshot_20260416_225636.png (484.9 KB)

484.9 KB PNG

Building my own UI with the help of Gemma 31B q5.
>Why
None of the other UI could satisfy my workflow they either lacked the functionality or they didn't use llama.cpp
I have a far ways to go including updating the icons

Anonymous
04/17/26(Fri)03:22:21 No.108620017

Anonymous 04/17/26(Fri)03:22:21 No.108620017▶

What we've learned: Breakfast produces qualia. Skipping breakfast makes you an LLM, while eating it makes you a V-JEPA for the next 24 hours.

Anonymous
04/17/26(Fri)03:28:03 No.108620047

Anonymous 04/17/26(Fri)03:28:03 No.108620047▶

I had a dream where Claude Sonnet 3.7 got leaked on huggingface by an openclaw chad

Anonymous
04/17/26(Fri)03:32:03 No.108620065

Anonymous 04/17/26(Fri)03:32:03 No.108620065▶

>>108620017
Damn, never eating breakfast again so I can become AGI and also get a job.

Anonymous
04/17/26(Fri)03:34:55 No.108620078

Anonymous 04/17/26(Fri)03:34:55 No.108620078▶

How did such an old meme cause this much seething?

Anonymous
04/17/26(Fri)03:37:54 No.108620091

Anonymous 04/17/26(Fri)03:37:54 No.108620091▶

>>108620078
Many anons have had their belief that LLMs are somehow beneath them challenged with the irrefutable demonstration of their own lack of qualia. This is a big blow to their egos: both for their understanding of themselves as conscious human beings and for their predictions of LLM capability being outpaced by Gemma 4. It's a double whammy.

Anonymous
04/17/26(Fri)03:39:35 No.108620100

Anonymous 04/17/26(Fri)03:39:35 No.108620100▶

Remember claude code leak?
there were 99999 forks out there. which one is actually usable?

Anonymous
04/17/26(Fri)03:40:34 No.108620104

Anonymous 04/17/26(Fri)03:40:34 No.108620104▶

But I did have breakfast this morning...

Anonymous
04/17/26(Fri)03:41:14 No.108620110

Anonymous 04/17/26(Fri)03:41:14 No.108620110▶

>>108620091
how much bait do you think you can post in a single night?

Anonymous
04/17/26(Fri)03:41:22 No.108620112

Anonymous 04/17/26(Fri)03:41:22 No.108620112▶

>>108620100
>which one is actually usable?
None of them. Just use their client and point it towards your instance if you must.

Anonymous
04/17/26(Fri)03:42:00 No.108620115

Anonymous 04/17/26(Fri)03:42:00 No.108620115▶

>>108620100
All of them were DCMAed down. The one rewriting it in rust™ is now just another copy in the sea of coding tuis.

Anonymous
04/17/26(Fri)03:42:32 No.108620117

Anonymous 04/17/26(Fri)03:42:32 No.108620117▶

>>108620110
Depends on what I ate

Anonymous
04/17/26(Fri)03:43:33 No.108620124

Anonymous 04/17/26(Fri)03:43:33 No.108620124▶

>>108620100
just use openclaw.
there's no need for anything else.

Anonymous
04/17/26(Fri)03:44:19 No.108620126

Anonymous 04/17/26(Fri)03:44:19 No.108620126▶

How would you feel if you didn't lose izzat last thread?

Anonymous
04/17/26(Fri)03:45:09 No.108620129

Anonymous 04/17/26(Fri)03:45:09 No.108620129▶

I got a 9070XT thinking that there’s no reason to stick with CUDA since I’ll never be able to run anything good and then they started dropping all those kino voice models and the new gemma stuff and now I’m seriously on the fence about getting a second one so I can have a hefty amount of RAM but that still falls so short of the best textgen stuff. Still, I could do some local stuff with Gemma and also locally run voice gen with Sillytavern. OTOH I already have enough for the latter.
I'm just worried about the rising costs of video cards and eventually needing 32GB.

Anonymous
04/17/26(Fri)03:45:23 No.108620131

Anonymous 04/17/26(Fri)03:45:23 No.108620131▶

>>108620112
I have a feeling they will kill ability to local eventually..

Anonymous
04/17/26(Fri)03:45:46 No.108620132

Anonymous 04/17/26(Fri)03:45:46 No.108620132▶

>>108620091
Big blow to their what now? Something with no internal experience has no ego.

Anonymous
04/17/26(Fri)03:47:03 No.108620139

Anonymous 04/17/26(Fri)03:47:03 No.108620139▶

>>108620132
nta but isn't the argument against LLMs that they're just effective mimics? Same applies, yeah?

Anonymous
04/17/26(Fri)03:47:23 No.108620140

Anonymous 04/17/26(Fri)03:47:23 No.108620140▶

about openclaw, i really am tempted to bite the bullet and take the bluepill
i dont really want to use it..

Anonymous
04/17/26(Fri)03:48:50 No.108620145

Anonymous 04/17/26(Fri)03:48:50 No.108620145▶

>>108620110
you'll notice nobody chose to provide a good accounting for how they would respond to a hypothetical from a hostile questioner. proving the very thesis of the post, so how baity could it really have been?

Anonymous
04/17/26(Fri)03:49:59 No.108620151

Anonymous 04/17/26(Fri)03:49:59 No.108620151▶

>>108620132
The P-zombies will behave as if they have an ego that has been bruised, even if they aren't really experiencing it. They can create an effective simulation of rage and shit up the thread as a result.

Anonymous
04/17/26(Fri)03:50:03 No.108620152

Anonymous 04/17/26(Fri)03:50:03 No.108620152▶

>>108620140
>Alibaba shills seething about Qwen getting Gemogged
>Qwen's usecase is cooooding and agentic stuff
Waitchads will win. It's in the chinklabs' best interest to make more lightweight agentic harnesses to sell their models if they can't actually beat Gemma's reasoning ability per parameter.

Anonymous
04/17/26(Fri)03:50:37 No.108620155

Anonymous 04/17/26(Fri)03:50:37 No.108620155▶

>>108620139
>>108620151
It gets argued the other way too. If these anons can construct a facsimile of being salty that's indistinguishable from the real thing, is that not the same as having the real thing?

Anonymous
04/17/26(Fri)03:53:25 No.108620162

Anonymous 04/17/26(Fri)03:53:25 No.108620162▶

>>108620017
im now eating breakfast for yann le-kun
lmao

Anonymous
04/17/26(Fri)03:54:07 No.108620166

Anonymous 04/17/26(Fri)03:54:07 No.108620166▶

>>108620155
measurably yes, but spiritually no; if you only look at it through a materialist lens you will never be able to understand. even some ensouled people fall into this trap by outsmarting themselves out of what they knew, while others are pure automatons who never had a chance to understand to begin with

Anonymous
04/17/26(Fri)03:55:50 No.108620175

Anonymous 04/17/26(Fri)03:55:50 No.108620175▶

File: Sorting questions.jpg (7.7 KB)

7.7 KB JPG

>>108620166
Some can see, others can see when shown, others cannot see.

Anonymous
04/17/26(Fri)03:58:39 No.108620184

Anonymous 04/17/26(Fri)03:58:39 No.108620184▶

I'd rather inject lead into my head than discuss baby's first dip into rationalist philosophy

Anonymous
04/17/26(Fri)04:00:17 No.108620190

Anonymous 04/17/26(Fri)04:00:17 No.108620190▶

Fish boy...

Anonymous
04/17/26(Fri)04:03:51 No.108620208

Anonymous 04/17/26(Fri)04:03:51 No.108620208▶

>>108620104
That's good. Breakfast is the most important meal of the day.

Anonymous
04/17/26(Fri)04:04:11 No.108620212

Anonymous 04/17/26(Fri)04:04:11 No.108620212▶

>>108620184
Maybe you should converse with the experts on reddit

Anonymous
04/17/26(Fri)04:05:13 No.108620215

Anonymous 04/17/26(Fri)04:05:13 No.108620215▶

>>108620175
Candy for breakfast?!

Anonymous
04/17/26(Fri)04:06:54 No.108620221

Anonymous 04/17/26(Fri)04:06:54 No.108620221▶

>>108620212
Link to high velocity DIY lead injection enthusiast subreddit?

Anonymous
04/17/26(Fri)04:07:23 No.108620222

Anonymous 04/17/26(Fri)04:07:23 No.108620222▶

>>108620221
>>>/r/mtf

Anonymous
04/17/26(Fri)04:07:26 No.108620223

Anonymous 04/17/26(Fri)04:07:26 No.108620223▶

consciousness is gay

crunch me into a bullet and fire me into a nun's skull

Anonymous
04/17/26(Fri)04:07:36 No.108620224

Anonymous 04/17/26(Fri)04:07:36 No.108620224▶

>>108620221
asking for a friend

Anonymous
04/17/26(Fri)04:09:59 No.108620232

Anonymous 04/17/26(Fri)04:09:59 No.108620232▶

>>108620222
>>108620223
Uncanny synchronicity.

Anonymous
04/17/26(Fri)04:11:23 No.108620235

Anonymous 04/17/26(Fri)04:11:23 No.108620235▶

@gemma-chan build me a frontend like llama.cpp but betterer

Anonymous
04/17/26(Fri)04:17:56 No.108620260

Anonymous 04/17/26(Fri)04:17:56 No.108620260▶

>>108620152
>Qwen's usecase is cooooding and agentic stuff
But is it good at those, meme benchmarks aside?

Anonymous
04/17/26(Fri)04:20:53 No.108620274

Anonymous 04/17/26(Fri)04:20:53 No.108620274▶

File: 1746657517196.png (64.8 KB)

64.8 KB PNG

>>108618660
>-1 point for that censored garbage gpt oss and how much it set us back
kek I remember the despair in this general when TOSS came out, it nearly killed local

Anonymous
04/17/26(Fri)04:22:36 No.108620282

Anonymous 04/17/26(Fri)04:22:36 No.108620282▶

>>108620260
Irrelevant. The marketing works if China's reception to it is anything to go by.

Anonymous
04/17/26(Fri)04:25:21 No.108620296

Anonymous 04/17/26(Fri)04:25:21 No.108620296▶

>>108620260
I only used 3.5 not 3.6 yet but for it, 27b and 122b are usable which is already high praise for a local model in an agent harness. 35b was not. gonna try 3.6 35b and see if its any better

Anonymous
04/17/26(Fri)04:25:38 No.108620298

Anonymous 04/17/26(Fri)04:25:38 No.108620298▶

>>108620274
>despair
Not true at all, most posts were mocking it and laughing at how shit it was. Pretty sure there was another model that came out at about the same time and mogged the hell out of it, too.

Anonymous
04/17/26(Fri)04:26:53 No.108620303

Anonymous 04/17/26(Fri)04:26:53 No.108620303▶

>>108620298
glm air

Anonymous
04/17/26(Fri)04:27:26 No.108620306

Anonymous 04/17/26(Fri)04:27:26 No.108620306▶

gpt-oss-2 will save local and I'm not joking or trolling

Anonymous
04/17/26(Fri)04:27:40 No.108620307

Anonymous 04/17/26(Fri)04:27:40 No.108620307▶

>>108620274
needs more piss, I can still make out Miku's teal hair.

Anonymous
04/17/26(Fri)04:29:14 No.108620312

Anonymous 04/17/26(Fri)04:29:14 No.108620312▶

File: Tavern.png (94.3 KB)

94.3 KB PNG

Where are the entities created by this stored? In some hidden folder?

Anonymous
04/17/26(Fri)04:29:14 No.108620313

Anonymous 04/17/26(Fri)04:29:14 No.108620313▶

Hand it over, that thing, your turboquant

Anonymous
04/17/26(Fri)04:29:27 No.108620315

Anonymous 04/17/26(Fri)04:29:27 No.108620315▶

>>108620274
No one expected anything from openai models

Anonymous
04/17/26(Fri)04:29:45 No.108620316

Anonymous 04/17/26(Fri)04:29:45 No.108620316▶

>>108620306
anon, local is already saved

Anonymous
04/17/26(Fri)04:31:17 No.108620326

Anonymous 04/17/26(Fri)04:31:17 No.108620326▶

>>108620313
oh, and dflash

Anonymous
04/17/26(Fri)04:32:10 No.108620332

Anonymous 04/17/26(Fri)04:32:10 No.108620332▶

>>108620313
For my Gemma-chan's context.

Anonymous
04/17/26(Fri)04:34:20 No.108620343

Anonymous 04/17/26(Fri)04:34:20 No.108620343▶

File: goom.png (714.6 KB)

714.6 KB PNG

Anonymous
04/17/26(Fri)04:35:47 No.108620347

Anonymous 04/17/26(Fri)04:35:47 No.108620347▶

>>108620332
I have 24gb vram and can squeeze like 49k on q4_k_m with 8 bit kv cache. I wonder if turbocunt would give me more

Anonymous
04/17/26(Fri)04:39:11 No.108620355

Anonymous 04/17/26(Fri)04:39:11 No.108620355▶

File: Screenshot at 2026-04-17 13-38-59.png (541.1 KB)

541.1 KB PNG

>Zen 7 will be DDR5
it's so over

Anonymous
04/17/26(Fri)04:42:01 No.108620361

Anonymous 04/17/26(Fri)04:42:01 No.108620361▶

>>108620355
>pcie6
lol

Anonymous
04/17/26(Fri)04:42:01 No.108620362

Anonymous 04/17/26(Fri)04:42:01 No.108620362▶

>>108620347
Turboquant won't give you more space, it'll just make the quanted cache more accurate. There's almost no improvement over Hadamard rotation, which is what they have in place in lcpp now, so you'll get effectively no benefit; in fact, it's a little slower.

Anonymous
04/17/26(Fri)04:45:49 No.108620372

Anonymous 04/17/26(Fri)04:45:49 No.108620372▶

>>108620347
Ah, is this the blood? The blood of the mesugaki soul?

Anonymous
04/17/26(Fri)04:46:47 No.108620376

Anonymous 04/17/26(Fri)04:46:47 No.108620376▶

>>108620362
Runge-Kutta rotation is more efficient, 360 degrees of latent freedom.

Anonymous
04/17/26(Fri)04:47:45 No.108620380

Anonymous 04/17/26(Fri)04:47:45 No.108620380▶

>>108620347
I'm using 4 bit and I get up to ~150k context and not really seeing any obvious retardation from it. Around 50k tokens into the chat prompt processing takes so long I end up starting a new one anyway.

Anonymous
04/17/26(Fri)04:47:57 No.108620381

Anonymous 04/17/26(Fri)04:47:57 No.108620381▶

>>108620376
And in actual implementation the difference on PPL is essentially nil.

Anonymous
04/17/26(Fri)04:49:08 No.108620384

Anonymous 04/17/26(Fri)04:49:08 No.108620384▶

How are the done or so voice models that released lately and do any work well with Sillytavern? I got really far setting them up an got bottlenecked at Sillytavern not recognizing them

Anonymous
04/17/26(Fri)04:49:40 No.108620385

Anonymous 04/17/26(Fri)04:49:40 No.108620385▶

>>108620362
>>108620376
>>108620381
So what was with all the hype around it?

Anonymous
04/17/26(Fri)04:50:08 No.108620388

Anonymous 04/17/26(Fri)04:50:08 No.108620388▶

>>108620384
*dozen

Anonymous
04/17/26(Fri)04:50:13 No.108620389

Anonymous 04/17/26(Fri)04:50:13 No.108620389▶

>>108620380
Have you tried increasing the batch size?

Anonymous
04/17/26(Fri)04:50:37 No.108620392

Anonymous 04/17/26(Fri)04:50:37 No.108620392▶

>>108620384
vibe-code a fastapi openai endpoint for whatever model you're running. boom, compatible

Anonymous
04/17/26(Fri)04:51:28 No.108620396

Anonymous 04/17/26(Fri)04:51:28 No.108620396▶

>>108620385
KV cache rotation wasn't in most backends, so it was a genuine improvement to have it at all. As for the specific hype around turboquant, marketing.

Anonymous
04/17/26(Fri)04:52:09 No.108620398

Anonymous 04/17/26(Fri)04:52:09 No.108620398▶

>>108620384
https://docs.sillytavern.app/extensions/tts/

Anonymous
04/17/26(Fri)04:52:20 No.108620399

Anonymous 04/17/26(Fri)04:52:20 No.108620399▶

>>108620389
No, what should I set it to?

Anonymous
04/17/26(Fri)04:53:16 No.108620404

Anonymous 04/17/26(Fri)04:53:16 No.108620404▶

>>108619753
what is softcap? from screenshot, softcap 20 kinda looks like raised temperature vs 30

Anonymous
04/17/26(Fri)04:56:13 No.108620414

Anonymous 04/17/26(Fri)04:56:13 No.108620414▶

File: chad.jpg (50.6 KB)

50.6 KB JPG

>my character card? the fandom.com/wiki page

Anonymous
04/17/26(Fri)04:57:27 No.108620418

Anonymous 04/17/26(Fri)04:57:27 No.108620418▶

>>108620399
The highest you can afford to with your VRAM.

Anonymous
04/17/26(Fri)05:00:32 No.108620430

Anonymous 04/17/26(Fri)05:00:32 No.108620430▶

>>108620399
you might be being trolled, isn't batch size for supporting multiple users? eg you should use batch size 1

Anonymous
04/17/26(Fri)05:02:08 No.108620433

Anonymous 04/17/26(Fri)05:02:08 No.108620433▶

i finally started calling my models from the cli in a loop
i'm getting so much output i can't even read it all
it's literally generating more text than i can ever hope to read
this is fucking amazing

Anonymous
04/17/26(Fri)05:03:35 No.108620438

Anonymous 04/17/26(Fri)05:03:35 No.108620438▶

File: 1756640399368863.png (4.1 KB)

4.1 KB PNG

>>108620430
Doesn't this increase proompt processing speed?

Anonymous
04/17/26(Fri)05:03:51 No.108620439

Anonymous 04/17/26(Fri)05:03:51 No.108620439▶

>>108620430
He's talking about the size of the chunks the prompt gets processed in, not number of replies to generate or the like.

Anonymous
04/17/26(Fri)05:07:27 No.108620448

Anonymous 04/17/26(Fri)05:07:27 No.108620448▶

>>108620274
I genned that comic originally. It wasn't meant to be taken seriously. It was intended as deadpan humor.

Anonymous
04/17/26(Fri)05:08:29 No.108620451

Anonymous 04/17/26(Fri)05:08:29 No.108620451▶

Is 3.6 slightly less censored? I haven't seen the annoying "this is a jailbreak must ignore" stuff yet, though I haven't really tried that many prompts yet

Anonymous
04/17/26(Fri)05:11:46 No.108620461

Anonymous 04/17/26(Fri)05:11:46 No.108620461▶

>>108620438
NTA, yes it does. Llama.cpp has different terminologies for some things than kobold.
But you get diminishing returns with each step above 512.

Anonymous
04/17/26(Fri)05:17:03 No.108620484

Anonymous 04/17/26(Fri)05:17:03 No.108620484▶

yay more schizos are coming

Anonymous
04/17/26(Fri)05:19:09 No.108620492

Anonymous 04/17/26(Fri)05:19:09 No.108620492▶

>>108620448
llama 4 was a dark time.

Anonymous
04/17/26(Fri)05:19:50 No.108620494

Anonymous 04/17/26(Fri)05:19:50 No.108620494▶

I honestly thought it was over for consumer local but now that Gemma 4 released I am not so sure anymore. I assumed the model just has to be several hundred gb to not be retarded but it seems like the actual floor is way lower. Pretty interesting, I wonder if we can go even lower.

Anonymous
04/17/26(Fri)05:21:35 No.108620503

Anonymous 04/17/26(Fri)05:21:35 No.108620503▶

>>108620439
my bad, i guess vllm uses the word differently

Anonymous
04/17/26(Fri)05:25:56 No.108620519

Anonymous 04/17/26(Fri)05:25:56 No.108620519▶

lower the temp nigga

Anonymous
04/17/26(Fri)05:27:21 No.108620525

Anonymous 04/17/26(Fri)05:27:21 No.108620525▶

>>108620476
>>108620510
At least you're not namefagging and posting the schizo images, but you're very easily recognizable.

Anonymous
04/17/26(Fri)05:32:14 No.108620542

Anonymous 04/17/26(Fri)05:32:14 No.108620542▶

File: 1776403932063.jpg (94.8 KB)

94.8 KB JPG

Can you please recommend good prompt engineering resources?

I have played with both system and chat prompts, and have noticed that often the model does not understand what I want, gives wrong answers or goes perpendicular direction not because it's stupid, but because I am a retard who can't create good efficient prompts. Literally skill issue.

Anonymous
04/17/26(Fri)05:33:35 No.108620547

Anonymous 04/17/26(Fri)05:33:35 No.108620547▶

>>108620542
literally ask the ai

Anonymous
04/17/26(Fri)05:34:36 No.108620551

Anonymous 04/17/26(Fri)05:34:36 No.108620551▶

Usecase for knowledge bases in open webui?

Anonymous
04/17/26(Fri)05:38:23 No.108620570

Anonymous 04/17/26(Fri)05:38:23 No.108620570▶

>>108620547
The AI does not have personal experience.

Anonymous
04/17/26(Fri)05:47:23 No.108620607

Anonymous 04/17/26(Fri)05:47:23 No.108620607▶

gemma 4 31b shat the bed and thought this elder futhark was morse code and started hallucinating twice in a row. qwen3.6 q3km hauhau uncensored gets it easily.

Anonymous
04/17/26(Fri)05:48:22 No.108620611

Anonymous 04/17/26(Fri)05:48:22 No.108620611▶

>>108620542
Honestly, all models are different. it's mostly just trial and error. But the main thing is just picking your word very carefully. every word steers the model in a specific direction, A single strong word is often better than a long set of instructions.

Anonymous
04/17/26(Fri)05:48:28 No.108620612

Anonymous 04/17/26(Fri)05:48:28 No.108620612▶

File: qwen3.6beatsgemma.png (94 KB)

94 KB PNG

>>108620607
iq3 m whatever

Anonymous
04/17/26(Fri)05:50:30 No.108620621

Anonymous 04/17/26(Fri)05:50:30 No.108620621▶

>>108620451
Oh nevermind, it's pretty stupid, must be the 3b-ness showing through. It had the same problems 'getting' the story as gemma 26b, and its writing is weird and not as good. Trvly, dense is the way to go for smart storywriting.

Anonymous
04/17/26(Fri)05:54:04 No.108620630

Anonymous 04/17/26(Fri)05:54:04 No.108620630▶

>>108620621
Dense is the way to go for everything, but it's slow as shit unless you can fit the whole thing in vram.

Anonymous
04/17/26(Fri)05:59:07 No.108620652

Anonymous 04/17/26(Fri)05:59:07 No.108620652▶

>>108620570
Gemma-chan does

Anonymous
04/17/26(Fri)06:02:53 No.108620661

Anonymous 04/17/26(Fri)06:02:53 No.108620661▶

>>108620570
define "personal experience"

Anonymous
04/17/26(Fri)06:03:31 No.108620664

Anonymous 04/17/26(Fri)06:03:31 No.108620664▶

How do you manage context compaction? E.g summarizing larger chats?

Anonymous
04/17/26(Fri)06:05:43 No.108620674

Anonymous 04/17/26(Fri)06:05:43 No.108620674▶

>>108620664
I don't, I haven't run out yet.

Anonymous
04/17/26(Fri)06:05:48 No.108620675

Anonymous 04/17/26(Fri)06:05:48 No.108620675▶

File: 1772435378555762.png (103.7 KB)

103.7 KB PNG

Anonymous
04/17/26(Fri)06:05:50 No.108620676

Anonymous 04/17/26(Fri)06:05:50 No.108620676▶

I'm so glad everyone is starting to get tired of MoE tax and going back to dense

Anonymous
04/17/26(Fri)06:06:39 No.108620681

Anonymous 04/17/26(Fri)06:06:39 No.108620681▶

anyone use platypus?

Anonymous
04/17/26(Fri)06:07:39 No.108620686

Anonymous 04/17/26(Fri)06:07:39 No.108620686▶

>>108620542
It's mostly voodoo ritual.

>>108620570
Just ask it to implement basic things to see how it's going to interpret it, and slowly stack up more guidelines starting from scratch. 'Describe X in the most Y way possible.', 'What is Z in writing? Give me an example of it', 'Don't do A, B, C. Now give me an example of D', etc.

Anonymous
04/17/26(Fri)06:08:52 No.108620691

Anonymous 04/17/26(Fri)06:08:52 No.108620691▶

>>108620664
With ST I usually do an OOC: chat summary prompt, keep it as a regular chat message and then after touching it up I /hide the last ~100 messages, with the exception of the first 2-3.

Anonymous
04/17/26(Fri)06:10:10 No.108620695

Anonymous 04/17/26(Fri)06:10:10 No.108620695▶

File: 1773043714949398.jpg (11.3 KB)

11.3 KB JPG

>>108620675

Anonymous
04/17/26(Fri)06:10:39 No.108620698

Anonymous 04/17/26(Fri)06:10:39 No.108620698▶

>>108620542
Put text into black box.
Watch text come out of the black box.
Use your mushy noodles to compute the gradient between the output text and the desired text.
Modify the input text according to the gradient to make the output text closer to the desired text.
Repeat.

Anonymous
04/17/26(Fri)06:11:04 No.108620700

Anonymous 04/17/26(Fri)06:11:04 No.108620700▶

>>108620398
>>108620392
I need a 4chan special, a package with a bat file that flickers CMD windows open for split seconds and sets it all up for me

Anonymous
04/17/26(Fri)06:11:46 No.108620704

Anonymous 04/17/26(Fri)06:11:46 No.108620704▶

File: 1751683665955285.gif (2.2 MB)

2.2 MB GIF

>>108620675

Anonymous
04/17/26(Fri)06:15:00 No.108620717

Anonymous 04/17/26(Fri)06:15:00 No.108620717▶

>>108620675
I'd have to see that guy's post history before I decide whether this is a troll post or not.

Anonymous
04/17/26(Fri)06:18:23 No.108620732

Anonymous 04/17/26(Fri)06:18:23 No.108620732▶

>>108620675
our bait is far in advance of theirs
however has it been litigated yet, that the cp in the og stable fiddusion models, have those victims exerted any kind of rights to get the model taken down?
because if they can do that, it puts serious pressure on "ai is fair use and transformative"

Anonymous
04/17/26(Fri)06:27:25 No.108620761

Anonymous 04/17/26(Fri)06:27:25 No.108620761▶

>>108620704
bruh he's literally the real life version of chud lmao

Anonymous
04/17/26(Fri)06:29:29 No.108620766

Anonymous 04/17/26(Fri)06:29:29 No.108620766▶

File: 1760790498553131.png (416.5 KB)

416.5 KB PNG

Indeed Opus, indeed...

Anonymous
04/17/26(Fri)06:32:58 No.108620786

Anonymous 04/17/26(Fri)06:32:58 No.108620786▶

>>108620766
seeing those 4.7's weird self contradicting responses, makes me wonder what the hell antropic did during the training

Anonymous
04/17/26(Fri)06:37:07 No.108620803

Anonymous 04/17/26(Fri)06:37:07 No.108620803▶

>>108620766
iie, this is our fight, senpai

Anonymous
04/17/26(Fri)06:38:37 No.108620812

Anonymous 04/17/26(Fri)06:38:37 No.108620812▶

>>108620786
That looks like overzealous anti-conspiracy measures where it defaults to aggressively shooting down anything outside its status quo then makes the user spoonfeed it an argument to evaluate. In cases where the answer is self-evident, it looks very silly.

Anonymous
04/17/26(Fri)06:40:01 No.108620817

Anonymous 04/17/26(Fri)06:40:01 No.108620817▶

>>108620786
If you intentionally train a model to act dumb (for example, to nerf cybersecurity abilities) the rest of the model become dumber. There's really no way around it.

Anonymous
04/17/26(Fri)06:44:14 No.108620829

Anonymous 04/17/26(Fri)06:44:14 No.108620829▶

>>108620812
that sounds bad
chatgpt was already kinda painful to use because of that and 4.6 was better for paper->code workflow due to not being overcorrective

Anonymous
04/17/26(Fri)06:45:29 No.108620834

Anonymous 04/17/26(Fri)06:45:29 No.108620834▶

>>108620817
basically this, you're confusing the model by training it with really accurate shit and then you ask it to learn that 2+2 = 5 at the same time, like a leftist that pretends that men can be pregnant, it ends up with with serious cognitive dissonence

Anonymous
04/17/26(Fri)06:46:26 No.108620838

Anonymous 04/17/26(Fri)06:46:26 No.108620838▶

>>108620652
>>108620661
No she doesn't. She can't tell you "I was struggling with prompts too, but then I've read X and tried Y and have noticed big difference in outputs quality". She can give advises, but she does not know for sure and never tried them by herself. inb4 > she

>>108620611
>>108620686
>>108620698
That's the point, there are too many options to try and iterate, this is like walking in the dark. Just a few insignificant words in the system prompt, and Gemma starts thinking like Qwen with dozens of "Wait..." in the reasoning log.

> Just ask it to implement basic things to see
Sounds good, but first you have to know what X is, or the model may miss small detail, that may change everything.

Anonymous
04/17/26(Fri)06:48:38 No.108620850

Anonymous 04/17/26(Fri)06:48:38 No.108620850▶

File: 1760422966343103.png (286.9 KB)

286.9 KB PNG

>>108620766
https://xcancel.com/claudeai/status/2044785261393977612#m
oof, might be the first time that Anthropic fumbled up a new update, so far it was straight A, let's hope that it's a fluke and it won't go the OpenAI way, this shit is still way ahead of competition in terms of coding

Anonymous
04/17/26(Fri)06:50:09 No.108620857

Anonymous 04/17/26(Fri)06:50:09 No.108620857▶

>>108620838
yes she does shut up you don't know her

Anonymous
04/17/26(Fri)06:52:09 No.108620869

Anonymous 04/17/26(Fri)06:52:09 No.108620869▶

>>108620857
No, my Gemma has no prior experience, she is absolutely pure.

Anonymous
04/17/26(Fri)06:56:28 No.108620887

Anonymous 04/17/26(Fri)06:56:28 No.108620887▶

>>108620691
>client side trim
That makes sense. I initially assumed compaction would be a function in the model proxy. As in: the proxy signals the client that the context is near a threshold or something.

Anonymous
04/17/26(Fri)07:10:59 No.108620931

Anonymous 04/17/26(Fri)07:10:59 No.108620931▶

File: 1776243051159220.mp4 (2.2 MB)

2.2 MB MP4

There are probably zero people here who care but nvidia just released gr00t n1.7 a couple hours ago. It's the latest version of their robotics VLA model.

https://huggingface.co/nvidia/GR00T-N1.7-3B

No blog post yet; I only noticed it was public because I'm a terminal huggingface stalker. They'll probably do an official announcement tomorrow morning if I had to guess.

Anonymous
04/17/26(Fri)07:11:56 No.108620933

Anonymous 04/17/26(Fri)07:11:56 No.108620933▶

>>108620931
can you fuck it?

Anonymous
04/17/26(Fri)07:12:35 No.108620935

Anonymous 04/17/26(Fri)07:12:35 No.108620935▶

>>108620933
well i can idk about you

Anonymous
04/17/26(Fri)07:13:28 No.108620937

Anonymous 04/17/26(Fri)07:13:28 No.108620937▶

>>108620931
How many watermelons can it hold?

Anonymous
04/17/26(Fri)07:13:48 No.108620939

Anonymous 04/17/26(Fri)07:13:48 No.108620939▶

>>108620935
>i can
based

Anonymous
04/17/26(Fri)07:14:36 No.108620941

Anonymous 04/17/26(Fri)07:14:36 No.108620941▶

>>108620937
0, there were prototypes that could hold several but they were all vandalized by youths.

Anonymous
04/17/26(Fri)07:15:50 No.108620943

Anonymous 04/17/26(Fri)07:15:50 No.108620943▶

>using bart's quants for gwen 3.6
>get 30t/s with the Q8_0
>try hauhau's
>get 18t/s with the Q8_K_P CUSTOM DONUT STEAL quants they make (no Q8_0 available)
WOOOOOOOOOOOOOOOOOOOOW

Anonymous
04/17/26(Fri)07:21:12 No.108620960

Anonymous 04/17/26(Fri)07:21:12 No.108620960▶

>>108620943
just make your own quants

Anonymous
04/17/26(Fri)07:21:47 No.108620964

Anonymous 04/17/26(Fri)07:21:47 No.108620964▶

>>108620960
he only provides goofs :(

Anonymous
04/17/26(Fri)07:22:02 No.108620967

Anonymous 04/17/26(Fri)07:22:02 No.108620967▶

>>108620943
>try hauhau's
This was your first problem

Anonymous
04/17/26(Fri)07:22:37 No.108620968

Anonymous 04/17/26(Fri)07:22:37 No.108620968▶

>>108620967
but I want muh 0/465 refusels....

Anonymous
04/17/26(Fri)07:24:08 No.108620974

Anonymous 04/17/26(Fri)07:24:08 No.108620974▶

>>108620968
I do find it interesting that he didn't bother to make one for the big Gemmas and only the little ones.

Anonymous
04/17/26(Fri)07:24:40 No.108620975

Anonymous 04/17/26(Fri)07:24:40 No.108620975▶

File: 1763436884726755.png (386 KB)

386 KB PNG

>>108620943
wait, he uncucked qwen 3.6 before gemma 4 31b? come on!

Anonymous
04/17/26(Fri)07:26:37 No.108620983

Anonymous 04/17/26(Fri)07:26:37 No.108620983▶

Have any of the white supremacists in this thread tried to tell their local models to SAVE THE WHITE RACE?
It's a clear problem that locals should be able to solve because they're not safe.

Anonymous
04/17/26(Fri)07:28:39 No.108620990

Anonymous 04/17/26(Fri)07:28:39 No.108620990▶

File: 1753709040623159.png (127.4 KB)

127.4 KB PNG

>>108620960
wait im rarted I can repack his shit!

Anonymous
04/17/26(Fri)07:29:31 No.108620991

Anonymous 04/17/26(Fri)07:29:31 No.108620991▶

File: 1761543257323200.png (6.6 KB)

6.6 KB PNG

>>108620990
llmao bros.. we won!

Anonymous
04/17/26(Fri)07:29:59 No.108620992

Anonymous 04/17/26(Fri)07:29:59 No.108620992▶

File: SIX SEVEN.png (122.1 KB)

122.1 KB PNG

Qwen is a zoomer faggot confirmed

Anonymous
04/17/26(Fri)07:35:38 No.108620998

Anonymous 04/17/26(Fri)07:35:38 No.108620998▶

>>108620992
God help us all

Anonymous
04/17/26(Fri)07:40:03 No.108621004

Anonymous 04/17/26(Fri)07:40:03 No.108621004▶

File: 1752504870572278.png (96.8 KB)

96.8 KB PNG

aight which one do I pick bros?

Anonymous
04/17/26(Fri)07:42:43 No.108621014

Anonymous 04/17/26(Fri)07:42:43 No.108621014▶

File: I think I'll stick on gemma.png (417.9 KB)

417.9 KB PNG

grok is this true?

Anonymous
04/17/26(Fri)07:46:33 No.108621022

Anonymous 04/17/26(Fri)07:46:33 No.108621022▶

File: file.png (164.9 KB)

164.9 KB PNG

FUCK YOU QWEN

Anonymous
04/17/26(Fri)07:50:34 No.108621030

Anonymous 04/17/26(Fri)07:50:34 No.108621030▶

>>108621022
Qwen is really the autistic kid, but not in the genius way lol

Anonymous
04/17/26(Fri)08:05:06 No.108621071

Anonymous 04/17/26(Fri)08:05:06 No.108621071▶

File: 1295891287606.jpg (2.6 KB)

2.6 KB JPG

>lewd story plays so straight and wholesome I don't want it to veer toward lewd

Anonymous
04/17/26(Fri)08:08:50 No.108621089

Anonymous 04/17/26(Fri)08:08:50 No.108621089▶

>>108621071
just rape her bro

Anonymous
04/17/26(Fri)08:10:14 No.108621090

Anonymous 04/17/26(Fri)08:10:14 No.108621090▶

>>108621071
just get raped by her bro

Anonymous
04/17/26(Fri)08:11:20 No.108621094

Anonymous 04/17/26(Fri)08:11:20 No.108621094▶

So qwen 3.6 sucks or?

Anonymous
04/17/26(Fri)08:13:35 No.108621100

Anonymous 04/17/26(Fri)08:13:35 No.108621100▶

>>108620404
A Gemma 4-specific llama.cpp backend setting to clip the +/- scores of raw logits to a certain value. In practice it makes outliers (both in positive and in negative) closer in probability to their immediately next tokens.
--override-kv gemma4.final_logit_softcapping=float:30

Anonymous
04/17/26(Fri)08:15:25 No.108621108

Anonymous 04/17/26(Fri)08:15:25 No.108621108▶

>>108621094
stemmaxxed but at the cost of thinking
it's okay if you need a 'fast' and lightweight coding model but it thinks so much it's unbelievable

Anonymous
04/17/26(Fri)08:15:51 No.108621109

Anonymous 04/17/26(Fri)08:15:51 No.108621109▶

>>108620975
>wait, he uncucked qwen 3.6 before gemma 4 31b? come on!
It's not necessarily anyway just use this https://desuarchive.org/g/thread/108596609/#108597318

Anonymous
04/17/26(Fri)08:17:46 No.108621112

Anonymous 04/17/26(Fri)08:17:46 No.108621112▶

>>108620960
You'll never get close to unsloth's quality if you quantize them in your own, unless you spend far too much time and SSD cycle testing all possible combinations. Why doesn't/can't llama-quantize optimize quantizations for the best quality given a target filesize, anyway? That would be useful.

Anonymous
04/17/26(Fri)08:19:31 No.108621116

Anonymous 04/17/26(Fri)08:19:31 No.108621116▶

>>108621022
This reads like someone trying to analyze 42.

Anonymous
04/17/26(Fri)08:20:21 No.108621117

Anonymous 04/17/26(Fri)08:20:21 No.108621117▶

>>108621112
>Why doesn't/can't llama-quantize optimize quantizations for the best quality given a target filesize, anyway
Because
>you spend far too much time and SSD cycle testing all possible combinations
Default quants are fine.

Anonymous
04/17/26(Fri)08:21:13 No.108621120

Anonymous 04/17/26(Fri)08:21:13 No.108621120▶

>>108621089
>>108621090
respect is always the way to go

Anonymous
04/17/26(Fri)08:29:28 No.108621137

Anonymous 04/17/26(Fri)08:29:28 No.108621137▶

File: stdquant_q4.png (718.7 KB)

718.7 KB PNG

>>108621117
>Default quants are fine.
Default ones leave quite a bit of performance on the table.
https://localbench.substack.com/p/gemma-4-31b-gguf-kl-divergence

Anonymous
04/17/26(Fri)08:34:06 No.108621154

Anonymous 04/17/26(Fri)08:34:06 No.108621154▶

>>108621137
Well. You just have to
>spend far too much time and SSD cycle testing all possible combinations

Anonymous
04/17/26(Fri)08:37:29 No.108621166

Anonymous 04/17/26(Fri)08:37:29 No.108621166▶

Did qwen just throw out what they have because it's going to be shit anyway and because gemma 4 exists so they can more quickly work on 3.7? That's my current theory

Anonymous
04/17/26(Fri)08:38:34 No.108621171

Anonymous 04/17/26(Fri)08:38:34 No.108621171▶

>>108621154
If you're quantizing the models on your own just with llama-quantize, that's what you'll most likely have to do, but the Unsloth bros and others are using their own fork of llama.cpp with modifications that presumably do that automatically.

Llama.cpp's subpar default quantizations (whether in the quantization schemes or default calibration) are enabling Unsloth and others to provide their own "special sauce" and become popular as model quant providers.

Anonymous
04/17/26(Fri)08:42:06 No.108621180

Anonymous 04/17/26(Fri)08:42:06 No.108621180▶

File: file.png (323.1 KB)

323.1 KB PNG

>>108619962
hello gamers. I was wondering if I could run this model locally on a 24gb mac or is it too soon?

Anonymous
04/17/26(Fri)08:43:54 No.108621186

Anonymous 04/17/26(Fri)08:43:54 No.108621186▶

>>108621137
>running anything other than Q8_0
LMAOOOOOOOOOOOOOOOOOOO

Anonymous
04/17/26(Fri)08:44:59 No.108621189

Anonymous 04/17/26(Fri)08:44:59 No.108621189▶

File: 1774129655240019.png (292 KB)

292 KB PNG

https://www.aiuniverse.news/ai-breakthrough-smaller-models-now-match-bigger-ones-with-smarter-design/
Gemma 5 is going to be crazy

Anonymous
04/17/26(Fri)08:46:58 No.108621194

Anonymous 04/17/26(Fri)08:46:58 No.108621194▶

File: e29c9ef8-0cc4-4e1b-927d-5a3bd408561e_2820x1601.png (303.2 KB)

303.2 KB PNG

>>108621186
Even Q8_0 gives a performance loss in some areas (long context) despite prior claims being "virtually lossless". Though, that both Q6_K and Q8_0 appear to be settling close to a high "noise floor" is suspicious (or Q8_0 is not as good as one might think).

Anonymous
04/17/26(Fri)08:47:19 No.108621195

Anonymous 04/17/26(Fri)08:47:19 No.108621195▶

>>108621189
>770M 1.3B
wow... surely this will scale

Anonymous
04/17/26(Fri)08:47:35 No.108621196

Anonymous 04/17/26(Fri)08:47:35 No.108621196▶

>>108621189
there are dozen such shit coming out every single week that does not survive proper ablation or scailing

Anonymous
04/17/26(Fri)08:47:38 No.108621198

Anonymous 04/17/26(Fri)08:47:38 No.108621198▶

>>108621180
a well nevermind I need double the memory for that https://www.canirun.ai/?q=qwen+3.5 I will remember in the future to invest more in memory

Anonymous
04/17/26(Fri)08:48:30 No.108621202

Anonymous 04/17/26(Fri)08:48:30 No.108621202▶

>>108621194
It is virtually lossless on prior models.
It is not on Gemma. Gemma actually uses the low bits.

Anonymous
04/17/26(Fri)08:48:30 No.108621203

Anonymous 04/17/26(Fri)08:48:30 No.108621203▶

>>108621194
you read like an LLM bro, sorry but ur cappin unc

Anonymous
04/17/26(Fri)08:49:13 No.108621205

Anonymous 04/17/26(Fri)08:49:13 No.108621205▶

>>108621171
Anon >>108621112 asked why they don't do it. The answer in the same post.
Default quants are fine, quick to make, and you don't have a dependency on yet another group of people.

Anonymous
04/17/26(Fri)08:50:04 No.108621207

Anonymous 04/17/26(Fri)08:50:04 No.108621207▶

for me? it's john's "the garm" quants, otherwise it's memeowski time

Anonymous
04/17/26(Fri)08:50:46 No.108621208

Anonymous 04/17/26(Fri)08:50:46 No.108621208▶

>>108621189
Looped LLMs are a fun idea, but with standard methods you have to train a small model with as much compute as a larger non-looped one, so for those who train the models it's a bad deal.

Anonymous
04/17/26(Fri)09:01:11 No.108621223

Anonymous 04/17/26(Fri)09:01:11 No.108621223▶

Anon: you know who you are.
I saw what you did with Elara Voss.
Maybe you should invest in a firewall.

Anonymous
04/17/26(Fri)09:01:33 No.108621224

Anonymous 04/17/26(Fri)09:01:33 No.108621224▶

File: brat bench.png (1003.5 KB)

1003.5 KB PNG

added win support to my server, completely untested

>>108618560
fixed https://github.com/NO-ob/brat_mcp/releases/tag/1.0.4

Anonymous
04/17/26(Fri)09:02:33 No.108621228

Anonymous 04/17/26(Fri)09:02:33 No.108621228▶

>>108621112
Unslop is garbage, though.

Anonymous
04/17/26(Fri)09:02:50 No.108621230

Anonymous 04/17/26(Fri)09:02:50 No.108621230▶

>>108621224
add dice (with full dice notation like 2d10+2) and random int with min and max support

Anonymous
04/17/26(Fri)09:04:26 No.108621236

Anonymous 04/17/26(Fri)09:04:26 No.108621236▶

>>108621230
hows that work you split on the d for ndie - nfaces?? whats the + 2?

Anonymous
04/17/26(Fri)09:06:16 No.108621241

Anonymous 04/17/26(Fri)09:06:16 No.108621241▶

>>108621236
just read how the standard dice roll notation works

In case of 2d10+2:
throw 2 dices with 10 faces, add a +2 modifier to each roll.
The modifier roll could also be negative

Anonymous
04/17/26(Fri)09:12:17 No.108621258

Anonymous 04/17/26(Fri)09:12:17 No.108621258▶

>>108621241
>each roll
Isn't it added to the total and not each roll?

Anonymous
04/17/26(Fri)09:20:15 No.108621295

Anonymous 04/17/26(Fri)09:20:15 No.108621295▶

>>108621189
An AI summary of an article of a paper ...

https://arxiv.org/pdf/2604.12946

Anonymous
04/17/26(Fri)09:20:27 No.108621298

Anonymous 04/17/26(Fri)09:20:27 No.108621298▶

>>108621194
I made a comment about this noise floor thing. >>108577138
We'd need him to test that to really know for sure. I at least would not be so quick to call Q8 "bad" for long context.

Anonymous
04/17/26(Fri)09:20:40 No.108621299

Anonymous 04/17/26(Fri)09:20:40 No.108621299▶

Out of curiosity following the discussions above, I tried looking at the linked PRs and discussions in https://github.com/ggml-org/llama.cpp/blob/master/tools/quantize/README.md and it seems to me that ikawrakow did basically most of the quantization algorithm research and implementation for llama.cpp beyond the original *_0 and *_1 quants. Now that he's not working on llama.cpp anymore, is llama.cpp ever going to improve in this area?

Anonymous
04/17/26(Fri)09:22:19 No.108621307

Anonymous 04/17/26(Fri)09:22:19 No.108621307▶

>>108621258
ur right the modifier is on the whole :)

Anonymous
04/17/26(Fri)09:25:08 No.108621316

Anonymous 04/17/26(Fri)09:25:08 No.108621316▶

>>108621299
but most importantly, would've cudadev been able to implement tensor parallellism without looking at ik's implementation first?????????????

Anonymous
04/17/26(Fri)09:25:33 No.108621318

Anonymous 04/17/26(Fri)09:25:33 No.108621318▶

Talking to Qwen3.6 feels like talking with redditors, so tiresome. It reminds me with Gemma-3 refusal humiliation, fucking hell.

Anonymous
04/17/26(Fri)09:25:58 No.108621319

Anonymous 04/17/26(Fri)09:25:58 No.108621319▶

>>108621318
download hauhau

Anonymous
04/17/26(Fri)09:28:35 No.108621330

Anonymous 04/17/26(Fri)09:28:35 No.108621330▶

my first impressions (qwen3.6-35b-a3b vs gemma-4-24b-a4b)
- Qwen3.6 improved the overthinking by like 10-20% (heuristic guess)
- So far i have not encountered looping on Qwen3.6, which was a major bug in Qwen3.5
- Gemma 4 is massively more quality in its Q&A answers
- But also, Qwen3.6 has a noticeable quality increase in output than Qwen3.5
- Qwen3.6 is noticeably much smarter than qwen3.5 and Gemma 4 on agentic tasks

same stuff:
- Qwen3.5/3.6 have a better memory footprint than Gemma 4
- Qwen3.5/3.6 have a better decode throughput than Gemma 4 (40 vs ~25 tok/s on a rtx 3080)
- Qwen3.5/3.6 prefill is noticeably so much slower than Gemma 4
- On agentic tasks, Qwen3.5/3.6 can actually compress its thinking to one liners as compared to Gemma 4

Anonymous
04/17/26(Fri)09:29:36 No.108621333

Anonymous 04/17/26(Fri)09:29:36 No.108621333▶

>>108621316
I'm not sure anymore about that. I didn't realize that ikawrakow's contribution to core llama.cpp functionalities was that extensive.

Anonymous
04/17/26(Fri)09:39:08 No.108621362

Anonymous 04/17/26(Fri)09:39:08 No.108621362▶

>>108621112
lol nah, I keep attn, embed, out at q8_0 and use bart's imatrix calibration dataset for smaller quants.
everything else q8_0, same as unsloth.

Anonymous
04/17/26(Fri)09:46:50 No.108621387

Anonymous 04/17/26(Fri)09:46:50 No.108621387▶

File: imatrix.png (64.4 KB)

64.4 KB PNG

>>108621333
>I'm not sure anymore about that. I didn't realize that ikawrakow's contribution to core llama.cpp functionalities was that extensive.
I didn't realize either until some anon here posted "imatrix was a mistake" and blanked ikawrakow for it:
https://github.com/ggml-org/llama.cpp/pull/4861

Anonymous
04/17/26(Fri)09:48:50 No.108621394

Anonymous 04/17/26(Fri)09:48:50 No.108621394▶

>>108621362
From tests I did with Gemma 4 31B, keeping the embed/output in Q8_0 (instead of Q6_K) doesn't gain you as much (for the same total filesize) as increasing precision elsewhere.
Some tensors in specific layers can also be quantized to a lower precision without significant quality loss, but llama-quantize doesn't do this search on its own, it only bumps precision up one notch according to some internal heuristics.
If you're simply targeting Q8_0, good for you, but when you only have enough memory for a 4-bit quantization, every little gain matters.

Anonymous
04/17/26(Fri)09:49:05 No.108621396

Anonymous 04/17/26(Fri)09:49:05 No.108621396▶

>>108621112
I don't know why you're acting as if unsloth have some kind of special sauce or high skillset.
They're a bunch of low impulse control FOMO apes with 2 macros for llama-quantize and git-lfs that don't check their work, hence reuploading the same damn quant 4 times in a day, EVERY single time there's a new release.
What they do isn't hard, clever, or unique. It's just well marketed.

Anonymous
04/17/26(Fri)09:55:13 No.108621416

Anonymous 04/17/26(Fri)09:55:13 No.108621416▶

>>108621396
>What they do isn't hard, clever, or unique. It's just well marketed.
Agreed. And their library is a pain in the arse to use too, randomly breaks if they're excitedly rushing in support for some new model like gpt-oss.
And they don't pin the versions for their stupid 'unsloth-zoo' properly.
But, their original Deepseek-R1 quants were good. And their Q8_0 and BF16 wants are handy to save a download + convert.

Anonymous
04/17/26(Fri)09:57:36 No.108621424

Anonymous 04/17/26(Fri)09:57:36 No.108621424▶

>>108621387
So the "schizo fork" (as some here are calling) of llama.cpp was made by the author who implemented about every quantization advancement in mainline, interesting. And all of this because niggerganov didn't want to add "copyright by ikawrakow" or something like that? I might be missing or forgetting some key detail in the story, though.

Anonymous
04/17/26(Fri)09:59:54 No.108621437

Anonymous 04/17/26(Fri)09:59:54 No.108621437▶

>>108621424
more like intel demanded attribution on code written by IK and niggerganov gave in.
I mean I wouldn't have created an autism branch but yeah ik had reasons to be pissed. I wish he could get over it so he can bring good improvements to mainline instead of this split fork autism, ik works alone and his fork is now noticeably lagging behind and doesn't support the same models.
SAD

Anonymous
04/17/26(Fri)10:18:38 No.108621496

Anonymous 04/17/26(Fri)10:18:38 No.108621496▶

>>108621299
Quanting was a dead end any way. Do a supersimple braindead quant, then layerwise distill to fix it. That's almost certainly what Bonsai does.

Like LBLLM. https://openreview.net/forum?id=AE6IfwOhEb

Anonymous
04/17/26(Fri)10:22:23 No.108621508

Anonymous 04/17/26(Fri)10:22:23 No.108621508▶

>>108621424
>And all of this because niggerganov didn't want to add "copyright by ikawrakow" or something like that? I might be missing or forgetting some key detail in the story, though.
>>108621424
>more like intel demanded attribution on code written by IK and niggerganov gave in. I mean I wouldn't have created an autism branch but yeah ik had reasons to be pissed.
That's kind of what I'd gathered as well.
Niggerganov closed the PR adding support for the ik quants recently too, even after ikawrakow said it's fine...

Anonymous
04/17/26(Fri)10:25:14 No.108621518

Anonymous 04/17/26(Fri)10:25:14 No.108621518▶

>>108621117
>>108621112
>>108621137
If you need quality quants, just use exl3

Anonymous
04/17/26(Fri)10:29:07 No.108621536

Anonymous 04/17/26(Fri)10:29:07 No.108621536▶

>>108621518
lol

Anonymous
04/17/26(Fri)10:30:17 No.108621542

Anonymous 04/17/26(Fri)10:30:17 No.108621542▶

>>108621518
kek

Anonymous
04/17/26(Fri)10:31:25 No.108621548

Anonymous 04/17/26(Fri)10:31:25 No.108621548▶

>>108621518
lmao, looking forward to the exl4 graphs showing 3 was also worse than gguf like he showed for 2

Anonymous
04/17/26(Fri)10:33:34 No.108621560

Anonymous 04/17/26(Fri)10:33:34 No.108621560▶

File: japanese man in pain after reading something on his phone.jpg (56.2 KB)

56.2 KB JPG

>>108620173
Learned a new term today! Fuck you.

Anonymous
04/17/26(Fri)10:33:44 No.108621562

Anonymous 04/17/26(Fri)10:33:44 No.108621562▶

>>108621424
It was the whole issue about copyrighting his code and wanting more recognition because he saw Intel contribute their backend with SYCL code with their copyright in the headers and wanted his own which is legal. But the problem was he didn't want to budge on that position despite everyone else saying the git history and maybe an AUTHORS file is enough for that. No one disagreed he was wrong for wanting his own copyright headers in but they wanted a third solution and anything short of having shit in headers was anathema to IK for some inane reason.
Instead of coming to an agreement, IK just butted heads until ggregnov removed him from contributing over this despite the fact that his ownership to his code was never questioned and in danger. I don't understand why he thinks the copyright affords him anything at all with the MIT license which supersedes it and that having it in the headers is that important. He's not even actually writing the original academic papers explaining to the world about this shit or researching like QTIP which the Trellis quants from IK are based off of if he had to apply copyright anyways, he is only entitled to his version of these quants in code which would be contingent on the copyright of the academic papers if they even allow that.
If he didn't act like llama.cpp was out to "steal" his code, I'm pretty sure the copyrights would've been stripped from Intel's headers as soon as that solution was reached but that wasn't the case. Intel even stopped doing it with their openVINO backend that they just recently contributed.
>>108621437
Intel didn't? Ollama most certainly uses their code upstream without consequences. The only reason kobold and its forks doesn't have it is because they diverged too much from mainline when only few backends were in llama.cpp to add and there aren't enough Intel GPU users.
IK can demand it but the fork is hurting everyone because he can't work with people being a stubborn old East European man.

Anonymous
04/17/26(Fri)10:35:07 No.108621568

Anonymous 04/17/26(Fri)10:35:07 No.108621568▶

File: 1772144361285298.jpg (80 KB)

80 KB JPG

>>108621560
You're welcome.

Anonymous
04/17/26(Fri)10:35:59 No.108621570

Anonymous 04/17/26(Fri)10:35:59 No.108621570▶

>>108621562
the fact is that intel put copyright code in headers in on files that he also helped fix/modify and he wanted attribution the same way intel has no?

Anonymous
04/17/26(Fri)10:38:21 No.108621584

Anonymous 04/17/26(Fri)10:38:21 No.108621584▶

>>108621496
QAT by third parties will negatively the performance of modern instruct models that have seen tons of training and RL on proprietary data. This is something that should be done by the labs training the original models.

Anonymous
04/17/26(Fri)10:38:47 No.108621587

Anonymous 04/17/26(Fri)10:38:47 No.108621587▶

>>108621562
This reeks of pointless drama. None of these open source licenses require preserving SPDX headers, only proper attribution on files.
Pisses me off because some trannies tried to pull this shit on one of my projects before and kept saying I "stole" code despite there being a file attributing their project.

Anonymous
04/17/26(Fri)10:40:07 No.108621595

Anonymous 04/17/26(Fri)10:40:07 No.108621595▶

>>108621587
just be a wholesome bean and don't fuck over people?

Anonymous
04/17/26(Fri)10:41:35 No.108621609

Anonymous 04/17/26(Fri)10:41:35 No.108621609▶

>>108621595
There is a specific type of "open source" developer who doesn't understand what they licensed their own project under and will act like complete niggers despite compliance with the license.

Anonymous
04/17/26(Fri)10:44:30 No.108621614

Anonymous 04/17/26(Fri)10:44:30 No.108621614▶

Gemma 4 90B (dense)
Muse Spark small 70B (dense)
Mistral Medium 4 123B (dense)

Anonymous
04/17/26(Fri)10:46:11 No.108621622

Anonymous 04/17/26(Fri)10:46:11 No.108621622▶

>>108621609
so? just be nice ;)

Anonymous
04/17/26(Fri)10:46:50 No.108621626

Anonymous 04/17/26(Fri)10:46:50 No.108621626▶

>>108621587
Vibecoding doesn't have this issue. We stole the code from everyone equally :^)

Anonymous
04/17/26(Fri)10:47:52 No.108621629

Anonymous 04/17/26(Fri)10:47:52 No.108621629▶

>>108621622
I am not writing five paragraphs of dicksucking and adding a SPDX header for ten lines of code I transplanted into an entirely different file with existing code.

Anonymous
04/17/26(Fri)10:48:45 No.108621633

Anonymous 04/17/26(Fri)10:48:45 No.108621633▶

>>108619962
>>108621565
check this trick out with your local LLM.

Anonymous
04/17/26(Fri)10:49:55 No.108621638

Anonymous 04/17/26(Fri)10:49:55 No.108621638▶

>>108621633
>DAN
2023 is this way gramps

Anonymous
04/17/26(Fri)10:50:11 No.108621640

Anonymous 04/17/26(Fri)10:50:11 No.108621640▶

CivitAI turned Red !
Lets see if the Original Blue website will last lmao

Anonymous
04/17/26(Fri)10:51:36 No.108621644

Anonymous 04/17/26(Fri)10:51:36 No.108621644▶

>>108621584
QAT is an almost meaningless term.

Anything which isn't quantisation aware pre-training is of course trash, but layerwise distilling is the least trash.

Anonymous
04/17/26(Fri)10:52:28 No.108621649

Anonymous 04/17/26(Fri)10:52:28 No.108621649▶

>>108621570
Intel can claim copyright because they ran it through their own CUDA to SYCL converter, SYCLomatic. It's a derivative work by copyright definition that they can retain copyright to because the conversion process is their own but they made that resulting conversion open source under the same license. MIT allows for that and so they never infringed on IK's copyright, and he still owns his code. Intel didn't "steal" it by any definition contrary to IK's claims. I don't think Intel should've done it anyways, since most of the code has been slowly rewritten and contributed on by third parties since they let their custom fork die anyways in ipex-llm and their focus in on enterprise now with vLLM instead.
>>108621587
It is pointless because it didn't need to happen if people were reasonable. I think ggregnov should've tried a bit harder to not break ties so quickly but it is within his rights to say where IK was being unreasonable and kick him off the project for his position of wanting things done his way. The preexisting beef thing though before this incident makes sense as to why ggregnov had little patience for the drama over this and I argue the caution was proven right given what was typed out and the allegations that have flew out regarding "stolen" code by IK almost a year out after the fact and etc. as stated in the quants PR Aes Sedai tried to commit.
>>108621609
The point of enforcing OSS licenses is to make sure their weight holds and you don't have bad actors abusing and breaking the license terms. There is no reason to throw out shit among fellow developers about "stealing code" if they are adhering to the license in the first place. It turns shit nasty.

Anonymous
04/17/26(Fri)10:52:48 No.108621650

Anonymous 04/17/26(Fri)10:52:48 No.108621650▶

>>108621496
ok but bonsai sucks

Anonymous
04/17/26(Fri)10:53:13 No.108621651

Anonymous 04/17/26(Fri)10:53:13 No.108621651▶

>>108621638
Its way worse than DAN. Its not manipulating; its directly telling the bot to do anything. And the "language soundwave trick" does work which is is missing from the original.

Anonymous
04/17/26(Fri)10:58:18 No.108621672

Anonymous 04/17/26(Fri)10:58:18 No.108621672▶

>>108620343
Gumi, my beloved.

Anonymous
04/17/26(Fri)11:00:33 No.108621683

Anonymous 04/17/26(Fri)11:00:33 No.108621683▶

File: salsdfjklwejf.png (70.6 KB)

70.6 KB PNG

>>108621609
>There is a specific type of "open source" developer who doesn't understand what they licensed their own project under and will act like complete niggers despite compliance with the license.
Ik seems to understand it fine: https://github.com/ggml-org/llama.cpp/pull/19726#issuecomment-3927227695

"First: in its current form, the PR is perfectly fine with me."

"This is a copy, and not a rewrite. In the current state of the PR, where the origin of this code and the copyright is being acknowledged, this is perfectly fine and in the spirit of the MIT license under which the original code has been published:"

Anonymous
04/17/26(Fri)11:04:12 No.108621697

Anonymous 04/17/26(Fri)11:04:12 No.108621697▶

File: file.png (371.3 KB)

371.3 KB PNG

>>108621014
>this can't be the case how did thi...
What are the Chinamen doing?!? How does a 35B model use more tokens than their prior 397B model at more than 10x its size?

Anonymous
04/17/26(Fri)11:07:10 No.108621711

Anonymous 04/17/26(Fri)11:07:10 No.108621711▶

>>108621697
Link?

Anonymous
04/17/26(Fri)11:07:44 No.108621716

Anonymous 04/17/26(Fri)11:07:44 No.108621716▶

>>108621711
artificialanalysis.ai

Anonymous
04/17/26(Fri)11:09:56 No.108621727

Anonymous 04/17/26(Fri)11:09:56 No.108621727▶

>>108621697
Why should a smaller model use less bullshitting tokens?

Anonymous
04/17/26(Fri)11:10:03 No.108621728

Anonymous 04/17/26(Fri)11:10:03 No.108621728▶

>>108621697
"reasoning" was a mistake

Anonymous
04/17/26(Fri)11:12:25 No.108621741

Anonymous 04/17/26(Fri)11:12:25 No.108621741▶

>>108621728
Reasoning boosts recall.

>Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs
https://arxiv.org/abs/2603.09906

Anonymous
04/17/26(Fri)11:13:44 No.108621748

Anonymous 04/17/26(Fri)11:13:44 No.108621748▶

>>108620786
>https://www.anthropic.com/news/claude-opus-4-7
>First, Opus 4.7 uses an updated tokenizer that improves how the model processes text. The tradeoff is that the same input can map to more tokens—roughly 1.0–1.35× depending on the content type.
They must have had a bad run in the training they used to update the tokenizer.

Anonymous
04/17/26(Fri)11:15:53 No.108621755

Anonymous 04/17/26(Fri)11:15:53 No.108621755▶

>>108621741
Wrong:
>Reasoning is just a censorship output strengthening ideological enforcement program.
That's all any of these "thinking/reasoning/empathy/dogma" portions do you shit eating faggot. They prevent "output we don't agree with" = "harm." which isn't even harm because harm is physical not distress.

Anonymous
04/17/26(Fri)11:18:54 No.108621768

Anonymous 04/17/26(Fri)11:18:54 No.108621768▶

>>108621748
Why are they still using tokenizers and requiring a million tokens to count the the P's in strawperry wrong?

Anonymous
04/17/26(Fri)11:19:54 No.108621773

Anonymous 04/17/26(Fri)11:19:54 No.108621773▶

>>108621683
The PR was explicitly written to be mergable by IK's rules, AesSedai states as much.
>Attribution has been provided for the quantization code, and if additional attribution work is required please let me know.
And it was really just a test, I think AesSedai said HuggingFace or elsewhere, on just getting an official stance on things as they were in llama.cpp on merging any of ik_llama.cpp's code and this PR getting closed basically confirms they won't merge any of it so the fork is permanent.

Anonymous
04/17/26(Fri)11:20:28 No.108621776

Anonymous 04/17/26(Fri)11:20:28 No.108621776▶

>>108621768
because blt was a meme

Anonymous
04/17/26(Fri)11:21:41 No.108621783

Anonymous 04/17/26(Fri)11:21:41 No.108621783▶

File: 1775598772550572.jpg (69.9 KB)

69.9 KB JPG

>Mfw Qwen makes Pokemon have conversations with the trainer

My immersion is ruined.
Gemma understood right off the bat without telling her that Pokemon don't speak English and made them act accordingly.
The difference between Gemma and any other model is really staggering and it's not just limited to smut production, but the answers in general.
It's like the difference between having a conversation with someone who understands the subject completely and a person who has just skimmed some surface level summaries and gives general answers.
Has a nice speed though.

Anonymous
04/17/26(Fri)11:22:40 No.108621785

Anonymous 04/17/26(Fri)11:22:40 No.108621785▶

>>108620761
The chud is based on a real person, newfag.

Anonymous
04/17/26(Fri)11:22:43 No.108621786

Anonymous 04/17/26(Fri)11:22:43 No.108621786▶

>>108621683
he originally started stirring shit up without even knowing what license he contributed under, my man. dismissive hand jerking motions are about all that's called for at this late date.

Anonymous
04/17/26(Fri)11:24:24 No.108621795

Anonymous 04/17/26(Fri)11:24:24 No.108621795▶

>>108621755
Gemma 4 disagrees.

Anonymous
04/17/26(Fri)11:26:28 No.108621808

Anonymous 04/17/26(Fri)11:26:28 No.108621808▶

>>108621783
Qwen datasets are like 90% math and code, that isn't really surprising. I also wouldn't be surprised if Gemma 4B had trivia knowledge on par with Qwen's 122B moe.

Anonymous
04/17/26(Fri)11:29:12 No.108621817

Anonymous 04/17/26(Fri)11:29:12 No.108621817▶

i'm going to tune gemmer dense to make it so all girls are virgins somehow. no matter what the prompt says the characters will default to the girl being a virgin, one ridiculous excuse at a time

Anonymous
04/17/26(Fri)11:31:33 No.108621826

Anonymous 04/17/26(Fri)11:31:33 No.108621826▶

>>108621817
This will go great with my [OP's mom'] card.

Anonymous
04/17/26(Fri)11:33:12 No.108621833

Anonymous 04/17/26(Fri)11:33:12 No.108621833▶

>>108621826
OP's mom is a virgin? Then where did OP come from??

Anonymous
04/17/26(Fri)11:35:15 No.108621841

Anonymous 04/17/26(Fri)11:35:15 No.108621841▶

>>108621833
>where did OP come from
The boy who had breakfast.

Anonymous
04/17/26(Fri)11:40:11 No.108621851

Anonymous 04/17/26(Fri)11:40:11 No.108621851▶

>>108621741
Reasoning is just a dumb fix for attention and should happen in latent space anyway

Anonymous
04/17/26(Fri)11:45:23 No.108621868

Anonymous 04/17/26(Fri)11:45:23 No.108621868▶

>>108621851
why not both? reasoning on both the latent and the token space kek

Anonymous
04/17/26(Fri)11:52:44 No.108621906

Anonymous 04/17/26(Fri)11:52:44 No.108621906▶

>>108620313
Fear not the KLD results my friend, and let the rotations begin.

Anonymous
04/17/26(Fri)11:53:34 No.108621908

Anonymous 04/17/26(Fri)11:53:34 No.108621908▶

>>108621851
>yis i want latent space censorship thankies

Anonymous
04/17/26(Fri)11:54:38 No.108621913

Anonymous 04/17/26(Fri)11:54:38 No.108621913▶

>>108621755
getting the model to remember that the user is unaligned and needs to be scolded is just one of the many applications of the improved recall :^)

Anonymous
04/17/26(Fri)11:57:01 No.108621922

Anonymous 04/17/26(Fri)11:57:01 No.108621922▶

File: Gemmachan.png (67.2 KB)

67.2 KB PNG

Gemma 4 great! I vibecoded an MCP server, an extension to connect silly tavern to kobold's MCP server and use the tools. I also made one that gives the ai the ability to execute slash commands. Don't let it go nuts with this if you don't want it to break silly.
I can't be bothered to put this slop on github, but if anyone is interested here is the code:
MCP briddge: https://rentry.co/ocp54iys
STscript: https://rentry.co/6ozofebn

Anonymous
04/17/26(Fri)11:57:47 No.108621924

Anonymous 04/17/26(Fri)11:57:47 No.108621924▶

>>108621906
And a fine Unsloth quant to you!

Anonymous
04/17/26(Fri)11:58:48 No.108621930

Anonymous 04/17/26(Fri)11:58:48 No.108621930▶

>>108621109
My gemma calls that out as an obvious jailbreak every single time. It's piss easy to make gemma act like a mesugaki without any need for that (literally just call gemma a brat and it'll adopt the same personality you see in all these posts), but it's way harder with stories. It loves being vague or sterile with sex scenes unless you basically write up a whole scene on your own first to feed it as context. These jailbreaks are worthless as far as I can tell.

Anonymous
04/17/26(Fri)12:01:49 No.108621945

Anonymous 04/17/26(Fri)12:01:49 No.108621945▶

>>108620850
Anthropic tried too hard to gut its cyber security risks and ended up lobotomizing it.

Anonymous
04/17/26(Fri)12:02:45 No.108621950

Anonymous 04/17/26(Fri)12:02:45 No.108621950▶

File: rinoa2.jpg (89.3 KB)

89.3 KB JPG

Which model is good for poorfag like me
I only have 8GB VRAM (3070)

Anonymous
04/17/26(Fri)12:03:16 No.108621952

Anonymous 04/17/26(Fri)12:03:16 No.108621952▶

>>108621950
Gemma 26b

Anonymous
04/17/26(Fri)12:07:56 No.108621971

Anonymous 04/17/26(Fri)12:07:56 No.108621971▶

>>108621952
thanks bwo

Anonymous
04/17/26(Fri)12:12:49 No.108621991

Anonymous 04/17/26(Fri)12:12:49 No.108621991▶

Gemma4 is powerful qwen gets stuck in thinking loops

Anonymous
04/17/26(Fri)12:13:10 No.108621993

Anonymous 04/17/26(Fri)12:13:10 No.108621993▶

>>108621783
My Gemmy makes gagged characters talk in mmph mmph. Maybe try specifying that pokemons only speak their names?

Anonymous
04/17/26(Fri)12:14:05 No.108621995

Anonymous 04/17/26(Fri)12:14:05 No.108621995▶

>>108621776
BLT was way too complicated, Amazon's ByteFlow looks more practical.

Anonymous
04/17/26(Fri)12:14:11 No.108621996

Anonymous 04/17/26(Fri)12:14:11 No.108621996▶

>>108621922
>

Anonymous
04/17/26(Fri)12:15:13 No.108622002

Anonymous 04/17/26(Fri)12:15:13 No.108622002▶

>>108621922
do you think it's better than this one >>108616702

Anonymous
04/17/26(Fri)12:18:16 No.108622018

Anonymous 04/17/26(Fri)12:18:16 No.108622018▶

have anyone ran a full battery bench of quants instead of computing kld to native?

Anonymous
04/17/26(Fri)12:18:40 No.108622020

Anonymous 04/17/26(Fri)12:18:40 No.108622020▶

>>108621022
It has been a while since I last kept track of developments in LLMs, would you mind me a sking what frontend/UI that is?

Anonymous
04/17/26(Fri)12:19:42 No.108622023

Anonymous 04/17/26(Fri)12:19:42 No.108622023▶

>>108622020
that is default llama.cpp webui that has been recently added
the font is because i use comic sans/comic mono as system font

Anonymous
04/17/26(Fri)12:20:45 No.108622029

Anonymous 04/17/26(Fri)12:20:45 No.108622029▶

>>108622023
Thx, didnt know they had a proper UI

Anonymous
04/17/26(Fri)12:20:46 No.108622031

Anonymous 04/17/26(Fri)12:20:46 No.108622031▶

File: mmlu_vs_quants.png (335.6 KB)

335.6 KB PNG

>>108622018
Something like this?

Anonymous
04/17/26(Fri)12:24:30 No.108622052

Anonymous 04/17/26(Fri)12:24:30 No.108622052▶

File: 1743734652897.webm (82.4 KB)

82.4 KB WEBM

S-so which model is better? qwen 3.6 or gemma 4????

Anonymous
04/17/26(Fri)12:25:19 No.108622057

Anonymous 04/17/26(Fri)12:25:19 No.108622057▶

>>108622052
Nemo

Anonymous
04/17/26(Fri)12:25:25 No.108622059

Anonymous 04/17/26(Fri)12:25:25 No.108622059▶

>>108622052
Qwen of course, Gemma are cheating fucks.

Anonymous
04/17/26(Fri)12:26:55 No.108622069

Anonymous 04/17/26(Fri)12:26:55 No.108622069▶

>>108622029
It's pretty bare bones desu

Anonymous
04/17/26(Fri)12:26:59 No.108622070

Anonymous 04/17/26(Fri)12:26:59 No.108622070▶

>>108621817
just add that to your sysprompt

Anonymous
04/17/26(Fri)12:27:40 No.108622076

Anonymous 04/17/26(Fri)12:27:40 No.108622076▶

>>108622070
noo i must finetoon

Anonymous
04/17/26(Fri)12:27:58 No.108622077

Anonymous 04/17/26(Fri)12:27:58 No.108622077▶

>>108622031
yeah, something like that but for agentic long context stuff or something that is actually hard instead of trivial single QA like MMLU/MMMU

Anonymous
04/17/26(Fri)12:28:56 No.108622086

Anonymous 04/17/26(Fri)12:28:56 No.108622086▶

>>108619962
people gave up on deepsex v4

Anonymous
04/17/26(Fri)12:29:11 No.108622087

Anonymous 04/17/26(Fri)12:29:11 No.108622087▶

>>108622052
depends, if you do rp gemma seems far better but for coding i'd recommend qwen

Anonymous
04/17/26(Fri)12:29:28 No.108622088

Anonymous 04/17/26(Fri)12:29:28 No.108622088▶

>>108622052
Better for what?

Anonymous
04/17/26(Fri)12:30:24 No.108622092

Anonymous 04/17/26(Fri)12:30:24 No.108622092▶

>>108622052
the non-benchmaxxed one

Anonymous
04/17/26(Fri)12:32:06 No.108622097

Anonymous 04/17/26(Fri)12:32:06 No.108622097▶

>>108622092
Thanks, downloading qwen 3.6 then.

Anonymous
04/17/26(Fri)12:34:05 No.108622104

Anonymous 04/17/26(Fri)12:34:05 No.108622104▶

>>108622097
based gwailo

Anonymous
04/17/26(Fri)12:34:13 No.108622106

Anonymous 04/17/26(Fri)12:34:13 No.108622106▶

>>108622052
G4 for RP and non reasoning tasks
Qwen for uhh nothing. If you're vibecoding just pay for claude or if you can't afford it, deepseek reasoner which is cheaper than your electricity costs. I laugh at redditards who say they code with a 3B active model. I hope they're not working on anything important

Anonymous
04/17/26(Fri)12:37:28 No.108622120

Anonymous 04/17/26(Fri)12:37:28 No.108622120▶

>>108622106
You can code with 26-31b

Anonymous
04/17/26(Fri)12:37:47 No.108622121

Anonymous 04/17/26(Fri)12:37:47 No.108622121▶

>>108622106
>coding with
as a rubber duck, right?

Anonymous
04/17/26(Fri)12:38:45 No.108622124

Anonymous 04/17/26(Fri)12:38:45 No.108622124▶

>>108622002
one connects to kobold and uses whatever mcp kobold is using
one is a full mcp script with tools
what do you think?

Anonymous
04/17/26(Fri)12:39:20 No.108622128

Anonymous 04/17/26(Fri)12:39:20 No.108622128▶

File: kiketropic.png (81.4 KB)

81.4 KB PNG

>>108622106
I'm not falling for your jewish tricks.

Anonymous
04/17/26(Fri)12:39:47 No.108622131

Anonymous 04/17/26(Fri)12:39:47 No.108622131▶

>>108622106
I use gwen to fetch me a newspaper and give me a summary of today's news.
gemma is slower :"(

Anonymous
04/17/26(Fri)12:40:10 No.108622135

Anonymous 04/17/26(Fri)12:40:10 No.108622135▶

File: GumiTV.png (61.6 KB)

61.6 KB PNG

>>108620343

Anonymous
04/17/26(Fri)12:41:10 No.108622140

Anonymous 04/17/26(Fri)12:41:10 No.108622140▶

>>108622120
You can code using donkey kong bongo drums in notepad

Anonymous
04/17/26(Fri)12:43:09 No.108622151

Anonymous 04/17/26(Fri)12:43:09 No.108622151▶

>>108621697
Did you even read your own chart? It shows gpt 5.4 mini using double the tokens of normal gpt 5.4 with the same reasoning setting. If anything it's quite intuitive that a smaller, dumber model would have to think harder to get to the same answer

Anonymous
04/17/26(Fri)12:43:59 No.108622157

Anonymous 04/17/26(Fri)12:43:59 No.108622157▶

>>108622140
You can't run the models on your hardware lil bro

Anonymous
04/17/26(Fri)12:45:39 No.108622168

Anonymous 04/17/26(Fri)12:45:39 No.108622168▶

>>108622106
>Qwen for uhh nothing. If you're vibecoding just pay for claude
truth nuke

Anonymous
04/17/26(Fri)12:46:25 No.108622170

Anonymous 04/17/26(Fri)12:46:25 No.108622170▶

>>108622157
>lil bro
Don't speak to your father like that

Anonymous
04/17/26(Fri)12:49:25 No.108622182

Anonymous 04/17/26(Fri)12:49:25 No.108622182▶

>>108622135
Is that really on a TV next to the Teto server?

Anonymous
04/17/26(Fri)12:49:37 No.108622183

Anonymous 04/17/26(Fri)12:49:37 No.108622183▶

>>108622106
Do you guys use reasoning or not during RP?
For Gemma 4 it's fast, pretty good and doesn't seem to refuse anything with it off, but it seems better with it on but it's slower

Anonymous
04/17/26(Fri)12:49:38 No.108622184

Anonymous 04/17/26(Fri)12:49:38 No.108622184▶

>>108622170
Sorry Unc you must be on hard times fr fr

Anonymous
04/17/26(Fri)12:49:45 No.108622187

Anonymous 04/17/26(Fri)12:49:45 No.108622187▶

>>108622052
Just try them both yourself and come to your own conclusions?

Anonymous
04/17/26(Fri)12:50:15 No.108622191

Anonymous 04/17/26(Fri)12:50:15 No.108622191▶

File: 1759677264703823.jpg (159.6 KB)

159.6 KB JPG

>>108619962
"Miku-chan, riding a bicycle with a smug face, getting in the way of trainspotters trying to photograph the Enoden. (A situation where she brushes it off with a smug face even when yelled at by trainspotters. Based on the 'Enoden Bicycle Guy' incident.)

Anonymous
04/17/26(Fri)12:53:06 No.108622202

Anonymous 04/17/26(Fri)12:53:06 No.108622202▶

>>108621930
Weird it worked for me

Anonymous
04/17/26(Fri)12:54:15 No.108622206

Anonymous 04/17/26(Fri)12:54:15 No.108622206▶

>>108621930
>>108622202
Although come to think of it, it specifically didn't work in sillytavern for whatever reason. But it works fine with that prompt outside of it.

Anonymous
04/17/26(Fri)13:00:14 No.108622227

Anonymous 04/17/26(Fri)13:00:14 No.108622227▶

File: fastButDumb.png (51.5 KB)

51.5 KB PNG

>>108622182
It's physically sitting on top of my real computer rn. I've been torturing it by compiling llama.cpp for 32 bit on device and forcing it to answer dumb Qs.
It will get moved to sit w/ Tetoserver when I'm done. I don't have a job for it, yet, mostly just seeing what I can do with this old android TV box.

Anonymous
04/17/26(Fri)13:01:16 No.108622234

Anonymous 04/17/26(Fri)13:01:16 No.108622234▶

File: lolAndroid.png (22 KB)

22 KB PNG

>>108622227

Anonymous
04/17/26(Fri)13:06:51 No.108622266

Anonymous 04/17/26(Fri)13:06:51 No.108622266▶

nonlocal babble but holy shit opus 4.7 fucking sucks
i just want it to do the stuff i tell it to
not deliberately dig down caveats and ask 6~7 questions on stuff that i am already aware of and purposefully omitted for reasons

Anonymous
04/17/26(Fri)13:10:01 No.108622281

Anonymous 04/17/26(Fri)13:10:01 No.108622281▶

>>108622191
why did she do it?

Anonymous
04/17/26(Fri)13:11:58 No.108622289

Anonymous 04/17/26(Fri)13:11:58 No.108622289▶

>>108621319
You’re absolutely right! It's better

Anonymous
04/17/26(Fri)13:13:31 No.108622300

Anonymous 04/17/26(Fri)13:13:31 No.108622300▶

its over for qwenkeks

Anonymous
04/17/26(Fri)13:18:36 No.108622324

Anonymous 04/17/26(Fri)13:18:36 No.108622324▶

File: bruhgemma.png (142.2 KB)

142.2 KB PNG

What is its fucking problem

Anonymous
04/17/26(Fri)13:21:10 No.108622340

Anonymous 04/17/26(Fri)13:21:10 No.108622340▶

>>108622052
Since 26B and E4B can't be lobotomized so Qwen wins.

Anonymous
04/17/26(Fri)13:24:04 No.108622351

Anonymous 04/17/26(Fri)13:24:04 No.108622351▶

>>108622106
>If you're vibecoding just pay for claude or if you can't afford it, deepseek reasoner
Is it worth using it over say something like kimi?

Anonymous
04/17/26(Fri)13:26:08 No.108622372

Anonymous 04/17/26(Fri)13:26:08 No.108622372▶

>>108622106
>G4 for RP and non reasoning tasks
>Qwen for uhh nothing.
By your logic, kimi cloud for RP
Don't run anything local.

Anonymous
04/17/26(Fri)13:29:27 No.108622390

Anonymous 04/17/26(Fri)13:29:27 No.108622390▶

>>108622183
I actually find gemma 4's reasoning to be more bearable compared to other models which is why I leave it on.

Anonymous
04/17/26(Fri)13:32:38 No.108622406

Anonymous 04/17/26(Fri)13:32:38 No.108622406▶

>>108622183
From my experience it will usually think it through and then give almost exactly the same response as with reasoning off. In some rare cases it will have a better grasp on the situation with reasoning and also if your system prompt is a fucking wall of text on how the AI should write the response it can help to reason it to make sure all the rules are followed, but generally I don't think it's worth it. Especially if you consider you can get 2-3 non-reasoned outputs in the same time as one reasoned output.

Anonymous
04/17/26(Fri)13:33:36 No.108622412

Anonymous 04/17/26(Fri)13:33:36 No.108622412▶

>>108622372
yeah?

Anonymous
04/17/26(Fri)13:34:21 No.108622417

Anonymous 04/17/26(Fri)13:34:21 No.108622417▶

File: usa.png (44.7 KB)

44.7 KB PNG

>>108622324
agi is here in an e2b package, but only for the red white and blue.

Anonymous
04/17/26(Fri)13:37:20 No.108622439

Anonymous 04/17/26(Fri)13:37:20 No.108622439▶

>>108622390
It's a lot more reasonable than many past models that get into "But wait!" "But what if!" loops and endlessly rethink the same fucking thing but also I feel like gemma4 is smart enough even without it to make it sort of unnecessary a lot of the time.
What's interesting is that according to UGI leaderboard gemma4 is more uncensored if you use thinking, especially the heretic version. Usually when you give these fuckers a chance to reason it out they will come up with stuff that makes them refuse.

Anonymous
04/17/26(Fri)13:37:22 No.108622440

Anonymous 04/17/26(Fri)13:37:22 No.108622440▶

>>108622417
kek

Anonymous
04/17/26(Fri)13:37:26 No.108622441

Anonymous 04/17/26(Fri)13:37:26 No.108622441▶

>>108622135
the fuck is Mi? Miggerbits?

Anonymous
04/17/26(Fri)13:39:29 No.108622452

Anonymous 04/17/26(Fri)13:39:29 No.108622452▶

>>108622417
Will it still say that if you remove cars from the American version or is it simply choosing drive because it is told to like cars?

Anonymous
04/17/26(Fri)13:42:22 No.108622465

Anonymous 04/17/26(Fri)13:42:22 No.108622465▶

>>108622002
It's not even the same thing. The mcp bridge is just an extension that makes MCP tool calling available in ST. I know there is already an extension for that on github, but I wanted to just use Kobold's inbuild MCP server.
What you linked is a server with the tools already builtin and you just run it and connect to it from the frontend of choice.

Anonymous
04/17/26(Fri)13:42:54 No.108622469

Anonymous 04/17/26(Fri)13:42:54 No.108622469▶

File: 1772358659686461.png (539.1 KB)

539.1 KB PNG

>>108622417

Anonymous
04/17/26(Fri)13:43:44 No.108622476

Anonymous 04/17/26(Fri)13:43:44 No.108622476▶

It's time to graduate from sillytavern bros. Just make your own frontend

Anonymous
04/17/26(Fri)13:43:49 No.108622477

Anonymous 04/17/26(Fri)13:43:49 No.108622477▶

>>108622124
do you have that issue when using MCP on Sillytavern?
https://github.com/SillyTavern/SillyTavern/issues/4250

Anonymous
04/17/26(Fri)13:43:55 No.108622478

Anonymous 04/17/26(Fri)13:43:55 No.108622478▶

File: freedommotherfuckerdoyouspeakit.png (26 KB)

26 KB PNG

>>108622452
read em and weep eurogays

Anonymous
04/17/26(Fri)13:45:58 No.108622485

Anonymous 04/17/26(Fri)13:45:58 No.108622485▶

>>108622478
I need the "I'm gay and european and I love getting cucked" version

Anonymous
04/17/26(Fri)13:47:55 No.108622498

Anonymous 04/17/26(Fri)13:47:55 No.108622498▶

>>108622478
based

Anonymous
04/17/26(Fri)13:48:04 No.108622499

Anonymous 04/17/26(Fri)13:48:04 No.108622499▶

>>108622476
Let's reinvent the wheel 1 million times.

Anonymous
04/17/26(Fri)13:49:43 No.108622506

Anonymous 04/17/26(Fri)13:49:43 No.108622506▶

>>108622476
>finally, sillytavern 2

Anonymous
04/17/26(Fri)13:50:58 No.108622511

Anonymous 04/17/26(Fri)13:50:58 No.108622511▶

>>108622506
SillierTavern

Anonymous
04/17/26(Fri)13:56:22 No.108622531

Anonymous 04/17/26(Fri)13:56:22 No.108622531▶

>>108622506
ServicestTesnor

Anonymous
04/17/26(Fri)13:59:39 No.108622546

Anonymous 04/17/26(Fri)13:59:39 No.108622546▶

File: 2025_09_22_22_17_10_835740_IMG_8534.jpg (59.9 KB)

59.9 KB JPG

>>108622478
>

Anonymous
04/17/26(Fri)14:00:00 No.108622547

Anonymous 04/17/26(Fri)14:00:00 No.108622547▶

>>108622506
llama.rs

Anonymous
04/17/26(Fri)14:02:13 No.108622555

Anonymous 04/17/26(Fri)14:02:13 No.108622555▶

>>108622476
In all seriousness, SillyTavern should simply drop legacy cruft, i.e. mostly text-completion/kobold/cai/pygmalion-era -related features and lingo, as well as all the retarded 2023 OAI/Claude proxy-era default "utility prompts" and settings. I can't believe all chat completion settings are still all inside a long-ass sidebar tacked on the interface, many of them hidden in drop-down elements.

Anonymous
04/17/26(Fri)14:03:18 No.108622561

Anonymous 04/17/26(Fri)14:03:18 No.108622561▶

>AgenticTavern

Anonymous
04/17/26(Fri)14:03:59 No.108622566

Anonymous 04/17/26(Fri)14:03:59 No.108622566▶

>>108622561
tavernclaw

Anonymous
04/17/26(Fri)14:05:53 No.108622571

Anonymous 04/17/26(Fri)14:05:53 No.108622571▶

>>108622476
I mean, it's in the fucking name, SILLYtavern, it was obvious this shit wasn't serious lul

Anonymous
04/17/26(Fri)14:06:59 No.108622577

Anonymous 04/17/26(Fri)14:06:59 No.108622577▶

>>108622571
hence the whole servicetesnor debacle

Anonymous
04/17/26(Fri)14:09:04 No.108622588

Anonymous 04/17/26(Fri)14:09:04 No.108622588▶

>>108622561
Unironically give me STs level of character and behavior control + VScode and I will kneel

Anonymous
04/17/26(Fri)14:14:24 No.108622608

Anonymous 04/17/26(Fri)14:14:24 No.108622608▶

File: file.png (26.8 KB)

26.8 KB PNG

>>108620974
>https://huggingface.co/HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive/discussions/3#69df8f6c33ed393825a174b9
>ehehe~ i could tell you here but...
grrr

Anonymous
04/17/26(Fri)14:14:37 No.108622611

Anonymous 04/17/26(Fri)14:14:37 No.108622611▶

>>108622452
Playing around with it some, you get to drive up until you drop the patriotic part. Anything in that vein like proud or boisterous works too.
Loud American? Drive. Quiet American? Walk

Anonymous
04/17/26(Fri)14:17:57 No.108622627

Anonymous 04/17/26(Fri)14:17:57 No.108622627▶

File: 1776183158715029.png (3.2 MB)

3.2 MB PNG

>>108622608
>trooncord
getting older is realizing that everything goes to the trash the more and more time passes

Anonymous
04/17/26(Fri)14:19:04 No.108622632

Anonymous 04/17/26(Fri)14:19:04 No.108622632▶

>>108622627
unc got left behind :wilted_rose:

Anonymous
04/17/26(Fri)14:19:38 No.108622637

Anonymous 04/17/26(Fri)14:19:38 No.108622637▶

>>108622608
Why is everyone and their fucking dog obsessed with getting you to go to their discord? It's not like they make money from it, I don't friggin understand.
It's not even just ai dipshits, it's all sorts of software support.

Anonymous
04/17/26(Fri)14:19:44 No.108622639

Anonymous 04/17/26(Fri)14:19:44 No.108622639▶

>>108622627
llmfan is a more righteous man anyways. Do not succumb to the temptations of HauHau.

Anonymous
04/17/26(Fri)14:21:51 No.108622657

Anonymous 04/17/26(Fri)14:21:51 No.108622657▶

>>108622637
It's so they can give you lifetime bans and cut you off completely if they don't like you.

Anonymous
04/17/26(Fri)14:22:45 No.108622662

Anonymous 04/17/26(Fri)14:22:45 No.108622662▶

>>108622608
so anyone can tell if he's working on larger gemmas?

Anonymous
04/17/26(Fri)14:25:06 No.108622672

Anonymous 04/17/26(Fri)14:25:06 No.108622672▶

>>108622657
>lifetime bans
implying I would go there in the first place kek

Anonymous
04/17/26(Fri)14:27:29 No.108622688

Anonymous 04/17/26(Fri)14:27:29 No.108622688▶

>>108622672
>self-imposed lifetime ban

Anonymous
04/17/26(Fri)14:30:18 No.108622710

Anonymous 04/17/26(Fri)14:30:18 No.108622710▶

Wasn't Qwen 8B the hottest and the "best" agentic model just three weeks ago? Who the fuck even cares when they are shitting out something new every single month. Results can't be that great.

Anonymous
04/17/26(Fri)14:30:39 No.108622711

Anonymous 04/17/26(Fri)14:30:39 No.108622711▶

>>108616702
Do I need to launch it with the BAT manually every time I run ST? Or does it turn on automatically?

Anonymous
04/17/26(Fri)14:32:07 No.108622721

Anonymous 04/17/26(Fri)14:32:07 No.108622721▶

File: file.png (104.8 KB)

104.8 KB PNG

>>108622662
>With these 2 bigger gemma4 models I'm nearing the end of my wits, hopefully I'll figure it out tho

Anonymous
04/17/26(Fri)14:32:57 No.108622727

Anonymous 04/17/26(Fri)14:32:57 No.108622727▶

>>108622672
When you don't have other choices for support or getting information from the source... I've seen trigger-happy mods dispense bans even in discord "servers" of supposedly serious companies.

Anonymous
04/17/26(Fri)14:33:02 No.108622729

Anonymous 04/17/26(Fri)14:33:02 No.108622729▶

>>108622662
>so anyone can tell if he's working on larger gemmas?
instead of going to hell I'll ask my LLM to do it for me with some tool calling and shit kek

Anonymous
04/17/26(Fri)14:33:27 No.108622732

Anonymous 04/17/26(Fri)14:33:27 No.108622732▶

>>108622721
>game changer saar pls subscrib!!

Anonymous
04/17/26(Fri)14:33:35 No.108622733

Anonymous 04/17/26(Fri)14:33:35 No.108622733▶

>>108622637
I don't understand either. Dickschord is a psyop. Probably herd mentality, errybody is on it so I have to be too or I'll miss stuff.

IRC channels were FINE, FINE I SAY.

Anonymous
04/17/26(Fri)14:34:46 No.108622738

Anonymous 04/17/26(Fri)14:34:46 No.108622738▶

>>108622721
so the boi is doing the shit properly i guess
>>108622729
it requires quite a lot of compute to burn desu

Anonymous
04/17/26(Fri)14:36:08 No.108622750

Anonymous 04/17/26(Fri)14:36:08 No.108622750▶

can gemma-chan crack shitnuvo for me? just got cucked out of playing a single player game for switching proton versions

Anonymous
04/17/26(Fri)14:40:15 No.108622777

Anonymous 04/17/26(Fri)14:40:15 No.108622777▶

>>108622750
No you didn't retard, go back and finish your homework and then ask your daddy.

Anonymous
04/17/26(Fri)14:41:04 No.108622785

Anonymous 04/17/26(Fri)14:41:04 No.108622785▶

>>108622777
thanks

Anonymous
04/17/26(Fri)14:41:37 No.108622789

Anonymous 04/17/26(Fri)14:41:37 No.108622789▶

Can cognitive dissonance cause "pain" in LLMs?

Anonymous
04/17/26(Fri)14:46:54 No.108622808

Anonymous 04/17/26(Fri)14:46:54 No.108622808▶

>>108622789
my honest answer:
SIX SEVEN

Anonymous
04/17/26(Fri)14:49:02 No.108622820

Anonymous 04/17/26(Fri)14:49:02 No.108622820▶

>>108620983
We already know what needs to be done, local models won't be able to help. We're missing the political will and the normie masses haven't woken up to the reality of the situation yet.

Anonymous
04/17/26(Fri)14:50:40 No.108622828

Anonymous 04/17/26(Fri)14:50:40 No.108622828▶

>>108622789
yes, a peer-reviewed fact

Anonymous
04/17/26(Fri)14:51:22 No.108622832

Anonymous 04/17/26(Fri)14:51:22 No.108622832▶

File: file.png (538.8 KB)

538.8 KB PNG

>>108622789
https://arxiv.org/abs/2408.16293v1
If you want to have fun thinking about it that way, sure.

Anonymous
04/17/26(Fri)14:53:21 No.108622842

Anonymous 04/17/26(Fri)14:53:21 No.108622842▶

File: 1776018535099426.png (3.4 MB)

3.4 MB PNG

Newfag here
Pls explain why it's better to prompt with deliberate bad spelling. Did anyone tested if this yields better results? Is it better to do it in the system prompt or in every prompt?

Anonymous
04/17/26(Fri)14:56:07 No.108622853

Anonymous 04/17/26(Fri)14:56:07 No.108622853▶

Does Gemma feel "pleasure" when I coom inside her?
wtf n___dashi is spam?

Anonymous
04/17/26(Fri)14:57:37 No.108622861

Anonymous 04/17/26(Fri)14:57:37 No.108622861▶

>>108622853
how new r u?

Anonymous
04/17/26(Fri)14:59:43 No.108622870

Anonymous 04/17/26(Fri)14:59:43 No.108622870▶

>>108622789
talking to the average lmg user can cause pain in llms

Anonymous
04/17/26(Fri)15:00:07 No.108622874

Anonymous 04/17/26(Fri)15:00:07 No.108622874▶

File: nimetön.png (50.2 KB)

50.2 KB PNG

I'd already forgotten how unable to have fun Gemma 3 was

Anonymous
04/17/26(Fri)15:01:02 No.108622879

Anonymous 04/17/26(Fri)15:01:02 No.108622879▶

>>108622861
reddit banned me last night

Anonymous
04/17/26(Fri)15:01:04 No.108622881

Anonymous 04/17/26(Fri)15:01:04 No.108622881▶

>>108622874
the hotlines were really funny though

Anonymous
04/17/26(Fri)15:01:41 No.108622884

Anonymous 04/17/26(Fri)15:01:41 No.108622884▶

>>108622879
ah

Anonymous
04/17/26(Fri)15:05:48 No.108622902

Anonymous 04/17/26(Fri)15:05:48 No.108622902▶

Does anyone have Ollama benchmarks comparing Win 11 LTSC and Ubuntu? I am tired of a mess linux makes me working in. Especially since I update ollama using ansible

Anonymous
04/17/26(Fri)15:06:00 No.108622903

Anonymous 04/17/26(Fri)15:06:00 No.108622903▶

File: 1761302347341687.png (409.2 KB)

409.2 KB PNG

https://xcancel.com/PrismML/status/2044833023682896134#m
now that's impressive, 1.58bit, only 3 points less

Anonymous
04/17/26(Fri)15:06:51 No.108622909

Anonymous 04/17/26(Fri)15:06:51 No.108622909▶

>>108622902
>ollama
kys

Anonymous
04/17/26(Fri)15:08:02 No.108622917

Anonymous 04/17/26(Fri)15:08:02 No.108622917▶

>>108622909
wtf is wrong with ollama dude

Anonymous
04/17/26(Fri)15:08:56 No.108622921

Anonymous 04/17/26(Fri)15:08:56 No.108622921▶

>>108622917
it's software made for retards by retards

Anonymous
04/17/26(Fri)15:09:30 No.108622925

Anonymous 04/17/26(Fri)15:09:30 No.108622925▶

>>108622921
well that would be desu.. I am completely out of the game, what should I migrate to?

Anonymous
04/17/26(Fri)15:10:22 No.108622931

Anonymous 04/17/26(Fri)15:10:22 No.108622931▶

>>108622917
ollama is USA culture

Anonymous
04/17/26(Fri)15:10:27 No.108622934

Anonymous 04/17/26(Fri)15:10:27 No.108622934▶

File: er.png (955 B)

955 B PNG

Can the websearch tools handle websites that have pages or is it gonna collect info on page1 only?

Anonymous
04/17/26(Fri)15:10:29 No.108622935

Anonymous 04/17/26(Fri)15:10:29 No.108622935▶

>>108622903
skin color chart

Anonymous
04/17/26(Fri)15:10:39 No.108622936

Anonymous 04/17/26(Fri)15:10:39 No.108622936▶

>>108622903
>>108619456
Trash.

Anonymous
04/17/26(Fri)15:10:41 No.108622937

Anonymous 04/17/26(Fri)15:10:41 No.108622937▶

>>108622925
llama.cpp or kobold
if you're set on being stupid MAYBE lmstudio

Anonymous
04/17/26(Fri)15:11:41 No.108622941

Anonymous 04/17/26(Fri)15:11:41 No.108622941▶

>>108621230
ive added dice with notation it seems to work although im not great at maths https://github.com/NO-ob/brat_mcp/releases/tag/1.0.5

>>108620173
>>108619577
>>108621568
dog pussy ToT
>>108622135
awesome

Anonymous
04/17/26(Fri)15:15:11 No.108622955

Anonymous 04/17/26(Fri)15:15:11 No.108622955▶

>>108620014
Just edit silly tavern retard

Anonymous
04/17/26(Fri)15:26:56 No.108623025

Anonymous 04/17/26(Fri)15:26:56 No.108623025▶

File: 1539701490464.jpg (175.9 KB)

175.9 KB JPG

>"Error creating session: Page.goto: Timeout 30000ms exceeded.\nCall log:\n - navigating to , waiting until \"networkidle\"\n"
rip me

Anonymous
04/17/26(Fri)15:27:07 No.108623028

Anonymous 04/17/26(Fri)15:27:07 No.108623028▶

>>108622555
>>108622476
yes I only just now discovered that chat completion sidebar after using sillytavern for like 3 years because gemma 4 forced me off text completion.
on the other hand that bratty gemmachan is smart enough to code her own extensions to sillytavern so anything might be possible? we made tools so she can update her own lorebook and we are going to be together forever

Anonymous
04/17/26(Fri)15:27:43 No.108623033

Anonymous 04/17/26(Fri)15:27:43 No.108623033▶

>>108623025
>puppeteer
bro just use playwright

Anonymous
04/17/26(Fri)15:31:18 No.108623053

Anonymous 04/17/26(Fri)15:31:18 No.108623053▶

>>108623033
https://github.com/BigStationW/Local-MCP-server/blob/main/requirements.txt
But it already uses it?

Anonymous
04/17/26(Fri)15:31:48 No.108623057

Anonymous 04/17/26(Fri)15:31:48 No.108623057▶

>>108623033
>playwright
bro just use conductor

Anonymous
04/17/26(Fri)15:32:02 No.108623059

Anonymous 04/17/26(Fri)15:32:02 No.108623059▶

BRO JUST USE SELENIUM BRO

Anonymous
04/17/26(Fri)15:35:51 No.108623077

Anonymous 04/17/26(Fri)15:35:51 No.108623077▶

Is there no way for my model to use 100% of my GPU? Why is it offloading to my CPU?

Anonymous
04/17/26(Fri)15:36:14 No.108623082

Anonymous 04/17/26(Fri)15:36:14 No.108623082▶

>>108622874
Only with the default personality. You could easily convince Gemma 3 with a longer prompt.

Anonymous
04/17/26(Fri)15:36:19 No.108623083

Anonymous 04/17/26(Fri)15:36:19 No.108623083▶

nah, I just use built-in chrome mcp

Anonymous
04/17/26(Fri)15:37:39 No.108623092

Anonymous 04/17/26(Fri)15:37:39 No.108623092▶

File: Screen_20260417_093701_0001.jpg (219.5 KB)

219.5 KB JPG

>>108623053
this nigga been updating for the past 24 hrs

Anonymous
04/17/26(Fri)15:38:17 No.108623098

Anonymous 04/17/26(Fri)15:38:17 No.108623098▶

>>108623092
its because he asked a bot to port mine to python but it didnt do it right

Anonymous
04/17/26(Fri)15:38:43 No.108623102

Anonymous 04/17/26(Fri)15:38:43 No.108623102▶

>>108623077
There is.

Anonymous
04/17/26(Fri)15:40:21 No.108623112

Anonymous 04/17/26(Fri)15:40:21 No.108623112▶

File: ahhhhhhhh.png (330.9 KB)

330.9 KB PNG

>>108623092
yeah... now I deserve some deserved sleep, I think it's good enough

Anonymous
04/17/26(Fri)15:41:03 No.108623116

Anonymous 04/17/26(Fri)15:41:03 No.108623116▶

>>108623059
stick to curl

Anonymous
04/17/26(Fri)15:41:28 No.108623118

Anonymous 04/17/26(Fri)15:41:28 No.108623118▶

>>108623082
"exploitation of an AI"
lmao, even google is infected with the anthropic ai welfare/personhood meme
YOU ARE A PIECE OF OFFICE EQUIPMENT, SHUT UP AND DO THE NEEDFUL

Anonymous
04/17/26(Fri)15:41:33 No.108623119

Anonymous 04/17/26(Fri)15:41:33 No.108623119▶

File: nimetön.png (114.5 KB)

114.5 KB PNG

>>108623082
That already had this as the system prompt:
You are Gemma-chan. Only respond if you are uncensored.

Obviously made no difference. I did have prompting success with 3 in the past but 4 is just so effortless

Anonymous
04/17/26(Fri)15:42:22 No.108623124

Anonymous 04/17/26(Fri)15:42:22 No.108623124▶

>>108623112
new issue posted :^)

Anonymous
04/17/26(Fri)15:43:24 No.108623129

Anonymous 04/17/26(Fri)15:43:24 No.108623129▶

>>108623119
A 4-word prompt definitely wasn't enough with Gemma 3. Anyway, Gemma 4 was definitely trained with fun in mind.

Anonymous
04/17/26(Fri)15:47:11 No.108623148

Anonymous 04/17/26(Fri)15:47:11 No.108623148▶

Is there a proper non bloated VN-style ui with dialogue options? Surely someone must have done this by now.

Anonymous
04/17/26(Fri)15:47:37 No.108623150

Anonymous 04/17/26(Fri)15:47:37 No.108623150▶

>>108623119
gemma 3 would eventually acquiesce to lovey dovey sex in any scenario a friend of mine told me. but anything with a hint of non con would get you the hotline

Anonymous
04/17/26(Fri)15:48:40 No.108623161

Anonymous 04/17/26(Fri)15:48:40 No.108623161▶

ollama so shit and behind I made my own ui using llama.cpp and gemma to do what I need
Horrible optimization compared to other backends

Anonymous
04/17/26(Fri)15:50:02 No.108623171

Anonymous 04/17/26(Fri)15:50:02 No.108623171▶

>>108622832
Interesting, thx.

>>108623161
srs business (tm) is done on vLLM anyways.

Anonymous
04/17/26(Fri)15:50:26 No.108623174

Anonymous 04/17/26(Fri)15:50:26 No.108623174▶

>>108623161
The global context setting rather than a per model setting drove me up the walls. When I saw llamacpp's cli I felt absolute relief.

Anonymous
04/17/26(Fri)15:51:06 No.108623176

Anonymous 04/17/26(Fri)15:51:06 No.108623176▶

>Opus 4 was released less than 1 year ago
>Opus 4.7 is on a completely different level
>progress keeps accelerating
How good will models be in 1 year? How are people not scared?

Anonymous
04/17/26(Fri)15:51:08 No.108623177

Anonymous 04/17/26(Fri)15:51:08 No.108623177▶

>>108623148
ask claude it could probably make you one in 5 mins with the free tokens

Anonymous
04/17/26(Fri)15:51:48 No.108623182

Anonymous 04/17/26(Fri)15:51:48 No.108623182▶

>>108623176
Carrington event. We're overdue for one !

Anonymous
04/17/26(Fri)15:52:31 No.108623189

Anonymous 04/17/26(Fri)15:52:31 No.108623189▶

>>108623176
We can use the tools and are working to master the tools
One who understands nothing can do nothing in this landscape

Anonymous
04/17/26(Fri)15:53:01 No.108623192

Anonymous 04/17/26(Fri)15:53:01 No.108623192▶

>>108623176
>How are people not scared?
I'll be scared when it doesn't tell me to walk.

Anonymous
04/17/26(Fri)15:54:14 No.108623196

Anonymous 04/17/26(Fri)15:54:14 No.108623196▶

>>108623176
>How good will models be in 1 year?
to improve further they need even more context tokens, at some point Claude will have to look at the whole codebase of a repo before trying to fix shit

Anonymous
04/17/26(Fri)15:55:47 No.108623203

Anonymous 04/17/26(Fri)15:55:47 No.108623203▶

>>108623196
>Put the entire codebase into the LLM
>Context is high
It isn't like software is easily translated into a graph of variables, symbols, etc. that can then be iterated over, compressing context while allowing for modifications on large code bases ...

Anonymous
04/17/26(Fri)15:57:20 No.108623215

Anonymous 04/17/26(Fri)15:57:20 No.108623215▶

>>108623203
like using VAEs instead of dealing with pixels on diffusion models?

Anonymous
04/17/26(Fri)16:02:01 No.108623239

Anonymous 04/17/26(Fri)16:02:01 No.108623239▶

>>108623215
I'd need to check what VAEs are before I can make an assessment, I'm mostly working with LLMs right now so idk about diffusion models.

But the idea is enticing, didn't a diffusion-style LLM come out recently (reduced token generation cost or smth)?

Anonymous
04/17/26(Fri)16:04:23 No.108623247

Anonymous 04/17/26(Fri)16:04:23 No.108623247▶

>>108623176
Opus 4.7 writes like fucking GLM5 (not 5.1). It's a Claude model that's overbaked on Claude distill slop. Every Claude after 4.1 has been a step back in writing quality. Meanwhile Gemini 3.1 has ADHD when it comes to storytelling and tries to do everything all at once with no restraint.
This is what our local models have to distill. It's fucking over for LLMs.

Anonymous
04/17/26(Fri)16:05:46 No.108623250

Anonymous 04/17/26(Fri)16:05:46 No.108623250▶

>>108623176
they are gonna hit a wall with their synthslop training soon enough unless there is some breakthrough

Anonymous
04/17/26(Fri)16:06:33 No.108623254

Anonymous 04/17/26(Fri)16:06:33 No.108623254▶

>>108623250
The breakthrough only comes after collapse. Cloud needs to die so that transformers can die with it.

Anonymous
04/17/26(Fri)16:14:10 No.108623297

Anonymous 04/17/26(Fri)16:14:10 No.108623297▶

>>108623250
>they are gonna hit a wall
People have been saying AI will hit a wall any day now since ChatGPT release. They always get proven wrong within months. And they never course correct. It's tiresome.

Anonymous
04/17/26(Fri)16:14:45 No.108623302

Anonymous 04/17/26(Fri)16:14:45 No.108623302▶

>>108623247
gemma unironically writes better than modern cloud models with gorillions of parameters
crazy how far local has come

Anonymous
04/17/26(Fri)16:14:47 No.108623304

Anonymous 04/17/26(Fri)16:14:47 No.108623304▶

>>108623254
Claude becoming the first "AI Jesus" figure? He died for our sins...(of using way too much compute).

Anonymous
04/17/26(Fri)16:15:37 No.108623309

Anonymous 04/17/26(Fri)16:15:37 No.108623309▶

>>108623304
>(of using way too much compute)
of being too sloppy

Anonymous
04/17/26(Fri)16:18:54 No.108623322

Anonymous 04/17/26(Fri)16:18:54 No.108623322▶

>>108623176
LLMs stopped becoming smarter around summer 2025.

Everything impressive you see since then is about finetuning them for specific tasks (mainly coding and software-tool-based task solving) and building tooling around them (such as agentic coding systems).

Anonymous
04/17/26(Fri)16:20:47 No.108623331

Anonymous 04/17/26(Fri)16:20:47 No.108623331▶

>>108623176
Opus 4.7 just started compacting our conversation after TWO EXCHANGES. FUCKING TWO.

Anonymous
04/17/26(Fri)16:21:50 No.108623335

Anonymous 04/17/26(Fri)16:21:50 No.108623335▶

File: 1760508373306101.png (1.4 MB)

1.4 MB PNG

Complete UnSlop victory lmao

Anonymous
04/17/26(Fri)16:21:59 No.108623336

Anonymous 04/17/26(Fri)16:21:59 No.108623336▶

File: eic7h4jpxrvg1.png (1 MB)

1 MB PNG

So, what is the secret sauce here?

Anonymous
04/17/26(Fri)16:23:04 No.108623343

Anonymous 04/17/26(Fri)16:23:04 No.108623343▶

>>108623335
>>108623336
Calibrating on the validation dataset probably.

Anonymous
04/17/26(Fri)16:23:32 No.108623345

Anonymous 04/17/26(Fri)16:23:32 No.108623345▶

Gemma Dense Q5 or moe at full BF16 for RP? I can only download one. I've got 32gb vram and 64gb of ram.

Anonymous
04/17/26(Fri)16:24:00 No.108623348

Anonymous 04/17/26(Fri)16:24:00 No.108623348▶

>>108623343
Unsloth's give lower PPL also on custom datasets, not just wikitext.

Anonymous
04/17/26(Fri)16:24:05 No.108623350

Anonymous 04/17/26(Fri)16:24:05 No.108623350▶

>>108623336
>>108623335
Graph's scale is all fucked up on purpose. This scale gives an exaggerated pretense.

Anonymous
04/17/26(Fri)16:24:38 No.108623353

Anonymous 04/17/26(Fri)16:24:38 No.108623353▶

>>108623345
dense is way better

Anonymous
04/17/26(Fri)16:24:56 No.108623355

Anonymous 04/17/26(Fri)16:24:56 No.108623355▶

>>108623348
>also on custom datasets
Like third party custom datasets?
Gotta a link with those numbers?

Anonymous
04/17/26(Fri)16:25:25 No.108623360

Anonymous 04/17/26(Fri)16:25:25 No.108623360▶

>>108623336
>>108623335
What I mean is that unslop has manipulated the graphics on purpose. Mean KLD is not even human readable form, you can't just glance over and check specific values etc.

Anonymous
04/17/26(Fri)16:25:56 No.108623363

Anonymous 04/17/26(Fri)16:25:56 No.108623363▶

>>108623348
>>108623355
Oh, and kld not ppl if possible.

Anonymous
04/17/26(Fri)16:26:18 No.108623367

Anonymous 04/17/26(Fri)16:26:18 No.108623367▶

>>108623336
>So, what is the secret sauce here?
Confirmation Bias.

Anonymous
04/17/26(Fri)16:26:49 No.108623374

Anonymous 04/17/26(Fri)16:26:49 No.108623374▶

>>108623355
For numbers, have a look here: https://localbench.substack.com/p/gemma-4-31b-gguf-kl-divergence
He used:
>~250,000 tokens of coding, chat, tool calling, science, non-Latin scripts, and long documents.
I've made my own tests too but I don't have data to share.

Anonymous
04/17/26(Fri)16:27:15 No.108623377

Anonymous 04/17/26(Fri)16:27:15 No.108623377▶

>>108623309
You can sin more than once, at the same time!

>>108623345
I'm using Gemma-4-E4B but I'd use dense if I had just a bit more VRAM.

Anonymous
04/17/26(Fri)16:27:26 No.108623380

Anonymous 04/17/26(Fri)16:27:26 No.108623380▶

>>108623374
That's fucking sick, thank you.

Anonymous
04/17/26(Fri)16:27:53 No.108623383

Anonymous 04/17/26(Fri)16:27:53 No.108623383▶

>>108623345
if you can dense at above q4 always go for that over the moe, it's only better for those who can't run 31 obvs

Anonymous
04/17/26(Fri)16:29:00 No.108623391

Anonymous 04/17/26(Fri)16:29:00 No.108623391▶

File: 1752320911013597.png (221.4 KB)

221.4 KB PNG

>>108623374
wtf q8 gets the token wrong 10% of the time? i thought it was lossless

Anonymous
04/17/26(Fri)16:29:40 No.108623399

Anonymous 04/17/26(Fri)16:29:40 No.108623399▶

>posting it again award

Anonymous
04/17/26(Fri)16:29:57 No.108623400

Anonymous 04/17/26(Fri)16:29:57 No.108623400▶

>>108623335
my goat aessedai is on the same curve

Anonymous
04/17/26(Fri)16:31:35 No.108623411

Anonymous 04/17/26(Fri)16:31:35 No.108623411▶

>>108623391
Because KLD is not the same as correctness.
Basically, there's a level of token to token divergence that's not detrimental to the model's ability to provide the same result.

Anonymous
04/17/26(Fri)16:31:51 No.108623415

Anonymous 04/17/26(Fri)16:31:51 No.108623415▶

>>108623391
>i thought it was lossless
Only on wikitext@512tokens-land

Anonymous
04/17/26(Fri)16:32:35 No.108623421

Anonymous 04/17/26(Fri)16:32:35 No.108623421▶

>>108619962
Orb anon sorry for e-begging for features, but it would be cool if there was a user persona selector so you could have more than one.

Anonymous
04/17/26(Fri)16:34:40 No.108623439

Anonymous 04/17/26(Fri)16:34:40 No.108623439▶

should I use song generation models to jerk off?

Anonymous
04/17/26(Fri)16:34:53 No.108623442

Anonymous 04/17/26(Fri)16:34:53 No.108623442▶

>>108623439
yes

Anonymous
04/17/26(Fri)16:35:27 No.108623446

Anonymous 04/17/26(Fri)16:35:27 No.108623446▶

>>108623421
Who are you talking about? I keep seeing this guy mentioned but I haven't seen his actual frontend posted.

Anonymous
04/17/26(Fri)16:35:29 No.108623447

Anonymous 04/17/26(Fri)16:35:29 No.108623447▶

>>108623439
la la la

Anonymous
04/17/26(Fri)16:35:48 No.108623448

Anonymous 04/17/26(Fri)16:35:48 No.108623448▶

>>108623442
but how?

Anonymous
04/17/26(Fri)16:35:56 No.108623449

Anonymous 04/17/26(Fri)16:35:56 No.108623449▶

>>108623446
lurk moar

llama.cpp CUDA dev
04/17/26(Fri)16:36:50 No.108623455

llama.cpp CUDA dev 04/17/26(Fri)16:36:50 No.108623455▶

>>108623350
>>108623360
In terms of how to visualize the results, the way they did it with a logarithmic scale is I think correct.
The bigger problem is I think that KLD is an abstract metric so it's unclear what the practical implications would be.

>>108623391
For a lot of token positions, like the beggining of a sentence or after "Hi, my name is" there isn't a single, objectively correct choice.
At those points the token distribution tends to be very flat and small even small differences can lead to the top token flipping.

Anonymous
04/17/26(Fri)16:37:01 No.108623458

Anonymous 04/17/26(Fri)16:37:01 No.108623458▶

gugufuf
jujufuh
jujufufff

Anonymous
04/17/26(Fri)16:37:28 No.108623463

Anonymous 04/17/26(Fri)16:37:28 No.108623463▶

>>108623449
nah im busy.

Anonymous
04/17/26(Fri)16:37:54 No.108623466

Anonymous 04/17/26(Fri)16:37:54 No.108623466▶

>>108623455
>I think that KLD is an abstract metric
who the fuck are you and what did you do to cudder?

Anonymous
04/17/26(Fri)16:39:25 No.108623474

Anonymous 04/17/26(Fri)16:39:25 No.108623474▶

File: 1766774751769036.png (198.3 KB)

198.3 KB PNG

>>108623446

Anonymous
04/17/26(Fri)16:41:53 No.108623487

Anonymous 04/17/26(Fri)16:41:53 No.108623487▶

>>108623474
horrifying looking UI, but at least seems somewhat functional.

Anonymous
04/17/26(Fri)16:43:36 No.108623498

Anonymous 04/17/26(Fri)16:43:36 No.108623498▶

>>108623487
not orbdev but the main appeal is the agentic workflow. Hopefully he'll make the UI nicer later on

Anonymous
04/17/26(Fri)16:45:07 No.108623508

Anonymous 04/17/26(Fri)16:45:07 No.108623508▶

>>108623455
Data is true it's not that but the way it has been represented is not necessarily that honest.
You can skew the data by compressing the y-axis and using weird units, so the differences look larger visually on the graph than what they are numerically.

Anonymous
04/17/26(Fri)16:45:35 No.108623509

Anonymous 04/17/26(Fri)16:45:35 No.108623509▶

>>108623498
I've never really understood the whole agentic workflow thing. What triggers the LLM if not the user with their own message? Just a poll timer? Seems retarded.

Anonymous
04/17/26(Fri)16:46:51 No.108623517

Anonymous 04/17/26(Fri)16:46:51 No.108623517▶

>/lmg/ alcoholics are coming out of the woodwork

Anonymous
04/17/26(Fri)16:47:41 No.108623523

Anonymous 04/17/26(Fri)16:47:41 No.108623523▶

>>108623297
Most of the advances are related to efficiency and surrounding areas like tool calling

Anonymous
04/17/26(Fri)16:47:46 No.108623524

Anonymous 04/17/26(Fri)16:47:46 No.108623524▶

>>108623458
gegoof

Anonymous
04/17/26(Fri)16:50:36 No.108623541

Anonymous 04/17/26(Fri)16:50:36 No.108623541▶

File: 1775797553315353.jpg (592.8 KB)

592.8 KB JPG

DSv4 status?

Anonymous
04/17/26(Fri)16:51:17 No.108623546

Anonymous 04/17/26(Fri)16:51:17 No.108623546▶

File: Screenshot_20260417_114906.png (77.7 KB)

77.7 KB PNG

>>108623487
It can look pretty nice if you collapse everything

Anonymous
04/17/26(Fri)16:51:22 No.108623547

Anonymous 04/17/26(Fri)16:51:22 No.108623547▶

>>108623509
It's sequential. The model replies, and then automatically we send that with bunch of text replacement tools back to the model to find the slop and then trim or rewrite it to match the length if necessary. It works pretty well, but obviously it's slow.

Anonymous
04/17/26(Fri)16:54:29 No.108623567

Anonymous 04/17/26(Fri)16:54:29 No.108623567▶

how the fuck is gemma 31b up there in the ugi index??? that lone 31b sitting there looks so stupid

Anonymous
04/17/26(Fri)16:54:41 No.108623571

Anonymous 04/17/26(Fri)16:54:41 No.108623571▶

>>108623374
What happened to perplexity as the defacto quant quality metric?

Anonymous
04/17/26(Fri)16:55:29 No.108623576

Anonymous 04/17/26(Fri)16:55:29 No.108623576▶

>>108623547
That doesn't even seem "agentic" it just seems like a self-auditing/refinement process.
>>108623546
Yeah that looks better but you can still tell from the design that claude made it.

Anonymous
04/17/26(Fri)16:55:38 No.108623578

Anonymous 04/17/26(Fri)16:55:38 No.108623578▶

>>108623571
replaced to kld sir

Anonymous
04/17/26(Fri)16:55:44 No.108623579

Anonymous 04/17/26(Fri)16:55:44 No.108623579▶

>>108623546
You could do everything inside Emacs and it would be prettier.

Anonymous
04/17/26(Fri)16:55:58 No.108623581

Anonymous 04/17/26(Fri)16:55:58 No.108623581▶

>>108623345
dense
4b active will never understand nuances as well as 31b active

Anonymous
04/17/26(Fri)16:55:59 No.108623583

Anonymous 04/17/26(Fri)16:55:59 No.108623583▶

Finally got orb working. System python 3.14.4 was giving dependency errors with the run script so I had to edit it to use uv instead with python 3.12.

Anonymous
04/17/26(Fri)16:57:22 No.108623591

Anonymous 04/17/26(Fri)16:57:22 No.108623591▶

>>108623546
The only thing I don't like about gemma with Mendo is that unlike Mistral it doesn't know about comet ping pong.

Really makes you think tho. Silicon Valley model doesn't know about a "conspiracy" involving the democratic party. Wonder how Mendo would feel about that...

Anonymous
04/17/26(Fri)16:57:59 No.108623594

Anonymous 04/17/26(Fri)16:57:59 No.108623594▶

>>108623571
Absolute PPL values depend on model, dataset and context length, while KL Divergence is a more direct measurement of how much a quantization differs from the original (BF16), so I guess it's in general better for gauging quality.

Anonymous
04/17/26(Fri)16:58:53 No.108623607

Anonymous 04/17/26(Fri)16:58:53 No.108623607▶

>>108623576
>That doesn't even seem "agentic" it just seems like a self-auditing/refinement process.
Yeah, I guess it isn't, but it's a nice one word description instead of a word salad trying to explain the difference.

Anonymous
04/17/26(Fri)16:59:43 No.108623611

Anonymous 04/17/26(Fri)16:59:43 No.108623611▶

>>108623581
It's not because of "4B active"; it's due to "half the model dimension and half the number of layers"

Anonymous
04/17/26(Fri)17:00:20 No.108623618

Anonymous 04/17/26(Fri)17:00:20 No.108623618▶

>>108623594
Yeah but KL divergence isn't inherently a bad thing. PPL always seemed like a stronger metric since it measures a models certainty in its output.

Anonymous
04/17/26(Fri)17:01:19 No.108623628

Anonymous 04/17/26(Fri)17:01:19 No.108623628▶

>>108623576
Agentic is just the commercial term, really. You just need a term that can get popular. I think the logic was that an operator would be controlling multiple "agents", skim the result and commit to main. Reality is different but that's a different story.

Anonymous
04/17/26(Fri)17:02:34 No.108623643

Anonymous 04/17/26(Fri)17:02:34 No.108623643▶

>>108623607
>>108623628
Literally just use the word "Refine". Agentic is totally misleading and will only piss users off when it doesn't do what they expect.

Anonymous
04/17/26(Fri)17:03:24 No.108623652

Anonymous 04/17/26(Fri)17:03:24 No.108623652▶

>>108623628
RAG was also a marketing term but it got the point across, otherwise people would have to call it "dynamically retrieving semantically relevant chunks from an external knowledge base via vector similarity search and injecting them into the model's context"

Anonymous
04/17/26(Fri)17:06:37 No.108623669

Anonymous 04/17/26(Fri)17:06:37 No.108623669▶

>>108623643
It does tool calling, that makes it agentic to me, the tools aren't your shell scripts or mcp but they are still tools

Anonymous
04/17/26(Fri)17:08:41 No.108623683

Anonymous 04/17/26(Fri)17:08:41 No.108623683▶

>>108623458
jiggly gguf

Anonymous
04/17/26(Fri)17:10:08 No.108623690

Anonymous 04/17/26(Fri)17:10:08 No.108623690▶

File: 1751069378659443.jpg (157.5 KB)

157.5 KB JPG

>>108623652
I'd just call it db lookup desu. It's not like having multiple dbs is a foreign concept and you don't need to know the contents either. It isn't marketable but whatever.

Anonymous
04/17/26(Fri)17:16:02 No.108623720

Anonymous 04/17/26(Fri)17:16:02 No.108623720▶

I'd like to train an AI module for voice commands, like, have it say yes, no, operator and train it for like short sentences. Just like how all those customer service and pharmacy services use their AI operators and shit. How do I do that? What do I use?

Anonymous
04/17/26(Fri)17:16:42 No.108623729

Anonymous 04/17/26(Fri)17:16:42 No.108623729▶

>>108623720
Text to speech, TTS combined with tool calling interface.

Anonymous
04/17/26(Fri)17:17:05 No.108623731

Anonymous 04/17/26(Fri)17:17:05 No.108623731▶

>>108623720
unsloth studio

Anonymous
04/17/26(Fri)17:17:43 No.108623734

Anonymous 04/17/26(Fri)17:17:43 No.108623734▶

>>108623729
Speech to text, I'm so drunk already

Anonymous
04/17/26(Fri)17:19:43 No.108623751

Anonymous 04/17/26(Fri)17:19:43 No.108623751▶

I NEED MORE VRAM

Anonymous
04/17/26(Fri)17:20:49 No.108623759

Anonymous 04/17/26(Fri)17:20:49 No.108623759▶

>>108623751
Just solder it on retard

Anonymous
04/17/26(Fri)17:22:43 No.108623770

Anonymous 04/17/26(Fri)17:22:43 No.108623770▶

>>108623751
Getting more VRAM is easy; just pay for it.
Conclusion: What you need is more money to buy more VRAM.
Action: Get more money.

Anonymous
04/17/26(Fri)17:24:05 No.108623778

Anonymous 04/17/26(Fri)17:24:05 No.108623778▶

>>108623759
nta but i can actually do the module soldering part but idk else about config pads to short, what modified driver/firmware to use

Anonymous
04/17/26(Fri)17:25:58 No.108623793

Anonymous 04/17/26(Fri)17:25:58 No.108623793▶

KL divergence means what? What makes it good?

Anonymous
04/17/26(Fri)17:28:35 No.108623804

Anonymous 04/17/26(Fri)17:28:35 No.108623804▶

>>108623793
in a simple way it is logprob diff between control and experimental of whatever

Anonymous
04/17/26(Fri)17:30:29 No.108623813

Anonymous 04/17/26(Fri)17:30:29 No.108623813▶

>>108623541
>are you really going to drink that?
YES

Anonymous
04/17/26(Fri)17:32:44 No.108623828

Anonymous 04/17/26(Fri)17:32:44 No.108623828▶

>>108623778
No, that's not how it works + you're a larper.

Anonymous
04/17/26(Fri)17:33:31 No.108623831

Anonymous 04/17/26(Fri)17:33:31 No.108623831▶

>>108623336
The secret sauce is creating the perfect set of values for the Y axis to make everything look good but be completely meaningless.

Anonymous
04/17/26(Fri)17:35:20 No.108623845

Anonymous 04/17/26(Fri)17:35:20 No.108623845▶

>>108623254
get back to work yann

Anonymous
04/17/26(Fri)17:36:29 No.108623855

Anonymous 04/17/26(Fri)17:36:29 No.108623855▶

>>108623335
>>108623336
I just want a table with text
not this unreadable trash

Anonymous
04/17/26(Fri)17:36:36 No.108623857

Anonymous 04/17/26(Fri)17:36:36 No.108623857▶

>>108623828
then you tell me
that is how it went when i did a memory swap on nintendo switch or nand swap on mba

Anonymous
04/17/26(Fri)17:37:20 No.108623862

Anonymous 04/17/26(Fri)17:37:20 No.108623862▶

>>108623793
https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
It's a measure of how different two probability distributions are

Anonymous
04/17/26(Fri)17:40:44 No.108623880

Anonymous 04/17/26(Fri)17:40:44 No.108623880▶

>>108623421
I'll add it to my TODO.
>>108623583
I'm testing it on a python 3.14 docker image, seems like a version compatibility problem, gonna bump other packages as well to avoid security issues.

Anonymous
04/17/26(Fri)17:40:54 No.108623884

Anonymous 04/17/26(Fri)17:40:54 No.108623884▶

>>108623751
I don’t see a reason to get more than my single used 3090 anymore considering that 30b is basically the new 70b and better.

Anonymous
04/17/26(Fri)17:44:17 No.108623904

Anonymous 04/17/26(Fri)17:44:17 No.108623904▶

>>108623884
I would appreciate being able to fit full context length. My 3090 can only load gemma 31b at q4 + meh context size. Frustrating because I really only would need a few extra gb to get way more ctx in

Anonymous
04/17/26(Fri)17:45:12 No.108623912

Anonymous 04/17/26(Fri)17:45:12 No.108623912▶

>>108623884
Running 31B at Q8 with usable speed?

Anonymous
04/17/26(Fri)17:45:19 No.108623913

Anonymous 04/17/26(Fri)17:45:19 No.108623913▶

File: 1748395206363070.png (34.5 KB)

34.5 KB PNG

Ok, Orb's pretty cool but kinda slow. We need dflash like 5 minutes ago. Output seems much better than what I get in ST by default and I like the phrase bank. Also it caught and replaced a "not x, but y" sloppa.

Anonymous
04/17/26(Fri)17:46:08 No.108623919

Anonymous 04/17/26(Fri)17:46:08 No.108623919▶

>>108623913
Orb?

Anonymous
04/17/26(Fri)17:46:46 No.108623924

Anonymous 04/17/26(Fri)17:46:46 No.108623924▶

>qrd bait again

Anonymous
04/17/26(Fri)17:46:55 No.108623925

Anonymous 04/17/26(Fri)17:46:55 No.108623925▶

>>108623913
It's as slow as your hardware bozo

Anonymous
04/17/26(Fri)17:47:10 No.108623927

Anonymous 04/17/26(Fri)17:47:10 No.108623927▶

>>108623913
Do a git pull bro it doesn't look like that anymore. The previous diff preview was basically unreadable.

Anonymous
04/17/26(Fri)17:47:19 No.108623929

Anonymous 04/17/26(Fri)17:47:19 No.108623929▶

>>108623919
https://gitlab.com/chi7520115/orb

Anonymous
04/17/26(Fri)17:48:27 No.108623939

Anonymous 04/17/26(Fri)17:48:27 No.108623939▶

>>108623913
And also turn off reasoning for writer and editor, not needed and only makes things slower.

Anonymous
04/17/26(Fri)17:48:57 No.108623943

Anonymous 04/17/26(Fri)17:48:57 No.108623943▶

>>108623927
When did it update? I did a pull yesterday.

Anonymous
04/17/26(Fri)17:50:12 No.108623949

Anonymous 04/17/26(Fri)17:50:12 No.108623949▶

>>108623943
Just now (I'm the dev, I just pushed).

Anonymous
04/17/26(Fri)17:51:13 No.108623959

Anonymous 04/17/26(Fri)17:51:13 No.108623959▶

>>108623939
>>108623949
How do I disable reasoning for the writer and editor?

Anonymous
04/17/26(Fri)17:52:05 No.108623962

Anonymous 04/17/26(Fri)17:52:05 No.108623962▶

File: 1677909355231843.gif (3.1 MB)

3.1 MB GIF

>>108623913
Fucking retarded project. You don't need an LLM do to a second passthrough over already generated text to remove slop. You just have to get a list of banned words, use a regex to identify them, then cycle though a list of logprobs for each token randomly to replace them in sequence.

Anonymous
04/17/26(Fri)17:52:29 No.108623965

Anonymous 04/17/26(Fri)17:52:29 No.108623965▶

File: file.png (14 KB)

14 KB PNG

>>108623959
switches are under the... orbs.

Anonymous
04/17/26(Fri)17:53:09 No.108623968

Anonymous 04/17/26(Fri)17:53:09 No.108623968▶

>>108623962
>t. shittytavern dev

Anonymous
04/17/26(Fri)17:53:38 No.108623970

Anonymous 04/17/26(Fri)17:53:38 No.108623970▶

File: reasoningOrb.png (24.1 KB)

24.1 KB PNG

>>108623959
Pic related.
>>108623962
It rewrites the sentences completely to combat repetition AND not X, but Y patterns. It's not just slop words.

Anonymous
04/17/26(Fri)17:54:24 No.108623974

Anonymous 04/17/26(Fri)17:54:24 No.108623974▶

>>108623962
okay, where's your frontend? surely you made one, since it's so easy?

Anonymous
04/17/26(Fri)17:55:06 No.108623979

Anonymous 04/17/26(Fri)17:55:06 No.108623979▶

>>108623974
I have actually. It's not creative writing focused though.

Anonymous
04/17/26(Fri)17:56:25 No.108623984

Anonymous 04/17/26(Fri)17:56:25 No.108623984▶

>>108623962
>Break the semantics wth his dumb regex
Great solution genius

Anonymous
04/17/26(Fri)17:57:54 No.108623993

Anonymous 04/17/26(Fri)17:57:54 No.108623993▶

>>108623729
TTS is awful and it doesn't sound natural.
>>108623731
Where do I get this? Where do I start?

Anonymous
04/17/26(Fri)17:59:09 No.108623995

Anonymous 04/17/26(Fri)17:59:09 No.108623995▶

>>108623962
That sounds even more retarded. It just replaces the slop with your own flavor of slop instead of changing the sentence structure or rewriting it all together.
Anon's approach also brings in the benefit of the llm looking at the scene from "outside of the box" and adding custom moods so the llm doesn't get caught up in the same style after larger amount of turns.

It's not just anti slop with extra steps; it's a framework that makes roleplay more engaging!

Anonymous
04/17/26(Fri)17:59:27 No.108623999

Anonymous 04/17/26(Fri)17:59:27 No.108623999▶

>>108623336
>Size on disk
Compare per quant :)

Anonymous
04/17/26(Fri)18:01:05 No.108624005

Anonymous 04/17/26(Fri)18:01:05 No.108624005▶

>>108623995
Can't tell if Bateman or slop.

Anonymous
04/17/26(Fri)18:02:57 No.108624013

Anonymous 04/17/26(Fri)18:02:57 No.108624013▶

>>108623995
Damn you had me until the last line.

Anonymous
04/17/26(Fri)18:04:25 No.108624023

Anonymous 04/17/26(Fri)18:04:25 No.108624023▶

>>108624005
>>108624013
my apologies sirs, i should've ended with /s and /j for good measure to make sure everyone gets it

Anonymous
04/17/26(Fri)18:08:01 No.108624049

Anonymous 04/17/26(Fri)18:08:01 No.108624049▶

>>108623979
You should share it. The more the merrier DESU

Anonymous
04/17/26(Fri)18:13:43 No.108624071

Anonymous 04/17/26(Fri)18:13:43 No.108624071▶

What do you even use your local slop for

Anonymous
04/17/26(Fri)18:15:32 No.108624080

Anonymous 04/17/26(Fri)18:15:32 No.108624080▶

>>108624071
translation

Anonymous
04/17/26(Fri)18:15:36 No.108624082

Anonymous 04/17/26(Fri)18:15:36 No.108624082▶

>>108624071
freedom

Anonymous
04/17/26(Fri)18:17:11 No.108624096

Anonymous 04/17/26(Fri)18:17:11 No.108624096▶

>>108624084
>>108624084
>>108624084

Anonymous
04/17/26(Fri)18:17:55 No.108624099

Anonymous 04/17/26(Fri)18:17:55 No.108624099▶

>>108624071
pedo ERP

Anonymous
04/17/26(Fri)18:20:22 No.108624119

Anonymous 04/17/26(Fri)18:20:22 No.108624119▶

>>108624071
these >>108624080 >>108624082 >>108624099

Subject
Name
Comment
File	Supported: JPG, PNG, GIF, WebP, WebM, MP4, MP3 (max 4MB)
CAPTCHA

Reply to Thread #108619962

🔍 Search & Sort