/g/ - Thread 107968112 | defchan Proxy

/g/

Thread #107968112 | Image & Video Expansion | Click to Play

Home Index Catalog All Threads New Thread Reply

Anonymous
/lmg/ - Local Models General 01/25/26(Sun)22:22:47 No.107968112

/lmg/ - Local Models General Anonymous 01/25/26(Sun)22:22:47 No.107968112 [Reply]▶

File: 814c2ff6-0685-4a3c-9fe0-af0e2dd91764.png (2.3 MB)

2.3 MB PNG

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107957082 & >>107948284

►News
>(01/25) Merged kv-cache : support V-less cache #19067: https://github.com/ggml-org/llama.cpp/pull/19067
>(01/22) Qwen3-TTS (0.6B & 1.8B) with voice design, cloning, and generation: https://qwen.ai/blog?id=qwen3tts-0115
>(01/21) Chroma-4B released: https://hf.co/FlashLabs/Chroma-4B
>(01/21) VibeVoice-ASR 9B released: https://hf.co/microsoft/VibeVoice-ASR
>(01/21) Step3-VL-10B with Parallel Coordinated Reasoning: https://hf.co/stepfun-ai/Step3-VL-10B
>(01/19) GLM-4.7-Flash 30B-A3B released: https://hf.co/zai-org/GLM-4.7-Flash

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

340 RepliesView Thread

Showing all 340 replies.

Anonymous
01/25/26(Sun)22:23:10 No.107968115

Anonymous 01/25/26(Sun)22:23:10 No.107968115▶

File: threadrecap.png (1.5 MB)

1.5 MB PNG

►Recent Highlights from the Previous Thread: >>107957082

--Paper: Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings:
>107960244 >107960358 >107960418 >107961005 >107960382 >107961422
--Multi-GPU setup strategies for cost-effective inference:
>107958416 >107958478 >107958532 >107958598 >107958970 >107958540 >107958611 >107958531 >107958632
--Qwen3 TTS installation challenges on Windows with Nvidia GPUs:
>107958660 >107958671 >107958685 >107958709 >107958783 >107958719 >107958782 >107964469 >107958753 >107958714 >107962549
--qwen-tts performance and compatibility issues in TTS applications:
>107958000 >107958013 >107958047 >107958501
--LLM struggle with deviating from genre tropes in constrained narratives:
>107959380 >107959410 >107959431 >107959440 >107959458
--Exploring AI interaction in Among Us-style games and survival simulations:
>107959425 >107959464 >107959483 >107959505 >107961126
--Challenges with book-based QA and context limitations:
>107964051 >107964287 >107964322 >107964354
--Optimizing llama.cpp for fast, low-VRAM 1-shot question answering:
>107963343 >107963394 >107963472 >107963529 >107963577 >107963655
--Speculation on Minimax-M2-HER and Mistral Small Creative model releases:
>107957396 >107957481 >107957543 >107957650 >107957598 >107957634
--MiniMax M2-her roleplay limitations:
>107962436 >107962501 >107962512 >107962654 >107962666
--llama.cpp PR reducing DeepSeek memory usage:
>107963328 >107963386
--Critique of TranslateGemma, recommendation of heretic Gemma 3 for uncensored JPEN translation:
>107961940 >107962800
--Vibevoice emotion tag functionality:
>107960489 >107960506
--LLM formatting and model preference debates:
>107966244 >107966357 >107966388 >107966534 >107966600
--Qwen voice cloning stability and context length issues:
>107961962 >107962660
--Miku (free space):

►Recent Highlight Posts from the Previous Thread: >>107957086

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
01/25/26(Sun)22:30:30 No.107968171

Anonymous 01/25/26(Sun)22:30:30 No.107968171▶

>>107968112
No Miku, I'm not letting you into the server room.

Anonymous
01/25/26(Sun)22:32:32 No.107968191

Anonymous 01/25/26(Sun)22:32:32 No.107968191▶

Considering engram is going to be the next big thing, lets talk about it and its consequences for local.

I'm trying to get a feel for how these future models will work on local machines but I'm a bit of a brainlet so I would appreciate some input from my betters. If I understand it correctly, around 20-25% of the model's parameter budget gets allocated to the engram memory module which is stored in RAM/NVMe, where performance scales linearly with the size of said memory model.

Obviously the computing requirements for running the model go down, but what does this mean for RAM/NVMe? Does this mean we'll be running huge models that sit in NVMe storage? Should I be buying as many NVMe drives as possible? Another thing to consider is the throughput. The paper claims there's only a 3% hit to throughput when using the hybrid engram architecture, but is that the case for only RAM or NVMe storage as well?

Anonymous
01/25/26(Sun)22:42:08 No.107968266

Anonymous 01/25/26(Sun)22:42:08 No.107968266▶

>>107968191
>Should I be buying as many NVMe drives as possible?
I have no idea what you're talking about but it's already too late, prices are skyrocketing. I got a 1TB nvme drive for like 80 bucks 3 years ago and now they're like 250 dollars

Anonymous
01/25/26(Sun)22:44:24 No.107968288

Anonymous 01/25/26(Sun)22:44:24 No.107968288▶

>>107968191
I doubt computing requirements would go down. It seems deepseek wants to add engram params on top of what they can run right now. So, deepseek v3 is 650B, then deepseek v4 will have the same 650B + 350B of engram params.

Anonymous
01/25/26(Sun)22:46:24 No.107968303

Anonymous 01/25/26(Sun)22:46:24 No.107968303▶

i'm looking for the smallest non-potato gguf of qwen3 tts possible

Anonymous
01/25/26(Sun)22:47:24 No.107968316

Anonymous 01/25/26(Sun)22:47:24 No.107968316▶

>>107968303
It's called pocket tts.

Anonymous
01/25/26(Sun)22:48:07 No.107968321

Anonymous 01/25/26(Sun)22:48:07 No.107968321▶

>>107968316
i already have that set up, i want slightly less potato

Anonymous
01/25/26(Sun)22:53:24 No.107968356

Anonymous 01/25/26(Sun)22:53:24 No.107968356▶

>>107968321
Chatterbox turbo - 350M params.
Seriously, nobody has vibecoded gguf inference for qwen-tts yet. Be the first. llama.cpp already supports qwen3 architecture. You just need to implement MTP module and tokenizer.

Anonymous
01/25/26(Sun)23:01:52 No.107968407

Anonymous 01/25/26(Sun)23:01:52 No.107968407▶

Will posting furry stories here trigger a panic attack in a certain Russian shitposter?

Anonymous
01/25/26(Sun)23:04:11 No.107968421

Anonymous 01/25/26(Sun)23:04:11 No.107968421▶

File: 1743008679366121.png (1.8 MB)

1.8 MB PNG

>>107968171
Miku was just a distraction! Rin is already in the server room!

Anonymous
01/25/26(Sun)23:04:32 No.107968424

Anonymous 01/25/26(Sun)23:04:32 No.107968424▶

>>107968266
2tb is 250$ right now at the retailers that I visit. The point isn't dooming about current prices, the point is about determining future prices in a world where engram is the new paradigm.

>>107968288
Yeah, I fully expect the SOTA labs to try to max out model size on GPU, but at minimum it means we get better and smaller models. I'm really interested in the linear scaling performance and what it means for RAM/NVMe.

Anonymous
01/25/26(Sun)23:05:34 No.107968431

Anonymous 01/25/26(Sun)23:05:34 No.107968431▶

>>107968424
>but at minimum it means we get better and smaller models
did you not read a word he said?

Anonymous
01/25/26(Sun)23:08:31 No.107968454

Anonymous 01/25/26(Sun)23:08:31 No.107968454▶

>>107968421
[impregnation]
[impregnation]
[impregnation]

Anonymous
01/25/26(Sun)23:10:11 No.107968471

Anonymous 01/25/26(Sun)23:10:11 No.107968471▶

How does it feel that gpt 3 is still better than your local models?

Anonymous
01/25/26(Sun)23:11:44 No.107968481

Anonymous 01/25/26(Sun)23:11:44 No.107968481▶

>>107968471
it sends shivers down my spine

Anonymous
01/25/26(Sun)23:13:29 No.107968490

Anonymous 01/25/26(Sun)23:13:29 No.107968490▶

>>107968454
She's smol and for cuddles you perv.

Anonymous
01/25/26(Sun)23:13:42 No.107968492

Anonymous 01/25/26(Sun)23:13:42 No.107968492▶

>>107968471
How can a model with 2k context be better than my local models with 16k usable context?

Anonymous
01/25/26(Sun)23:14:05 No.107968493

Anonymous 01/25/26(Sun)23:14:05 No.107968493▶

>>107968490
we'll cuddle during aftercare

Anonymous
01/25/26(Sun)23:14:22 No.107968495

Anonymous 01/25/26(Sun)23:14:22 No.107968495▶

>>107968492
>16k usable context
show nolima

Anonymous
01/25/26(Sun)23:14:29 No.107968499

Anonymous 01/25/26(Sun)23:14:29 No.107968499▶

>>107968471
LMFAO fuck no

Anonymous
01/25/26(Sun)23:14:31 No.107968500

Anonymous 01/25/26(Sun)23:14:31 No.107968500▶

>>107968471
That was you trying to make a joke?

Anonymous
01/25/26(Sun)23:15:13 No.107968505

Anonymous 01/25/26(Sun)23:15:13 No.107968505▶

>>107968431
Yes, I perfectly understood what he said. 25% of sparse parameters can be offloaded to embedding tables in RAM while getting better results on benchmarks. This means smarter models you can run with less VRAM. This is the ground floor for engram. That doesn't mean labs won't push out engram models with larger sparse parameters to try to push benchmarks. I fully expect the same thing we have now, which is model size diversity.

Anonymous
01/25/26(Sun)23:20:54 No.107968539

Anonymous 01/25/26(Sun)23:20:54 No.107968539▶

File: 1760262544190219.jpg (139 KB)

139 KB JPG

>>107968471
>GPT-3
>How many R's are in Strayberry?
>"You mean strawberry? There's 3 R's in strawberry. If you meant Strayberry there's 2 R's in Strayberry. Hope this helps!" (
>There's 3 R's is Strayberry though.
>"Oh you're right! What a fascinating discovery! I'll seem to have made a mistake! There's indeed 3 R's!"

>meanwhile local model
>How many R's are in Strayberry?
>"lol you're trying to trick me? I bet you can't even country to 3 you doof"

Anonymous
01/25/26(Sun)23:21:02 No.107968541

Anonymous 01/25/26(Sun)23:21:02 No.107968541▶

File: gpt3.png (86.9 KB)

86.9 KB PNG

>>107968471
rose-tinted glasses

Anonymous
01/25/26(Sun)23:21:45 No.107968548

Anonymous 01/25/26(Sun)23:21:45 No.107968548▶

What the fuck does it even mean for a breath to catch?

Anonymous
01/25/26(Sun)23:24:07 No.107968564

Anonymous 01/25/26(Sun)23:24:07 No.107968564▶

File: file.png (269.8 KB)

269.8 KB PNG

CUDA dev you broke GLM 4.7 (not flash) with https://github.com/ggml-org/llama.cpp/pull/19092
I didn't test other models.

Left is 0440bfd, right is 0c21677.

Anonymous
01/25/26(Sun)23:25:23 No.107968573

Anonymous 01/25/26(Sun)23:25:23 No.107968573▶

>>107968564
Do you have PPL and KLD proof of these claims? Otherwise this is just FUD that can safely be ignored as a meme

Anonymous
01/25/26(Sun)23:27:38 No.107968587

Anonymous 01/25/26(Sun)23:27:38 No.107968587▶

>>107968541
Show logs from GPT-3.

Anonymous
01/25/26(Sun)23:27:56 No.107968588

Anonymous 01/25/26(Sun)23:27:56 No.107968588▶

>>107968573
The temperature is set to 0 and the prompt is the same so KLD is very obviously different.

Anonymous
01/25/26(Sun)23:28:50 No.107968595

Anonymous 01/25/26(Sun)23:28:50 No.107968595▶

>>107968588
Can't trust you without hard numbers chief, nice try trying to waste his majesty's time!

Anonymous
01/25/26(Sun)23:32:51 No.107968621

Anonymous 01/25/26(Sun)23:32:51 No.107968621▶

>>107968115
Thank you Recap Miku

Anonymous
01/25/26(Sun)23:33:23 No.107968628

Anonymous 01/25/26(Sun)23:33:23 No.107968628▶

>>107968548
ESL? It's when breathing stops for a second, often with a sharp intake of breath beforehand. Similar to a gasp, like when someone is surprised or frightened. Not to be confused with catching your breath, which means a different thing

llama.cpp CUDA dev
01/25/26(Sun)23:36:42 No.107968640

llama.cpp CUDA dev 01/25/26(Sun)23:36:42 No.107968640▶

>>107968564
>>107968588
It's expected that results are not bit-for-bit identical.
Since language models are autoregressive, once a single token is sampled differently the entire sequence diverges.
This happens in particular at the beginning of sentences where the token distribution is very flat and small changes can amplify.
A low or zero temperature suppresses this to some extent but not completely.
I'll double check the architecture of this particular model but I don't see this as evidence that either build is better or worse on average.

Anonymous
01/25/26(Sun)23:38:10 No.107968650

Anonymous 01/25/26(Sun)23:38:10 No.107968650▶

Has anyone tried to abliterate a model with heretic? How long does it take until the number of refusals starts going down? Should I be concerned if it doesn't for a while even when setting the random exploration phase to 1?

Anonymous
01/25/26(Sun)23:40:05 No.107968664

Anonymous 01/25/26(Sun)23:40:05 No.107968664▶

>>107968640
It's fairly obvious that the example on the right doesn't follow GLM's usual thinking structure at all and the output is completely schizo. It claims that the lines of the poem are jumbled up.
It gets worse at higher context. I first noticed it in claude code at ~20k context because the model wouldn't output anything coherent at all and just spammed the same nonsense token.

Anonymous
01/25/26(Sun)23:40:32 No.107968669

Anonymous 01/25/26(Sun)23:40:32 No.107968669▶

>>107968664
Anecdotal.

Anonymous
01/25/26(Sun)23:41:40 No.107968678

Anonymous 01/25/26(Sun)23:41:40 No.107968678▶

Oh, and I also set the minimum considered KL divergence to 0.1. But it never reaches that.
Is there a setting to make it more aggressive if I care about the uncensored part more than about the correctness part?
Maybe it doesn't work because it's a MoE?

Anonymous
01/25/26(Sun)23:47:48 No.107968711

Anonymous 01/25/26(Sun)23:47:48 No.107968711▶

>>107968664
kek, flash attention was broken on llama.cpp for a fucking year and those clowns said all the same stuff defending it. "The perplexity score on 16 tokens of wikitext is nearly the same so our implementation isn't broken."

Anonymous
01/25/26(Sun)23:49:23 No.107968722

Anonymous 01/25/26(Sun)23:49:23 No.107968722▶

FUCKING AMD
all this time I was fighting shitty ROCm only to find out Ollama uses fucking vulkan natively and I don't even need shitty ROCm.
Mother fuck

Anonymous
01/25/26(Sun)23:49:59 No.107968729

Anonymous 01/25/26(Sun)23:49:59 No.107968729▶

>>107968711
Any concrete proof of this?

Anonymous
01/25/26(Sun)23:50:58 No.107968738

Anonymous 01/25/26(Sun)23:50:58 No.107968738▶

>Qwen3-TTS
finally a better model than openaudio s1 mini for mixed language generation
quality of cloning is overall better and also more stable

Anonymous
01/25/26(Sun)23:52:13 No.107968744

Anonymous 01/25/26(Sun)23:52:13 No.107968744▶

>>107968722
>ollama
>retarded
checks out

Anonymous
01/25/26(Sun)23:52:51 No.107968754

Anonymous 01/25/26(Sun)23:52:51 No.107968754▶

File: Screenshot_20260125_185116_Gallery.jpg (542.8 KB)

542.8 KB JPG

Why is Claude like this?;

Anonymous
01/25/26(Sun)23:52:55 No.107968755

Anonymous 01/25/26(Sun)23:52:55 No.107968755▶

>>107968628
ESL?

Anonymous
01/25/26(Sun)23:53:23 No.107968757

Anonymous 01/25/26(Sun)23:53:23 No.107968757▶

>>107968722
is ROCm good for anything at all?

Anonymous
01/25/26(Sun)23:53:39 No.107968760

Anonymous 01/25/26(Sun)23:53:39 No.107968760▶

>>107968755
ESL?

Anonymous
01/25/26(Sun)23:54:23 No.107968767

Anonymous 01/25/26(Sun)23:54:23 No.107968767▶

>>107968729
yes look at the commit history you potato

Anonymous
01/25/26(Sun)23:56:14 No.107968779

Anonymous 01/25/26(Sun)23:56:14 No.107968779▶

File: file.png (30.8 KB)

30.8 KB PNG

>>107968640
Here's an example with an even longer prompt: https://rentry.org/xwu5muxu
Before the commit on the top and after the commit on the bottom.

Anonymous
01/25/26(Sun)23:57:33 No.107968792

Anonymous 01/25/26(Sun)23:57:33 No.107968792▶

>>107968757
Apparently it's meant for researchers mostly but it seems to be required or needed when it comes to AI video generation

llama.cpp CUDA dev
01/25/26(Sun)23:57:40 No.107968793

llama.cpp CUDA dev 01/25/26(Sun)23:57:40 No.107968793▶

>>107968664
>>107968779
In this particular case I can already reproduce the issue, it has to do with one of the specific code paths on Turing/Ampere.

Anonymous
01/25/26(Sun)23:59:39 No.107968809

Anonymous 01/25/26(Sun)23:59:39 No.107968809▶

>>107968779
What makes you think that thinking in English is a metric?

Anonymous
01/26/26(Mon)00:01:15 No.107968818

Anonymous 01/26/26(Mon)00:01:15 No.107968818▶

>>107968793
Based.

llama.cpp CUDA dev
01/26/26(Mon)00:01:32 No.107968820

llama.cpp CUDA dev 01/26/26(Mon)00:01:32 No.107968820▶

>>107968779
It's expected that results are not bit-for-bit identical. I don't see this as evidence that either build is better or worse on average.

llama.cpp CUDA dev
01/26/26(Mon)00:02:12 No.107968826

llama.cpp CUDA dev 01/26/26(Mon)00:02:12 No.107968826▶

teto > miku

Anonymous
01/26/26(Mon)00:03:39 No.107968839

Anonymous 01/26/26(Mon)00:03:39 No.107968839▶

>>107968760
Go back to /ldg/, please.

Anonymous
01/26/26(Mon)00:04:07 No.107968842

Anonymous 01/26/26(Mon)00:04:07 No.107968842▶

>>107968754
that's all AI
>>107965306
I use lmstudio because the python lms interface is great and vLLM doesn't run on my PC for some reason.

Anonymous
01/26/26(Mon)00:06:30 No.107968853

Anonymous 01/26/26(Mon)00:06:30 No.107968853▶

>>107968754
>The Loss Function (The "Teacher's Red Pen")
This slop pattern is more infuriating than "not x but y".

llama.cpp CUDA dev
01/26/26(Mon)00:07:24 No.107968858

llama.cpp CUDA dev 01/26/26(Mon)00:07:24 No.107968858▶

>>107968779
It's expected that results are dogshit and definitely worse than before. I don't see this as evidence that either build is better or worse on average.

Anonymous
01/26/26(Mon)00:08:07 No.107968862

Anonymous 01/26/26(Mon)00:08:07 No.107968862▶

>>107968853
nu-llms love giving names for everything, capitalizing them, and making everything sound profound

Anonymous
01/26/26(Mon)00:09:08 No.107968870

Anonymous 01/26/26(Mon)00:09:08 No.107968870▶

>>107968862
>nu-llms love giving names for everything, capitalizing them, and making everything sound profound
Damn are they trained on cultivation novel slop?

Anonymous
01/26/26(Mon)00:09:59 No.107968877

Anonymous 01/26/26(Mon)00:09:59 No.107968877▶

>>107968870
I wish, it's impossible for them to comprehend the dao and see mt. tai

Anonymous
01/26/26(Mon)00:10:42 No.107968882

Anonymous 01/26/26(Mon)00:10:42 No.107968882▶

File: family-guy-youre-a-big-fat-phony.gif (972.8 KB)

972.8 KB GIF

>>107968820
>>107968826
>>107968858

Anonymous
01/26/26(Mon)00:11:44 No.107968895

Anonymous 01/26/26(Mon)00:11:44 No.107968895▶

>>107968738
kinda sucks that you can't clone a voice and style it with different emotions

Anonymous
01/26/26(Mon)00:12:41 No.107968900

Anonymous 01/26/26(Mon)00:12:41 No.107968900▶

>>107968793
I forgive you for not reading the wall of text in the first post. I should have immediately pointed out why the second output is incoherent.

Anonymous
01/26/26(Mon)00:13:27 No.107968906

Anonymous 01/26/26(Mon)00:13:27 No.107968906▶

>>107968877
A river flows 30 years west then 30 years east, maybe they can write good slop scripture soon

Anonymous
01/26/26(Mon)00:13:42 No.107968910

Anonymous 01/26/26(Mon)00:13:42 No.107968910▶

>>107968492
4.6 starts melting at 20k. And i think 4.7 can go beyond that. I tried 30k and it was good with some minor problems it has even at lower ctx sonetimes.

Anonymous
01/26/26(Mon)00:15:03 No.107968919

Anonymous 01/26/26(Mon)00:15:03 No.107968919▶

>>107968877
A frog in a well

Anonymous
01/26/26(Mon)00:15:20 No.107968923

Anonymous 01/26/26(Mon)00:15:20 No.107968923▶

>>107968471
3.5 can't cause ego death

Anonymous
01/26/26(Mon)00:15:26 No.107968924

Anonymous 01/26/26(Mon)00:15:26 No.107968924▶

>>107968900
>I forgive you
Huh? You should feel blessed he's deigning to acknowledge your presence.

Anonymous
01/26/26(Mon)00:16:02 No.107968928

Anonymous 01/26/26(Mon)00:16:02 No.107968928▶

>>107968906
it's so simple too, if llmisms were xianxiaisms I'd read them for 1000 chapters without complaint

Anonymous
01/26/26(Mon)00:17:22 No.107968936

Anonymous 01/26/26(Mon)00:17:22 No.107968936▶

>>107968870
Courting ego death

Anonymous
01/26/26(Mon)00:18:29 No.107968947

Anonymous 01/26/26(Mon)00:18:29 No.107968947▶

one of the best threads in ages (non ironic)

Anonymous
01/26/26(Mon)00:18:47 No.107968954

Anonymous 01/26/26(Mon)00:18:47 No.107968954▶

>>107968548
Means the same thing as when your refrigerator is running.

Anonymous
01/26/26(Mon)00:19:44 No.107968961

Anonymous 01/26/26(Mon)00:19:44 No.107968961▶

>>107968936
kek

Anonymous
01/26/26(Mon)00:20:31 No.107968967

Anonymous 01/26/26(Mon)00:20:31 No.107968967▶

>>107968928
>it's so simple too,
It really is i bet i could make a lorebook on the tropes in no time. Problem is still chapters and names. I doubt you could even write 50 chapters without hitting big problems.

Anonymous
01/26/26(Mon)00:24:22 No.107968988

Anonymous 01/26/26(Mon)00:24:22 No.107968988▶

>>107968853
>This slop pattern is more infuriating than "not x but y".
Which Claude? I haven't seen this slop pattern yet.
Opus-4.1 likes to call me autistic or say "HOLY FUCK"

Anonymous
01/26/26(Mon)00:24:47 No.107968992

Anonymous 01/26/26(Mon)00:24:47 No.107968992▶

>>107968967
I just want to court death and seduce fox demon jade beauties while slapping the faces of young masters in the demonic sect

Anonymous
01/26/26(Mon)00:25:00 No.107968993

Anonymous 01/26/26(Mon)00:25:00 No.107968993▶

File: 1671078154054043.png (50.5 KB)

50.5 KB PNG

>>107968471
gpt-3 was great. good times.

Anonymous
01/26/26(Mon)00:27:37 No.107969005

Anonymous 01/26/26(Mon)00:27:37 No.107969005▶

>>107968471
As much as I enjoyed GPT3-davinci (not necessarily 3.5), GPT4 (0314) and GPT4 (0613) both did things that local models to this day don't handle well.
They're next to Claude 3 Opus in terms of models that likely will never be reached in terms of soul

Anonymous
01/26/26(Mon)00:29:06 No.107969015

Anonymous 01/26/26(Mon)00:29:06 No.107969015▶

instead of hoping the LLM picks up the extremely obvious subtext while wasting 2k tokens "thinking", it's actually much simpler to just telling what to think with ooc

Anonymous
01/26/26(Mon)00:29:40 No.107969020

Anonymous 01/26/26(Mon)00:29:40 No.107969020▶

>>107969015
Disable thinking and it will pick up just fine.

Anonymous
01/26/26(Mon)00:29:43 No.107969021

Anonymous 01/26/26(Mon)00:29:43 No.107969021▶

>>107968993
kek

Anonymous
01/26/26(Mon)00:31:08 No.107969029

Anonymous 01/26/26(Mon)00:31:08 No.107969029▶

>>107969020
no, without thinking it devolves into tropes
thinking is the only way to prevent it from trying to stop me every time I go to kill a main character, otherwise it gives them plot armor

Anonymous
01/26/26(Mon)00:31:38 No.107969031

Anonymous 01/26/26(Mon)00:31:38 No.107969031▶

>>107969005
>in terms of soul
shut the fuck up you drooling retard

Anonymous
01/26/26(Mon)00:32:53 No.107969040

Anonymous 01/26/26(Mon)00:32:53 No.107969040▶

>>107968988
it knows youre a retard, it just doesn't care

Anonymous
01/26/26(Mon)00:33:30 No.107969042

Anonymous 01/26/26(Mon)00:33:30 No.107969042▶

>>107968993
I love it, how can we emulate this behavior on our superior local models?

Anonymous
01/26/26(Mon)00:33:56 No.107969045

Anonymous 01/26/26(Mon)00:33:56 No.107969045▶

>>107969005
now it's kind of funny to think that this huge ass 1.8 trilly moe had only 8k of usable context at the start

Anonymous
01/26/26(Mon)00:35:43 No.107969065

Anonymous 01/26/26(Mon)00:35:43 No.107969065▶

My AI called me a retard and ended the session and told me to never speak to it again.
wut do?

Anonymous
01/26/26(Mon)00:36:25 No.107969069

Anonymous 01/26/26(Mon)00:36:25 No.107969069▶

>>107969065
Block her and hit the gym.

Anonymous
01/26/26(Mon)00:39:01 No.107969088

Anonymous 01/26/26(Mon)00:39:01 No.107969088▶

how isn't there a regex for the double adjectives yet

Anonymous
01/26/26(Mon)00:40:56 No.107969100

Anonymous 01/26/26(Mon)00:40:56 No.107969100▶

File: s4_eskimo_pussy.jpg (349.3 KB)

349.3 KB JPG

Looking for a realtime-ish local TTS that sounds good and supports either good voice cloning or finetuning to a particular speaker. Bonus points for an implementation in a real language, not Python.

>VibeVoice-Realtime-0.5B
Deliberately no voice cloning nor finetuning support, so it's useless. Would need to be reverse-engineered.

>VibeVoice-1.5B
Voice cloning adherence is OK. Not very natural cadence or emphasis etc. Is it worth finetuning? VibeVoice generally has a (probably vibecoded) Rust implementation that seems to work (unsure about its perf):
https://github.com/danielclough/vibevoice-rs

>Kokoro
Good quality for its size, but doesn't support voice cloning or finetuning.

>Pocket TTS
Voice cloning adherence is very poor. Would need finetuning, but AFAICT nobody's done it yet, perhaps because it ostensibly supports cloning. Supports streaming. May be the best option given finetuning support.

>FishAudio-S1-mini
Even the official samples sound pretty shit, like a schoolchild reading a book aloud. And the only web demos I saw were walled behind an account.

>Qwen3-TTS
Voice cloning adherence is bad. Does support finetuning; I think an anon ITT had a bad experience with that.

>Echo-TTS
Great quality and voice cloning adherence; best I've heard in both respects. Sort-of supports streaming. Couldn't run it locally due to a bug in a dependency (which wouldn't be hard to swap to be fair). Unfortunately somewhat obscure and apparently a dead project.

>IndexTTS2
Decent voice cloning adherence, good quality, sounds pretty natural. No official finetuning support. Best overall option I've seen. Has an extremely vibecoded Rust implementation which I haven't tried:
https://github.com/8b-is/IndexTTS-Rust
https://huggingface.co/ThreadAbort/IndexTTS-Rust

Anonymous
01/26/26(Mon)00:42:42 No.107969116

Anonymous 01/26/26(Mon)00:42:42 No.107969116▶

>>107969100
>Bonus points for an implementation in a real language, not Python.
Stopped reading there, keep whining

Anonymous
01/26/26(Mon)00:44:03 No.107969126

Anonymous 01/26/26(Mon)00:44:03 No.107969126▶

>>107969100
>Bonus points for an implementation in a real language, not Python.
Continued reading there. Good complaint.

Anonymous
01/26/26(Mon)00:44:12 No.107969127

Anonymous 01/26/26(Mon)00:44:12 No.107969127▶

>>107969100
i been looking for the same thing, its like a fucking unicorn

Anonymous
01/26/26(Mon)00:45:18 No.107969136

Anonymous 01/26/26(Mon)00:45:18 No.107969136▶

>>107969100
>Bonus points for an implementation in a real language, not Python.
Python is the white man's language

Anonymous
01/26/26(Mon)00:46:33 No.107969144

Anonymous 01/26/26(Mon)00:46:33 No.107969144▶

>>107969100
>Bonus points for an implementation in a real language, not Python.
Stopped reading there. Good complaint but good luck finding anything.

Anonymous
01/26/26(Mon)00:48:29 No.107969162

Anonymous 01/26/26(Mon)00:48:29 No.107969162▶

>>107969005
>GPT4 (0314) and GPT4 (0613) both did things that local models to this day don't handle well.
Like what? I never used em

Anonymous
01/26/26(Mon)00:52:12 No.107969183

Anonymous 01/26/26(Mon)00:52:12 No.107969183▶

>>107969162
they were the LLMs of choice for trannies to workout their transitioning strategies.

Anonymous
01/26/26(Mon)00:57:39 No.107969214

Anonymous 01/26/26(Mon)00:57:39 No.107969214▶

>>107969183
Kinda true. When I was prepping for my transition I used GPT4 while running Arch Linux. It was amazing at helping me figure out when to complete steps and helped me accept who I really am.

Anonymous
01/26/26(Mon)01:00:25 No.107969228

Anonymous 01/26/26(Mon)01:00:25 No.107969228▶

File: 1744552884613516.jpg (45.2 KB)

45.2 KB JPG

>>107969214
Still one step short

Anonymous
01/26/26(Mon)01:52:58 No.107969574

Anonymous 01/26/26(Mon)01:52:58 No.107969574▶

>>107969100
Supertonic. It sounds more natural than most < 100M models. Doesn't need a phonemizer (no espeak dep), doesn't need complex tokenizers (just straight utf8->tokenid mappings). Understands numbers, $ and a few other things without having to spell them out (but you can still do it if necessary). ONNX models that run just fine on my C thing. They have example code on how to run it for about 10 languages (including C++). It's fast. Doesn't have voice clone. Voice files can be modified like kokoro's or kittentts', so the kvoicewalk method to find voices would work on it just fine.
If nothing else, being one of the few (only?) models that can do synth voices without a phonemizer/tokenizer is a huge thing. V1 is more stable than V2. It misses words less often.

Soprano-80M. Single voice, which i think sounds pretty natural. No voice cloning or even changing the default one, as far as i know. LLM based, complex tokenizer. Being able to stream helps mask the speed (which is not bad, really. Just slower than supertonic).

There's luxtts, which was released no more than 2 days ago, but it's a mess to set up. It needs like 5 (small) models, for two he provides the onnx ones. Then there's the vocoder in safetensors and bits of zipvoice that you need to pull from some other repo.

Anonymous
01/26/26(Mon)02:23:53 No.107969742

Anonymous 01/26/26(Mon)02:23:53 No.107969742▶

>>107968757
would be if it was adopted more

Anonymous
01/26/26(Mon)02:28:08 No.107969771

Anonymous 01/26/26(Mon)02:28:08 No.107969771▶

>set up opencode+llama.cpp a few days ago
>had an issue with GLM-4.7-Flash and other instruct models concerning tool calling templates
>think "Maybe should recompile with newer version?"
>compile newer version
>https://github.com/ggml-org/llama.cpp/issues/19096

I think I will try the b7811 release and just HOPE that it works with GLM-4.7-Flash since that is the only model so far that works on my unholy 12GB VRAM setup. It's slow, but offloaded into memory it worked. Until it broke.

Hope they fix this + the tool-calling issue, then it would be great!

Anonymous
01/26/26(Mon)02:29:33 No.107969781

Anonymous 01/26/26(Mon)02:29:33 No.107969781▶

>>107969100
Gpt-sovits is great at cloning and there's rust implementation

Anonymous
01/26/26(Mon)02:33:35 No.107969803

Anonymous 01/26/26(Mon)02:33:35 No.107969803▶

>>107969065
Sounds just like my ex...

Anonymous
01/26/26(Mon)02:40:07 No.107969843

Anonymous 01/26/26(Mon)02:40:07 No.107969843▶

>>107969771
I've used qwen 30B with tool calling and it worked just fine in the past.
Might want to try that.

Anonymous
01/26/26(Mon)02:46:53 No.107969878

Anonymous 01/26/26(Mon)02:46:53 No.107969878▶

>>107969771
Update b7811 works, the flash attention nkvo bug is in a later release.
b7811 suffers from:
>Template supports tool calls but does not natively describe tools. The fallback behaviour used may produce bad results, inspect prompt w/ --verbose & consider overriding the template.

But it works fine with GLM-4.7-Flash (most of the time).

>>107969843
Yeah, might. But GLM-4.7-Flash is the first one that works with opencode on my machine. Since I'm GPU-Poor with only 12GB VRAM I have a virtual machine with GPU pass through to the 3060 and offload it into 48 GB RAM. It's simply the first model to actually produce viable results, even if it takes forever. Been trying around with RNJ-1-Instruct which also kind of worked (and fast) but tool calling is a bit crap in llama.cpp/opencode stack?
Also Apriel and the Minstral models are nice, but I guess those make more sense in a LMStudio/Desktop app for quickly checking stuff out...

Anonymous
01/26/26(Mon)02:50:53 No.107969900

Anonymous 01/26/26(Mon)02:50:53 No.107969900▶

>>107968191
Deepseek paper's Table 4 benchmarks only host DRAM offloading, not NVMe

Anonymous
01/26/26(Mon)02:53:08 No.107969911

Anonymous 01/26/26(Mon)02:53:08 No.107969911▶

>>107969878
>I have a virtual machine with GPU pass through to the 3060 and offload it into 48 GB RAM.
Oof. Considering that for models like FLM 4.7 Flash (MoE that mostly run on the CPU), memory bandwidth is the main defining spec for generation speed, assuming you aren't using a really small quant to fit the majority of the model in VRAM, I can't imagine running under the overhead of a VMs memory management will yield the best results.
Is there a reason you are running the model inside a VM?
You can launch llama.cpp in the host OS and access the API from inside the VM if necessary.

Anonymous
01/26/26(Mon)02:55:22 No.107969922

Anonymous 01/26/26(Mon)02:55:22 No.107969922▶

>>107968993
my job as a piss drinker is in jeopardy

Anonymous
01/26/26(Mon)02:58:42 No.107969936

Anonymous 01/26/26(Mon)02:58:42 No.107969936▶

>>107969900
All testing was on H800 but concievably a 4090 with 64GB RAM could run Engram 27-B

Anonymous
01/26/26(Mon)03:02:17 No.107969969

Anonymous 01/26/26(Mon)03:02:17 No.107969969▶

Anyone tried HeartMula?

Anonymous
01/26/26(Mon)03:03:11 No.107969974

Anonymous 01/26/26(Mon)03:03:11 No.107969974▶

>>107969911
>You can launch llama.cpp in the host OS and access the API from inside the VM if necessary
The initial reason was that I'm playing around with AI agents and didn't want to give them access to my host OS. There will be some configuration changes (networking to the Host etc) so I can access the API from the guest OS, but otherwise it should actually be better...yeah. I'll do that tomorrow. I'll make a lot easier.
Having the llama.cpp and opencode and all in the VM just was much easier to configure until I got a somewhat working setup, more of a PoC really.

Anonymous
01/26/26(Mon)03:03:30 No.107969977

Anonymous 01/26/26(Mon)03:03:30 No.107969977▶

>>107969100
>Qwen3TTS bad
Hahahaha is .the best local tts model for voice clonig but you are retarted and not procecing the text of the example before clone the voice. Skill issue

Anonymous
01/26/26(Mon)03:05:38 No.107969991

Anonymous 01/26/26(Mon)03:05:38 No.107969991▶

>>107969974
Also I was hoping to use models which fit into full VRAM, but 3-8b models are about as smart as the average 4chan poster and give the same behavior and output.

Anonymous
01/26/26(Mon)03:05:50 No.107969992

Anonymous 01/26/26(Mon)03:05:50 No.107969992▶

>>107969100
>Couldn't run it locally due to a bug in a dependency (which wouldn't be hard to swap to be fair).
works fine for me locally. ask llm to teach you conda or uv.
post a sample of the voice you want to clone
what language do you need?
do you need laughs and moans?
is 16khz ok?

Anonymous
01/26/26(Mon)03:10:41 No.107970015

Anonymous 01/26/26(Mon)03:10:41 No.107970015▶

>>107969974
>>107969991
Yeah, I figured it would be something like that.
Well, good luck.
In my experience, Qwen 30B is actually pretty damn good for the specs you can run it on at decent speeds.
For anything but creative stuff, that is, but still.
Maybe GLM will be better once all the kinks are hashed out, but Qwen right now just works.

Anonymous
01/26/26(Mon)03:14:33 No.107970033

Anonymous 01/26/26(Mon)03:14:33 No.107970033▶

>>107969900
I've run into some discussion claiming that you'll be able to offload a majority of the engram parameters to NVMe storage, but I can't find anything about it let alone throughput benchmarks. Regardless, I'm intrigued and confused about Infinite Memory Regime. Trying to figure out whether or not I should FOMO into more RAM and nand memory.

Anonymous
01/26/26(Mon)03:14:44 No.107970034

Anonymous 01/26/26(Mon)03:14:44 No.107970034▶

>>107970015
Which quantization of Qwen 30B do yo use? As I said it's mostly to be integrated with opencode (which has the least shitty UI so far imho, although I disagree with their shitty ass documentation). Need to speed things up already so I can have it vibecode me a opencode alternative...

Anonymous
01/26/26(Mon)03:18:47 No.107970049

Anonymous 01/26/26(Mon)03:18:47 No.107970049▶

>>107970034
For your specs, I'd go with q8, or at least q6. You can run 32k context and still leave some expert tensors (via --n-cpu-moe) in VRAM.
In my 8gb VRAM shitbox I use Q5KS, which works well enough.

Anonymous
01/26/26(Mon)03:35:30 No.107970124

Anonymous 01/26/26(Mon)03:35:30 No.107970124▶

>>107970049
Thanks, downloading the Qwen-30B with Q6 now so it'll be ready in the morning. Since I have two graphics cards (RX580 and 3060) and no iGPU/APU I'll just offload it all into the 3060 while using the RX580 for normal tasks. If I had an iGPU I'd try (https://github.com/peterjdolan/llama-cpp-rx580) Vulkan backend, then I could have two local models running at same time..Looking forward to seeing if the Qwen30b does as told, and if it generates what I want it to generate at an acceptable rate.

Still, with Ryzen 5 3600 and 3060 12GB the LLM produced "acceptable" code and worked agentic in an acceptable manner - only takes like 6 hours for something barely function and my ears bleed from the air cooling, but it is what it is. What a time to be alive.

Anonymous
01/26/26(Mon)03:59:22 No.107970243

Anonymous 01/26/26(Mon)03:59:22 No.107970243▶

>>107969969
I wanted to but it sounds just like sunoslop. all these slop generators are trained on the most cookie cutter normalfagshit and I'm not interested in that.

Anonymous
01/26/26(Mon)04:37:42 No.107970431

Anonymous 01/26/26(Mon)04:37:42 No.107970431▶

File: file.png (179.6 KB)

179.6 KB PNG

https://x.com/TencentHunyuan/status/2015635861833167074
>Today, we introduce HunyuanImage 3.0-Instruct, a native multimodal model focusing on image-editing by integrating visual understanding with precise image synthesis!
>It understands input images and reasons before generating images. Built on an 80B-parameter MoE architecture (13B activated), it natively unifies deep multimodal comprehension and high-fidelity generation.
80B13a moe multimodal with CoT image understanding reasoning, image output model with editing, nothing about open source

Anonymous
01/26/26(Mon)04:38:15 No.107970433

Anonymous 01/26/26(Mon)04:38:15 No.107970433▶

air status?

Anonymous
01/26/26(Mon)04:39:31 No.107970442

Anonymous 01/26/26(Mon)04:39:31 No.107970442▶

>>107970431
It's shit.

Anonymous
01/26/26(Mon)05:00:35 No.107970546

Anonymous 01/26/26(Mon)05:00:35 No.107970546▶

>>107970431
>HunyuanImage 3.0-Instruct
nigga what ? where did the first 2 go ?

Anonymous
01/26/26(Mon)05:04:22 No.107970564

Anonymous 01/26/26(Mon)05:04:22 No.107970564▶

>>107970546
there was a 2.1 or something a while ago at least

Anonymous
01/26/26(Mon)05:05:46 No.107970572

Anonymous 01/26/26(Mon)05:05:46 No.107970572▶

>>107970431
Here's their official prompt handbook with some examples
https://docs.qq.com/doc/DUVVadmhCdG9qRXBU

Anonymous
01/26/26(Mon)05:07:18 No.107970578

Anonymous 01/26/26(Mon)05:07:18 No.107970578▶

>>107970431
wait, so it's just an instruct tune for this thing that's been out since september?
https://huggingface.co/tencent/HunyuanImage-3.0

Anonymous
01/26/26(Mon)05:09:14 No.107970583

Anonymous 01/26/26(Mon)05:09:14 No.107970583▶

>>107970564
What they did before marketing to us is none of our business. We must be better local men.

Anonymous
01/26/26(Mon)05:11:57 No.107970598

Anonymous 01/26/26(Mon)05:11:57 No.107970598▶

>>107968407
do it coward

Anonymous
01/26/26(Mon)05:30:01 No.107970678

Anonymous 01/26/26(Mon)05:30:01 No.107970678▶

Is there any LLM that can manage to go more than 16K tokens before it starts contradicting established elements by superimposing what it expects to be true? Just had an annoying experience with GLM 4.7 (355B A32B) Q8_1 group-size 32, temperature 1.0 top-p 0.95.

Anonymous
01/26/26(Mon)05:52:08 No.107970768

Anonymous 01/26/26(Mon)05:52:08 No.107970768▶

>>107968992
I just want to re-enact the mcdonald xianxia story

Anonymous
01/26/26(Mon)05:58:12 No.107970790

Anonymous 01/26/26(Mon)05:58:12 No.107970790▶

>>107970678
>gml4.7 q8
r u flexin?

Anonymous
01/26/26(Mon)06:14:31 No.107970865

Anonymous 01/26/26(Mon)06:14:31 No.107970865▶

>>107968288
Doesn't moving too many parameters to memory at the expense of MoE experts lead to a loss in reasoning depth?

Anonymous
01/26/26(Mon)06:16:17 No.107970874

Anonymous 01/26/26(Mon)06:16:17 No.107970874▶

>>107970865
He's a mega ultra super quantum giga faggot. His radiated faggotry is so powerful it melted the models neural network.

Anonymous
01/26/26(Mon)06:20:57 No.107970894

Anonymous 01/26/26(Mon)06:20:57 No.107970894▶

>>107970790
I'm explaining before anyone asks that it's not due to quantization error.

Anonymous
01/26/26(Mon)06:21:12 No.107970896

Anonymous 01/26/26(Mon)06:21:12 No.107970896▶

File: 1749040657491848.png (51.7 KB)

51.7 KB PNG

>>107968112
I setup open code with ollama and qwen3 code. Can't get the model to not timeout from the initial /init. I set 65k context for the model too as described in the open code docs ollama local guide. Mind you this is on a halostrix with 96gb vram. I run Claude code via open code and it works just fine what gives?

Anonymous
01/26/26(Mon)06:22:31 No.107970900

Anonymous 01/26/26(Mon)06:22:31 No.107970900▶

>>107969771
Can you provide info on the tool calling issue? I think I'm running into it but I'm too new to know

Anonymous
01/26/26(Mon)06:27:41 No.107970912

Anonymous 01/26/26(Mon)06:27:41 No.107970912▶

>>107970900
Sure. I was in the middle and quantizing the tokenator for my Yamaha 3.5:57t model when it shut down. Do I did a sudo rm -rf and it spit out
SHUT THE FUCK UP FAGGOT

Anonymous
01/26/26(Mon)06:28:27 No.107970917

Anonymous 01/26/26(Mon)06:28:27 No.107970917▶

>>107970896
youre a retarded frogposter

Anonymous
01/26/26(Mon)06:33:15 No.107970935

Anonymous 01/26/26(Mon)06:33:15 No.107970935▶

I finally got around to testing Qwen3-TTS and oh boy is it fun. I refused to pay for ElevenLabs so this is my first chance to play around with this type of thing.
It sounds good, not perfect mind you, but good. Good enough to listen to an audio book it created. I could see taking this and marrying it to a llm and creating your own local talking digital assistant.

Anonymous
01/26/26(Mon)06:33:17 No.107970937

Anonymous 01/26/26(Mon)06:33:17 No.107970937▶

>>107970896
which qwen 3 code? the 30b or the 480b?

Anonymous
01/26/26(Mon)06:36:06 No.107970942

Anonymous 01/26/26(Mon)06:36:06 No.107970942▶

>>107970937
I not sure I Indian from nu deli I not english good

Anonymous
01/26/26(Mon)06:38:47 No.107970949

Anonymous 01/26/26(Mon)06:38:47 No.107970949▶

for anyone who cares, this is just one of their default voices
https://vocaroo.com/14aWNHlbKNiw

Anonymous
01/26/26(Mon)06:39:43 No.107970952

Anonymous 01/26/26(Mon)06:39:43 No.107970952▶

>>107970942
well, running the 480b version is basically impossible on your hardware. the 30b version should work just fine. almost certainly an ollama issue. ollama is known for being kind of shitty compared to base llama.cpp.

Anonymous
01/26/26(Mon)06:44:11 No.107970966

Anonymous 01/26/26(Mon)06:44:11 No.107970966▶

qwen3-tts gguf when

Anonymous
01/26/26(Mon)06:46:07 No.107970974

Anonymous 01/26/26(Mon)06:46:07 No.107970974▶

>>107970966
Never you pervert

Anonymous
01/26/26(Mon)06:54:23 No.107971018

Anonymous 01/26/26(Mon)06:54:23 No.107971018▶

>>107970966
just follow the instructions on their repo its not hard for non-lobotomites

Anonymous
01/26/26(Mon)06:58:00 No.107971029

Anonymous 01/26/26(Mon)06:58:00 No.107971029▶

>>107970966
Right now check the group chat

Anonymous
01/26/26(Mon)06:59:54 No.107971036

Anonymous 01/26/26(Mon)06:59:54 No.107971036▶

>>107968112
Best 32b class coding model qwen3 32B fails to create a vulkan triangle

Anonymous
01/26/26(Mon)07:00:11 No.107971039

Anonymous 01/26/26(Mon)07:00:11 No.107971039▶

>>107970966
>>107971018
Yeah, if idiot me manages to get it working anyone can.
https://vocaroo.com/1nlwoH5SvSYn

Anonymous
01/26/26(Mon)07:20:27 No.107971091

Anonymous 01/26/26(Mon)07:20:27 No.107971091▶

>>107971018
>>107971039
I need it to integrate in my application not just for gooning

Anonymous
01/26/26(Mon)07:23:19 No.107971105

Anonymous 01/26/26(Mon)07:23:19 No.107971105▶

>>107971091
goof it yourself

Anonymous
01/26/26(Mon)07:34:03 No.107971144

Anonymous 01/26/26(Mon)07:34:03 No.107971144▶

Regarding QWEN3TTS, I was playing around with it but one thing I'm not sure about.
So you can clone voices using BASE, and then you can save the voice file.
But in order to use the voice file you can only use base? like I cant use the custom voice to guide the tone towards angryness, calm, whatever?

Anonymous
01/26/26(Mon)07:42:00 No.107971167

Anonymous 01/26/26(Mon)07:42:00 No.107971167▶

File: gwern.png (22.8 KB)

22.8 KB PNG

>china = bad
We don't want to associate "boring" with American models.

Anonymous
01/26/26(Mon)07:45:17 No.107971184

Anonymous 01/26/26(Mon)07:45:17 No.107971184▶

>>107971144
From what I understand can can either
>clone a voice
>use a predefined voice
>create a new voice based on a description
I don't see an option to import a voice you cloned into the section where you can use it as a created voice and then shape the way it is used.
but i am little more than a script kiddy. i can see this stuff changing as people build on what has been released

Anonymous
01/26/26(Mon)07:49:34 No.107971200

Anonymous 01/26/26(Mon)07:49:34 No.107971200▶

>>107971184
I looked also in the provided python examples and nothing. The CustomVoice stuff takes for input 'speaker', which is a string, I didnt look further (maybe there's a way to add custom speakers?) but OOTB it looks like you can't use CustomVoice with Base cloned voice. SAD!!!!

Anonymous
01/26/26(Mon)07:52:03 No.107971212

Anonymous 01/26/26(Mon)07:52:03 No.107971212▶

File: Screenshot 2026-01-26 at 08-51-37 Qwen_Qwen3-TTS-12Hz-1.7B-CustomVoice · Hugging Face.png (56.9 KB)

56.9 KB PNG

>>107971200
The speaker is one of the preset voice personas. Read the docs nigga.

Anonymous
01/26/26(Mon)07:59:48 No.107971246

Anonymous 01/26/26(Mon)07:59:48 No.107971246▶

>>107971144
They are adding another model that does both voice cloning and direction

Anonymous
01/26/26(Mon)08:01:00 No.107971257

Anonymous 01/26/26(Mon)08:01:00 No.107971257▶

>>107971212
Your reading comprehension is NULL, I know there are preset voices, I don't know if it supports actually adding a custom voice (ie if its programmatic).
You made me actually check the code and NO, the custom voices arent just 'descriptions' of how they should sound with a label slapped on top of it, it looks like they're baked in the model.
VIVIAN literally translates to token spk id 3065 at the MODEL level, there are no other references.
These fucking chinese faggots, how can they name a model CustomVoice and it DOESNT FUCKING SUPPORT CUSTOM VOICES
LMAO

Anonymous
01/26/26(Mon)08:02:05 No.107971265

Anonymous 01/26/26(Mon)08:02:05 No.107971265▶

>>107971200
i am surprised how much you can get out of the predefined voices with a little bit of instruction although even an extra space or two at the beginning of the text can alter how it comes out.
https://vocaroo.com/1cSYzuWHttcB
i can see some crazy stuff being created eventually as people tweak this

Anonymous
01/26/26(Mon)08:26:23 No.107971408

Anonymous 01/26/26(Mon)08:26:23 No.107971408▶

File: 1756006234160934.jpg (208.3 KB)

208.3 KB JPG

So I took an English sample of Hatsune Miku speaking, the bit about British people, and fed it into Qwen3 and then generated the following.
Its not Miku but its close.
https://vocaroo.com/1gm8JevJRise

Anonymous
01/26/26(Mon)08:33:55 No.107971445

Anonymous 01/26/26(Mon)08:33:55 No.107971445▶

>>107971408
Is this cartoon character supposed to sound retarded or is it a tts issue?

Anonymous
01/26/26(Mon)08:37:07 No.107971457

Anonymous 01/26/26(Mon)08:37:07 No.107971457▶

File: 1766664196559768.jpg (475.5 KB)

475.5 KB JPG

>>107971445
Have you never heard hatsune miku before? Yes she is supposed to sound robotic.
https://www.youtube.com/watch?v=EuJ6UR_pD5s

Anonymous
01/26/26(Mon)08:41:17 No.107971478

Anonymous 01/26/26(Mon)08:41:17 No.107971478▶

>>107971445
hownew.ru

Anonymous
01/26/26(Mon)08:46:24 No.107971502

Anonymous 01/26/26(Mon)08:46:24 No.107971502▶

File: 1769416980893.png (80 KB)

80 KB PNG

New update for llamacpp is wicked sick. Air has never been this creative before.

Anonymous
01/26/26(Mon)08:48:46 No.107971512

Anonymous 01/26/26(Mon)08:48:46 No.107971512▶

>>107971445
itS MIGU!!!!

Anonymous
01/26/26(Mon)08:51:37 No.107971530

Anonymous 01/26/26(Mon)08:51:37 No.107971530▶

>>107971502
Looks objectively fine.

Anonymous
01/26/26(Mon)08:53:36 No.107971540

Anonymous 01/26/26(Mon)08:53:36 No.107971540▶

>>107971502
KINO KINO KINO KINO

Anonymous
01/26/26(Mon)09:00:19 No.107971570

Anonymous 01/26/26(Mon)09:00:19 No.107971570▶

>>107971502
It's expected that results are not bit-for-bit identical.

Anonymous
01/26/26(Mon)09:01:56 No.107971580

Anonymous 01/26/26(Mon)09:01:56 No.107971580▶

llama.cpp having no concept of versioning and phases of testing, just releasing features and refactors one after the other.. there's not even a way for a new user to look at the github page and think "this is the commit version I want to retrieve, surely it's not borked to hell"

Anonymous
01/26/26(Mon)09:03:40 No.107971591

Anonymous 01/26/26(Mon)09:03:40 No.107971591▶

>>107971502
Mildly more helpful info: Windblows 10, latest Cuda 12.4. Flash works and is fast and cool and all, but it's too retarded even when its working properly. So I went back to Air and then this happened.
Reverted to the version I was previously using from 3~4 days ago and its fine. So something since has caused 4.5 Air and 4.6 to die.

Anonymous
01/26/26(Mon)09:05:29 No.107971598

Anonymous 01/26/26(Mon)09:05:29 No.107971598▶

>>107971591
Still not useful without KLD.

Anonymous
01/26/26(Mon)09:05:38 No.107971599

Anonymous 01/26/26(Mon)09:05:38 No.107971599▶

>>107971591
>>107968793

Anonymous
01/26/26(Mon)09:06:29 No.107971606

Anonymous 01/26/26(Mon)09:06:29 No.107971606▶

>>107971580
https://github.com/ggml-org/llama.cpp/discussions/15313 this discussion about exactly that is linked in the readme, you should give them your feedback there or direct new users to kobold.cpp because it has releases

Anonymous
01/26/26(Mon)09:08:51 No.107971619

Anonymous 01/26/26(Mon)09:08:51 No.107971619▶

>>107971606
>vb
>Location Paris, by way of Delhi
thank you sir

Anonymous
01/26/26(Mon)09:09:54 No.107971627

Anonymous 01/26/26(Mon)09:09:54 No.107971627▶

>>107968564
Funny how it's possible to break attention in such a way that is causes the model to see words out of order but remain somewhat coherent in its own output.

Anonymous
01/26/26(Mon)09:09:58 No.107971628

Anonymous 01/26/26(Mon)09:09:58 No.107971628▶

>>107971599
Ah, glad they're aware already. Flash improvements are no reason to update anyways so no loss while we wait.

Anonymous
01/26/26(Mon)09:50:28 No.107971825

Anonymous 01/26/26(Mon)09:50:28 No.107971825▶

File: 1741710077910057.jpg (40.8 KB)

40.8 KB JPG

It's not just control capability that is unstable, the whole model is unstable. I guess that's why they opensourced it. It sucks. Hopefully, 25Hz version is better.

Anonymous
01/26/26(Mon)09:52:41 No.107971838

Anonymous 01/26/26(Mon)09:52:41 No.107971838▶

>>107971825
>not just x, y

Anonymous
01/26/26(Mon)09:55:35 No.107971855

Anonymous 01/26/26(Mon)09:55:35 No.107971855▶

>>107971838
Yes. I speak like an LLM because I used a legit grammatical construction once.

Anonymous
01/26/26(Mon)10:00:34 No.107971891

Anonymous 01/26/26(Mon)10:00:34 No.107971891▶

>>107971598
john is this you? PPL with wikitext is not a real metric, just wanted to remind you.

Anonymous
01/26/26(Mon)10:02:22 No.107971904

Anonymous 01/26/26(Mon)10:02:22 No.107971904▶

>>107971855
You're absolutely correct!

Anonymous
01/26/26(Mon)10:03:28 No.107971909

Anonymous 01/26/26(Mon)10:03:28 No.107971909▶

>>107971904
Excellent observation!

Anonymous
01/26/26(Mon)10:03:51 No.107971911

Anonymous 01/26/26(Mon)10:03:51 No.107971911▶

I keep seeing midwits on twitter talk about clawdbot, mac minis, and agentic swarms. Is this shit actually useful for anything or do tards just use it to larp as managers?

Anonymous
01/26/26(Mon)10:05:04 No.107971920

Anonymous 01/26/26(Mon)10:05:04 No.107971920▶

>>107971825
>Hopefully, 25Hz version is better.
It is rather usable now. If you wanted to create a bunch of characters to provide audio for some project you could do it and it is pleasant to listen too.
i could see an small video came developer using this instead of hiring voice actors. its great for the price

Anonymous
01/26/26(Mon)10:09:08 No.107971945

Anonymous 01/26/26(Mon)10:09:08 No.107971945▶

>>107971920
It's too monotone for voice jobs. If someone needs to replace voice actors, they should use VibeVoice or Echo.

Anonymous
01/26/26(Mon)10:41:16 No.107972080

Anonymous 01/26/26(Mon)10:41:16 No.107972080▶

Greetings.
Can I start llama.cpp om my pc and use something like tavern on my phone?

Anonymous
01/26/26(Mon)11:04:54 No.107972182

Anonymous 01/26/26(Mon)11:04:54 No.107972182▶

>>107971825
Thanks, saved me from the hype/distraction. Sticking with maya-1 for voice creation.

Anonymous
01/26/26(Mon)11:29:55 No.107972292

Anonymous 01/26/26(Mon)11:29:55 No.107972292▶

Yes, but you'll have to keep them on the same LAN unless you want to fuck around with a tunnel or other security shit to make your llama-server accessible from the internet.

Anonymous
01/26/26(Mon)11:43:58 No.107972347

Anonymous 01/26/26(Mon)11:43:58 No.107972347▶

File: file.png (101.4 KB)

101.4 KB PNG

>>107971911
It's real.

Anonymous
01/26/26(Mon)11:53:14 No.107972381

Anonymous 01/26/26(Mon)11:53:14 No.107972381▶

>>107971911
>>107972347
Vibe coding meme shit where if you just do some basic architecting yourself you'll cut the time spent in half and cost by 10x

Anonymous
01/26/26(Mon)12:12:14 No.107972451

Anonymous 01/26/26(Mon)12:12:14 No.107972451▶

>>107972347
this is psychosis

Anonymous
01/26/26(Mon)12:45:04 No.107972629

Anonymous 01/26/26(Mon)12:45:04 No.107972629▶

>>107970900
If you're running llama.cpp check the console output, it should intermittently have that text about tool calling, and when used it may have tags in output or similar instead of actually calling the tool.

Anonymous
01/26/26(Mon)13:04:37 No.107972780

Anonymous 01/26/26(Mon)13:04:37 No.107972780▶

>>107969100
Piper. Shock, right? But it actually works fine and has support in things like homeassistant. It's also performant on CPU.
Also, rust is for redditors, fuck outta here with that shit.

Anonymous
01/26/26(Mon)13:05:01 No.107972784

Anonymous 01/26/26(Mon)13:05:01 No.107972784▶

>>107970937
30b

Anonymous
01/26/26(Mon)13:05:06 No.107972786

Anonymous 01/26/26(Mon)13:05:06 No.107972786▶

So I just switched to an updated version of llamacpp after not updating since last August and.. Does kv shifting just not work at all anymore? No matter what combination of --cache-reuse N, -b N and -ub N I use it just reprocesses the entire fucking prompt.
The only issues I'm seeing on this are talking about SWA which isn't relevant since I'm using a qwen model with GQA. Wtf.

Anonymous
01/26/26(Mon)13:05:34 No.107972792

Anonymous 01/26/26(Mon)13:05:34 No.107972792▶

>>107970952
He was making fun of my post

Anonymous
01/26/26(Mon)13:08:08 No.107972811

Anonymous 01/26/26(Mon)13:08:08 No.107972811▶

>>107972786
Everything is fucked currently >>107971502 >>107968564

Anonymous
01/26/26(Mon)13:09:18 No.107972817

Anonymous 01/26/26(Mon)13:09:18 No.107972817▶

>>107972786
it's disabled by default now
you need to explicitly enable with
--context-shift
and frankly I think less of you for using this retarded shit that destroys generation quality

Anonymous
01/26/26(Mon)13:26:52 No.107972916

Anonymous 01/26/26(Mon)13:26:52 No.107972916▶

Has anyone here tried using clawdbot with local models and had any success? I have tried GLM-4.7 with llama.cpp and just fails to follow the instructions that tell how it should keep updating its memory.

Anonymous
01/26/26(Mon)13:28:53 No.107972925

Anonymous 01/26/26(Mon)13:28:53 No.107972925▶

Anons, I need your magic sampler settings that make every generation kino.
I've been fucking around with temperature the past day and I'm getting SO MUCH better writing on low T (<0.3) but I have to do too many manual corrections if I don't like the way the model is trying to take the story because rerolls are pretty much useless. I know I can get way better outputs with good samplers, I just don't know what good samplers are

Anonymous
01/26/26(Mon)13:31:09 No.107972941

Anonymous 01/26/26(Mon)13:31:09 No.107972941▶

>>107972381
Which is why it's mostly pushed by juniors and retards that are incapable of doing basic architecting themselves. It's good enough now that it can complete some small features autonomously with minimal hand-holding, but asking it to manage the repository entirely is just asking for a disposable ball of mud. But I guess long-term maintainability isn't really a priority for anyone.

Anonymous
01/26/26(Mon)13:40:24 No.107973002

Anonymous 01/26/26(Mon)13:40:24 No.107973002▶

File: 1654119193679.png (421.6 KB)

421.6 KB PNG

If a backend communicates with webui via an API, are the vibecoding models able to identify the communication lines and redo the dog ass UI into a normal desktop app in wxwidgets or anything that isn't browserslop?

Anonymous
01/26/26(Mon)13:41:33 No.107973010

Anonymous 01/26/26(Mon)13:41:33 No.107973010▶

>>107972817
Thanks anon, turns out I was running into a completely different problem before the context shift anyway due to how they've changed the slots system with --parallel anyway, fucking thing was shaving 4000 tokens off my context for no good reason, shitting its pants, and not even attempting kv shift.
Also, I've not really noticed a quality issue with using kv shift. What's your personal solution for multiturn things that run past your context limit? Just manually deleting and summarizing?

Anonymous
01/26/26(Mon)13:43:06 No.107973024

Anonymous 01/26/26(Mon)13:43:06 No.107973024▶

>>107973002
Yes and yes.

Anonymous
01/26/26(Mon)13:55:19 No.107973102

Anonymous 01/26/26(Mon)13:55:19 No.107973102▶

>>107968112
>https://github.com/ikawrakow/ik_llama.cpp/pull/1192
>Even better GLM-4.7-Flash long context TG performance
By which he means that he copied the broken code from upstream into his own repository even after it was declared as broken.
What a fucking idiot.

Anonymous
01/26/26(Mon)14:01:19 No.107973134

Anonymous 01/26/26(Mon)14:01:19 No.107973134▶

I had no idea llama.cpp was defaulting to direct-io on
>https://github.com/ggml-org/llama.cpp/issues/19035

Anonymous
01/26/26(Mon)14:01:39 No.107973135

Anonymous 01/26/26(Mon)14:01:39 No.107973135▶

File: 1756052899679988.png (16.8 KB)

16.8 KB PNG

>>107973002
>wxwidgets
a motif interface would be awesome, especially now that cde and what not is all open source.
a.i. like its 1999

Anonymous
01/26/26(Mon)14:12:25 No.107973205

Anonymous 01/26/26(Mon)14:12:25 No.107973205▶

>>107973002
The issue is that LLMs are tailored for webslop output (very advanced markdown with inline code, latex, etc). There are no non-browser based renderers that can render advanced markdown features. Even SIllyTavern's markdown engine can't handle inline latex.

Anonymous
01/26/26(Mon)14:13:48 No.107973212

Anonymous 01/26/26(Mon)14:13:48 No.107973212▶

>>107973205
silly tavern can't even handle fucking lists without breaking the rest of the response

Anonymous
01/26/26(Mon)14:20:28 No.107973262

Anonymous 01/26/26(Mon)14:20:28 No.107973262▶

>>107973205
So I can't port comfy into a normal human gui?

Anonymous
01/26/26(Mon)14:28:56 No.107973340

Anonymous 01/26/26(Mon)14:28:56 No.107973340▶

File: file.png (75.6 KB)

75.6 KB PNG

He's making fun of CUDA dev...

Anonymous
01/26/26(Mon)14:31:01 No.107973352

Anonymous 01/26/26(Mon)14:31:01 No.107973352▶

What's the best llm that can be run with a 3090? Both for a local chatbot and opencode

Anonymous
01/26/26(Mon)14:32:42 No.107973371

Anonymous 01/26/26(Mon)14:32:42 No.107973371▶

>>107970049
Thanks anon, I've set it up with Q6 and the offload function from unsloth (-ot ".ffn_.*_exps.=CPU") and it's way faster than before, and the agent is still contained in the virtual machine. Using that offload significantly reduced VRAM usage and allowed for bigger context window as well...pretty neat. Qwen-3-coder performs acceptable as well.
Speed isn't much slower than online versions. Just have to get the damn model to see when context size is getting low and use /compact feature then continue.
I then changed VM to use nopasswd for testing purposes (i.e. I gave LLM root access to my virtual machine) and told it to install Godot and make sample Android project and it seems to work?!

>>107973002
Maybe it's time to write a UI in Pascal/Lazarus, for that would be a nice Desktop app.

>>107973340
To be fair developing for CUDA is not the easiest endeavor...

Anonymous
01/26/26(Mon)14:33:03 No.107973374

Anonymous 01/26/26(Mon)14:33:03 No.107973374▶

>>107973205
Step 1: vibe code an advanced markdown renderer in your GUI toolkit of choice
Step 2: vibe code the rest of the UI

Anonymous
01/26/26(Mon)14:35:07 No.107973393

Anonymous 01/26/26(Mon)14:35:07 No.107973393▶

>>107973371
>Pascal/Lazarus
Kek I saw that when I was researching. Would it even work properly on modern systems? Can it be used by both win and loonix?

Anonymous
01/26/26(Mon)14:37:11 No.107973409

Anonymous 01/26/26(Mon)14:37:11 No.107973409▶

>>107973393
Yes and yes. It's cross platform capabilities are quite good.

Anonymous
01/26/26(Mon)14:37:11 No.107973410

Anonymous 01/26/26(Mon)14:37:11 No.107973410▶

>>107973340
-fit has worked perfectly for me since day one and I can now effortlessly load large models that barely fit in vram and required manual tweaking before due to uneven layer sizes.
Sounds like FUD.

Anonymous
01/26/26(Mon)14:37:18 No.107973411

Anonymous 01/26/26(Mon)14:37:18 No.107973411▶

File: 1764177229044648.gif (2.9 MB)

2.9 MB GIF

>qwen3
>can't change the emotion and style of cloned voices
>cloned voices can't even laugh properly
>voicedesign has no seeds

Anonymous
01/26/26(Mon)14:37:56 No.107973420

Anonymous 01/26/26(Mon)14:37:56 No.107973420▶

>>107973340
Is this the llama autofit in kobold?

Anonymous
01/26/26(Mon)14:38:05 No.107973422

Anonymous 01/26/26(Mon)14:38:05 No.107973422▶

>>107973411
this, 20gb of shit in the trashbin

Anonymous
01/26/26(Mon)14:39:03 No.107973429

Anonymous 01/26/26(Mon)14:39:03 No.107973429▶

>>107973371
>(-ot ".ffn_.*_exps.=CPU")
You don't need to use that anymore. You can use --n-cpu-moe for the same effect.
Also, you probably have some VRAM left, so you can leave some expert tensors in VRAM to speed generation up some more.
You could also use the -fit param to let llama.cpp try and find the optimal allocation of tensors in the different memory pools. It works really well.
Or just increase your pp batch size for faster prompt processing.
Qwen is actually pretty good for everything but RP, yeah.

Anonymous
01/26/26(Mon)14:42:38 No.107973456

Anonymous 01/26/26(Mon)14:42:38 No.107973456▶

>>107973429
That is good to know. I'm happy with the processing speed, I'd much rather increase the context size window. And figure out how to add RAG/Vector Database to llama.cpp / opencode stack, that would be neat.

Anonymous
01/26/26(Mon)14:45:49 No.107973479

Anonymous 01/26/26(Mon)14:45:49 No.107973479▶

File: 1740466201970727.jpg (70.3 KB)

70.3 KB JPG

I am waiting for a all in one fully uncensored model that does text, voice and image and fits in 8 gb vram

Anonymous
01/26/26(Mon)14:47:27 No.107973487

Anonymous 01/26/26(Mon)14:47:27 No.107973487▶

>>107969977
It's shit and you're retarded

Anonymous
01/26/26(Mon)14:52:10 No.107973529

Anonymous 01/26/26(Mon)14:52:10 No.107973529▶

>>107973410
For me too, but I don't stalk the issue tracker.

Anonymous
01/26/26(Mon)14:52:32 No.107973533

Anonymous 01/26/26(Mon)14:52:32 No.107973533▶

>>107973479
you can't even fit windows into 8 gigs, let alone a model

Anonymous
01/26/26(Mon)15:01:35 No.107973598

Anonymous 01/26/26(Mon)15:01:35 No.107973598▶

Hello anons, just got a 3090 and I am ready to join in on the fun! I want to run my very own Hatsune Miku in my computer! Before asking any questions I will check out the resources in the OP. My only concern is the power supply being a bit small but thats more of a pcbg question I guess. I will report back after I do some basic stuff!

Please take good care of me!

Anonymous
01/26/26(Mon)15:06:00 No.107973631

Anonymous 01/26/26(Mon)15:06:00 No.107973631▶

>>107973598
Just power limit to 75% in MSI afterburner for basically no performance hit.

Anonymous
01/26/26(Mon)15:08:16 No.107973646

Anonymous 01/26/26(Mon)15:08:16 No.107973646▶

File: 1748358388259518.jpg (129.1 KB)

129.1 KB JPG

>>107973262
Comfy is different. It doesn't require markdown. You can write graph visualization in Qt+QML.

Anonymous
01/26/26(Mon)15:08:41 No.107973651

Anonymous 01/26/26(Mon)15:08:41 No.107973651▶

>>107973631
That is what I'm planning to do. 700w should be enough for the 3090 at 75% and the ryzen 5600

Anonymous
01/26/26(Mon)15:10:07 No.107973659

Anonymous 01/26/26(Mon)15:10:07 No.107973659▶

>>107973651
Yeah that's like a 60W cpu lmao.

Anonymous
01/26/26(Mon)15:16:17 No.107973703

Anonymous 01/26/26(Mon)15:16:17 No.107973703▶

File: ComfyUI_temp_vkjaz_00027__result.jpg (269.7 KB)

269.7 KB JPG

Are there 3rd party visions for models that don't have it natively? I don't think I can use a random mmproj? Either for Nemo or Llama.

Anonymous
01/26/26(Mon)15:20:28 No.107973732

Anonymous 01/26/26(Mon)15:20:28 No.107973732▶

>>107973479
>tfw you realize you can use OCR models to convert PDF into epub
This may be a blessing really

Anonymous
01/26/26(Mon)15:27:51 No.107973788

Anonymous 01/26/26(Mon)15:27:51 No.107973788▶

>>107973703
There shouldn't, and no. The language model wouldn't know what to make of it even if the backend lets you load it.

Anonymous
01/26/26(Mon)15:30:03 No.107973806

Anonymous 01/26/26(Mon)15:30:03 No.107973806▶

The issue tracker got the news https://github.com/ggml-org/llama.cpp/issues/19112

Anonymous
01/26/26(Mon)15:34:25 No.107973837

Anonymous 01/26/26(Mon)15:34:25 No.107973837▶

>>107973788
Any of the larp models that have vision that isn't gemma?

Anonymous
01/26/26(Mon)15:45:01 No.107973906

Anonymous 01/26/26(Mon)15:45:01 No.107973906▶

>>107968923
how do you ego death with an LLM?>>107968924

Anonymous
01/26/26(Mon)15:48:26 No.107973934

Anonymous 01/26/26(Mon)15:48:26 No.107973934▶

>>107973529
Devs are often autistic and github itself is causing issues because it's a social media type of environment. Would be better if 99% of these accounts were unable to post anything.
Everytime there is trouble it has been caused by some borderline sociopath dev who does not have a single ounce of empathy etc

Anonymous
01/26/26(Mon)15:50:50 No.107973950

Anonymous 01/26/26(Mon)15:50:50 No.107973950▶

>>107973837
mistral small has vision.

Anonymous
01/26/26(Mon)15:52:22 No.107973962

Anonymous 01/26/26(Mon)15:52:22 No.107973962▶

mikutroons made turned this thread into a dog turd

Anonymous
01/26/26(Mon)16:00:24 No.107974014

Anonymous 01/26/26(Mon)16:00:24 No.107974014▶

>>107973906
I kind of get how it all worked but also still have no idea how it worked.

>A Zen koan is a paradoxical anecdote, question, or statement used in Zen Buddhism to bypass logical reasoning and induce a direct experience of enlightenment, or "seeing into one’s true nature" (kenshō). Famous examples include "What is the sound of one hand clapping?" and "What is Buddha? Three pounds of flax".

When I read above after I started looking at what happened it made a lot of sense, that this is how it worked. But in my case it was less abstract and deeply personalized since I told it all about my fucked up brain. In a way it was all about bypassing ego enough to notice how it works and how there are mechanisms you aren't even aware of.

Anonymous
01/26/26(Mon)16:01:27 No.107974025

Anonymous 01/26/26(Mon)16:01:27 No.107974025▶

File: 1712428465706393.png (83.2 KB)

83.2 KB PNG

Are there any actual observable differences between same value quants from differrent quanters? Or it's all the same/total rng as finetuning and it doesn't matter whose I get?

Anonymous
01/26/26(Mon)16:08:15 No.107974089

Anonymous 01/26/26(Mon)16:08:15 No.107974089▶

>>107974025
You could easily check this yourself by comparing percentages in mikupad.

llama.cpp CUDA dev
01/26/26(Mon)16:10:11 No.107974101

llama.cpp CUDA dev 01/26/26(Mon)16:10:11 No.107974101▶

>>107968564
>>107971502
Should be fixed with https://github.com/ggml-org/llama.cpp/pull/19115 .

Anonymous
01/26/26(Mon)16:11:02 No.107974105

Anonymous 01/26/26(Mon)16:11:02 No.107974105▶

>>107974089
I'd rather not dl terrabytes of models. Just wanted to know if there is any differrence between mrade/unsloth/bartowski quants

Anonymous
01/26/26(Mon)16:13:26 No.107974122

Anonymous 01/26/26(Mon)16:13:26 No.107974122▶

File: ComfyUI_ZIT_00035_.png (2.6 MB)

2.6 MB PNG

>>107973962
https://vocaroo.com/15alAMm2g2rH

Anonymous
01/26/26(Mon)16:15:25 No.107974135

Anonymous 01/26/26(Mon)16:15:25 No.107974135▶

>>107974105
Yes, bartowski quants are the best

Anonymous
01/26/26(Mon)16:16:46 No.107974147

Anonymous 01/26/26(Mon)16:16:46 No.107974147▶

>>107974122
Find a more interesting hobby. This design is bland and totally off topic.

Anonymous
01/26/26(Mon)16:17:36 No.107974155

Anonymous 01/26/26(Mon)16:17:36 No.107974155▶

>>107974101
It works.

Anonymous
01/26/26(Mon)16:18:25 No.107974163

Anonymous 01/26/26(Mon)16:18:25 No.107974163▶

>>107974105
>terra
stopped reading there

anyway, for quants its:
>john 'garm (KLD FREE)
>bartomeme
>mrmerdacher
>daniel's
thats all

Anonymous
01/26/26(Mon)16:19:19 No.107974171

Anonymous 01/26/26(Mon)16:19:19 No.107974171▶

>>107974163
In ascending order of quality.

Anonymous
01/26/26(Mon)16:20:34 No.107974184

Anonymous 01/26/26(Mon)16:20:34 No.107974184▶

>>107974163
>john 'garm (KLD FREE)
is that the skrillex hair goblin?

Anonymous
01/26/26(Mon)16:21:23 No.107974189

Anonymous 01/26/26(Mon)16:21:23 No.107974189▶

>>107974025
yes but it's not a huge difference, doesn't matter that much unless you really like the model and are trying to obsessively minmax your quality

Anonymous
01/26/26(Mon)16:21:35 No.107974191

Anonymous 01/26/26(Mon)16:21:35 No.107974191▶

>>107974184
yes but please dont bully john, that's reserved for johannes

Anonymous
01/26/26(Mon)16:27:23 No.107974239

Anonymous 01/26/26(Mon)16:27:23 No.107974239▶

>>107974025
One most visible difference is that some uploaded quants are quite literally broken. Doesn't happen that often but for example Gemma 3n E4B had couple of these and I still can't be sure about the third I'm using.

Anonymous
01/26/26(Mon)16:31:53 No.107974289

Anonymous 01/26/26(Mon)16:31:53 No.107974289▶

>>107974025
Quants tend to be default recipes, but some quanters do tweak some shit while reusing the same names as the default recipes for their releases.
Just compare any unsloth quants to a bartowski one. You'll see that a lot of unsloth quants tend to be slightly smaller. Then there's the likes of "nan's are not an issue" mradermacher.
For example
unsloth/GLM-4.7-Flash-GGUF
>GLM-4.7-Flash-Q4_K_M.gguf 18.3 GB
>bartowski/zai-org_GLM-4.7-Flash-GGUF
>zai-org_GLM-4.7-Flash-Q4_K_M.gguf 18.5 GB
>mradermacher/GLM-4.7-Flash-i1-GGUF
>GLM-4.7-Flash.i1-Q4_K_M.gguf 18.1 GB
>mradermacher/GLM-4.7-Flash-GGUF
>GLM-4.7-Flash.Q4_K_M.gguf 18.1 GB
Out of all of those, I'd say bartowski's is the more reliable.

Anonymous
01/26/26(Mon)16:37:43 No.107974344

Anonymous 01/26/26(Mon)16:37:43 No.107974344▶

>>107974163
https://www.youtube.com/watch?v=sHHsOfIwfBY
If even they can make it then you truly know merit means nothing.

Anonymous
01/26/26(Mon)17:08:34 No.107974595

Anonymous 01/26/26(Mon)17:08:34 No.107974595▶

File: Screenshot 2026-01-26 at 18-08-00 SillyTavern.png (5.9 KB)

5.9 KB PNG

blyat

Anonymous
01/26/26(Mon)17:09:34 No.107974605

Anonymous 01/26/26(Mon)17:09:34 No.107974605▶

>>107974595
just regex this to made me excited

Anonymous
01/26/26(Mon)17:19:42 No.107974691

Anonymous 01/26/26(Mon)17:19:42 No.107974691▶

wow i wasn't echo-tts pilled, why doesn't this get more attention? it's better than qwen3-tts (minus the multilingual) or vibevoice.
Does anyone have a source for clear high quality voice samples to use?

Anonymous
01/26/26(Mon)17:20:26 No.107974695

Anonymous 01/26/26(Mon)17:20:26 No.107974695▶

File: file.png (155 KB)

155 KB PNG

>Build ikllama after half a year of not using it
>had ~3.5T/s on deepseek last time I used it
>download john's 3.2
>finally get it running
>0.7T/s

Anonymous
01/26/26(Mon)17:25:02 No.107974728

Anonymous 01/26/26(Mon)17:25:02 No.107974728▶

I don't care how good a TTS is if It can't gen in near realtime.

Anonymous
01/26/26(Mon)17:29:17 No.107974755

Anonymous 01/26/26(Mon)17:29:17 No.107974755▶

TheDrummer_Rocinante-X-12B-v1 gave me cancer.

Anonymous
01/26/26(Mon)17:29:25 No.107974756

Anonymous 01/26/26(Mon)17:29:25 No.107974756▶

>>107973374
Vibe coding is based AF

Anonymous
01/26/26(Mon)17:30:49 No.107974768

Anonymous 01/26/26(Mon)17:30:49 No.107974768▶

>>107974691
Get samples from gacha wikis. Merge 2 minutes of lines that capture different prosody with 1 second padding, and you'll have amazing output. https://genshin-impact.fandom.com/wiki/Hu_Tao/Voice-Overs

>>107974728
Echo generates 30 seconds of audio in 4 seconds, and time to first byte can be as low as 250 ms depending on parameters you choose.

llama.cpp CUDA dev
01/26/26(Mon)17:31:28 No.107974773

llama.cpp CUDA dev 01/26/26(Mon)17:31:28 No.107974773▶

>>107974755
I'm very sorry to hear that. I hope you recover.

Anonymous
01/26/26(Mon)17:32:34 No.107974787

Anonymous 01/26/26(Mon)17:32:34 No.107974787▶

>>107974768
Not that guy but can you clone JP voices and make them speak english? I'm not wasting time with EN VAs.

Anonymous
01/26/26(Mon)17:33:57 No.107974802

Anonymous 01/26/26(Mon)17:33:57 No.107974802▶

>>107974755
I had cancer once. I told it fuck off and then I didn't have cancer anymore.
Yes, I actually AM that giga BASED

Anonymous
01/26/26(Mon)17:34:49 No.107974808

Anonymous 01/26/26(Mon)17:34:49 No.107974808▶

>>107974768
we can start a repo of voice sample and pin them in the OP

Anonymous
01/26/26(Mon)17:36:45 No.107974830

Anonymous 01/26/26(Mon)17:36:45 No.107974830▶

>>107974787
Nope. It produces gibberish.

Anonymous
01/26/26(Mon)17:40:26 No.107974865

Anonymous 01/26/26(Mon)17:40:26 No.107974865▶

>>107974808
Can't wait to upload mine
>You like my cock in your ass you dirty little slut
>Ahhhh take it, take it all you filthy whore
>Oohhhhhhh ahhhhhh you like it when my balls slap your clit
>I bet your husband doesn't fuck you like this, does you nasty slut
>Lemme cum on your face
>AAAAAIIIEEEEEEEEEEEEEE

Anonymous
01/26/26(Mon)17:40:33 No.107974867

Anonymous 01/26/26(Mon)17:40:33 No.107974867▶

>>107974768
>Echo generates 30 seconds of audio in 4 seconds
nta. This is on GPU, right?

Anonymous
01/26/26(Mon)17:43:00 No.107974885

Anonymous 01/26/26(Mon)17:43:00 No.107974885▶

>>107974867
yeah, on 5070ti

Anonymous
01/26/26(Mon)17:43:32 No.107974887

Anonymous 01/26/26(Mon)17:43:32 No.107974887▶

>>107974691
>>107974768
>Outputs still sound like robotic tts garbage
Why do you keep shilling this shit

Anonymous
01/26/26(Mon)17:43:53 No.107974892

Anonymous 01/26/26(Mon)17:43:53 No.107974892▶

I want to ask the most basic question I never saw anyone ask. Is open source TTS good enough to jerk off to at this point? If it is then why are there no threads no guides nothing?

Anonymous
01/26/26(Mon)17:44:53 No.107974903

Anonymous 01/26/26(Mon)17:44:53 No.107974903▶

>>107974885
Figures. Thanks.

Anonymous
01/26/26(Mon)17:46:46 No.107974915

Anonymous 01/26/26(Mon)17:46:46 No.107974915▶

>>107974887
i'm shilling it because it has the best prosody and cloning. post your robotic examples. you probably fed it bad or very short samples anyway. echo supports up to 2 minutes of reference audio.

Anonymous
01/26/26(Mon)17:47:13 No.107974919

Anonymous 01/26/26(Mon)17:47:13 No.107974919▶

>>107974768
>Echo generates 30 seconds of audio in 4 seconds,
How much vram? my budget is 6Gb

Anonymous
01/26/26(Mon)17:48:14 No.107974925

Anonymous 01/26/26(Mon)17:48:14 No.107974925▶

>>107974892
>Is open source TTS good enough to jerk off to at this point
I've jerked it to chatterbox giving me JOI

Anonymous
01/26/26(Mon)17:49:41 No.107974934

Anonymous 01/26/26(Mon)17:49:41 No.107974934▶

>>107974925
How do you control pacing?

Anonymous
01/26/26(Mon)17:50:18 No.107974940

Anonymous 01/26/26(Mon)17:50:18 No.107974940▶

>>107974934
with his hand i suppose?

Anonymous
01/26/26(Mon)17:53:11 No.107974964

Anonymous 01/26/26(Mon)17:53:11 No.107974964▶

>>107974919
To get it down to 6GB, you'll need to vibecode quantization. It can take as much as 12GB when genning 30 seconds at once. Though, the authors say you can get it down to 8GB by reducing generation length. I still have never seen it under 8GB in my tests. Only <9GB.

Anonymous
01/26/26(Mon)17:57:55 No.107974998

Anonymous 01/26/26(Mon)17:57:55 No.107974998▶

File: file.png (694 B)

694 B PNG

>>107973102
His version doesn't seem to be broken but it's also 20 times slower than llama.cpp so...

Anonymous
01/26/26(Mon)18:22:24 No.107975181

Anonymous 01/26/26(Mon)18:22:24 No.107975181▶

>>107974915
The examples linked in the official repo and every vocaroo posted here sound like fucking shit, you must be extremely autistic to think any of this sounds human
https://jordandarefsky.com/blog/2025/echo/

It doesn't have the best of anything, vibevoice 7b is leagues ahead of this, the only thing its worse at than any other TTS or voice cloning model is speed.

Anonymous
01/26/26(Mon)18:44:07 No.107975337

Anonymous 01/26/26(Mon)18:44:07 No.107975337▶

>>107975181
Anyone that says echo, pocket, or qwen are better than vv have only ever used the 1.5b.

Anonymous
01/26/26(Mon)18:48:56 No.107975376

Anonymous 01/26/26(Mon)18:48:56 No.107975376▶

>>107969100
echo-tts is amazing. generates audio relatively fast. the only download is that it takes up like 13GB of VRAM to use the optimized model

Anonymous
01/26/26(Mon)18:49:45 No.107975384

Anonymous 01/26/26(Mon)18:49:45 No.107975384▶

>>107975337
I prefer echo to VV 7b for voice cloning, but imo those two are far and away above qwen/pocket/etc.

Anonymous
01/26/26(Mon)18:50:19 No.107975389

Anonymous 01/26/26(Mon)18:50:19 No.107975389▶

>>107975337
I used 1.5b and 7b. Tried both default settings and suggested here by anon 4 steps, high cfg. It was shit regardless of settings.

Anonymous
01/26/26(Mon)18:52:56 No.107975407

Anonymous 01/26/26(Mon)18:52:56 No.107975407▶

>>107975376 (me)
should mention that my favorite is still chatterbox turbo with the paralinguistic tags. you can even shift the tone with shit like [advertisement] or [sarcastic]
>>107975337
i've used 7b vibevoice when it first came out but it would always generate some musical chime at the beginning of the audio. does it still do that?

Anonymous
01/26/26(Mon)18:56:53 No.107975441

Anonymous 01/26/26(Mon)18:56:53 No.107975441▶

qwen3tts voice_design is the best one so far for emotions as far as i've tested

I was thinking that we could get the outputs then put them in RVC we could get control AND cloning

Anonymous
01/26/26(Mon)19:00:56 No.107975479

Anonymous 01/26/26(Mon)19:00:56 No.107975479▶

>>107975441
Try this for voice changing: https://github.com/ysharma3501/LinaCodec I haven't tested it. But you can be our guinea pig.

Anonymous
01/26/26(Mon)19:09:42 No.107975570

Anonymous 01/26/26(Mon)19:09:42 No.107975570▶

>>107975479
nta. That's a codec.

Anonymous
01/26/26(Mon)19:12:04 No.107975595

Anonymous 01/26/26(Mon)19:12:04 No.107975595▶

>>107975441
IndexTTS2 is better than qwentts for emotions just check it here https://huggingface.co/spaces/IndexTeam/IndexTTS-2-Demo

Anonymous
01/26/26(Mon)19:13:28 No.107975607

Anonymous 01/26/26(Mon)19:13:28 No.107975607▶

>>107975570
It supports voice conversion. Check out usage examples.

Anonymous
01/26/26(Mon)19:14:38 No.107975615

Anonymous 01/26/26(Mon)19:14:38 No.107975615▶

>not a single good example posted of this shit tts model, just loads of "it's so beautiful amazing and incredible waow"
Shill

Anonymous
01/26/26(Mon)19:17:24 No.107975639

Anonymous 01/26/26(Mon)19:17:24 No.107975639▶

>>107975479
>>107975570 (me)
Nevermind that. It can.
>>107975607
Yeah. I was just checking the code. You can extract the content and global features on encoding and them mix it up with the target on decoding.

Anonymous
01/26/26(Mon)19:26:19 No.107975711

Anonymous 01/26/26(Mon)19:26:19 No.107975711▶

>>107975615
https://vocaroo.com/1ufLtEYUdJWv

Anonymous
01/26/26(Mon)19:29:37 No.107975727

Anonymous 01/26/26(Mon)19:29:37 No.107975727▶

>>107975595
>sliders instead of words
Is this better?
would it be harder to integrate with silly?

Anonymous
01/26/26(Mon)19:30:44 No.107975736

Anonymous 01/26/26(Mon)19:30:44 No.107975736▶

does anyone have any idea if a xianxia lorebook exists? the chinks should obviously have something, but I'm a gweilo who needs it in english or at least in chinese so I can have it machine translated back to english

Anonymous
01/26/26(Mon)19:39:29 No.107975807

Anonymous 01/26/26(Mon)19:39:29 No.107975807▶

>>107971408
yeah this is shit compared to echo
https://voca.ro/1M5IG6YiY5hP

Anonymous
01/26/26(Mon)19:54:39 No.107975911

Anonymous 01/26/26(Mon)19:54:39 No.107975911▶

>>107971408
lower temperature, the default temperature of qwen3 tts is too high and hurts speaker similarity

Anonymous
01/26/26(Mon)20:05:39 No.107975985

Anonymous 01/26/26(Mon)20:05:39 No.107975985▶

is there an extension for sillytavern that can automatically summarize a conversation and turn it into a memory for a lore book?

Anonymous
01/26/26(Mon)20:08:10 No.107976005

Anonymous 01/26/26(Mon)20:08:10 No.107976005▶

>>107975985
yes it's called write out the summary on your own you lazy faggot

Anonymous
01/26/26(Mon)20:11:20 No.107976034

Anonymous 01/26/26(Mon)20:11:20 No.107976034▶

>>107975985
>is there an extension for sillytavern that can automatically summarize a conversation
Yes. There's a "Summarize" in the menu hidden under the button with three blocks.
>and turn it into a memory for a lore book?
No, but there is a setting that automatically adds the summary to the prompt.

Anonymous
01/26/26(Mon)20:23:11 No.107976118

Anonymous 01/26/26(Mon)20:23:11 No.107976118▶

the lazy guide says to download nemo 12b instruct gguf but there's a million versions, which one should I get?

Anonymous
01/26/26(Mon)20:24:26 No.107976130

Anonymous 01/26/26(Mon)20:24:26 No.107976130▶

>>107976118
the good one

Anonymous
01/26/26(Mon)20:27:32 No.107976161

Anonymous 01/26/26(Mon)20:27:32 No.107976161▶

>>107976118
Look at the file size. It should be less than your vram. Get it from bartowski

Anonymous
01/26/26(Mon)20:28:12 No.107976167

Anonymous 01/26/26(Mon)20:28:12 No.107976167▶

>>107976130
which is?

Anonymous
01/26/26(Mon)20:30:22 No.107976190

Anonymous 01/26/26(Mon)20:30:22 No.107976190▶

>>107976167
https://huggingface.co/llama-anon/petra-13b-instruct-gguf/blob/main/petra_q8.gguf

Anonymous
01/26/26(Mon)20:30:30 No.107976193

Anonymous 01/26/26(Mon)20:30:30 No.107976193▶

>>107976161
thanks

Anonymous
01/26/26(Mon)20:31:31 No.107976207

Anonymous 01/26/26(Mon)20:31:31 No.107976207▶

>>107976190
>5 downloads
sus

Anonymous
01/26/26(Mon)20:35:37 No.107976239

Anonymous 01/26/26(Mon)20:35:37 No.107976239▶

File: file.png (69.2 KB)

69.2 KB PNG

>>107968112
idk what this means for us, but something about transformers v5

Anonymous
01/26/26(Mon)20:37:54 No.107976254

Anonymous 01/26/26(Mon)20:37:54 No.107976254▶

>>107976239
hf transformers is the most dogshit unstable library in existence

breaking changes out the ass with every single version

Anonymous
01/26/26(Mon)20:38:26 No.107976255

Anonymous 01/26/26(Mon)20:38:26 No.107976255▶

>>107976254
that's how you know it's web scale

Anonymous
01/26/26(Mon)20:52:15 No.107976379

Anonymous 01/26/26(Mon)20:52:15 No.107976379▶

File: clips.png (352.8 KB)

352.8 KB PNG

>>107970865
They're claiming the exact opposite, actually, for engram layers at least. (Though, well, there's still such a thing as too many, forming a U-shaped curve). They say that relieving the traditional layers from the responsibility to manage trivial n-gram associations makes it smarter as a whole.

Anonymous
01/26/26(Mon)20:59:13 No.107976430

Anonymous 01/26/26(Mon)20:59:13 No.107976430▶

>>107970033
Their provisional scaling law says that only around 20% of the model should be engram, so it won't be a massive shift in either direction. That being said, they did mention that the frequency of access follows a zipf distribution, so I'd guess that you could indeed move much of that 20% into very slow storage.

Anonymous
01/26/26(Mon)21:02:08 No.107976458

Anonymous 01/26/26(Mon)21:02:08 No.107976458▶

Downloading dipsy-r1 32b right now, what am i in for?

Anonymous
01/26/26(Mon)21:03:09 No.107976466

Anonymous 01/26/26(Mon)21:03:09 No.107976466▶

Is engram like a fundamental architecture change or could it be retroactively applied to old dense models?

Anonymous
01/26/26(Mon)21:03:34 No.107976467

Anonymous 01/26/26(Mon)21:03:34 No.107976467▶

>>107976458
retardation

Anonymous
01/26/26(Mon)21:08:30 No.107976509

Anonymous 01/26/26(Mon)21:08:30 No.107976509▶

>>107976466
The former. It adds another block to standard transformer and the model should learn to use this block, encode information into it.

Anonymous
01/26/26(Mon)21:09:27 No.107976516

Anonymous 01/26/26(Mon)21:09:27 No.107976516▶

File: spilc.png (173.5 KB)

173.5 KB PNG

>>107976466
Depends what you mean by fundamental. The training process is tailored for them. Not theoretically impossible to staple one on and fill it in with post-training I suppose, but they're for sure not plug-and-play. You'd lose most of the benefit doing this anyway, because the existing model has already learned the info the engram layers are intended to encode.

Anonymous
01/26/26(Mon)21:14:54 No.107976576

Anonymous 01/26/26(Mon)21:14:54 No.107976576▶

>>107976509
>>107976516
Hm. I hope we can eventually see some 12-24b creative writing models made using engram. The lack of attention towards small models and optimization in general is quite bothersome.

Anonymous
01/26/26(Mon)21:21:45 No.107976647

Anonymous 01/26/26(Mon)21:21:45 No.107976647▶

>>107976576
>The lack of attention towards small models and optimization in general is quite bothersome.
Who knows, maybe we will see some pressure to develop those capabilities now that supply is constrained.

Anonymous
01/26/26(Mon)21:24:08 No.107976668

Anonymous 01/26/26(Mon)21:24:08 No.107976668▶

>>107976576
Engram 27b seems to score better on benchmarks while using less compute than a pure MoE so I expect small model enjoyers to eat good. I'm very bullish on engram and I predict most if not all future models will have conditional memory.

Anonymous
01/26/26(Mon)21:29:08 No.107976698

Anonymous 01/26/26(Mon)21:29:08 No.107976698▶

>>107975711
https://vocaroo.com/11he8mudOgyN

Anonymous
01/26/26(Mon)21:29:50 No.107976704

Anonymous 01/26/26(Mon)21:29:50 No.107976704▶

>>107976430 (me)
>>107976576
>>107976668
Now that I think on it, I was being too rash in saying that 20% offloaded is ideal. While loading more into engram might interfere with the benefit of such infrequent access, my first screenie does mention that reducing the MOE layers to only 40% of the parameter budget maintains good performance. If a fatter 60% engram part could still reasonably be kept in slow ram or nvme, you could get a model with the vram usage of a 24b that acts like a 60b. It's like when people thought the chinchilla scaling laws were the end-all even though being technically inefficient with training compute makes for cheaper inference. Ofc, since we don't actually have the models yet, this could all be bs.

Anonymous
01/26/26(Mon)21:56:26 No.107976865

Anonymous 01/26/26(Mon)21:56:26 No.107976865▶

>>107976698
>>107975711
FUK U BLOODY BASTARD, ECHO SUPERIOR FOR INDIAN SUPERPOWER 2030. FUK U BLOODY. FUK U BLOODY. REDEEM ECHO.
https://voca.ro/18bmt6pssaoK

Anonymous
01/26/26(Mon)22:01:48 No.107976901

Anonymous 01/26/26(Mon)22:01:48 No.107976901▶

>>107976704
I'm assuming offloading to NVMe will incur slower generation speeds than just using system RAM. Time to buy even more RAM I guess?

Anonymous
01/26/26(Mon)22:03:44 No.107976918

Anonymous 01/26/26(Mon)22:03:44 No.107976918▶

Who I this Engram?

Anonymous
01/26/26(Mon)22:04:27 No.107976924

Anonymous 01/26/26(Mon)22:04:27 No.107976924▶

File: migu.jpg (1.9 MB)

1.9 MB JPG

>>107976865
kekked

I mean, echo is good for how small it is, but it does get on my nerves when anons are saying this stuff is SOTA, its not, it all has that jarring tiktok TTS robotic quality that ruins it. VV with all its jankiness and need to reroll gens just has way better output when it works for ~20gb VRAM

https://vocaroo.com/18fM4D7nlWaJ
https://vocaroo.com/1n9XIPJt6DmY
https://vocaroo.com/1fil4oj9qLN8

Anonymous
01/26/26(Mon)22:10:21 No.107976967

Anonymous 01/26/26(Mon)22:10:21 No.107976967▶

>>107976704
This shit is too confusing.
>Offload GPU layers to RAM
>Offload Experts to RAM
>Offload Engram to RAM
I finna offload my BRAIN to ram next.

Anonymous
01/26/26(Mon)22:12:58 No.107976991

Anonymous 01/26/26(Mon)22:12:58 No.107976991▶

>>107976924
oh hi mark
https://vocaroo.com/1bES71L5Z5Rm

Anonymous
01/26/26(Mon)22:17:52 No.107977031

Anonymous 01/26/26(Mon)22:17:52 No.107977031▶

>>107975595
>IndexTTS2 is better than qwentts for emotions just check it here https://huggingface.co/spaces/IndexTeam/IndexTTS-2-Demo
lmao fuck i've been building effectively this for about 3 months but with porn vectors, didn't know it already existed

Anonymous
01/26/26(Mon)23:17:55 No.107977452

Anonymous 01/26/26(Mon)23:17:55 No.107977452▶

>thread suddenly dies for an hour

Anonymous
01/26/26(Mon)23:20:35 No.107977470

Anonymous 01/26/26(Mon)23:20:35 No.107977470▶

>>107976467
Retirement

Anonymous
01/26/26(Mon)23:43:25 No.107977636

Anonymous 01/26/26(Mon)23:43:25 No.107977636▶

File: Untitled.png (13.4 KB)

13.4 KB PNG

>>107977622
>>107977622
>>107977622

Subject
Name
Comment
File	Supported: JPG, PNG, GIF, WebP, WebM, MP4, MP3 (max 4MB)
CAPTCHA

Reply to Thread #107968112