/g/ - Thread 108078850 | defchan Proxy

Anonymous
/lmg/ - Local Models General 02/06/26(Fri)22:24:19 No.108078850

/lmg/ - Local Models General Anonymous 02/06/26(Fri)22:24:19 No.108078850 [Reply]▶

File: Sirens.jpg (446.9 KB)

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108067607 & >>108057380

►News
>(02/06) Step3.5 Flash support merged into llama cpp: https://github.com/ggml-org/llama.cpp/pull/19283
>(02/04) Voxtral Mini 4B Realtime 2602 released: https://hf.co/mistralai/Voxtral-Mini-4B-Realtime-2602
>(02/04) Intern-S1-Pro 1T-A22B released: https://hf.co/internlm/Intern-S1-Pro
>(02/03) MiniCPM-o-4.5 released: https://hf.co/openbmb/MiniCPM-o-4_5
>(02/03) ACE-Step v1.5 released: https://hf.co/ACE-Step/Ace-Step1.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

33 RepliesView Thread

Showing all 33 replies.

Anonymous
02/06/26(Fri)22:24:41 No.108078855

Anonymous 02/06/26(Fri)22:24:41 No.108078855▶

File: __megurine_luka_and_takoluka_vocaloid_drawn_by_aotori__66cb1757ca9acc208420a7454a6a96e4.jpg (111.4 KB)

111.4 KB JPG

►Recent Highlights from the Previous Thread: >>108067607

--Papers:
>108074961
--Real-time STT model recommendations and AMD GPU deployment with Whisper.cpp:
>108072225 >108072400 >108072561 >108072577 >108072787 >108072799 >108072811 >108072928 >108072952 >108073000
--Feasibility of speculative decoding without draft models using batched parallel inference:
>108077025 >108077060 >108077099 >108077101 >108077114 >108077137 >108077417 >108077176 >108077197 >108077298 >108077267 >108077321 >108077356 >108077374 >108077428
--Anthropic disables prefill in Claude Opus 4.6 API to prevent misuse:
>108068386 >108072150 >108072882 >108072896 >108072899 >108073088 >108074528 >108075007 >108074281 >108074286
--Qwen3-Coder-Next performance evaluation with temperature sensitivity issues:
>108067656 >108067836 >108067860 >108067946 >108067971 >108067989 >108073119
--GPT-5.3-Codex outperforms GPT-5.2-Codex in benchmark tests:
>108069949
--Testing model knowledge cutoffs using OpenAI Responses API awareness:
>108071195
--Step-3.5-Flash support added to ikawrakow's llama.cpp fork:
>108070436 >108070476 >108070566 >108071304 >108071316 >108073024
--Small TTS model recommendations and output consistency tips:
>108077276 >108077324 >108077327 >108077334 >108077357 >108077359
--Kobold phrase banning vs llama.cpp string bans for roleplay use:
>108071246 >108071323 >108071469 >108071619
--Strategies for summarizing and categorizing large Discord message datasets:
>108075539 >108075614 >108076851
--Dual GPU PCIe lane allocation for X870/9950x systems with pipeline parallelism considerations:
>108073548 >108074065
--Exploring web search frontend alternatives for local LLMs:
>108071960 >108071986 >108072041 >108073241
--Step3.5 Flash support merged into llama cpp:
>108077798
--Rin and Miku (free space):
>108067820 >108073563 >108074616 >108076620

►Recent Highlight Posts from the Previous Thread: >>108067610

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
02/06/26(Fri)22:34:05 No.108078930

Anonymous 02/06/26(Fri)22:34:05 No.108078930▶

File: 1767150464468654.gif (3.9 MB)

3.9 MB GIF

Is there a way to have something using llamacpp to load models and able to ban actual sentences or words and not just tokens through logit bias?
Maybe something doing that through llama-cpp-python?
(Outside of using koboldcpp and its antislop feature)
No one actually made something like that?

Anonymous
02/06/26(Fri)22:35:03 No.108078940

Anonymous 02/06/26(Fri)22:35:03 No.108078940▶

>>108078930
how does this improve kld?

Anonymous
02/06/26(Fri)22:48:05 No.108079013

Anonymous 02/06/26(Fri)22:48:05 No.108079013▶

Has anyone managed to make ace step 1.5 base or base-sft work in comfy? The turbo ver is atrocious.

Anonymous
02/06/26(Fri)22:57:56 No.108079079

Anonymous 02/06/26(Fri)22:57:56 No.108079079▶

>>108078930
I have good news and bad news for you.
Good news: regex ban exists https://github.com/ikawrakow/ik_llama.cpp/pull/1243
Bad news: I'm a filthy vibecoder so it will take a while to get accepted, but at least I test if my shitcode works for my usecases, unlike firecoperana

Anonymous
02/06/26(Fri)23:03:00 No.108079117

Anonymous 02/06/26(Fri)23:03:00 No.108079117▶

After having been thoroughly disappointed in basically anything sub 200b and not feeling like spending several thousand dollars on my pc to run bigger, I've been trying to beat the shit out of small models into following my rules through prompt repetition based off of that one arxiv paper, very strict rules to give me the most barebones writing to then edit myself, and also using tricks like repurposing think blocks to only keep the newest scene information. Then, feeding it a scene-by-scene basis of a chapter of writing, I get it to actually focus, not spam nonsense filler, and provide me a skeleton for what I ask. So far I like it better than what I get out of the biggest shit I can run.
Who'd have thought that going "hey llm, I want you to write the tedious mundane shit of this chapter for me" without it trying to write like a woman YA novelist would be this involved

Anonymous
02/06/26(Fri)23:04:50 No.108079129

Anonymous 02/06/26(Fri)23:04:50 No.108079129▶

Adding "..." and "…" to banned strings was the best decision in my SillyTaverning career. Just saying.

Anonymous
02/06/26(Fri)23:05:47 No.108079133

Anonymous 02/06/26(Fri)23:05:47 No.108079133▶

k2.5 is one of those models that needs about 20 different em-dash related bans to be remotely usable

Anonymous
02/06/26(Fri)23:05:48 No.108079134

Anonymous 02/06/26(Fri)23:05:48 No.108079134▶

>>108079117
>barebones writing to then edit myself
I member doing that! Silly times. What the fuck was I even doing at that point? Should have just opened notepad.txt and wrote everything myself.

Thankfully I have 4.6 and 4.7 now.

Anonymous
02/06/26(Fri)23:06:54 No.108079144

Anonymous 02/06/26(Fri)23:06:54 No.108079144▶

I hope the new llmarena model is GLM-4.7 with dflash. The paper finally released yesterday and their initial Qwen speedups were pretty good.
https://arxiv.org/abs/2602.06036

Anonymous
02/06/26(Fri)23:08:56 No.108079152

Anonymous 02/06/26(Fri)23:08:56 No.108079152▶

>>108079117
>>108079134
Why don't we get the logits from the double prompt and use them to disttill the same model for infinite recursive self improvement?

Anonymous
02/06/26(Fri)23:11:45 No.108079164

Anonymous 02/06/26(Fri)23:11:45 No.108079164▶

File: ComfyUI_temp_vpymy_00559__result.jpg (625 KB)

625 KB JPG

Are 4.6 and 4.7 flash versions worth bothering with for a vramlet or are they just pure shit wearing the glm logo?

Anonymous
02/06/26(Fri)23:15:22 No.108079181

Anonymous 02/06/26(Fri)23:15:22 No.108079181▶

>>108079134
Editing and writing more or less goes hand in hand and honestly it gives me more motivation to spitefully fix whatever dumbass shit these things tend to come up with than to just slog through writing it myself. I just want the stupid word predictor to give me a floorplan that I can renovate and add onto so I can do something else in the meantime instead of having to do research on a topic to explain a topic a reader may not know about. Plus, once in a while it comes up with something I wouldn't have pursued due to assuming it would be retarded but there's a grain of a good idea in it that I can repurpose
As for your 600b model, I can guarantee you if you posted a short story it wrote of some topic, I could point out at least four lazy writing habits it and virtually every model down to a 12b has, as well as three quarters of human writing

Anonymous
02/06/26(Fri)23:35:03 No.108079267

Anonymous 02/06/26(Fri)23:35:03 No.108079267▶

>>108079079
Thanks, I will check that anon.

Anonymous
02/06/26(Fri)23:53:45 No.108079377

Anonymous 02/06/26(Fri)23:53:45 No.108079377▶

>>108079129
git good

"*"
"..."
"~"
"—"
"“"
"”"
"…"

Anonymous
02/06/26(Fri)23:56:42 No.108079396

Anonymous 02/06/26(Fri)23:56:42 No.108079396▶

Welcome, lmstudio fans! Let's make the Tiger Mom that gets us to meet our goals and objectives on time and under budget.

Anonymous
02/07/26(Sat)00:01:37 No.108079433

Anonymous 02/07/26(Sat)00:01:37 No.108079433▶

>>108079377
and one more dash they sometimes use instead of a hyphen

"–"

Anonymous
02/07/26(Sat)00:11:39 No.108079488

Anonymous 02/07/26(Sat)00:11:39 No.108079488▶

>>108079433
noob here.

why not ban "("?

Anonymous
02/07/26(Sat)00:14:37 No.108079509

Anonymous 02/07/26(Sat)00:14:37 No.108079509▶

>>108079488
never comes up for me. if you see it and don't want to then add that too. being able to ban annoying strings is the best thing they've added in a long time

Anonymous
02/07/26(Sat)00:32:19 No.108079613

Anonymous 02/07/26(Sat)00:32:19 No.108079613▶

File: Anima Waiting Room.jpg (262.6 KB)

262.6 KB JPG

anima is good
but tough

Anonymous
02/07/26(Sat)00:38:14 No.108079641

Anonymous 02/07/26(Sat)00:38:14 No.108079641▶

>>108079613
Clearly not Anima

Anonymous
02/07/26(Sat)00:51:33 No.108079724

Anonymous 02/07/26(Sat)00:51:33 No.108079724▶

>>108079133
It has no problem with them for me, just a problem with meaningless flowery bullshit

Anonymous
02/07/26(Sat)00:51:56 No.108079726

Anonymous 02/07/26(Sat)00:51:56 No.108079726▶

>>108078930
>>108079079
>>108079267
samefag

Anonymous
02/07/26(Sat)00:56:23 No.108079761

Anonymous 02/07/26(Sat)00:56:23 No.108079761▶

File: ComfyUI_temp_yorkx_00096__result.jpg (166.9 KB)

166.9 KB JPG

Instead of trying to rng something with ACEstep, wouldn't it be better to have a library of existing sounds (like fl studio) and then let the model use that and piece something together?

Anonymous
02/07/26(Sat)00:57:44 No.108079775

Anonymous 02/07/26(Sat)00:57:44 No.108079775▶

oops, posted in wrong thread, reposting:

You are a tiger mother. Your son is the user. He is has no job, and hasn't applied for work in months. He owns a computer but has never been on a date. Your task is to honor your ancestors by producing grand children through him, your sole heir. He likes to be called "anon".

Anonymous
02/07/26(Sat)01:00:07 No.108079786

Anonymous 02/07/26(Sat)01:00:07 No.108079786▶

what schizo nonsense is that

Anonymous
02/07/26(Sat)01:01:57 No.108079799

Anonymous 02/07/26(Sat)01:01:57 No.108079799▶

>>108079775
This + using neurolinguistic programming + dopamine circuit hijack

Anonymous
02/07/26(Sat)01:12:04 No.108079852

Anonymous 02/07/26(Sat)01:12:04 No.108079852▶

File: 1752150732038746.png (315 KB)

315 KB PNG

When will we get pic related....

Anonymous
02/07/26(Sat)01:19:57 No.108079897

Anonymous 02/07/26(Sat)01:19:57 No.108079897▶

File: perplexity.png (149.8 KB)

149.8 KB PNG

Was trying to figure out why K2.5 was so dogshit at times and spouting random gibberish and I think I figure out why.
Nobody use IQ2 K2.5, ever, at all, and nobody EVER use unsloth quants. Can only imagine how bad their quant is. I should have waited for ubergarm.

Anonymous
02/07/26(Sat)01:20:29 No.108079901

Anonymous 02/07/26(Sat)01:20:29 No.108079901▶

DEEEPSEEEKV4 WHEEEEEEN
IWANT ENGRAAAAM
ARRRRRRRRRRRRRRRRRRRRRRGH

Anonymous
02/07/26(Sat)01:27:20 No.108079923

Anonymous 02/07/26(Sat)01:27:20 No.108079923▶

I want 1tb of ram. Is there sweepstakes?

Anonymous
02/07/26(Sat)01:28:32 No.108079930

Anonymous 02/07/26(Sat)01:28:32 No.108079930▶

>>108079897
Yeah they're all retarded except ubergarm or the Q4_X AesSedai
Same for K2-Thinking. smol-IQ2_KS passed the official eval: https://huggingface.co/ubergarm/Kimi-K2-Thinking-GGUF/discussions/15

Anonymous
02/07/26(Sat)01:42:35 No.108079998

Anonymous 02/07/26(Sat)01:42:35 No.108079998▶

How good is STT at handling heavily accented english? Should I even bother?

Subject
Name
Comment
File	Supported: JPG, PNG, GIF, WebP, WebM, MP4, MP3 (max 4MB)
CAPTCHA

Reply to Thread #108078850