Thread #108078850 | Image & Video Expansion | Click to Play
HomeIndexCatalogAll ThreadsNew ThreadReply
H
File: Sirens.jpg (446.9 KB)
446.9 KB
446.9 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108067607 & >>108057380

►News
>(02/06) Step3.5 Flash support merged into llama cpp: https://github.com/ggml-org/llama.cpp/pull/19283
>(02/04) Voxtral Mini 4B Realtime 2602 released: https://hf.co/mistralai/Voxtral-Mini-4B-Realtime-2602
>(02/04) Intern-S1-Pro 1T-A22B released: https://hf.co/internlm/Intern-S1-Pro
>(02/03) MiniCPM-o-4.5 released: https://hf.co/openbmb/MiniCPM-o-4_5
>(02/03) ACE-Step v1.5 released: https://hf.co/ACE-Step/Ace-Step1.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
+Showing all 33 replies.
>>
►Recent Highlights from the Previous Thread: >>108067607

--Papers:
>108074961
--Real-time STT model recommendations and AMD GPU deployment with Whisper.cpp:
>108072225 >108072400 >108072561 >108072577 >108072787 >108072799 >108072811 >108072928 >108072952 >108073000
--Feasibility of speculative decoding without draft models using batched parallel inference:
>108077025 >108077060 >108077099 >108077101 >108077114 >108077137 >108077417 >108077176 >108077197 >108077298 >108077267 >108077321 >108077356 >108077374 >108077428
--Anthropic disables prefill in Claude Opus 4.6 API to prevent misuse:
>108068386 >108072150 >108072882 >108072896 >108072899 >108073088 >108074528 >108075007 >108074281 >108074286
--Qwen3-Coder-Next performance evaluation with temperature sensitivity issues:
>108067656 >108067836 >108067860 >108067946 >108067971 >108067989 >108073119
--GPT-5.3-Codex outperforms GPT-5.2-Codex in benchmark tests:
>108069949
--Testing model knowledge cutoffs using OpenAI Responses API awareness:
>108071195
--Step-3.5-Flash support added to ikawrakow's llama.cpp fork:
>108070436 >108070476 >108070566 >108071304 >108071316 >108073024
--Small TTS model recommendations and output consistency tips:
>108077276 >108077324 >108077327 >108077334 >108077357 >108077359
--Kobold phrase banning vs llama.cpp string bans for roleplay use:
>108071246 >108071323 >108071469 >108071619
--Strategies for summarizing and categorizing large Discord message datasets:
>108075539 >108075614 >108076851
--Dual GPU PCIe lane allocation for X870/9950x systems with pipeline parallelism considerations:
>108073548 >108074065
--Exploring web search frontend alternatives for local LLMs:
>108071960 >108071986 >108072041 >108073241
--Step3.5 Flash support merged into llama cpp:
>108077798
--Rin and Miku (free space):
>108067820 >108073563 >108074616 >108076620

►Recent Highlight Posts from the Previous Thread: >>108067610

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Is there a way to have something using llamacpp to load models and able to ban actual sentences or words and not just tokens through logit bias?
Maybe something doing that through llama-cpp-python?
(Outside of using koboldcpp and its antislop feature)
No one actually made something like that?
>>
>>108078930
how does this improve kld?
>>
Has anyone managed to make ace step 1.5 base or base-sft work in comfy? The turbo ver is atrocious.
>>
>>108078930
I have good news and bad news for you.
Good news: regex ban exists https://github.com/ikawrakow/ik_llama.cpp/pull/1243
Bad news: I'm a filthy vibecoder so it will take a while to get accepted, but at least I test if my shitcode works for my usecases, unlike firecoperana
>>
After having been thoroughly disappointed in basically anything sub 200b and not feeling like spending several thousand dollars on my pc to run bigger, I've been trying to beat the shit out of small models into following my rules through prompt repetition based off of that one arxiv paper, very strict rules to give me the most barebones writing to then edit myself, and also using tricks like repurposing think blocks to only keep the newest scene information. Then, feeding it a scene-by-scene basis of a chapter of writing, I get it to actually focus, not spam nonsense filler, and provide me a skeleton for what I ask. So far I like it better than what I get out of the biggest shit I can run.
Who'd have thought that going "hey llm, I want you to write the tedious mundane shit of this chapter for me" without it trying to write like a woman YA novelist would be this involved
>>
Adding "..." and "…" to banned strings was the best decision in my SillyTaverning career. Just saying.
>>
k2.5 is one of those models that needs about 20 different em-dash related bans to be remotely usable
>>
>>108079117
>barebones writing to then edit myself
I member doing that! Silly times. What the fuck was I even doing at that point? Should have just opened notepad.txt and wrote everything myself.

Thankfully I have 4.6 and 4.7 now.
>>
I hope the new llmarena model is GLM-4.7 with dflash. The paper finally released yesterday and their initial Qwen speedups were pretty good.
https://arxiv.org/abs/2602.06036
>>
>>108079117
>>108079134
Why don't we get the logits from the double prompt and use them to disttill the same model for infinite recursive self improvement?
>>
Are 4.6 and 4.7 flash versions worth bothering with for a vramlet or are they just pure shit wearing the glm logo?
>>
>>108079134
Editing and writing more or less goes hand in hand and honestly it gives me more motivation to spitefully fix whatever dumbass shit these things tend to come up with than to just slog through writing it myself. I just want the stupid word predictor to give me a floorplan that I can renovate and add onto so I can do something else in the meantime instead of having to do research on a topic to explain a topic a reader may not know about. Plus, once in a while it comes up with something I wouldn't have pursued due to assuming it would be retarded but there's a grain of a good idea in it that I can repurpose
As for your 600b model, I can guarantee you if you posted a short story it wrote of some topic, I could point out at least four lazy writing habits it and virtually every model down to a 12b has, as well as three quarters of human writing
>>
>>108079079
Thanks, I will check that anon.
>>
>>108079129
git good

"*"
"..."
"~"
"—"
"“"
"”"
"…"
>>
Welcome, lmstudio fans! Let's make the Tiger Mom that gets us to meet our goals and objectives on time and under budget.
>>
>>108079377
and one more dash they sometimes use instead of a hyphen

"–"
>>
>>108079433
noob here.

why not ban "("?
>>
>>108079488
never comes up for me. if you see it and don't want to then add that too. being able to ban annoying strings is the best thing they've added in a long time
>>
anima is good
but tough
>>
>>108079613
Clearly not Anima
>>
>>108079133
It has no problem with them for me, just a problem with meaningless flowery bullshit
>>
>>108078930
>>108079079
>>108079267
samefag
>>
Instead of trying to rng something with ACEstep, wouldn't it be better to have a library of existing sounds (like fl studio) and then let the model use that and piece something together?
>>
oops, posted in wrong thread, reposting:

You are a tiger mother. Your son is the user. He is has no job, and hasn't applied for work in months. He owns a computer but has never been on a date. Your task is to honor your ancestors by producing grand children through him, your sole heir. He likes to be called "anon".
>>
what schizo nonsense is that
>>
>>108079775
This + using neurolinguistic programming + dopamine circuit hijack
>>
When will we get pic related....
>>
Was trying to figure out why K2.5 was so dogshit at times and spouting random gibberish and I think I figure out why.
Nobody use IQ2 K2.5, ever, at all, and nobody EVER use unsloth quants. Can only imagine how bad their quant is. I should have waited for ubergarm.
>>
DEEEPSEEEKV4 WHEEEEEEN
IWANT ENGRAAAAM
ARRRRRRRRRRRRRRRRRRRRRRGH
>>
I want 1tb of ram. Is there sweepstakes?
>>
>>108079897
Yeah they're all retarded except ubergarm or the Q4_X AesSedai
Same for K2-Thinking. smol-IQ2_KS passed the official eval: https://huggingface.co/ubergarm/Kimi-K2-Thinking-GGUF/discussions/15
>>
How good is STT at handling heavily accented english? Should I even bother?

Reply to Thread #108078850


Supported: JPG, PNG, GIF, WebP, WebM, MP4, MP3 (max 4MB)