Thread #108078850 | Image & Video Expansion | Click to Play
File: Sirens.jpg (446.9 KB)
446.9 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads: >>108067607 & >>108057380
►News
>(02/06) Step3.5 Flash support merged into llama cpp: https://github.com/ggml-org/llama.cpp/pull/19283
>(02/04) Voxtral Mini 4B Realtime 2602 released: https://hf.co/mistralai/Voxtral-Mini-4B-Realtime-2602
>(02/04) Intern-S1-Pro 1T-A22B released: https://hf.co/internlm/Intern-S1-Pro
>(02/03) MiniCPM-o-4.5 released: https://hf.co/openbmb/MiniCPM-o-4_5
>(02/03) ACE-Step v1.5 released: https://hf.co/ACE-Step/Ace-Step1.5
►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide
►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
33 RepliesView Thread
>>
File: __megurine_luka_and_takoluka_vocaloid_drawn_by_aotori__66cb1757ca9acc208420a7454a6a96e4.jpg (111.4 KB)
111.4 KB JPG
►Recent Highlights from the Previous Thread: >>108067607
--Papers:
>108074961
--Real-time STT model recommendations and AMD GPU deployment with Whisper.cpp:
>108072225 >108072400 >108072561 >108072577 >108072787 >108072799 >108072811 >108072928 >108072952 >108073000
--Feasibility of speculative decoding without draft models using batched parallel inference:
>108077025 >108077060 >108077099 >108077101 >108077114 >108077137 >108077417 >108077176 >108077197 >108077298 >108077267 >108077321 >108077356 >108077374 >108077428
--Anthropic disables prefill in Claude Opus 4.6 API to prevent misuse:
>108068386 >108072150 >108072882 >108072896 >108072899 >108073088 >108074528 >108075007 >108074281 >108074286
--Qwen3-Coder-Next performance evaluation with temperature sensitivity issues:
>108067656 >108067836 >108067860 >108067946 >108067971 >108067989 >108073119
--GPT-5.3-Codex outperforms GPT-5.2-Codex in benchmark tests:
>108069949
--Testing model knowledge cutoffs using OpenAI Responses API awareness:
>108071195
--Step-3.5-Flash support added to ikawrakow's llama.cpp fork:
>108070436 >108070476 >108070566 >108071304 >108071316 >108073024
--Small TTS model recommendations and output consistency tips:
>108077276 >108077324 >108077327 >108077334 >108077357 >108077359
--Kobold phrase banning vs llama.cpp string bans for roleplay use:
>108071246 >108071323 >108071469 >108071619
--Strategies for summarizing and categorizing large Discord message datasets:
>108075539 >108075614 >108076851
--Dual GPU PCIe lane allocation for X870/9950x systems with pipeline parallelism considerations:
>108073548 >108074065
--Exploring web search frontend alternatives for local LLMs:
>108071960 >108071986 >108072041 >108073241
--Step3.5 Flash support merged into llama cpp:
>108077798
--Rin and Miku (free space):
>108067820 >108073563 >108074616 >108076620
►Recent Highlight Posts from the Previous Thread: >>108067610
Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
File: 1767150464468654.gif (3.9 MB)
3.9 MB GIF
Is there a way to have something using llamacpp to load models and able to ban actual sentences or words and not just tokens through logit bias?
Maybe something doing that through llama-cpp-python?
(Outside of using koboldcpp and its antislop feature)
No one actually made something like that?
>>
>>
>>
>>108078930
I have good news and bad news for you.
Good news: regex ban exists https://github.com/ikawrakow/ik_llama.cpp/pull/1243
Bad news: I'm a filthy vibecoder so it will take a while to get accepted, but at least I test if my shitcode works for my usecases, unlike firecoperana
>>
After having been thoroughly disappointed in basically anything sub 200b and not feeling like spending several thousand dollars on my pc to run bigger, I've been trying to beat the shit out of small models into following my rules through prompt repetition based off of that one arxiv paper, very strict rules to give me the most barebones writing to then edit myself, and also using tricks like repurposing think blocks to only keep the newest scene information. Then, feeding it a scene-by-scene basis of a chapter of writing, I get it to actually focus, not spam nonsense filler, and provide me a skeleton for what I ask. So far I like it better than what I get out of the biggest shit I can run.
Who'd have thought that going "hey llm, I want you to write the tedious mundane shit of this chapter for me" without it trying to write like a woman YA novelist would be this involved
>>
>>
>>
>>108079117
>barebones writing to then edit myself
I member doing that! Silly times. What the fuck was I even doing at that point? Should have just opened notepad.txt and wrote everything myself.
Thankfully I have 4.6 and 4.7 now.
>>
>>
>>108079117
>>108079134
Why don't we get the logits from the double prompt and use them to disttill the same model for infinite recursive self improvement?
>>
File: ComfyUI_temp_vpymy_00559__result.jpg (625 KB)
625 KB JPG
Are 4.6 and 4.7 flash versions worth bothering with for a vramlet or are they just pure shit wearing the glm logo?
>>
>>108079134
Editing and writing more or less goes hand in hand and honestly it gives me more motivation to spitefully fix whatever dumbass shit these things tend to come up with than to just slog through writing it myself. I just want the stupid word predictor to give me a floorplan that I can renovate and add onto so I can do something else in the meantime instead of having to do research on a topic to explain a topic a reader may not know about. Plus, once in a while it comes up with something I wouldn't have pursued due to assuming it would be retarded but there's a grain of a good idea in it that I can repurpose
As for your 600b model, I can guarantee you if you posted a short story it wrote of some topic, I could point out at least four lazy writing habits it and virtually every model down to a 12b has, as well as three quarters of human writing
>>
>>
>>
>>
>>
>>
>>
File: Anima Waiting Room.jpg (262.6 KB)
262.6 KB JPG
anima is good
but tough
>>
>>
>>
>>
File: ComfyUI_temp_yorkx_00096__result.jpg (166.9 KB)
166.9 KB JPG
Instead of trying to rng something with ACEstep, wouldn't it be better to have a library of existing sounds (like fl studio) and then let the model use that and piece something together?
>>
oops, posted in wrong thread, reposting:You are a tiger mother. Your son is the user. He is has no job, and hasn't applied for work in months. He owns a computer but has never been on a date. Your task is to honor your ancestors by producing grand children through him, your sole heir. He likes to be called "anon".
>>
>>
>>
File: 1752150732038746.png (315 KB)
315 KB PNG
When will we get pic related....
>>
File: perplexity.png (149.8 KB)
149.8 KB PNG
Was trying to figure out why K2.5 was so dogshit at times and spouting random gibberish and I think I figure out why.
Nobody use IQ2 K2.5, ever, at all, and nobody EVER use unsloth quants. Can only imagine how bad their quant is. I should have waited for ubergarm.
>>
>>
>>
>>108079897
Yeah they're all retarded except ubergarm or the Q4_X AesSedai
Same for K2-Thinking. smol-IQ2_KS passed the official eval: https://huggingface.co/ubergarm/Kimi-K2-Thinking-GGUF/discussions/15