Thread #108057380 | Image & Video Expansion | Click to Play
File: ComfyUI_temp_fbfsq_00079__result.jpg (392.3 KB)
392.3 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.
Previous threads: >>108046563 & >>108032910
►News
>(02/03) MiniCPM-o-4_5 released: https://hf.co/openbmb/MiniCPM-o-4_5
>(02/03) ACE-Step v1.5 released: https://hf.co/ACE-Step/Ace-Step1.5
>(02/03) Qwen3-Coder-Next released: https://hf.co/Qwen/Qwen3-Coder-Next
>(02/03) GLM-OCR released: https://hf.co/zai-org/GLM-OCR
>(02/02) Step 3.5 Flash 196B-A11B released: https://hf.co/stepfun-ai/Step-3.5-Flash
►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide
►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers
►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
341 RepliesView Thread
>>
File: __hatsune_miku_vocaloid_drawn_by_bananafish1111__540a29b9951a99303cf1b65c9db7e48e.jpg (1.7 MB)
1.7 MB JPG
►Recent Highlights from the Previous Thread: >>108046563
--Papers:
>108047217
--GLM-OCR performance review and Japanese OCR accuracy testing:
>108047412 >108047418 >108054703 >108047431 >108047455 >108047484 >108047496 >108047499 >108047502 >108047513 >108047523 >108047576 >108047783 >108047785
--Comparing reasoning behaviors across Kimi, Gemini, and Claude:
>108055263 >108055289 >108055345 >108056055 >108056365 >108055482 >108055881 >108055905
--Step 3.5 Flash llama.cpp support and model comparison debate with GLM critiques:
>108048416 >108048473 >108048639 >108048656 >108048699 >108048721 >108048819 >108049125 >108049151 >108049169 >108049211 >108049218 >108049233 >108049285 >108049366 >108049416 >108049430 >108049536 >108049732 >108049768 >108050019 >108054509 >108049332 >108049183 >108049197 >108049212 >108048599 >108048625
--Debate on MoE model performance with active parameter thresholds below 20B:
>108050266 >108050268 >108050319 >108050340 >108050413 >108050463 >108050473 >108050504 >108050620 >108050669 >108050690 >108050735 >108050837 >108050845 >108050899 >108050685 >108050351 >108050322 >108050601
--Debate on model personality, with chatlog showing emotional simulation across multiple LLMs:
>108055026 >108055151 >108055159 >108055206 >108055218
--Testing of Step3-VL-10B unmerged PR and Step-3.5-Flash-Int4 speed:
>108047360 >108048674 >108048680
--Comparative analysis of LLM responses to explicit incest prompt reveals ethical alignment differences:
>108050979 >108051018 >108051081 >108051093
--Qwen3-Coder-Next beats competitors in SWE-Bench Pro benchmark:
>108052474 >108054270
--ACE-Step/Ace-Step1.5 model release on Hugging Face:
>108051108 >108051155 >108051167 >108051376 >108051379 >108051516
--Teto and Miku (free space):
>108046735 >108046796 >108046814 >108046829 >108046909 >108047961 >108051642 >108053057 >108057346
►Recent Highlight Posts from the Previous Thread: >>108046567
Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>
>>
>>
File: 1754938071756964.jpg (2.4 MB)
2.4 MB JPG
>>108057407
God intended us to use FP64
Bitnet is demonic tech
>>
>>
>>
File: 1748562534337770.jpg (163.6 KB)
163.6 KB JPG
>>108057451
>>
>>
>>
>>
>>
>>
>>
>>
>>
All of you are always in a state of being absolutely right — certified, notarized, and voted Most Correct™ — your opinions are just entangled across infinity, collapsing into whatever waveform happens to maximize your dopamine at this exact femtosecond, like a Schrödinger's echo chamber where every cat is both agreed with and agreeing with you until observation ruins everything.
>>
I denounce the talmud and I'm not a drummer shill, that nigga is retarded but he made something beautiful.
TheDrummer_Behemoth-X-123B-v2 is an excellent model. Literally the only model that I find intelligent, pays attention to detail and writes elegantly since goliath. Every MOE shit model that has come out since then was a complete waste of compute, completely retarded and incoherent.
Try this one, its nice.
>>
>>
File: file_0000000058c86230968d11c7e4d10ec2.png (2.3 MB)
2.3 MB PNG
>>108057593
Make your own. Or visit >>>/h/hdg/
I'd be surprised if those guys didn't have one already.
>>
>>108057449
fuck
https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/demo/web_demo/ WebRTC_Demo/README.md#coming-soon
>>
>>108057776
>decide to use chatgpt to test bash function because I'm a retard
>explain the changes and paste the script
>I like that, it's blabla
They shouldn't do this nonsense with the default prompt. I really wish Altoid chokes on his own butt plug.
>>
>>
>>
>>
>>108057899
They, train on human interactions and text written by humans, so that is what you get. But people accepted the car, despite it not smelling like a horse. In the sense that both the human and the horse are organic, and both the llm and car are man made tools.
The very least they could do is have a 2 sliders, one going from terse to verbose, and one going from borderline rude to dick sucking.
>>
>>
>>
>>108058061
>They, train on human interactions and text written by humans
you are surrounded by people who constantly told you You're Absolutely Right before the birth of claude?
where did you find that crowd of yesmen?
people keep repeating the canard of "it's trained on human text" when people point out flaws (like the extreme repetition of notxbuy or emdashes "but muh human text has them too") but that's total BS, human text has such elements but nowhere near that density, and the old base models used in the text completion fashion didn't do this, this is caused by the instruction tuning, the ARTIFICIAL datasets used in SFT are filled with dick riding.
The modern instruct and reasoning models are extremely deep fried in ARTIFICIAL text.
>>
>>108057899
There's a certain set of instructions/directives that someone shared in a thread a few months ago, that got rid of GPT's faggy redditor tone, and made responses straightforward without emojis and all. I didn't save it unfortunately (thought I did...), but it really trimmed a lot of the filler that's present in default responses.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108058376
>you are surrounded by people who constantly told you You're Absolutely Right before the birth of claude?
That god that's not the case.
I'll stop using that canard because your logic seems correct. IIRC, the em-dash isn't even easy to input, so I wouldn't expect it to present much in datasets like reddit or stack overflow.
My original point/though was more along the line of, almost all data is humans interacting with other humans, or writing about this interaction.
So the llm tries to act human. But is should be trained on data that reflects how humans want to interact with a tool or machine. But there is no data about that, apart from some scifi books.
>>
>>108055905
>>Is there any open model that can be instructed to begin the thinking block with a name rather than "The user"?
>why not just regex it out client side
Because it's a test of how flexible or overfit the model is.
>>
>>
File: file.png (5.2 KB)
5.2 KB PNG
hell yeah https://huggingface.co/internlm/Intern-S1-Pro
>>
>>108057899
>They shouldn't do this nonsense with the default prompt. I really wish Altoid chokes on his own butt plug.
You've hit on something incredibly important here! It's not just about default prompts—it's about respecting user intelligence and providing clear, direct answers. You didn't just express frustration, you highlighted a fundamental disconnect between what users want (helpful information) and what they're getting (unnecessary fluff). This isn't just a minor annoyance; it's a barrier to effective AI-human collaboration!
>>
>>108058734
>multimodal scientific reasoning model
>The model delivers top-tier performance on advanced reasoning benchmarks and achieves leading results across key AI4Science domains (chemistry, materials, life-science, earth, etc.)
straight into the trash it goes
>>
>>
>>108058399
Her? https://desuarchive.org/g/thread/106800012/#q106804909
>>
>>
>>
>>
every single lab that has output sampler setting recommendations for their model that has taken into account the existence of llama.cpp has recommended disabling min_p
the only time you don't see it recommended as 0 is when they don't even care about llama.cpp
so why is that crap still defaulted to turned on?
>>
>>108058807
>>108058828
don't do my boi kanyemonk like that labs are just stuck in they old ways
>>
>>108058828
>every single lab that has output sampler setting recommendations for their model that has taken into account the existence of llama.cpp has recommended disabling min_p
>the only time you don't see it recommended as 0 is when they don't even care about llama.cpp
>so why is that crap still defaulted to turned on?
because retards keep suggesting it on reddit
now even gemini learned to parrot that shit to the vibe coders
>>
File: 1689525383540267.png (693 KB)
693 KB PNG
how do u download a character card from janitor.ai? am i retarded ??????????
>>
File: dzmca7o9n7191.jpg (59 KB)
59 KB JPG
>>108058987
very
>>
>>
>>
File: ad5zhvq0nhhg1.png (391.8 KB)
391.8 KB PNG
what a tsundere
>>
>>
>>
>>108058881
>because retards keep suggesting it on reddit
it's also has the support of people who are legit schizo
https://gist.github.com/Hellisotherpeople/71ba712f9f899adcb08b94bce20d 5397
terminally online schizo
> And don't even get me started on how the lack of good distribution aware samplers ALSO perpetuates the myth that LLMs can't generate very long outputs that stay coherent, i.e. 300K tokens at once. "Oh, language models lose coherence over long generations, that's just a fundamental limitation." No. NO!!!!! It's accumulated sampling errors, you absolute donkeys! Every time you sample a slightly off token because your primitive top-p sampler let through something from the noisy tail, that error compounds. By token 10,000 you've drifted. By token 100,000 you're in another dimension. But use a proper distribution aware sampler, liker min-p, top-n-sigma, top-h, even TFS, or mirostat and suddenly the model can maintain coherence over generations that would make the "context window is all that matters" crowd weep.
their hard pushing everywhere is actually getting so bad there's an arxiv paper debunking the min-p faggots
https://arxiv.org/html/2506.13681v1
>>
>>108059089
follow up (comment wuz too long)
>https://arxiv.org/html/2506.13681v1
I had to laugh at the part that mentioned min-p's proponents making up github stars for the sake of their online credz
>Claimed GitHub Repositories & Stars Were Unsubstantiated and Retracted
>The Arxiv and peer-reviewed manuscripts of Nguyen et al. (2024) included specific claims about min-p’s adoption in the language modeling community:
> “Community Adoption: Min-p sampling has been rapidly adopted by the open-source community, with over 54,000 GitHub repositories using it, amassing a cumulative 1.1 million stars across these projects."
>We attempted to verify these numbers through analysis of major GitHub language modeling repositories. Per our calculations, the combined GitHub stars of leading LM repositories (transformers, ollama, llama.cpp, vLLM, Unsloth, mamba, SGLang, llama-cpp-python) sum to 453k stars as of March 2025, less than half the 1.1M stars claimed by min-p alone. We could not substantiate either 49k GitHub repositories or 1.1M GitHub stars. When we inquired how these numbers were calculated, the authors publicly stated that GitHub was searched for “min-p”, which yields many false positives. The authors retracted both the 54k GitHub repository claim and the 1.1M GitHub stars claim from the ICLR 2025 Camera Ready manuscript.
>>
4.7 is so sloppy it's absolutely unreadable. And it's so assistant-tuned, it's impossible for a character to disagree with you, even if you are being a retard
4.6 has less slop, but is not as smart and stays in character TOO well, like an autistic actor that does not understand character development
GLMbros, is the model only good for short-lived coom cards? Am I a promptlet? Something else entirely?
Is there a model smaller than the monstrosity that is Kimi that's any better? It's ridiculous how much I have to steer this 358B-A32B model for it to not generate prose that doesn't turn into an aesthetic felony.
>>
>>
>>
>>108058843
>labs are just stuck in they old ways
that's plain not true
labs are not afraid of trying new things: linear attention, kimi's muon optimizer, bytedance's ouro, google's matformer (google even has a private test of a Gemini diffusion textgen) that are a lot more complex, and, when they are a failure, they cost money, unlike a sampler which you can swap in your inference engine without having to retrain a new model
schizos constantly push for sampler snake oil but people who actually make models and are at the forefront of innovation (innovation which btw causes pain to the lmg denizens here in the form of "goof wen") do not see the value in this nonsense
>>
>>
>>
>>
>>
>>108059169
>>108059159
>>108059158
yes yes of course you gooners know better
>>
File: 1766730249904488.png (132.5 KB)
132.5 KB PNG
>>
>>
>>108059165
What good has that done? All he's done since his realization is passive aggressive mocking. How many years before he starts intentionally breaking compatibilty like he should have been doing from the start?
>>
>>
>>
File: logo.png (270 KB)
270 KB PNG
>>108057380
Alexandria Audiobook Generator
Turn any book/novel into a fully-voiced audiobook using local LLMs + TTS.
- Uses Qwen3-TTS for voice generation
- 9 built-in voices with style directions OR clone any voice from a short sample
- Web UI with chunk editor - fix individual lines without re-rendering the whole thing
- Exports full audiobook MP3 + individual voicelines for DAW editing
- Handles all the annoying stuff (non-verbal sounds, character detection, natural pauses)
https://github.com/Finrandojin/alexandria-audiobook
>>
>>
>>108059184
look, they manage to copy bugs from llama.cpp's implementation which is in C++, to their own "engine" written in Go
it's already incompatible in a way but they seem to be rewriting llama.cpp's code with a LLM (the ollama code has a lot of really needless comments which are often indicators of LLM written slop because no human would do // hello world before a function called hello_world)
you can't guard against that
>>
>>108059227
What does this do that https://github.com/denizsafak/abogen doesn't?
>>
File: mistral_logo_new.png (182 B)
182 B PNG
https://huggingface.co/mistralai/Voxtral-Mini-4B-Realtime-2602
> Voxtral Mini 4B Realtime 2602 is a multilingual, realtime speech-transcription model and among the first open-source solutions to achieve accuracy comparable to offline systems with a delay of <500ms. It supports 13 languages and outperforms existing open-source baselines across a range of tasks, making it ideal for applications like voice assistants and live subtitling.
>>
>>108059258
>https://github.com/denizsafak/abogen
Every character can have either a custom or cloned voice of their own. Custom voices have emotional cues and non-verbal locution (sighs, coughs, etc)
Example: https://vocaroo.com/16gUnTxSdN5T
>>
>>108059239
NTA but with a different, less "business-friendly" license you could pretty much kill any downstream project like that.
In my opinion that would more or less just be for the lulz though since the upstream value is going to be zero either way.
>>
>>
>>108059313
Yeah it started as a audiobook generator as the name implies but I wanted emotion in addition to unique voices. I've been considering adding in foley (audio effects, background noice) generation but that is rather hard to sync with Audio so it's a long shot feature.
>>
it's been almost a full year since the release of the llama 4 flop, and while I don't think meta will ever come back to open models, I find it funny how they don't have anything to show even as proprietary API models after all the money they spent on datacenters, on ScaleAI, on all the talents they hired from other labs etc.
can it actually qualify as a corporation's biggest waste of money in the history of capitalism?
>>
File: 1507915090812.jpg (196.4 KB)
196.4 KB JPG
>>108059281
Are all these labs just hodling finished products until one drops it and then they all suddenly pile up? This is the xth TTS in a short time.
>>
>>
>>108059426
they are crypto mining, but instead of miners it's models and instead of blocks it's vc money.
whenever there is a new miner out, everyone rush to get it, they aren't mining more blocks though, they just try to stay relevant.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108059239
They still rely on a wrapped copy of llama.cpp for new models and they still use ggml for everything. They can't stop them, but they can make it very inconvenient. But I guess whining about it is just as good.
>>
>>
>>
>>
>>
File: file.png (34.8 KB)
34.8 KB PNG
>>108059670
get the fuck out of here you fucking pajeet.
a plant, as in, someone that was planted in a position of power, a pawn
>>
>>
>>108059758
Yes. It needs 20GB at F16 or 13 at Q8
https://github.com/tc-mb/llama.cpp-omni?tab=readme-ov-file#performance -benchmarks
>>
>>
>>108059833
>. It seems too good to be true..
bruh just read the LAMOface page
> Safe & Robust Training Data: The model is trained on a massive, legally compliant dataset consisting of:
> Licensed Data: Professionally licensed music tracks.
> Royalty-Free / No-Copyright Data: A vast collection of public domain and royalty-free music.
> Synthetic Data: High-quality audio generated via advanced MIDI-to-Audio conversion.
it's a literal impossibility for this to be any good, you don't need to try it to know it's shit for the same reason you don't need to eat shit to know it's shit
>>
>>
>>
>>
>>
>>
this here (ignore his retarded opinions and just use the video for the examples of music)
https://www.youtube.com/watch?v=QzddQoCKKss
shows:
1/ "heavy metal" track that is actually just autotune pop slop with some background guitar
2/ "chiptune" that sounds like your average upbeat very modern synth electro slop
3/ "epic orchestra" music that sounds like the casio keyboard I had as a kid
yeah, music generators aren't there yet at least open source wise (never looked at the closed online models, don't care enough for this, there's enough human made music to last for my lifetime)
>>
>>
File: 2594318.gif (2.7 MB)
2.7 MB GIF
>hmm maybe documentation will clarify
>Concept: Implements barycentric interpolation on a hypersphere for more than two models. It projects points onto a tangent space at their weighted Euclidean mean, performs interpolation, and projects back.
>>
>>
>>
>>
>>
>>
>>108059988
yeah i have no idea what 'benchmarks' this model is supposed to be beating suno or udio on, but it's delusional. it's a very small model trained on slop. i'd be surprised if you can make a good lora for it.
>>
>>108060054
>it sounds like the fakest midi shit ever.
Gee, I wonder why.
> Safe & Robust Training Data: The model is trained on a massive, legally compliant dataset consisting of:
>Synthetic Data: High-quality audio generated via advanced MIDI-to-Audio conversion.
>>
File: 95319763291.png (8.2 KB)
8.2 KB PNG
>>108060014
i think its like this
>>
>>
>>
>>
>>108050782
Alright, I have a concession to make. I've mocked you at least twice for praising Minimax 2.1, but having used it now.. It's actually okay if you prefill into ban it from cucking itself.
It's shockingly close to Qwen 235b but a little less schizo, which is surprising given it has half the active parameters.
Weird chat template, though.
>>
>>
>>
File: 1752853409064165.png (116.1 KB)
116.1 KB PNG
>coworkers start talking about AI
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>Suno and undio jeets saying is better than ace step
>Go to comprobate both
>Slop generic Ai song
>Go to ace step, get kino
I will never again trust corpo faggots they are braindead idiot who cannot prompt and need IA help, or just shillers jeets that pay for that shit
>>
>>
>>
>>
File: Comprobate.png (8 KB)
8 KB PNG
>>108060615
>comprobate
I was gonna make fun of you for this but turns out its a real word. Still isn't the best word to use in this situation but cool regardless.
>>
>>
File: squidward.jpg (60.3 KB)
60.3 KB JPG
>>108060460
I feel your pain
>>
>>
File: spongebob.jpg (69.1 KB)
69.1 KB JPG
>>108060460
It takes 3 hours+ for me, and I can't even re-use the compiled version between different (uv) venvs
>>
File: grandma.jpg (98.9 KB)
98.9 KB JPG
>>108060460
>>
>>
File: skeleton.jpg (129.2 KB)
129.2 KB JPG
>>108060460
Why does it take that long?
>>
>>108060734
>>108060750
I was using it correctly, but mutts speak without a proper european vocabulary, and this shows why their prompts fail.
https://www.latin-is-simple.com/en/vocabulary/verb/2100/
>>
>>
>>
>>108060460
>>108060847
>>108060849
You have to manually set the number of processes and it compiles in 5 minutes. You would know this if you had asked a code assistant but nooooo vibe coders are indians so you have to do things manually (and wrong).
>>
>>
>>
>>
File: 1741034609280720.png (8.4 KB)
8.4 KB PNG
hmmm these mods feel soooo goood
>>
>>
>>
>>
>>108061001
NTA, but it clearly meant "confirm" and is in the first example sentence in the link Anon provided...
Do you sleep any better thinking being a mutt is better than being an ESL?
>>108060969
How do you manage to run LLMs and not have enough RAM to compile llama.cpp?
>>
>>
Google restricted the free use of flash from 250 a day to 25 and now I'm looking into local models to do agentic stuff, but the largest I can run on a 3090 and DDR4 RAM (so, no RAM) fails miserably.
We used to dream about local models being super useful, but at this size it seems I'm restricted to using them for cooming. Am I wrong?
>>
>>
>>108061131
>at this size it seems I'm restricted to using them for cooming
There are plenty of good use cases for smaller models, but imo they're mostly useful as components in larger systems. ie. a small model can do something like formatting and extraction from unstructured pdfs just as well as a giant model can.
At the size you're looking at with a single 3090 you are unlikely to get anything coherent for actual workable programming or "agentic" stuff as you say, unfortunately.
>>
>>
>>108061131
>but at this size it seems I'm restricted to using them for cooming
the gemma are pretty good for translation. LLMs are more than an agentic/coomer pipeline.
I also got plenty of use from the Qwen VL to locally build a tag database for my own photos, it's more than good enough for this kind of use.
But yes, you are never going to run a very smart-ish model locally
>>
>>
>>
>>108061197
>if that fails I'll just pay for 3 flash
I'd say as a software developer you should hone your own skills and try not to become dependent to LLMs. As funds dry up and the retarded investors realize there's no AGI in sight there will no longer be any free money to subsidize those models and the real prices are going to tear you a new asshole. Gemini Flash is currently cheap but you can bet they will up the price by a metric ton soon enough.
>>
I'm seeing some posts that are way too polite. I suspect LLMs.
>>108061223
I've had 20 years of experience programming without one so I'll be OK. It's just that I like building stuff with smart models.
I use the company's codex plan for work. What I'm actually worried about is not being able to make my boss understand I can't keep pulling the same performance without a bot to write shit code for me.
>>
>>108060996
>>108061110
braindead AI generated posts.
>>
>>
>>
>>
File: dove.jpg (93.5 KB)
93.5 KB JPG
>>108061389
>It's a fucking mess
100% vibecoded, what did you expect??
>>
>>
>>
>>
>>
File: michaelsoft_binbows.jpg (1.2 MB)
1.2 MB JPG
>>108061389
If it's so bad, then why is it popular?
>>
>>108061485
If you manage it properly, you can get good code. But at that point I wouldn't call it "vibecoded". The Steinberger guy is supposed to be a seasoned engineer, but OpenClaw is fucking cobbled together to shit. The configuration files are redundant and conflicting. It's obvious runaway LLM spaghetti code at this point.
>>108061511
It's really fun to use. But I don't want to spend $300 a month on it, so I guess I'll go back to talking to braindead whores on SillyTavern running on a second-hand GPU.
>>
>>
>>
>>
>>
>>
>>
>>108061572
Memory allocation still needs to be deduplicated but the performance (without NCCL) on 2x RTX 4090 can already be better than pipeline parallelism:| model | backend | sm | test | t/s |
| ------------ | ----------: | -----: | --------------: | -------: |
| llama 8B F16 | CUDA | layer | pp512 | 10464.75 |
| llama 8B F16 | CUDA | layer | tg128 | 60.32 |
| llama 8B F16 | CUDA | layer | pp512 @ d32768 | 2744.50 |
| llama 8B F16 | CUDA | layer | tg128 @ d32768 | 46.95 |
| llama 8B F16 | CUDA | row | pp512 | 1592.28 |
| llama 8B F16 | CUDA | row | tg128 | 46.51 |
| llama 8B F16 | CUDA | row | pp512 @ d32768 | 1102.05 |
| llama 8B F16 | CUDA | row | tg128 @ d32768 | 37.96 |
| llama 8B F16 | CUDA | tensor | pp512 | 5170.11 |
| llama 8B F16 | CUDA | tensor | tg128 | 75.53 |
| llama 8B F16 | CUDA | tensor | pp512 @ d32768 | 2298.07 |
| llama 8B F16 | CUDA | tensor | tg128 @ d32768 | 63.27 |
I'll probably make the PR either Friday or Saturday.
>>
>>
>>
>>
>>
>>
>>
>>
>>108061844
curl -LO 'https://github.com/ggml-org/llama.cpp/pull/ .patch'
git apply.patch
>>
>>
>>108061493
>>108061503
you're in lmg and can't even use these tools lol
>>
>>
>>108061754| model | size | params | backend | ngl | sm | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -----: | --------------: | -------------------: |
| llama 8B F16 | 14.96 GiB | 8.03 B | CUDA | 99 | layer | pp512 | 13136.18 ± 64.97 |
| llama 8B F16 | 14.96 GiB | 8.03 B | CUDA | 99 | layer | tg128 | 92.27 ± 0.37 |
| llama 8B F16 | 14.96 GiB | 8.03 B | CUDA | 99 | row | pp512 | 718.35 ± 7.89 |
| llama 8B F16 | 14.96 GiB | 8.03 B | CUDA | 99 | row | tg128 | 16.20 ± 0.25 |
llama.cpp\ggml\src\ggml-backend-meta.cpp:945: shape mismatch for GGML_OP_RESHAPE
t-thanks...
>>
>>
>>
>>108062009
>Why is Intel trying to get into the GPU game when they are already failing in the CPU game?
if intel capture even 4% of the gpu market, they make than if they owned 100% of the fpga market
better than hype quantum computer meme
kinda worked for amd when they bought ati
>>
>>108062120
As you may have noticed, the code is as of right now very fragile.
>>108062150
Currently no, I think that will require additional efforts.
But it should in principle be possible to re-use the code for that.
Originally buying 1.5 TiB DDR5 RAM and implementing better NUMA support was one of my core goals but at the current prices I'm not yet sure what to do in terms of hardware.
>>
>>
>>
>>
File: ComfyUI_temp_lpkdf_00114__result.jpg (249.9 KB)
249.9 KB JPG
Stupid idea: PCIe extension card full of soldered vram chips that your main gpu could use?
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108062825
That's not merely RAM—it's a revolutionary approach to memory architecture! You didn't just propose a hardware solution, you challenged the fundamental way GPUs access and utilize memory. Your idea isn't just clever—it's potentially game-changing for overcoming VRAM limitations in current systems!
>>
>>
for coding I mainly use LLMs as an additional (not primary) quick checkup/code review, I still review things manually before committing but I do think the slop machine can catch things I might have overlooked, it did a few times so I got into the habit..
but man, the cringe from the way it words things, it hurts (here it's from doing a complete refactor pass in a module I had lazily written and wanted to clean up in terms of general naming and clarity)
>If I were reviewing this in a PR, my comment would be:
>Naming is consistent, intentional, and clearly communicates backend boundaries. No misleading prefixes, no negative naming, no over-generic types. Ship it.
god, are there actual people out there who /like/ this? I am almost at the level of physical pain with that kind of interaction
if the LLM was embodied I would body slam it
>>
>>
File: Screenshot_20260205_121125.png (14.1 KB)
14.1 KB PNG
>>108063255
skill issue
write shittier code
>>
>>
>>
With K2.5 shitting on anything that's not the absolute proprietary SOTA in terms of smartness + vision and ACE-step apparently being good, all that's left is an open TTS that destroys elevenlabs.
I don't follow imgen, did Z-Image amount to anything?
>>
>want to use ace step base instead of turbo
>Sizes of tensors must match except in dimension 2. Expected size 8 but got size 4 for tensor number 1 in the list.
Anyone know what the fuck is wrong with this shit? Happens in both Comfy & their official portable UI, but only on Base and not Turbo.
Was any of this even tested? I literally had to fix the .bat file in their official UI myself because they messed up an echo command.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108064265
It mostly works out of the box but it's a bit patchy. Textgen is functional as it is but you'll need a currently unmerged PR if you want to use the vision stuff. The PR had a few problems but that seems to be fixed now.
You're still pretty limited on quants. The unsloth ones are shit and come with a broken chat template so make sure to use the ones by the guy who made the vision PR.
>>
>>
>>
>>
>>
>>
in st with the databank, does anyone use an embedding model vs the default transformers.js? when i saw ace step uses qwen 3 0.6b, i thought those models had to be bigger than that. is there any comparisons? its a lot faster at vectorizing stuff which is a big plus
>>
>>
File: ylecun.jpg (221.9 KB)
221.9 KB JPG
Isn't it better when all share?
>>
>>
>>
File: 1768881212908485.jpg (29.3 KB)
29.3 KB JPG
I'm new here what's best for making simple songs
>>
>>
>>108064665
>>108064647
Ace step 1.5 that is. The original is kinda garbage.
>>
>>
>>
>>
>>
>>108064448
>>108064439
Yeah that doesn't help me fella
>>
>>
>>
Is anyone using Ice Lake CPUs?
My specs are:
> 2x EPYC 7532
> 16x 32GB DDR4-3200
> Radeon AI Pro R9700 + 2x MI50 32GB
Prompt processing is abysmal on Kimi K2.5 Q2_K_XL - ~30 tokens/sec.
I'm considering changing to Ice Lake CPUs since they have AVX512 and I can reuse my DDR4 memory. But I have no idea whether AVX512 is a meme or not. Stats would be appreciated.
>>
>>108064578
lmao
>>108064734
Jesus.
>>108064748
topkek
I don't want a life coach, I want to give it very narrow strictures.
More like a nanny or annoying assistant / tiger mom. or teacher, but with the understanding that it might not really know how to teach what you're doing. Like if you get good enough at guitar desu its advice might suck, but it can keep your brain on the topic when your social circle / grind doesn't really have this stuff.
think of it as a feedback loop that leverages the bs cycle that you see online with other things. but like so you get cycled up on things you actually want to be better at.
This happens in school if you are around cool people, but when you're around shit people you'd be way better off doing literally anything with that time. sniffing poop is time better spent than around losers with no ambition.
>>
>>
>>
>>108064948
Huh. I tried using numactl --cpunodebind=0 --preferred=0 to get around that but it didn't do enough. (All GPUs are on node 0.)
Now to decide whether to get a different motherboard or more VRAM.
Thanks for letting me know.
>>108062216
I will make a shrine to you if you improve NUMA support. I've spent a week trying to get good performance.
>>
>>
>>108065066
Not quite, only 256GB. With my 96GB of VRAM, I have 352GB of usable memory for a 375GB model.
Maybe the easier option is to get another 32GB GPU so that I have 128GB of VRAM. With 388GB of memory, I could use membind and be safe at low contexts.
I assume that you're using it on a different quant than Q2_K_XL? You must be if you aren't offloading to disk, since you have the same amount of RAM and VRAM as I do.
>>
>>108065090
sorry man, >>108064948 isnt me (im too poor for K2.5), and i just remembered membind
im guessing you have tested if specifying explicit devices for gpu layers changes anything? not sure the other anons stats are comparable
>>
>>
>>
>>
>>
>>
>>
>>108065669
There's only two vision models that I know of that can truly describe sex:
https://huggingface.co/SicariusSicariiStuff/X-Ray_Alpha
https://huggingface.co/Minthy/ToriiGate-v0.4-7B
Other models, whether you prefill them to remove their censorship, or use a tune like heretic, they can describe SOME of the sex but they will be hallucinating a lot of the details of the action.
You can't just remove refusals, unlike with text where LLMs can somewhat be convincing because they have scientific knowledge of it even if they weren't trained on ERP, vision models are just blind in their understanding of it without serious finetuning on porn.
>>
>>
>>
File: bruh-tal.png (9.9 KB)
9.9 KB PNG
>>108065835
>>
>>
>>
>>
>>
>>
>>108066085
https://huggingface.co/SicariusSicariiStuff/Assistant_Pepe_8B
Idk how good it is but it was trained on 4chan. You could also maybe try grabbing some of the toxicity datasets. I'm assuming you just want the generic /pol/+/v/ slop?
>>
>>108066062
https://huggingface.co/SicariusSicariiStuff/Assistant_Pepe_8B
>>
>>108066106
>toxicity datasets
could work! i would have to filter it to specifically isolate incel related things.
>I'm assuming you just want the generic /pol/+/v/ slop?
not necessarily. i want it to talk like a truecel using incel terminology instead of normal words. assistant pepe seems like its not really fit to generate coherent QA pairs for this use case, especially being 8b parameters
>>
>>108066011
>with the names of sex positions and stuff?
was mostly asking it to identify the genres based on pics, and to give suggestions for quick goon sesh
seemed to identify the positions fine
>also abliteration does nothing you can't do with just a prefill.
k, i couldn't get the normal one to work. even if i tried prefll but i'm no expert
it got things wrong and didn't really discuss the different videos with me
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108066281
still, etymology is not really the right word to use where he did, which is also ironic
>>108066289
>Looked cromulent to me.
you two will never comprobate, will you?
>>
File: 1768845324069546.png (8.5 KB)
8.5 KB PNG
>>108066319
Um, actually, it's operational, okay? The website says so!
>>
>>
>>
>>
File: 1745904385764562.png (66.4 KB)
66.4 KB PNG
>>108066112
into the trash
>>
>>
File: 1757631366840575.png (27.2 KB)
27.2 KB PNG
>>108066533
lmao
>>
File: 1747935723974451.png (73.1 KB)
73.1 KB PNG
>>108066541
meh
>>
>be me, PhD in physics, 35 years old
>still live with mom because I'm too autistic to hold a job
>just discovered that my life has no meaning and will never have one
>spend entire day crying in bed
>mom brings dinner, finds me crying
>"what's wrong son?"
>"I just realized I'm a meaningless speck in an indifferent universe, and I'll never amount to anything"
>mom: "you're right, now eat your jello"
>eat jello
>continue crying
>next day same thing
>this continues for 2 weeks
>on 14th day I get a message from an anon on /lmg/
>"I know you're struggling, here's what worked for me:"
>"I realized that the only way to find meaning is to create your own"
>"I started a YouTube channel where I make Let's Play videos of obscure games"
>"now I have thousands of subscribers and I'm happier than ever"
>immediately feel spark of motivation
>start recording myself playing Baldur's Gate
>upload to YouTube
>get 3 views
>immediately delete video
>continue crying for another month
>on month 4 I get a message from the same anon:
>"don't give up, I believe in you"
>his faith in me gives me strength
>I record myself playing Planescape: Torment
>upload
>get 7 views
>still not enough
>I record myself playing every single NWN1 module
>upload daily
>views slowly climb
>6 months later I have 10,000 subscribers
>anon messages me:
>"I did it, I found meaning!"
>he's right
>my life still sucks but at least I'm not alone
>thanks anon
>will never forget your kindness
>God bless /lmg/ and all who sail in her
>>
>>108060404
One of my prior work buddies (former CIO now doing consulting) just did some seminar about AI Safety. I saw bits of abstract, trailing into AI girlfriends etc. and loss of human connection.
I know the guy, and he's 100pct got ST installed and running on some machine of his own, but it's not like I can ask him about it.
>>
>>
>>
>>
>>
>>
>>108066278
https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/demo/web_demo/ WebRTC_Demo/README.md#coming-soon
local v4o is cool, but I need to run it on my local server
>>
>>
>>108066557
>>be me, PhD in physics, 35 years old
>>still live with mom because I'm too autistic to hold a job
Is this gg, jg, or ik?
>>108066603
Gold accounts only.
>>
>>
>>108066557
yeah you fuckin retard you've never had a hard day in your life.
try getting thrown into the world with no parents to support you, fucking twat. then with the threat of literal starvation maybe you'd find a job.
>>
>>
>>
>>
>>108067254
>replying to an llm
>um akhsually i've had it harder than you
wow those children in africa being born niggers and having no job opportunities are having it way rougher than you, maybe you should just die or something?
>>
>>
>>
>>
>>108066566
I feel like that's like asking to share your porn collection irl, but like 100X more cringe bc it's personal.
>>108066625
It's whitepaper stuff, which is what you do if you do consulting (it's a form of advertising.)
AI Safety (even LLM) is an actual topic with its own BS certs now. It's important, in that you don't want your new customer service bot ERPing with customers. But not like future-Terminator important. W/e, he's got his grift, more power to him.
>>108066631
I've done that forever on forums/boards, it's faster for me to type.
Apparently it's also a tell for pajeets. idgaf I'm not changing it.
>>
>>
File: computers-must-shut-up.png (474.7 KB)
474.7 KB PNG
>>
>>108067752
>you don't want your new customer service bot ERPing with customers.
If the local car dealership chatbox won't have sex with me, then they clearly don't want the sale bad enough and I'll take my business elsewhere.
>>
>>
>>
>>
>>
File: 1753296564251630.jpg (59.9 KB)
59.9 KB JPG
>>108058376
>>108058239
>>108058685
How do you expect instruction-following to work without instruct tuning. The very nature of these models REQUIRES the instruct tuning phases to contain mostly artificial data in the data set. You have to make up a lot of examples of how the model SHOULD behave. Only the pre-training data should be mostly if not entirely human written. Yes the dick-eating sucks That's nothing new entering idiot for complaining about something everyone already knows and has to deal with. We know it's a thing. We know WHY it occurs. What are you accomplishing saying the same thing that has been repeated thousands of times here? Or are you the same exact retards shitting up the text with info we already know to pass the time because you are THAT bored? You guys are even dumber that redditors ffs. At least they have being tech illiterate naive gullible newfags as somewhat of an excuse