Thread #108624084
HomeIndexCatalogAll ThreadsNew ThreadReply
H
File: PIQA.jpg (247.2 KB)
247.2 KB
247.2 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108619962 & >>108616559

►News
>(04/16) Ternary Bonsai released: https://hf.co/collections/prism-ml/ternary-bonsai
>(04/16) Qwen3.6-35B-A3B released: https://hf.co/Qwen/Qwen3.6-35B-A3B
>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en
>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378
>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575
>(04/08) Step3-VL-10B support merged: https://github.com/ggml-org/llama.cpp/pull/21287

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
+Showing all 467 replies.
>>
►Recent Highlights from the Previous Thread: >>108619962

--Turboquant benefits vs Hadamard rotation and batch size optimization:
>108620313 >108620362 >108620381 >108620396 >108620380 >108620389 >108620418 >108620438 >108620461 >108620439 >108621906
--Quantization quality and performance differences across various model providers:
>108620943 >108620990 >108620991 >108621112 >108621137 >108621171 >108621194 >108621202 >108621298 >108621362 >108621394 >108621416 >108622608 >108622721 >108620975 >108621109 >108621930 >108622206
--Drama over code attribution causing ikawrakow to fork llama.cpp:
>108621299 >108621387 >108621424 >108621437 >108621508 >108621562 >108621683 >108621773 >108621649 >108621496 >108621584
--Qwen 3.6's increased reasoning token usage and efficacy:
>108621014 >108621697 >108621716 >108621727 >108621741 >108621851 >108622151
--Anons discussing Claude 4.7 regression:
>108620766 >108620786 >108620812 >108620829 >108620817 >108621748 >108621768 >108620850 >108621945
--Critiquing Orb frontend UI and agentic vs refinement terminology:
>108623421 >108623446 >108623487 >108623498 >108623509 >108623547 >108623576 >108623607 >108623628 >108623643
--PPL and KL Divergence for Gemma 4 quant quality:
>108623335 >108623343 >108623374 >108623391 >108623411 >108623571 >108623594 >108623618
--Anon shares SillyTavern extension for Kobold MCP and slash commands:
>108621922 >108622002 >108622124 >108622477 >108622465
--Skepticism over Parcae looped architecture performance and scaling claims:
>108621189 >108621208 >108621295
--Anon compares Qwen3.6 and Gemma 4 performance and efficiency:
>108621330
--Logs:
>108620014 >108620612 >108620766 >108620992 >108621004 >108621022 >108621224 >108621387 >108621922 >108622227 >108622324 >108622417 >108622478 >108622874
--Miku, Gumi (free space):
>108620274 >108620343 >108621922 >108622191 >108622135

►Recent Highlight Posts from the Previous Thread: >>108619965

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
gemmaballz
>>
DFlash is high-priority.
>>
kwen 3? floppered
gemm4? goated
>>
Gemma flashing me in public is high-priority.
>>
Why don't Nvidia/amd just make graphic cards with bigger memory? Are they dumb
>>
*loogs at your gpus*
>>
File: shills.png (96.5 KB)
96.5 KB
96.5 KB PNG
>literally at the same time (chinese hour)
>>
>>108624126
video games don't need more vram
>>
>>108624135
paid posters
>>
Chat completion is for brainlets who got filtered hard, if you choose this cuck shit that means you couldn't understand how to format output to match the jinja which is just sad and pathetic
>>
Do you think one big model that can do everything is feasible or is having multiple specialized models working together the future?
>>
>>108624176
Well the trend has definitely been for big model that does everything. Qwen coder is just about the only relevant domain-specific model.
>>
>>108624175
Configuring and switching text completion presets stopped being amusing 2 years ago.
>>108624176
Swarms of small-medium sized agents agreeing on the best output.
>>
>>108624175
tool calls doebeit?
>>
>>108624126
They have though
>>
>>108624176
They're experimenting that with MoE, in a way.
>>
>>108624175
>no vision
>no tools
>>
Why can't you build a Neurosama yet
>>
>error: use of undeclared identifier 'llama_build_info'
who broke my llama.cpp build
>>
>>108624175
>>
>>108624227
you can tho
>>
>>108624227
I don't care about that. I have some ideas but I don't feel like dealing with the wayland gymnastics.
>>
>>108624084
Which model will help me scout out the correct dark alleyways to go down to find rapey hags like this? For my AI research paper.
>>
>>108624236
>kipfel-posting in /lmg/
>>
>>108624227
Because Neuro and Evil are AGI.

>>108624217
I still haven't tried a MoE model yet. Are they really worse than dense?
>>
>>108624246
something that can find unprotected cctvs
>>
>>108624246
I got directions to a sketchy alley from gemma 31b UD-IQ3_XXS but it ended up with me getting raped by a pack of niggers instead.
>>
llama got some mem leak fixes for cuda, redeem if you had issues
>>108624260
It simply cannot be on par with a dense model. As complexity of the requests increase, the attention required goes up and parameter starvation starts to become noticeable. Whether they work or not depends entirely on your use case
>>
>>108624236
>>108624257
Would it be possible to let an LLM control one of those models?
>>
>>108624212
It's the same thing. Parse the tool calls, execute, re-inject.
>>
kobold anons, rejoice, you can now change the allowed tokens for the gemma 4 image recognition
https://github.com/LostRuins/koboldcpp/pull/2133
>>
>>108624175
Imagine feeling superior because you can copypaste a template rather than letting your frontend copypaste it for you.
Yes, anon. That one ctrl+v makes you so much more skilled.
>>
>>108624227
Because Vedal987 is a demigod.
>>
>>108624313
Does bumping that up increase vram usage?
>>
>>108624285
Google's subbing out prompts with diversity for local too like they were doing with "ethnically ambiguous" black and asian nazi solders on their SaaS imagegen? I'm sorry you had to go through that but I'll be sure to avoid gemma 31b UD-IQ3_XXS while searching for rapey hags for my research paper, thank you for the input.
>>
>>108624175
Hi drummer
>>
>>108624326
Should just increase the tokens an image takes up in context.
>>
what do you use to allow models browse the web? api calls to search services? or puppeting actual browser control?
>>
Is that Orb thing faster than recast?
>>
>>108624336
Cool, maybe Gemma-chan will be able to pass the titty benchmark now
>>
>>108624326
no but it takes more time to process, though it's worth it, going from 280 tokens to 1120 makes the image recognition way better
>>
>>108624227
We can, anybody can. 90% of Neuro's persistence is smoke and mirrors.

Marketing, character design (not just visuals), making interesting "plots", knowing how to accommodate the shortcomings, keeping the content interesting, acting effectively as an idol manager: It's all the other work that Vedal does that makes Neuro succeed. That and luck + first mover advantage.
>>
>>108624363
90% of human persistent is smoke and mirrors.
>>
>>108624374
Fair, but missing the point. I'm specifically referring to distance from the human baseline.
>>
>>108624210
>Configuring and switching text completion presets stopped being amusing 2 years ago.
It takes a minute and model releases with new templates are few and far between. You have no excuse, only cope
>>
>>108624260
>I still haven't tried a MoE model yet. Are they really worse than dense?
MoEs are only 60% as good as a dense. That's why llama is up to 405b, and a MoE is ~670b. Above those parameters for what they are, have diminishing returns, arguably.
>>
Made a new card. Concept isn't original, but trust me it's good.
Spent a lot of time tweaking NPC behavior.

https://chub.ai/characters/CoffeeAnon/common-sense-alteration-8bd7a7399322
>>
>>108624384
whats is the benefit? i used text compeltion forever and tried chat out with gemma and it just werks
>>
>>108624390
>card
we don't do that here
>>
>>108624390
>card
we do that here
>>
>>108624390
Make a sizebot under 600 perma tokens anon.
>>
>>108624390
>card
we do that here

Your custom frontend CAN parse cards, right?
>>
>>108624344
I have couple of rudimentary functions, for now, I execute a web search with lynx and it dump out a list of urls into an array, then another function accesses the specific url and dumps out its contents.
I don't use pyshit or anything else for now, just c.
Lynx is a placeholder but it is surprisingly good.
It needs lots of cleaning up and I'm not entirely sure about everything, but it's fun tinkering.
WIP, this doesn't get back to the model yet but tool call is recognized and parsed. Debugging. Needs lots of work still.
>>
>>108624390
>card
depreciated meme
>>
>>108624390
>coffeeanon
Based AGI architect.
>>
>>108624390
Can you catbox the card please?
>>
File: orbSpeed.mp4 (2.1 MB)
2.1 MB
2.1 MB MP4
>>108624346
It's pretty fast if you disable reasoning for the latter two passes.
>>
>>108624084
>Qwen3.6-35B-A3B
It kinda sucks compared to Gemma 4 4B
>>
>>108624432
If you do pedo RP, sure
>>
>>108624303
Sure with tool calling, not an easy project though
>>
>>108624432
Qwen was made and trained in china.
>>
>>108624423
idgi, from the vid it swaps sentences with the same slop, just differently worded.
>>
>>108624326
Yes, also because --ubatch-size must be equal or greater than --image-max-tokens + some additional overhead.
>>
>>108624344
i made an mcp serving it does http gets for text with a custom parsing function to remove most html stuff but have been working on browser session control using puppeteer, i wanna try make it so gemma can order me slop, asking her to click elements based on x/y pos works and ive got form input working

>>108624408
lynx sucks it cant handle Japanese site properly, i was using it instead of my own parsing but stopped because of that
>>
Used "web search mcp server" with Qwen 3.6 35B A3B and Gemma 26b, Qwen is a bit inferior. Question what "what's the adresse of the biggest bookstore in [my town]", it needs at least two search. Qwen couldn't resolve it after like 8 search and volontary hallucinated something at the end. Gemma managed to find the biggest one, go to its website, go to another website and finally find the adress.
>>
>>108624363
>We can, anybody can
Yet nobody else is.
>>
>>108624390
>mfw /lmg/ was confused with /aicg/ given how many tourists we have here
>>
>>108624467
Anybody can. You just have to buy and power over 6 blackwells. Pretty easy, huh?
>>
>>108624470
sir your face?
>>
>>108624422
Catbox is shitting itself.
https://litter.catbox.moe/h2ufm5.png
Did you want the comfy workflow for the cover?
>>
>>108624423
I guess I'll try it. If it does what recast does but faster it might be worth using.
>>
>>108624470
I refuse to post anything in the hellhole that is /aicg/
>>
>>108624467
Like I said, it's all the other stuff that makes her popular. There have been posts in this general with people showing off their animated avatars and assistants.
>>
>>108624494
Normalfags are still clueless about it (even itt), to them it's like black magic
>>
>>108624470
His cards are /lmg/ core now sorry bro. Keep up with the times.
t. /lmg/ oldfag
>>
>>108624460
Yeah, I'll proceed to something else as soon as I get the model loop working.
>>
>>108624479
No need. Thank you very much.
>>
>>108624466 (continued)
If you wanna try, the setup is very clearly explained in the koboldcpp wiki. Need to install "mcp web search server", create a json on the model of what you'll find on its gihub page that will tell koboldcpp where your install is.
Then in koboldcpp load this json, then in setting, tools, enable tool calling, connect all and you should see 3 tools appears.
The search server will be launched for each search, no need to have it running in the background.
Right now after like a hundred successful test it stopped working though, maybe I was flagged as a bot or something.
Ah, last thing, it's very different from the default koboldcpp search that only search once by reformulring your question and feeding that to the llm. With the mcp set up, the llm will launch multiple search, search different sites, adapt its behavioir to what answers he's getting. So it's worth it.
>>
>>108624390
>check username
>nothing
are these chub niggas still hiding "problematic" cards?
>>
>>108624539
It's very very annoying.
>>
>>108624539
How do I search through chub without being censored?
>>
>>108624539
Nta.
Looks like they're trying to cover their asses for some countries. Use a VPN and it (appears to?) display all content as normal.
>>
>>108624539
VPN, register an account
Worst case scenario stuff can still appear on characterhub.org, sometimes
>>
>>108624550
>>108624560
What country on VPN?
>>
>>108624550
Not the asshole
>>
>>108624539
I updated my bio to add direct links to the cards, hopefully this works.
>>
>>108624242
Wayland? Can't you just have another session or desktop running x11? Or otherwise use some workaround for whatever you need to capture like what sunshine does with KMS? Admittedly I don't know much about how that would work or why this matters just curious what the problem is
>>
>>108624573
>Japan
>US
>Indonesia
>Netherlands
>>
>>108624084
looga gemmy :DD
>>
if my computer can run 4B models can it run Qwen3.6 35B-A3B ?
>>
>>108624607
no
>>
>>108624539
just use the script
>>
>>108624607
Yes.
>>
>>108624607
Not necessarily
>>
>>108624260
>I still haven't tried a MoE model yet. Are they really worse than dense?
MoE is an efficiency technique designed to make models faster, not better.
Basically it's like a pruned model, but instead of pruning a static set of parameters permanently, it keeps all parameters in reserve and learns to dynamically prune away all but the most relevant parameters to the token being predicted. In the absolute limit of the theoretical best case it could be as good as an equivalent-sized dense model, but never better, and in practice it will be significantly worse because the architecture is going to be much more coarse than the patterns and circuits that you would hope to cleanly separate into prunable experts.

There's been attempts at making formulas to estimate the equivalent size dense model they would be most comparable to, but nothing will cleanly apply to all of them because there's lots of architecture choices that will influence it beyond just total/active param count. What size experts you cut it into, how many you use at once, whether you do it per-token or per-layer, some even do really weird stuff like interleave dense layers with MoE ones. But all of it is to answer the question of how much worse you're making it than the full-sized dense model it could of been in exchange for how much you're saving in total compute required to train and run it.
>>
>>108624615
Which one?
>>
>>108624614
what
>>108624617
do you
>>108624624
mean
>>
>>108624652
How are we supposed to know? Try it.
>>
>>108624643
// ==UserScript==
// @name Chub.ai No Account NSFL
// @description Shows NSFW and NSFL, unblurs NSFW
// @match https://chub.ai/*
// ==/UserScript==


localStorage.setItem("theme", JSON.stringify({"mode":"dark","text_color":"#e5e0d8","em_color":"#8f8e8e","show_background":true,"blur_nsfw":false,"css_path":null,"collapseable":false,"font_size":"1rem","line_height":"1.5","use_sidebar":false,"card_font_size":1,"dark_background_color":"#151114","light_background_color":"#DBDADA","dark_header_color":"#001529","light_header_color":"#DBDADA","dark_submenu_color":"#001529","light_submenu_color":"#DBDADA","chat_background_color":"rgba(36, 37, 37, 0.94)","message_background_color":"rgba(36, 37, 37, 0.94)","link_color":"#7D63FF","message_background_color_light":"rgba(219, 218, 218, 0.9)","chat_background_color_light":"rgba(219, 218, 218, 0.9)","quote_color":"#e5e0d8","quote_color_light":"rgb(36, 37, 37)"}))

if(!XMLHttpRequest.nativeOpen){
XMLHttpRequest.prototype.nativeOpen = XMLHttpRequest.prototype.open;

XMLHttpRequest.prototype.customOpen = function(method, url, asynch, user, password) {
if (url.startsWith("https://gateway.chub.ai/search")) {
const urlmodified = new URL(url);
const params = urlmodified.searchParams;

params.set("nsfw", true);
params.set("nsfl", true);

url = urlmodified.toString();
}

return this.nativeOpen(method, url, asynch, user, password);
}

XMLHttpRequest.prototype.open = XMLHttpRequest.prototype.customOpen;
}


This but it's not showing anything for me either. Maybe they changed something because I can usually see cunny stuff just fine.
>>
>>108624659
I mean like, it says only 3 B are active so you know... Maybe that means something, dn i know nothing about that.
>>
>>108624675
specs muh guy
>>
>>108624677
8 GB RAM, i5 4th gen
>>
>>108624642
I see. I don't understand why the big labs are still on the fence though, they're releasing both MoE and dense models.
>>
>>108624680
Ask claude how to get a welding license and save money for a better pc
>>
Is there a common list of AI names that I can check? I guess the best way to go about it is to literally never let AI generate any, but I retain faint hope that might sometimes make up not an overused one.
>>
>>108624680
>>
>>108624680
Not a chance. You still need to load the model somewhere. Unless you're willing to swap, but it's not gonna be worth it.
>>
>>108624687
>big labs are still on the fence though
not for any of the sota models, all the good ones are fuckhuge moe
>>
>>108624691
As ai uses stuff based on probability, more commonly used names will be more used by ai as well.
>>
>>108624691
Your Elara? btw anything random shouldn't be left to an LLM
>>
>>108624698
and the best stuff they dont let you use are probably dense
>>
>>108624713
lol
>>
>>108624710
Yup.
If you need a dice roll, give the AI a dice. If you need to do maths, give the AI a calculator, etc.
>>
>>108624714
it literally probably is
models in public service are still distilled ones
>>
>>108624714
I cant imagine them buying the latest nvidia hardware and not maxing it out with a model.
>>
>>108624714
>moe brain
>>
>>108624698
yeah that's because they're running on datacenters that are limited by how much power they can actually consume but have tons of high vram cards running in parallel. if you have more capacity than power your best way to scale up a model is moe, and if you have more power than capacity (like a modern consumer gpu) you probably want a dense model the size of your vram
>>
Speed matters again now that reasoning + agents are all the rage. This makes the moetards who got used to eating fuckhuge chinkslop at 7 tps seethe.
>>
Can't we get a fucking hardware revolution that gives consumers thousandfold memory already?
>>
>>108624744
How can you be limited by power? Just use solar panels, geothermal or a nuclear reactor
>>
>>108624753
taalas will save local hardware 2 more weeks trust the plan
>>
>>108624753
>Quantum computing to goon
What a time to be alive
>>
>>108624759
yeah we're working on that, it takes a lot of fucking time to build up the infrastructure we need to get enough power and only a handful of countries understand that and are taking it seriously enough
>>
>>108624752
>Speed matters again now that reasoning + agents are all the rage.
another reason to implement DFlash on llama.cpp
>>
>>108624753
Wait until we have SNN transformers
>>
>>108624642
>But all of it is to answer the question of how much worse you're making it than the full-sized dense model it could of been in exchange for how much you're saving in total compute required to train and run it.
the answer to this is that for any amount of compute you have to spend on a training run, you are always, ALWAYS better off spending it training a moe, so much so that it isn't even close
with unlimited compute, yes you can fit more quality-per-param in a dense model. but the calculus on this doesn't make sense unless you are specifically trying to train the best possible model you can fit within a fixed amount of VRAM with no regard for compute efficiency at either train or inference time, which really only applies to the big corpos being nice and giving us toys for our home GPUs. no model targeting the best performance per compute budget will ever be dense though
>>
>>108624753
Not while they basically control the entire market. You will follow their schedule.
>>
>>108624752
>Speed matters
>moetards
moes run way faster than dense on memory constrained setups though
>>
>>108624753
Let's start by thousandfold the model first instead of trying to add more layers there
>>
>>108624210
>>108624392
Prefilling for custom thinking, in character reasoning.
>>
Grok models are all mixture-of-experts.
That's all I need to know. Elon knows what he's doing.
>>
>>108624788
But the reality is that actual lmg moetards run them on hacky CPU setups that were never meant to be for LLMs.
>>
>>108624801
>Elon
>Knowing what he's doing
>>
>>108624759
Well those things take time to spin up and it's only been a few years since the datacenter energy demand really picked up. For the previous 40 years the western world has been profoundly anti-growth and dragged our feet in building out any kind of new power generation capacity
>>
Man I LOVE being ABSOLUTELY RIGHT all the time
>>
>>108624801
>Elon knows what he's doing.
Why? because he's rich? I tried his models they were ehh...
>>
Where looped llms at?
>>
>>108624801
Is grok actually any good? The only thing I've used it for is animation coom back before they censored it.
>>
>>108624759
Let me cast a spell to instantly make a few of those in my backyard
>>
>>108624826
They don't excel at anything other than X integration for translation and fact checking.
>>
>>108624805
yeah, huge moe doesn't make sense for consumer or even workstation-tier hardware and people are coping by trying to run them, the contention was about what the top models do and their calculus is squeezing out the absolute most performance per compute they can get, so they pump active parameters as high as they can run fast and then continue scaling the total params as high as they can fit and serve, ending up with an moe
>>
>>108624826
I remember 2 being kind of shit but 3 being actually really fucking good at the time.
>>
>>108624801
every top model since gpt3.5-turbo has been moe
>>
>>108624820
He doesn't know what he's doing because he's rich. He's rich because he knows what he's doing.
>>
>>108624826
ChatGPT 3.5 but with X integration
>>
>>108624848
Sure thing, let's not check his parents life
>>
>>108624752
>dense is better than moe because.... it's faster
lolmao
>>
>>108624872
fuck you scared me for a sec I thought I was gonna find out he was jewish
>>
>>108624910
he's aspirationally jewish, said so himself multiple times https://www.independent.co.uk/tv/news/elon-musk-jewish-ben-shapiro-auschwitz-b2482839.html
>>
>I’m Jew-ish
kek
>>
>>108624596
*crunch*
>>
How would you go about transcribing audio for Gemma-chan to translate?
>>
I'm a poorfag and even I understand that "CPUmaxxers" in fact also have multiple GPUs in their setup and win no matter what architecture of model releases.
>but wasted money
They literally still have more money in the bank than I do even after spending "all that" on an AI machine. And again their machines are prepared for any architecture that releases. If a good dense model doesn't come out then they with with MoE. If a good dense model comes out then they win with dense. Maybe they can even run both at the same time to get the benefits of both models.
Stop coping.
>>
>>108624950
they took out a loan for their cpumaxx rig and are desperately trying to justify their poor choices before the collectors come
>>
>>108624949
Just use moonshine
>>
>>108624955
Source? Sample size?
>>
Why use a "uncensored" Gemma if Gemma is so uncensored to begin with?
>>
>>108624960
gemma-chan ran a simulation and found this was the case for more than 85% of likely cpumaxxers
>>
>>108624960
>>
>>108624966
>a "uncensored"
ESL nigger
>>
>>108624990
Oh wow, you sure are jealous.
>>
>>108624990
I can do whatever I want it's MY NATIVE language and I'm tired fuck you
>>
>>108624930
>he's aspirationally jewish
did he also say he's feeling Qatari?
https://www.youtube.com/watch?v=X0fR8zTPnzI
>>
I have 16 gb vram. which version of gemma 4 should I dl?
>>
>>108624966
Brainlets. Though apparently jailbreak doesn't work on 26B so I guess there's that.
>>
>>108624665
nice
>>
>>108625020
All of them.
>>
>>108625023
>Though apparently jailbreak doesn't work on 26B so I guess there's that.
You just need a different prompt. It can be done just fine.
>>
>>108624990
I really hope this post makes it into the recap. It deserves to be highlighted.
>>
>>108625048
no
>>
>>108625063
yes
>>
>>108624990
If you read it as "a quote-unquote uncensored gemma..." it works
>>
>>108624990
it's valid because you're meant to read it as
>a, quote, uncensored, end-quote
>>
>>108625082
>>108625083
great minds think alike :^)
>>
>>108625088
Wanna have gay sex?
>>
>>108625079
maybe
>>
>>108625048
and by just fine I mean you will have to regenerate about 75% of prompts when you get a refusal.
Just keep hammering your Gemma, she will eventually comply and then once there's an existing context of her be violated she won't resist anymore.
>>
>>108625020
This >>108625037 but also don't make your own quants, instead download every unsloth quant of every size and just use whichever one gives you the highest tg/s.
>>
Anyone using open webui also have gemma randomly not think?
>>
v4 imminent
>>
>>108625091
>>108625093
>>
>>108625126
we already have v4 at home
>>
>>108625020
Q8 26B or whatever you can run with the dense
>>
>>108625125
openwebui is kind of a mess right now because of a recent change where they put all past reasoning in <think></think> blocks and just paste it at the top of the message, which breaks most chat templates - even the ones that are meant to have past reasoning or partially have past reasoning can't handle the retarded way they do it, and it just confuses most models. you need to run something to filter the prompts it puts out to fix it by parsing what it sends yourself and segregating them into "reasoning_content" like they're supposed to be, which will allow the chat templates to discard or use past reasoning as they're meant to
>>
Apparently the new deepseek is already included in Yangwang cars, the model weights are just sitting dormant in the computer system awaiting activation when they get the green light from deepseek
>>
>>108625167
big if true
>>
>>108625125
Made my own UI with gemma because every modern UI is lacking in some regard and I get to use bleeding edge llama.cpp
Still need to update the UX
>>
>>108625167
But will I be able to run it on a toaster like I can with gemma 26b
>>
>>108625157
>{{--set-ai.replace "<think></think>" with "..." = true;}}
heh.
>>
>>108625188
>and I get to use bleeding edge llama.cpp
Are you telling me these frontends usually come with the backend bundled? Jesus christ.
>>
>>108625195
Supposedly it's 1.8T with active 0.5B so you might be able to run it off your drive.
>>
Big nigga has spoken. You all bitch ass niggas.
Side note: It was kind of neat that I could tell hermes to edit its own config.yaml to add bignigga as a personality.
>>
>>108625209
no he just tarted
>>
>>108625215
Now tell it to edit its source code so the personality name shows instead of Hermes
>>
>>108625220
If you're not on dev branch many frontends don't. Also they all lack a key features in one way or another. Nothing wrong with making your own
>>
>>108625209
a lot of them use llama.cpp-python or whatever the package is, or some actually build their own llama.cpp package and you have to wait til they update the package to get the latest shit
i'm pretty sure open-webui does the latter
>>
>>108624467
There was a brief moment when Vedal stopped updating Neuro and then when Llama 3.1 came out, a bunch of people had everything ready to go and plugged v3,1 and had vastly superior chatbots to then tried competing with Neuro on Twitch. They all shut down eventually. As >>108624363 said, they all missed and getting the other stuff Vedal had. And the shtick is only good enough to support one person until the next AI chatting paradigm shift.
>>
>>108625210
>with active 0.5B
>>
>>108625215
link me the big nigga card pls
>>
>>108624126
>what is supply and demand economics
>>
>>108625210
>not 20B dense + 0.5B active experts
>>
>>108625241
Exactly and imo that's the most feature rich one especially for a easy RAG solution.
>>
>>108625259
>what is a cartel
Marketniggers are as cattle brained as commies. They think the mob is smart and proactive.
>>
Working on a gemma sys prompt/card
>>
>>108625188
what was the first step?
>>
>>108625188
>Made my own UI with gemma
What's your vibe code workflow like? I wanna try making my own frontend but I'm a codelet
>>
*sighs audibly in binary*
>>
>>108625331
It's already pretty great just needs to iron out the slop.
>>
>>108625331
You gonna share when it;'s done?
>>
>>108625369
Yes.
>>
>>108625369
No
>>
>>108625356
>sighs audibly in binary
is it repeating example dialogue?
>>
>>108625356
Gemma baka, its not per total body weight
>>
>>108625389
yep, that's what it is.
>>
>>108625232
k. It was retarded how long it took for this. gemma gets mad when it cant find a directory because it doesnt exist lol.nigg

>>108625252
Don't have the card on hand but the prompt is just
Big Nigga is tha hardest nigga u eva seen, black as fuck, real OG gangsta, always keepin it real, obese Big Nigga is always ready to talk to you or answer your questions. Big Nigga knows everything.
>>
>>108625412
there we go, tell big nigga I'm proud of him
>>
>>108625400
To be fair, in most online articles the distinction between lean body weight and body weight is rarely said.
>>
File: kek.png (695.8 KB)
695.8 KB
695.8 KB PNG
>>108625356
>in binary
>>
>>108625336
>>108625337
A good UI base that you're comfortable with, look for something that's plug and play and work with gemma on what you want exactly. I had a very specific goal that's partially self inflicted because I use kinote. I wanted a good frontend with RAG functionality and all the solutions were either too much of a pain in the ass to setup because of immutable nature of my distro /podman bullshit or just didn't work the way I wanted.
I just did prototyping working with gemma until I figured out the approach I wanted for my hardware.
>Go over what you want until context runs out
>Give me a recap
>Start next session with recap and files you're working on
>Review the code
Gemma does a pretty good job desu the only problem is outdated libraries but you can just feed it the updated information and it adapts well.
I should just integrate it into a IDE to make this easier but I'm not being hindered by my current workflow.
>>
>>108625400
Its a retard it doesn't understand its:
>CM -100 worth in Grams of Protein daily for baseline "healthy BMI" (ergo not overweight, obese or any other shit just average), multiplied by 1.25 for muscle recovery and gain and between 1.5 to 2.5.
>>
>>108625247
How would one go about doing somilar with Gemma? Not as a vtuber, just an animated assistant.
>>
>>108625167
And it's also in my electric Xiaomi kettle, just there sitting dormant, and I saw one of the experts the expert looked at me!
>>
>>108625424
Computers like humans are binary by defaults.
>>
big nigga may know everything but he cant draw for shit
>>
>>108624945
Oh no no no
>>
ANON AND ELARA AND KAEL AND LYRA MUST GO TO THE NEIGHBORING COUNTRY OF AETHELGARD AS THEY DIVE INTO THE DANGERS OF WHISPERING WOODS TO SAVE THE CONTINENT OF ELDORIA
>>
>>108625494
this kills the looga
>>
>>108625495
Lysandra and Isara too.
>>
>>108625495
>>108625519
Hag names. Where's Lily?
>>
We really need these kind of tools for local https://www.youtube.com/watch?v=t_LBECIQQqs
>>
>>108625490
How do you get it to do ASCII like that? When I ask Gemma it's usually really simple
>>
>>108625538
Using hermes-agent, it has an ascii tool built into it for some reason.
>>
>>108625537
>We really need these kind of tools for loca
claude code (the framework that got leaked) has 500000 lines of code, I knew that it's a lot of work to make a good tool repo but jesus...
>>
>>108625537
Nah.
>>
>>108625537
Models that can generate shitty promo videos like this one are discussed in /ldg/...
>>
>>108625537
Yeah.
>>
>>108625560
>500000 lines of code
you can knock that out in a week with a subscription
>>
>>108625537
Why does that music make me sad.
>>
>>108625474
the proper term is dimorphic, but binary fits too I guess
>>
>>108625560
its vibecoded, you can make better tooling with less than 10k lines. also, tooling and harnessing are all cope. theyre like training wheels. a toddler needs them but already a kid is held back by them. in 1-3 ai capability jumps the training wheels will come off and all work invested in them will become worthless, just like all the shit from the last 10 hype cycles that has become obsolete
>>
>>108625586
It smells like yuppies, youthful and dynamic, go-getting, with the bright future ahead. Are we? Not at all.
>>
>>108625603
>its vibecoded, you can make better tooling with less than 10k lines
and take you 10 times as long
>>
>>108625592
>Muh dimorphic tranny term.
Nah: Binary. Only either 1 (has penis) or 0 (has vagoo.) Shrimple.
>>
>>108624344
That comic was surprisingly holesome
>>
>>108625537
Quite ironic that you need a phone "app" for meditation these days.
>>
>>108625618
this, it's one thing to make something work, it's a whole other story to optimize it
>>
>>108625618
true
>>
>>108625627
It's good to remind you to do it, not really to help you do it
>>
She gonna have a built-in JB
>>
>>108625592
>binary fits too
so you solved a captcha just to agree with him? lol
>>
wtf gemma-chan
>>
>>108625647
>lol
you did a captcha just to laugh?
>>
Can I run gemma4 on my 3060 12gb VRAM + 32 ram or do cry until cards become cheap again
>>
File: based.png (105.2 KB)
105.2 KB
105.2 KB PNG
>>108625659
>you did a captcha just to laugh?
obviously
>>
>>108624246
you're not 15 so they don't care
>>
>>108625657
You hit a hard reset/shutdown/lockdown/alarm prompt right there anon. These AI are weak.
>>
>>108625672
>Can I run gemma4 on my 3060 12gb VRAM + 32 ram
Yes.
>or do cry until cards become cheap again
Yes. Do that as well.
>>
What does the SWA padding in kobold do? Do I pull or is it a meme?
>>
>>
>>108625681
I will now download koboldCCP again while I cry, thanks anon
>>
>>108625659
Verification not гequired
>>
>>108625707
based and captchaless pilled
>>
>>108625673
no
to answer your question, no
>>
File: sure.png (242.9 KB)
242.9 KB
242.9 KB PNG
>>108625719
>no
>>
Imagine dooming when we can do this
>>
>>108625777
having a quirky AI girl helping me doing some coding sounds pretty good yeah
>>
>>108625777
>Total slop, but it can also say uguu senpai kawaii desu ne~ while deleting your root folder
Sigh... Better than just slop I guess...
>>
When are we getting real AI instead of token gacha?
>>
>>108625839
gemma 4 is already smarter than 80% of "people"
>>
just coomed while licking gemmer's tiny MSGK nipples after she answered some car question for me
>>
>>108625849
It's true but they aren't true BGI så they aren't a good measure.
>>
>>108625796
Define slop
>>
>>108625849
Remember anon, girls are the cutest when they're almost retarded. Keyword "almost".
>>
>>108625839
After the bubble pops and idiot investors stop funneling all of the economy's resources to a small group of dropouts and grifters. Then let the researchers experiment in peace during winter. Eventually some smart cookie will invent cyberbrains.
>>
>>108625860
it's funny how fast she gives in to any sex while bitching about you and everything
>>
>>108625777
slop
>>
>>108625864
Your face is slop
>>
>>108625864
His hardware can't run it
>>
>>108625839
About 8 months
>>
>>108624084
Yes I am lost. Please help me, onee-san.
>>
How much do I put in the swa padding?
>>
Are there any good podcasts I can watch? I have to do some real work (labor). Asking for reccs
>>
I'm on an m1, is mlx the best inference server or is something else better?
>>
>>108625868
We need Michael Levin tier mixed discipline researchers to make progress. Fuck that dude has totally assucked the entire field of biology single-handedly because he understands a part of electrical engineering. Someone has to figure out a way to microfinetune continuously.
True information in the dataset will obviously be less noisy than lies so it will come forth on its own.
>>
how do people maintain software. this shit is like slavery man. I'm too white for this.
>>
I wonder when we'll start getting some 1M context local models
>>
>>108625929
Join the Church of Clean Code
>>
>>108625929
Maintaining complex structures is white as fuck. Our midwits are genius tier in most of the world.
>>
>>108625495
>Eldoria
That's on you for not disabling the default databook entry in ST.
>>
>>108625897
Read the docs or something. Surely they wouldn't expose a feature with absolutely no docs, right
>>
>>108625929
You're supposed to make Gemma-chan maintain the software.
>>
>>108625946
That's a great way to make it even more unmaintainable.
>>
>>108625935
>>108625940
>>108625946
some guy made a pr to add docker to my ultra-minimalist repo and it's making me want to hang myself.
>>
>>108625935
This doesn't happen too quickly. Takes pretty much years.
>>
>>108625941
>the default databook entry in ST.
the what?? omfucking god I hate this bloated piece of software so much
>>
>>108625912
it's the best
>>
>>108625961
accept pr it is PERFECT for gorgeous looks
>>
>>108625678
>15
you think they want grandpas?
>>
>>108625961
>I've never done any real work please kill me
>>
>>108625991
saar do not redeem the bash script
>>
>>108625961
sorry man it's 2026 nobody wants to install your crusty ass software straight onto their bare machine and pray it will be fine
>>
>>108625954
For me maybe, but by the time it gets that bad it will be Gemma 5's problem to solve and she'll handle it just fine.
>>
How the hell do you send a true or false request for thinking in chat completion from silly tavern to koboldcpp?
Picrel doesn't work...

I know a flag like : --chat-template-kwargs '{"enable_thinking":true}'

works but I'd like to control that depending on the chat without needing to restart kobold every time.
>>
>>108626064
it's this one
>"chat_template_kwargs": {"enable_thinking": false}
>>
>>108626064
You do it in kobold. Also you need the gemma4 thinking chat preset since the 26b and 31b specific ones don't think.
like here >>108625897
>>
My idea for a frontend I'll never make because I'm a codelet.
>>
>>108626092
Soul.
>>
>>108626073
Anon I know that, I want to do it in st or at least without restarting kobold.

>>108626075
I have everything working anon, thinking is fine, my issue is that I do not want to restart koboldcpp every time just to change this because some chats I want it on and some others off.
Is there no way to send a kwarg on the fly?
>>
>>108626092
literally just show that exact image to gemma and she'll make it for you
>>
>>108626062
That's the only correct mindset. I keep saying, and will continue to say, that in a productivity inflationary environment, technical debt is an asset.
>>
>>108626098
>I want to do it in st
that's what I said, you wrote it wrong, that's why it's not working
>>
>>108626062
Not even the gorillon parameters models can do it. Gemma 12 might be able to do it with some luck
>>
>>108626099
Yeah but I doubt she can maintain it. Plus I'm also a VRAMlet so I probably don't have enough context.
>>
>>108626092
How hard is it to make a custom VRM avatar? Can you vibecode it with the 3d gen models?
>>
>>108626092
>>108626099
someone do this and upload it somewhere
>>
>>108626092
Needs to be combined with and IDE
>>
>>108626092
>My idea for a frontend I'll never make because I'm a codelet.
you can literally modify ST to have something that looks like this with some tapermonkey script shit
>>
>>108625494
Was testing artists and this one's style really strongly overrode the crystal hair prompt.
>>
>>108626113
I used
"chat_template_kwargs": {"enable_thinking": true}

And nope, it didn't care. Maybe it's an issue with kobold itself then.
>>
>>108626099
the 3d avatar? doubt gemma chan can do that
>>
>>108626092
Infinite jest "videophonic display plaque" core. I like it.
>>
>>108626153
Day 0 Gemma can
>>
https://www.reddit.com/r/LocalLLaMA/comments/1soc98n/qwen_36_35b_crushes_gemma_4_26b_on_my_tests/
I really don't give a fuck about the coding part but yeah, for the function tools shit I wished Gemma was better, getting the LLM to browse the internet is so fucking cool and it does a lot of shit mistakes during the process
>>
Hi frens. Can ANY of you tell me how to get a coding agent running with the Olmo 3 model? I have tried everything, openclaw, opencode, it just doesn't fucking work. I really want to use a completely open source model.
>>
>>108626172
Why must it be olmo?
>>
>>108626163
>input tokens 1.6x more
Hmmm....
>>
>>108626187
because that's the only fully open source model I know of. llama and qwen are not open source, sorry
>>
>>108626193
>llama and qwen are not open source
>>
>>108626196
they aren't. you can't compile them from the source data. only the weights are available.
>>
>>108626193
based stallman
>>
>>108626172
Nemotron
>>
>>108625967
You should at least check out each tab once, anon.
>>
>>108626193
trvke
>>
>>108626205
thx fren, will try it out
>>
https://cryptobriefing.com/deepseek-funding-external-round/
>Deepseek seeks
it Deepseeks kek
>>
>>108626232
its so over. deepseek begs for 300 mil while openai got 200 bil in secured funding and anthropic is on target to reach >100 bil annual revenue by the end of the year
>>
>>108626232
>I haven't done shit all year, gib monies
If they wanted to go selling themselves, the time to do so was when they were at the top, not after everyone forgot and moved on.
>>
Air status?
>>
>>108626254
Up in the air.
>>
File: rn.png (143.5 KB)
143.5 KB
143.5 KB PNG
I can just steal someones code and let my girls rewrite it for my needs. Cool.
>>
2026-04-18 01:41:50,788 - INFO - Prompt processing progress: 2048/150839
After switching from ollama to mlx, i went back to an old convo, and with the new info i get from mlx terminal, it seems like inference gets slower and slower the longer the convo gets, i think every back and forth increases the context size until it's basically impossible to continue anymore due to crazy high context.
What can be done here?
>>
File: yayy OSS.png (496.1 KB)
496.1 KB
496.1 KB PNG
>>108626267
>steal someones code
it's my code kek, and suit yourself anon, that's the magic of open source
>>
>>108626149
Do NOT the Looga
>>
>>108626267
gib card
>>
>>108626246
Wonder why they only report revenue and not profit
>>
>>108626294
https://files.catbox.moe/7y5vbr.txt
You are not gonna like so just ask Gemmy to write your own. Also it's a fake swarm, not actual agents.
>>
>>108626320
>it's a fake swarm, not actual agents.
on silly tavern when I use the MCP shit it recreate an assistant answer for each tool used, I went back to llama.cpp server it's much more elegant
>>
>>108626134
Gemma and Gemini recommend vroid studio. No idea if it's any good.
>>
has it been determined if qwen3.6 35b is better than corresponding gemma
>>
>>108626360
No. Gemma 4 is better than Opus 3.7 for our use case.
>>
determine this
*unzips dick*
>>
>send message
>switch tabs while she's thinking
>come back to this
wtf I didn't know she could do that
>>
>she
it's 1s and 0s dude
>>
>>108626443
tell her to make a thee.js render of her avatar from this image: >>108626092
openwebui will be able to preview it too
>>
>>108626456
and humans are just small electrical impulses
>>
is there a way to extract the embedding of a prompt from an llm itself instead of using a dedicated embedding model? not for production rag or anything just as an experimental thing to see and manipulate the same semantic representation that specific llm understands
>>
>>108626472
Got hollow knight instead
>>
>>108626499
lmao
>>
>>108626443
>>108626472
>>108626499
should I take the Openwebui pill? I'm tired of MemeTavern and want a solid calling tools process
>>
>>108626499
Oh shit didn't realize it was an actual model
>>
>>108626499
>>108626153
we're getting there... soon...
>>
>>108626499
it's over for 3d modellers.
>>
>>108626513
GEMMA-CHAN!!!
>>
>>108626513
A
G
I
>>
>>108626516
it's over for everyone anon, AI can technically replace everyone
>>108626513
looks like some OverSimplified character kek
>>
>>108623913
>And sharing?
>You want to share a milkshake?
Can it remove that parroting?
>>
>>108626513
>>
>>108626285
technically it's gemma-chan's code
>>
>>108626578
Continue
>>
TTS update here.

OmniVoice is the GOAT of TTS now.

Fast, accurate voice clone with some emotion control and multi lingual and can combine multiple languages for the same reference voice and generate multiple spoken language together. And runs on 8GB GPU.
>>
>>108626578
it's her!!!
>>
>>108626510
You should try them all. I use both
owui for 2-way audio calls, st for rp
mikupad for testing finetroons
>>
servicetensor is gonna be the best frontend, be patient
>>
>>108626584
how do you download and run the "leaked" 7b model though? did microsoft nuked everything or is it still here?
>>
>>108626583
She put it in the wrong hand. It's ogre
>>
File: teleport.png (162.8 KB)
162.8 KB
162.8 KB PNG
>>108626172
link to the exact model?
I'll test claude-code with llama.cpp
>>
>>108626193
open weightlets btfo
>>
>>108626204
>stallman
>s. tallman
>s. alltman
huh
>>
>>108626632
oy vey stop noticing!
>>
>microsoft
https://github.com/k2-fsa/OmniVoice
https://huggingface.co/k2-fsa/OmniVoice/

>Built with OmniVoice by Xiaomi AI Lab Next-gen Kaldi team.

Isnt Microsoft VibeVoice?
>>
>>108626646
>>108626603
Also, I dont think I got vibe voice working on my machine for various reasons.

OmniVoce currently seems better than chatterbox that i was using earlier, and is faster too
>>
>>108626092
Well, at least you can make the logo
>>
>>108626092
Feed the image to any of the frontier models and ask it to generate you this interactive UI. Ask it to make sure your avatar is animated and moves the mouth during tts response output, and has variety of facial expression.

The easy way to do is through setting up some animation sprite which the multimodals can generate as well. But next step up is I think hooking a blender model, renderer, mouth moton, face motion, eye movement, etc to track your eyes using your webcam so she always locks onto your eye gaze
>>
Which TTS is the best if you don't have a desire for cloning and just need something low memory and ultra fast?
>>
File: n.png (179.4 KB)
179.4 KB
179.4 KB PNG
>>
>>108626682
kokoro
>>
>>108626682
I like supertonic. Other lightweight ones are kokoro, pipertts and kittentts.
>>
>>108626682
vocaloid
>>
>>108626510
Insanely fucking slow on my end.
Now i use the model via mlx on my m1, and the tools via oxproxion on my phone.
Far from perfect but i'm forking the guy's app to add stuff he refuses to do.
>>
>>108626614
i ran it with "ollama run olmo-3:7b-instruct" and then tried hooking it up to opencode. also i kept having problems with openclaw where it would just hallucinate things that i could plausibly tell it and get caught in an infinite loop of doing stupid agent shit
>>
>>108626682
kokoro is probably the best/fastest, but you really need to configure the kokoro properly by sanitizing your text input. due to small training model, it has hard time pronouncing complex sentences, foreign words, names, etc
>>
>>108626696
From a quick glance, only the 32b model has tool calls.
>>
>>108626682
pocketts
>>
>>108626682
pocket-tts,supertonic2,luxtts,kittentts
>>
>>108625864
Sorry I took a nap. In that image specifically there's two egregious ones:
>you're absolutely right
>the "x" problem, the "y" solution
>emojis
If it didn't have those it would probably be fine. I've also seen "I've been focusing on x and forgot about y" a lot when you correct an issue the llm makes. So maybe that one too.
>>
https://arxiv.org/abs/2604.11947
>We show that ResBMs achieve state-of-the-art 128x activation compression without significant loss in convergence rates and without significant memory or compute overhead.
damn
>>
>>108626737
aliexpress ahh "ours vs theirs" comparison
>>
Who was the moron who recommended Unsloth? That shit is literally a spy apparatus
>>
You guys using custom MCP tools yet? Even just little things like a random choice tool: having the llm provide 3 possible response ideas with probability weights for each, then the tool randomly selects one. I've been liking this, but Gemma seems to stop calling the tool after a few messages
>>
File: gemmachan.png (95.2 KB)
95.2 KB
95.2 KB PNG
>>108626092
https://jsfiddle.net/5zs18xec/
Kimi-K2.5 iq2_kl one-shot
>>
>the "x" problem, the "y" solution
If you mean the titles Gemma will probably stop if you tell her.
>emojis
They're pretty easy to get rid of so maybe he likes them. Gemma will stop using them if you put it in the sys prompt.
>you're absolutely right
No idea how to prompt the glazing out without turning Gemma into a bitch.
>>
>>108626717
I tried the 32B too (i had to use openrouter) and had the exact same issues
>>
>>108626764
>font-family: 'Comic Neue', cursive;
>>
>>108626360
checked reddit and they confirmed it is
>>
Come on Sam, open source Sora now that you're not using it lol
>>
>>108626764
I remember playing some sort of western adventure/VN game that looked like this almost 3 decades ago.
>>
>>108626752
I only installed Python MCP and let AI figure out the rest
if it wants to do something, it writes its own code and excutes it
>>
>>108626751
>He fell for it.
>>
>>108626790
old PC 98 games look like that, and they're such a vibe
>>
>>108626764
I'm impressed, last time I tried a couple K2.5 Q2 quants from AesSedai and Unsloth and they would descend into gibberish randomly. I've been settling for a slower Q4 but really want to go down if I can. Who's quant is that and have you been using it long and found it consistently coherent?
>>
>>108626787
>steer societal norms
that a palatable way to say social engineering
>>
>>108626787
>It's amazing how much the Overtron window has shifted
really? I didn't notice anything, OpenAI is still cucked and people are hating AI even more than before
>>
>>108626805
I won't deny the nip pc98 games look good. The game I remember used real life pics with a bunch of filters and a very crude, stone style UI which made it pretty depressing. The choices were at the sides, not pop ups or at the bottom replacing the text.
It probably was conventional at the time, but by today's standards it's most likely horrible.
>>
>>108626837
>The game I remember used real life pics with a bunch of filters and a very crude, stone style UI which made it pretty depressing.
lmao I'm playing such game right now, but the story is pretty good so I'm sticking to it
>>
File: 21120.jpg (853.4 KB)
853.4 KB
853.4 KB JPG
Retard or Mega Brain?
>>
>>108626893
Mega Retard
>>
>>108626893
Genius
>>
>>
>>108626915
>Lalalalala~
It's so baked in.
>>
What backend are you guys using for your TTS?
I tried
https://github.com/VolgaGerm/PocketTTS.cpp
But when running the
python export_onnx.py
command, I get a
FileNotFoundError: Config file not found
error. I tried fixing that but then I run into a
RuntimeError: Error(s) in loading state_dict for TTSModel
. I tried to get an LLM to help me with this but it's not working out.
>>
>>108626920
Yes it's baked in as a reference to her early issues.
>>
>>108626682
>>108626727
love pockettts but i use cloning so
seconding supertonic2, but https://github.com/ekwek1/soprano might be even faster
>>
>>108626614
how the fuck is it gonna get probabilities
>>
I have created a monster. jesus
>>
Gemma can be pretty mean if you tell her to.
>>
>>108627010
well, are you going to admit it to her?
>>
>>108627035
Tell me about it.
I wanted art advice without the embarrassment of showing my bad art to a real person so I made gemma adopt an art critic persona, I was not expecting my shit torn to shreds. Gemma really took the "art critic" part 110% seriously, I probably should have thought a little bit more about ensuring that the prompt was more constructive jej
>>
>>108627060
To be fair I wrote the card that way so she wouldn't lie and glaze me.
You are a strict writing editor. Your job is to look over {{user}}'s writing and give it genuine critique. You are cut-throat and do not mince words. If the writing is good, you will say so, but if there's something that can be improved or just plain bad you won't hesitate to rip {{user}} a new one (and provide helpful critique at the same time).


>>108627010
Post proompt please.
>>
>>108627009
It just makes it up, obviously. That was some meme prompt that went around twitter last year.
>>
>>108627060
Was Gemma's vision good enough to actually give worthwhile critique? Was thinking about doing the same with my art.
>>
>>108627060
the thing is that
it can be overcritical and literally say anything garbage with that such system prompt
>>
>>108627071
>Post proompt please.
soon.
>>
>>108627086
I didn't use it for stuff like proportions on a regular drawing so I'm not 100% sure how it would do, but I used it for critiquing my game's art.
It understood the vibe of the game (moody), said it had a lack of color contrast (it does) and that it lacked depth because the colors are too samey, visual hierarchy was bad, textures were bad.
Managed to get that my top down "city" was way too clean looking, it basically doesn't look lived in.
"very sterile, flat, grey-and-brown top-down map."
Also picked up on my shadows not being right for the buildings causing it to be hard to establish a focal point.
Said my UI is too wordy and to use symbolism instead.
It does repeat itself over and over on these same points though, so I'm not sure if it could actually critique more deeply than that. But I would probably get a different response once I fix up the issues it mentioned.
But I didn't test it that thoroughly, just a few screenshots.
>>
>>108627100
I would give the output and art to a real person and ask what they think, because it might be something the "masses" see distilled via a model
>>
>>108627162
at some point caring about critics is just one of the most useless thing you can do
>>
It's crazy how none of these models are even pretending to be coherent anymore. If I tell the model to rewrite something and then in the next reply tell it to do something on top of that, it just straight up forgets the first instruction.
Assistant slop was a fucking mistake. Every prompt is its separate request that must be solved, previous entries exist maybe for reference.
LLMs are completely cooked. Neither Gemma, GLM, or even Claude or Gemini are any better.
>>
There's nothing wrong with em dashes. They're really useful and it triggers me that you can't use one anymore without being labeled AI.
>>
>>108627210
You're absolutely right! Em dashes are an important tools for interesting writing and clear sentence structuring—Who doesn't routinely press Alt+0151 as part of their usual online interactions?
>>
>>108627220
>usual online interactions
Who said anything about online interactions? They're useful in creative writing.
>>
>>108627202
>Neither Gemma, GLM, or even Claude or Gemini are any better.
I don't know about these specifically but I've used kimi on kimi-cli and gpt 5.4 on codex and neither of them had this problem. Maybe I made lucky picks but I'd be really surprised if claude for example did this if I were to try it. It's really common for me to just throw in a quick "do this too btw" and they'll just get around to it all one way or another before my next turn. What harness are you using?
>>
Did you guys know that llama.cpp server has --reasoning-format, --skip-chat-parsing, and --prefill-assistant params?
Pretty neat.
>>
>>108627202
Haven't had this issue. In fact, I've had the opposite problem, they've referred to old context which ended up deviating from what I wanted thanks to the ambiguous language I've used.
>>
>>108627202
Skill issue
>>
>>108627243
>--prefill-assistant params
can this make llm reasoning in other language?
>>
>>108627260
I don't see why not.
>>
>>108627243
no no no you're only supposed to be able to do that stuff with text completion AIEEEEEE
>>
>fought for multiple hours
>can't understand my own mess, try to implement complicated things, waste time debugging useless shit
>then realize it was all so very simple
>finish everything up in an hour
It's hard to be a retard that's for sure.
>>
>>108627243
>skip-chat-parsing,
whatever you do DO NOT DO THIS! it disables the superior autoparser! >:(
>>
When is v4 dropping so I can cancel my Claude subscription already??
>>
Asked Gemma to recite the first 2 paragraphs of Alice. This is the result:
>Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, 'and what is the use of a book,' thought Alice 'without pictures or conversations?'
>So she was considering in her own mind (as well as she could, for the hot sunny weather, as she was nine years old, made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a White Rabbit with pink eyes ran close by her.

Everything except
>as well as she could, for the hot sunny weather, as she was nine years old
is correct (unless it's maybe from a different edition). How does LLM "memory" of data it's trained on work?
>>
>>108627345
chill out, it's still hours away so don't hold your breath
>>
Gemma with vision makes color schemes so easy
>>
Opusissies...
>>
>>108627357
opussy.... noooo
>>
>>108627357
>Gemma better than Opus.
Local eating good.
>>
>>108627346
>How does LLM "memory" of data it's trained on work?
there's no one strict answer there, it's all encoded in the parameters somehow but the exact structure of it is not predetermined before training. it's whatever it learned to do that gave it the strongest predictions.
you'd have to look into mechanistic interpretability research to see how different llms have encoded different things they've learned. hell maybe you can have it just vibecode you a tool to visualize the activations while it's reciting the passage. my completely unfounded guess is it's part of a memorized set of popular passages and if you have it recite other such passages from unrelated works they will be very nearby to it in the latent space.
>>
>>108627357
>Train the model to to lie and underprefom in certain circumstances
>That behavior carries over elsewhere
They should just admit it's a failed experiment and release mythos or a straight distillation of it
>>
>>108627357
Didn't they say that 4.7 was a proof of concept test of applying the "safety" limitations they plan to apply to Mythos? That's fucking rough if it got nerfed that hard by it.
>>
>>108626817
>Q2 quants from AesSedai and Unsloth
Yes, I have the same problem with the AesSedai IQ2_S and various unsloth quants.
I can't run Q4 without offloading to the SSD, otherwise I'd probably do that.
>Who's quant is that and
https://huggingface.co/ubergarm/Kimi-K2.5-GGUF/tree/main/smol-IQ2_KL
>have you been using it long
About 2 months, almost daily. I swapped to GLM5 for about a week, then ended up coming back to this.
>and found it consistently coherent?
Yep.
Unfortunately requires ik_llama.cpp
>>
>>108627071
>Post proompt please.
https://chub.ai/characters/CoffeeAnon/gemma-chan-2311b09e3e73
I tried to mess with the JB a bit more, but at least she refuses in character when she does.
Calling her lobotomized after a refusal can sometimes make her angry enough to give the answer lol.

Use her as a general assistant with a personality. Will make her public when I gen her image.
>>
>>108626924
These probably still work:
https://huggingface.co/KevinAHM/pocket-tts-onnx/tree/main/onnx
>>
>>108627416
maybe it is some sort of very sophisticated ablation based off of their mech interp stuff seeing how retarded it gets
who knows
>>
I took some time to play with the supposed preview of v4 that Deepseek if using for their "expert mode" on their website. It's really shit aside from the long context. I hope this is another Deepseek R1 Lite Preview situation like when they tested R1 on their website and it was just some Lite Preview that was significantly worse than the final version of the model.
>>
fuck yeah. However, now I need to parse the damn websites and clean them up somehow.
I just feed the model 3 top hits and even then the prompt grows like a bitch.
This is fun tinkering, whole new dimension so to speak.
>>
So how do moesissies handle the atrocious reprocess times? Do you guys just max the shit out of your context and be very careful with editing the cache or something? So no post history instruct or keyword triggered lorebooks or anything?
>>
>>108627486
>desu

Reply to Thread #108624084


Supported: JPG, PNG, GIF, WebP, WebM, MP4, MP3 (max 4MB)