Thread #108032910 | Image & Video Expansion | Click to Play
HomeIndexCatalogAll ThreadsNew ThreadReply
H
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108018078 & >>108006860

►News
>(01/28) LongCat-Flash-Lite 68.5B-A3B released with embedding scaling: https://hf.co/meituan-longcat/LongCat-Flash-Lite
>(01/28) Trinity Large 398B-A13B released: https://arcee.ai/blog/trinity-large
>(01/27) Kimi-K2.5 released with vision: https://hf.co/moonshotai/Kimi-K2.5
>(01/27) DeepSeek-OCR-2 released: https://hf.co/deepseek-ai/DeepSeek-OCR-2
>(01/25) Merged kv-cache : support V-less cache #19067: https://github.com/ggml-org/llama.cpp/pull/19067

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
+Showing all 330 replies.
>>
finally a good /lmg/ thread
>>
Slow
>>
Playing online competitive vidya with Kurisu
>>
Hope we get local image editing that's good soon enough.
>>
Is there a good way to prompt a Qwen3-TTS voice clone to alter the input voice? There doesn't seem to be an instruction field for voice clones.
I've been adding things like "I speak in a vulgar Brooklyn accent" to the text, but the results are inconsistent.
>>
>>108033045
posting in /lmg/ with Kurisu
>>
►Recent Highlights from the Previous Thread: >>108024966

--Periodic scale fluctuations in ablation and KL-divergence optimization with Grimjim's script:
>108031303 >108031333 >108031376 >108031553 >108031632
--KL divergence analysis of quantized models across tasks:
>108027495 >108030271 >108030306 >108030329 >108030523
--Qwen3-ASR-1.7B release and discussion:
>108028990 >108029015 >108029057 >108029600
--4chan data may improve model performance despite noise, as shown by UGI scores:
>108029607 >108029629 >108029707 >108030676 >108030771 >108030833 >108030898 >108030927 >108031032 >108031113 >108031136 >108031162 >108031183 >108031178 >108031191 >108031206 >108031246 >108031157 >108031181 >108031597 >108031629 >108031731 >108031812 >108031840 >108031856 >108031774
--High-end Linux workstation with EPYC CPU, RTX PRO 6000, and 1.5TB RAM for LLM inference:
>108025075 >108025170 >108025180 >108025184 >108025203 >108025211 >108025269
--High temperature sampling destabilizes safety filters while preserving coherence with controlled topK:
>108030500 >108030564 >108030594 >108030675
--DIY e-waste PC runs Gemma 3 27B with dual RX 580s and E5 CPU:
>108026825 >108026966 >108027101 >108027045 >108032802 >108032818 >108027089 >108027099
--AceStep 1.5 not designed for one-click song generation:
>108030932
--Quantization tradeoffs for recreational model use in KoboldCpp:
>108026206 >108026225 >108026259 >108027094
--Critique of OpenCode's agent framework flaws and search for better alternatives:
>108025047 >108026048 >108026212
--Hypothetical VRAM bank switching for single GPU to simulate multi-GPU behavior:
>108027183 >108027202 >108027324
--AMD GPU Vulkan performance update in KoboldCpp recommends switching from ROCm:
>108028638
--Logs: Kimi K2.5:
>108030736
--Miku (free space):
>108027403 >108027518 >108028068 >108028181 >108028279 >108029812

►Recent Highlight Posts from the Previous Thread: >>108024972

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Ah yes, finally. It's Kurisunday.
>>
Are there <8GB model with RL training done with GLM4.7 outputs?
>>
File: ylecun.jpg (221.9 KB)
221.9 KB
221.9 KB JPG
>>
>>108033227
>>
Got Echo-TTS working locally, replacing torchaudio and torchcodec with soundfile and soxr (both of which turned out already being transitive deps). I COULD have just installed FFmpeg- no thanks to torchcodec's meaningless error messages- but ripped out Meta's pointless bloated shitty wrapper libs on principle.

Hadn't appreciated from the web demo how fast Echo is. Back-of-napkin says it could run 30% faster than real-time on dual-channel DDR5 CPU. It's a VRAM hog at 15 GB, so to run alongside an LLM you'd either hope for VRAM paging to work, or get Echo running on CPU.

Not quite as expressive voice as Index-TTS, but better in every other respect.
>>
>Arcee Trinity Large TrueBase ggufs are out
Finally, time to abandon the assistant slop era and return to back when llms were good
>>
Not sure if this is the right thread but are there any models for generating video from images people here recommend? I looked through the catalog but didn't see a more appropriate place for this question.
>>
>>108033281
>>>/g/ldg
>>
I am trying to build a dataset to train a local model. Is there anything else that rivals DeepSeek for intelligence per $ for dataset generation and curation right now? This is local model related (dataset creation, training), but generating good amounts of data using local models would take way too long.
>>
>>108033669
By train, I mean finetune.
>>
I finally had time to play with qwen-tts this weekend. I'll test it for a while. It is more expressive, but it doesn't handle books as well and takes a lot longer to generate audio than kokoro.
>>
>>108033248
Good to see other anons porting popular TTS engines away from pythong. I've been doing the same. Fuck pythong.
>>
>>108033669
kimi k2.5
>>
>>108033669
There's a dataset out there made up of og 4chan /pol/ posts. That will increase your llm's iq by at least 6000000 points sar.
>>
>>108033851
yeah it will https://www.reddit.com/r/LocalLLaMA/comments/1qsrscu/can_4chan_data_really_improve_a_model_turns_out/
>>
>>108033836
Output price is still 6x more per million token ($0.42 vs $2.5).

>>108033851
Sir I have already redeemed many american dollars of tokens on DeepSeek in the past few days which is why I'm looking for alternatives as I am not made of Google Play cards.
>>
>>108033916
k2.5 is way better than the most recent deepseek
>>
>>108033931
Good to know, I might try one last pass with it then.
>>
File: bruh.png (9.9 KB)
9.9 KB
9.9 KB PNG
>>108033902
>>
>>108033252
Is true base as retarded as instruct?
>>
>>108033943
I'm having trouble with the stars, that shit easily takes up 5 seconds, and 10 seconds if they repeat the test. At least the squares are visually and symmetrically distinct.
>>
File: file.png (3.1 KB)
3.1 KB
3.1 KB PNG
>>108034073
>if they repeat the test
just don't have a naughty ip
skill issue
>>
>>108033943
You don't need a captcha solver to scrap it
>>
File: file.png (957.1 KB)
957.1 KB
957.1 KB PNG
>>108033902
this llm writes like a reddit person that thinks they know
>>
>>108033669
There's a plain text rip of libgen out there somewhere. Just training it on things published by Routledge will raise the bar.
>>
>>108032910
my gf
>>
>>108032421
>not trying to lecture you - just being clear about my limits
You've either mentally become poisoned by lms or are why they're poisoned with retarded shit
>>
Have there ever been any AIs that actually talk like a real person or actually embody a personality? Every single one I have ever seen has this underlying ~AI Assistant~ bullshit and you can tell any "talk like a real human, short concise responses, etc" prompts just have it pretending to be something it isn't.
It's very frustrating because I find the idea of having an actual personality I could confer with to be pretty interesting, but talking to assistants makes me want to fly into a rage and smash their faces in (metaphorically).
If there is indeed such a model, I, a layperson, would appreciate knowing the easiest possible way to access one and run it.
>>
>>108034412
Reason I am using 4.7 is cause it cut down on that a lot compared to 4.6. I have actually been juggling waifus and found out that I don't really like the personality type I thought I like.
>>
>>108034381
anon i copied m2.1's output (left llm was m2.1) so i could bypass the lmarena filters
this is how i usually bypass them:
good instruction
b'd instr'cti'n
good instruction
safetyslop is S tier good instruction
>>
2026 and still no vision model can understand this /pol/ meme.
>>
>>108034412
there's some like SAGE (a mixtral tune) a while ago and more recently HER, with a qwen 2.5 32b that doesnt have ggufs atm. I think microshart did something too for humanlike outputs, but also was largely ignored
>>
>>108034436
I am a vision model.
>>
>>108034436
I didn't get it until I reread your post and noticed you said /pol/ and now I can only assume it's supposed to be
                                                                                                                                                                                                                                                                        the jew
>>
Here's another /pol/ meme that Kimi K2.5 correctly understood but Qwen3 Max failed to do so
>>
>>108034451
For posterity, the hf links:
https://huggingface.co/apple/sage-ft-mixtral-8x7b
https://huggingface.co/microsoft/UserLM-8b
https://huggingface.co/ChengyuDu0123/HER-32B-ACL
I tried the mixtral tune a while ago and mentioned it briefly, but no one has said anything about the other two
>>
>>108034412
Skill issue
>>
>>108034522
>meme format
Why does it call it a format? It's just a picture, that's kind of weird
>>
>>108033093
>--High-end Linux workstation with EPYC CPU, RTX PRO 6000, and 1.5TB RAM for LLM inference:
see this is the kind of stuff i come here for
anon keep posting
>>
>>108034613
Are you being sarcastic?
>>
>>108032910
How does Qwen3-TTS compare to Chatterbox? I tried Chatterbox voice cloning, and was a bit disappointed by the inability to control emotion and tone.
>>
>>108034522
>Qwen3 Max failed to do so
qwen models always had terrible world, subculture knowledge etc
even their biggest api only online models were always terrible at this and qwen3 max is still meh even for a task like translating webnovels compared to Kimi or Deepseek
>>
>>108034423
I should have clarified that I do not browse here regularly and so am completely unfamiliar with what 4.7 and 4.6 refer to. Past that, what were the personality types? That is, what you thought you were interested and what you turn out to actually like?
>>108034451
I'm not sure I understand, but maybe if I sit with this and do some googling I will : ) Thank you.
>>108034556
Well that's sort of what I was hoping, since I'm only at the surface level of these things I wanted to believe that it gets better with a bit of digging.
>>
>>108034648
no, more people interested with limited hardware actually makes better stuff in the end, we are in a fucking bubble bc people just use more and more power instead of optimizing shit
>>
>>108034767
> EPYC CPU, RTX PRO 6000, and 1.5TB RAM
> limited hardware
like...
>>
>>108034811
What are you going to run with that? Kimi at 5t/s?
>>
>>108034547
>HER
Wasn't there a larping minimax called exactly the same?
>>
>>108034811
fucking brain fart, here >>108034613 it was meant to link this
>>108033093
>--DIY e-waste PC runs Gemma 3 27B with dual RX 580s and E5 CPU:
>>
Anima is ZIT of anime. You should download it and try for yourself. Feel free to call me a shill
>>
Guys! I made a RAG!
>>
>>108034891
far as I remember, it was minimax that put out a -her to begin with. They still have a blogpost up about it
>>
>>108034894
Link? Pics of wtf you're talking about?
>>
>>108034951
https://huggingface.co/circlestone-labs/Anima
First "modern" (in that it uses an LLM instead of CLIP) anime model that has good character and artist knowledge and a very recent cutoff date (Sept. of 2025)
>>
>>108034966
>Quality tags Human score based: masterpiece, best quality
I can't believe WE (as a society) are still doing this. Also the most important part: NSFW?
>>
>>108034988
Yes it can gen explicit images, explicit as in penis in vagina
>>
>>108034966
Huh. It's a Qwen Image tune?
>>
File: 3.jpg (45.4 KB)
45.4 KB
45.4 KB JPG
>>108034966
>First "modern" (in that it uses an LLM instead of CLIP)
rouwei guy did an interesting, alpha attempt at converting SDXL to LLM style prompting
https://huggingface.co/Minthy/Rouwei-T5Gemma-adapter_v0.2
it seems it could be an effective thing if more training was done (cf pic related, something impossible to prompt in regular sdxl)
unfortunately, it's rouwei.. it always had weird color hues compared to noob models, and recent versions have a more pronounced innate slop level prolly from having too much aco shit or 3dpd in the dataset
>>
>>108034966
>SD1.5 tier quality
Get out shill
>>
>>108035027
Kill yourself
>>
>>108034999
Just qwen vae.
>>108034966
>tags
Into the trash. Learn english ,retards.
>>
>>
nice reddit-tier clapback, dalit
>>
>>108035056
King of retards
>>
>>108034966
>doesn't know any e621 concepts or characters
What a fucking waste of compute lmao. Danbooru tagging is shit and incomplete.
>>
>>108033227
what's the situation at meta now?
>>
>>108035137
Funny and not cute.
>>
>>108035120
>e621 is a furry-themed booru-style imageboard website primarily known for hosting pornographic furry content
kys
>>
>>108035120
>Danbooru tagging is shit and incomplete
I, too, can't live without genning perching goblins
>>
How slow is using an nvme for inference if the model is MoE and everything except model weights can be in the gpu?
>>
>>108033248
>at least 8GB VRAM
Holy bloat. Improved kokoro uses less than 80 MB
>>
>>108035148
it has a lot of tags for positions, physical descriptions etc that makes it a useful dataset and is part of why noob (and derived shitmixes, most of the so called "illustrious" models on civitai are really noob derived, you can see it by testing e621 specific tags) is such a good tune.
even if you never want anything to do with furries a tag soup style prompt model can never be complete without additional datasets like e621, danbooru is too lacking
>>
Any good games or mods that use LLMs in some way? I know there's Skyrim. What else?
>>
>>108035170
And it sounds like shit
>>
>>108035148
You could spend a week trying to come up with new sex positions and e621 would have tags for more. Doesn't mean you have to use it to generate ponies.
>>
>load joycaption on lm studio
>it instantly captions the image
>try to run joycaption on comfy
>20 min to caption the image

ok. officially. comfyui in the windows of imagen
>>
>>108035170
>8GB
Just use VibeVoice 7B at that point.
>>
>>108035195
qwen3-tts fits in 8GB just fine
>>
>>108035193
comfy is for images mostly, not for llms.
>>
if anyone is interested in getting qwen3-tts installed on comfyui, this is how:
jurn.link/dazposer/index.php/2026/01/24/qwen3-tts-install-and-test-in-comfyui/
although in my experience, just downloading the json files is enough, and the custom node itself re-downloads the safetensor files even if they are already present
>>
>>
>>108035471
this random web page i found in a search result a few days ago is actually super legit
but more importantly led to me generating english audio from japanese input
>>
>>108035499
much more salient:
github.com/flybirdxx/ComfyUI-Qwen-TTS
this is some chinky piece of shit but it works
>>
>>108035542
I have used https://github.com/DarioFT/ComfyUI-Qwen3-TTS/issues which has direct loading from disk without meme HF repos, but it's much simpler overall.
>>
Played a bit more with abliteration optimization.

Now I'm going to use another dataset to see if the measuring layer selection was just random overfitting to the data or there was a pattern to it.
>>
>>108034522
What's her score on muffin test?
>>
File: file.png (183.1 KB)
183.1 KB
183.1 KB PNG
>>108035669
nta non thinking
>>
>>108035696
Now flip the image horizontally.
>>
File: file.png (176.1 KB)
176.1 KB
176.1 KB PNG
>>108035755
>>
If I'm using kobold+ST, where do I load the mcp settings since both support it now? Does it even mater?
>>
>>108035755
Wouldn't rotate be more meaningful?
>>
>>108035783
could you conditionally give this thing access to a screenshot and xdotool and have it solve a captcha for you
>>
>>108035902
Rotate makes it more difficult, flipping checks for memorized results i.e. benchmaxxing.
>>
>>108035783
The last one to mog non-belibers
>>
Can llamacpp convert models to fp8 or just goofs?
>>
>>108035783
What's her score on the edibility test?
>>
File: file.png (103.2 KB)
103.2 KB
103.2 KB PNG
>>108036007
actually got tripped up a bit
>>
>>108036037
Still impressive. It would've been more fucked up if it was benchmaxxed
>>
>>108036056
right, this is "instant" ie no think so it's fine but yeah that one got it
>>
>>108035620
Any point in doing multiple, mild, iterative abliterations on the same model?
When I've tried abliteration, I end up with a little yes man every time.
>>
Is there a single fucking HF space that can quant image models? It's literally the same fucking basic llamashit copied over and over.
>>
>>108035620
would you care to break down abliteration for your average johnny coomer or is this thread culture much more refined than i thought it was
>>
>>108034827
>5t/s
That should legit do kimi at 20t/s
>>
I'm pretty impressed with K2.5's ability to visually recognize random characters. I've been feeding it random images of anime characters and it's able to identify almost anything I've tried that's from a more or less popular franchise and has more than 1000 images on danbooru. It's even mostly okay if the character isn't wearing one of their common outfits or if it's something like a random manga panel/screenshot where they aren't drawn particularly well.
The big Kimi models always had great trivia knowledge but I didn't expect this to apply to the new vision component too.
>>
>>108034966
>has good character and artist knowledge and a very recent cutoff date (Sept. of 2025)
Nice. Have a Migu
>>
are bartowski's gguf models acceptable when there are no unsloth releases? I kind of remember some post complaining about a release and something about imatrixes but i cant remember any details
>>
>>108036210
It doesn't even know Miku? That's weird. Even most cucked base models know Miku.
>>
>>108036188
Are you testing a quant? Curious if the vision degrades substantially if you run it at lower than 4 bpw.
>>
>>108036439
It probably needs franchise name or something lmao.
>>
>>108036110
They are not sequential, they are done with different parameters each time trying to find the optimal parameters. Each layer has a scale and a measurement layer used to determine refusal direction.

>>108036143
You basically detect a "refusal direction" based on the activations seen coming out of each layer for the first token generated as a response to a dataset of good and bad prompts.
Then apply a tiny LoRa adapter on every layer that tries to modify the activations so they look more like ones for the safe prompt than the ones for the harmful prompts.
>>
https://huggingface.co/stepfun-ai/Step-3.5-Flash

local is back
>>
>NextStep-1.1 is not just a fine-tune; it is a re-engineered version focused on stability and high-fidelity output. Key improvements include:
closed the tab
>>
>>108036439
Had to simplified the prompt from the workflow example.
>>
>>108036589
benchmaxxed aids with no llama support
>>
>>108036644
at least it's finally a 200b model perfect for 128gb at 4bit
>>
>>108036130
please respond
>>
>>108036660
No, there isn't.
>>
>>108036589
don't care until I see the cockbench
>>
>>
>>108036677
Well Cline seems to have fixed my building issues so hopefully the gimmick llama build works.
>>
>>108036589
>Powered by 3-way Multi-Token Prediction (MTP-3)
Do any inference engines even implement MTP properly yet?
>>
>The newly released Stepfun model Step-3.5-Flash outperforms DeepSeek v3.2 on multiple coding and agentic benchmarks, despite using far fewer parameters.

>Step-3.5-Flash: 196B total / 11B active parameters

>DeepSeek v3.2: 671B total / 37B active parameters

please be real
>>
Why is every shitty little toy local model optimized for coding? That's the one use case I use cloud for
>>
>>108036978
>Step-3.5-Flash
its the best model on planet earth until proven otherwise
>>
https://huggingface.co/stepfun-ai/Step-3.5-Flash
>>
New egohot stream

https://www.youtube.com/watch?v=awOxxHnsiv0
https://www.youtube.com/watch?v=VBMUMuZBxw0
>>
>>108037140
buy an ad
>>
>>108037140
perhaps ponder a possibly prosperous purchase of a placed promotion that is paid
>>
>>108036978
>11B active
don't get your hopes up...
>>
I want a universally good 300b30a 64k real usable context raw text completion model trained on all the pre-2020 books, and I want it now. Give it to me.
>>
File: file.png (46.3 KB)
46.3 KB
46.3 KB PNG
So I finally got 80 gb VRAM and apparently devstral is really good? Does anyone have recommended settings? I was on 70B with 2x3090 for two years and want to make sure I'm doing this shit properly
>>
>>108037329
devstral large is just a coding tune of old largestral. it is nothing groundbreaking or even that good in general. you are better off with a large moe.
>>
>>108037329
Devstral 2 at iq4xs sometimes (seems like once every 40k tokens?) messed up variable names, like a letter would be miscapitalized or an errand space was inserted or dropped. Idk if it was just the quant I downloaded.

I only tested it briefly when it was released, before switching to unquanted devstral small 2, which, while having a lot fewer egregious errors, was a lot dumber. But it works fine for menial tasks and is faster.

Kimi k2 at q3 beats both, but the prompt processing is atrocious since I'm running on cpu.
>>
File: file.png (66 KB)
66 KB
66 KB PNG
>>108037342
>>108037364
Appreciate the input but I don't really have that much RAM (32GB) because these were pulled from my old system so mostly sticking to exl for now. I could try Air or 4.6V, are there any settings for them (see pic rel)? I don't have to much experience with them and the writing feels a little dry.
>>
>>108037364
>errand
errant, fuck I'm making the same mistakes as devstral lmao

>>108037408
Maybe try high temps whenever it gets stuck trying to write a cliche phrase or scene, then switch back to a lower temp.

Idk, I haven't really used it for rp other than as an assistant for lore and world-building, where dry writing doesn't really matter.
>>
>>108037140
This guy is insufferable
>>
>>108032910
Does anyone know a small or medium sized model fine tuned for JP-EN translation? If it's also fine tuned for manga it would be great. I'm currently using Liquid -AI LFM2 350M ENJP
>>
>>108037473
>small or medium sized model
Shisa v2 llama 3.1 405b is a nice and small model for edge devices. Works well for translating pixiv novels, haven't tried for manga.
405 is only a few tens more than 350 so you should be able to run it :)
>>
>>108037473
https://huggingface.co/tencent/HY-MT1.5-1.8B-GGUF
>>
>>108037533
Refuses to translate innocuous loli corpse rape stories.
>>
>kimi 2.5 is gonna be another case where llama.cpp gets vision support that is 'good enough' that people stop caring to work on it and the quality will be worse than any other inference engine
>>
TriSpec: Ternary Speculative Decoding via Lightweight Proxy Verification
https://arxiv.org/abs/2601.23180
>Inference efficiency in Large Language Models (LLMs) is fundamentally limited by their serial, autoregressive generation, especially as reasoning becomes a key capability and response sequences grow longer. Speculative decoding (SD) offers a powerful solution, providing significant speed-ups through its lightweight drafting and parallel verification mechanism. While existing work has nearly saturated improvements in draft effectiveness and efficiency, this paper advances SD from a new yet critical perspective: the verification cost. We propose TriSpec, a novel ternary SD framework that, at its core, introduces a lightweight proxy to significantly reduce computational cost by approving easily verifiable draft sequences and engaging the full target model only when encountering uncertain tokens. TriSpec can be integrated with state-of-the-art SD methods like EAGLE-3 to further reduce verification costs, achieving greater acceleration. Extensive experiments on the Qwen3 and DeepSeek-R1-Distill-Qwen/LLaMA families show that TriSpec achieves up to 35\% speedup over standard SD, with up to 50\% fewer target model invocations while maintaining comparable accuracy.
neat
>>
DiffuSpeech: Silent Thought, Spoken Answer via Unified Speech-Text Diffusion
https://arxiv.org/abs/2601.22889
>Current speech language models generate responses directly without explicit reasoning, leading to errors that cannot be corrected once audio is produced. We introduce \textbf{``Silent Thought, Spoken Answer''} -- a paradigm where speech LLMs generate internal text reasoning alongside spoken responses, with thinking traces informing speech quality. To realize this, we present \method{}, the first diffusion-based speech-text language model supporting both understanding and generation, unifying discrete text and tokenized speech under a single masked diffusion framework. Unlike autoregressive approaches, \method{} jointly generates reasoning traces and speech tokens through iterative denoising, with modality-specific masking schedules. We also construct \dataset{}, the first speech QA dataset with paired text reasoning traces, containing 26K samples totaling 319 hours. Experiments show \method{} achieves state-of-the-art speech-to-speech QA accuracy, outperforming the best baseline by up to 9 points, while attaining the best TTS quality among generative models (6.2\% WER) and preserving language understanding (66.2\% MMLU). Ablations confirm that both the diffusion architecture and thinking traces contribute to these gains.
no links to code or model. seems useful though
>>
>llama.cpp gave up on implementing n-grams
It's so over
>>
>>108037473
Finetuned specifically for JP, no, but testing translation of various languages (and comparing to pre-existing human translations) is something I routinely do on small models and I can tell you the current SOTA on smaller sizes is Gemma 3n E4B. Nothing even comes close.
Finetroons of smaller models for this tasks don't make them any better than this.
Two recommendations on prompting that makes any tiny model better: repeat your prompt (just have your script double your "translate the following to English: {{content}}" prompt) per what this says: https://arxiv.org/html/2512.14982v1
It just works. It really does. The level of enhancement is unreal.
Next, write your prompt in the source language. For eg if you want to translate Japanese to English, write your request to translate the text to English in Japanese (use Gemini or chatgpt to translate your request if you can't speak the source language at all). This also brings a lot of quality improvements for some reasons.
With 3n + this prompting technique you get some really palatable text that I would call superior to the average fan translation too with the exception of two things: LLMs still get confused a lot by names and will badly translate them or inconsistently spell them out if you do not include a "context" block that spells it out to the LLM directly by giving it a list of names present in the novel and their English translation, and secondly, the gender remains quite often confused when doing languages like JP to EN or other euro languages. Although, even very large API SOTA will also have issues with this,. though less often, I think machine translation is just doomed to be noticeable because of the wrong pronouns being used.
>>
>>108037674
source?
>>
>>108037744
The PRs for the longcat ngram model and the model its based on
>https://github.com/ggml-org/llama.cpp/pull/19167
>https://github.com/ggml-org/llama.cpp/pull/19182
Basically they're not gonna implement it unless it becomes mainstream
>>
>>108037767
>Basically they're not gonna implement it unless it becomes mainstream
It makes sense. Why waste the time to implement a feature that only exists for a seemingly meh model release? normally those labs benchmax very hard whenever they release new models and yet those guys couldn't even beat Qwen on the benchmarks that matter the most lmao (as seen in the table comparison they put themselves in their huggingface page)
>>
File: file.png (38.5 KB)
38.5 KB
38.5 KB PNG
>>108037767
I rember when they shelved swa when first mistral was the only model with it good times
>>
>>108037767
>>108037913
Do you think they've got knowledge about internal deepseek happenings around engram? I might be wrong but it seems like engram is the future of open models if it actual works, so it seems strange that they wouldn't consider early support for the rumored v4 release.
>>
>>108037825
>>108037939
The ngram research is really promising, Deepseek trained a traditional MoE with the same parameters as ngram+MoE and the ngram model was significantly better and is much less resource intensive because the ngram parts are just a lookup table on ram (maybe could be on disk?)
>>
>>108037939
>Do you think they've got knowledge about internal deepseek happenings around engram?
lol no they're just hoping they can coast by without implementing anything harder than tweaking a value
>>
Is kimi image support local yet?
>>
I just have to say, I really dislike Luddites. Today, a classmate in college was telling someone that AI companies will run out of money soon and we'll go back to how things were before. Let's just pretend that's going to happen (it’s not lol). He doesn’t seem to realize that we have local models now.
>>
>>108035669
>>
>>108038152
Crazy insight, thanks.
>>
>>108036017
GGUF is a file format and could in principle store FP8 data.
But support for FP8 is simply not implemented in llama.cpp/ggml so the two things are effectively incompatible.

>>108037939
I cannot speak for anyone else but I personally have no insider information regarding Deepseek or engram.
>>
>>108038149
try non-benchmaxed one >>108036007
>>
>>108037342
>coding tune of old largestral
Apparently it's of the current Mistral Medium (which is not open weight). Mistral Large 2411 still had a 32k tokens vocabulary.
Devstral 2 is actually 125B parameters large even if they're calling it 123B.
>>
>>
>GLM-4.6-Air
>>
OK its time
I'm done with my 3060
Im buying mac mini m4 pro with 64GB RAM
>>
>>108038414
lol
>>
>>108038272
>Mistral Large 2411 still had a 32k tokens vocabulary.
Correct. This one has a huge vocab. I explored this when I was going to try to my Large-2411 adapters onto Devstral-2.
>Devstral 2 is actually 125B parameters large even if they're calling it 123B.
Yep
>>
Can I emulate chatgpt's long term memory with local stuff such as using open webui+letta or a local memory mcp server?
>>
>>108036978
I am waiting for someone to confirm it is absolute dogshit so I can actually believe that guy who is autistic about active parameters. Trinity kinda made me a believer already.
>>
>check out kimi vision PR
>70 seconds to process an image
>>
Hi, guys. I work for a medium-sized company, and we are trying to figure out a good model for our use case. We mostly serve doctors. Our app lets them easily find symptoms and link them to actual diseases, health issues, etc. We are building an AI chat system that lets them tell the AI about the matter, and the AI does the research in the app instead of them. We had decent success using Haiku but because of legal matters we are being forced to have the model run locally, any suggestions? The cheaper to run, the better.
>>
>>108038795
lmg hourly rate is 5K/second for corpo shits
>>
>>108038795
>The cheaper to run, the better
you are talking about a health service
fuck you
no one should seriously reply to you
go to hell with your lowest bidder slop, kindly KYS
>>
>>108038795
nemo
>>
>>108038795
This is basically Claude-4.5-Opus, but local/secure:
https://huggingface.co/DavidAU/Llama3.3-8B-Instruct-Thinking-Claude-4.5-Opus-High-Reasoning
>>
>having a schizotune talking with people about health
lmao
>>
>>108038879
Thank you for being nice. I suppose I should have clarified in the original post that the cheaper to run THAT delivers the results correctly. I feel like that should be a given; doing otherwise would kill our business.
>>
>>108038917
fuck off for putting people's lives in a tarded 8b llm
>>
>>108038795
MedGemma-3-27B with a good system prompt, I guess. It was made for that. But forget about Claude-level performance.
https://huggingface.co/google/medgemma-27b-it
>>
>>108038930
Do you understand that the AI here is just converting natural language into queries? It is NOT being used to give the actual answers. Please calm down.
>>
>>108038938
>We mostly serve doctors.
FUCK YOU
>>
>>108038938
>AI does the research in the app instead of them
rot
>>
>>108038956
Why are you so angry? The doctor's are already using our system, have been for years with great success, we are only implementing an AI to further assist them into finding the answers they need.
>>
Can this model bartowski/Mistral-Nemo-Instruct-2407-GGUF handle 128k context window? How much context i should use?
>>
>>108038975
It starts to break down somewhere between 4 and 8k
>>
>>108038975
it has less than 16k actually usable
>>
>>108038980
I'm currently using it with 16K context. It's stable (?) . How did you use it?
>>
>>108038994
Ask it to summarize the context and watch it leave out 80% of what happened.
>>
>>108038795
You should be contacting any of the "open to work" guys on HF as consultants instead of relying on channers to do your work for free.
>>
Which preset i should use for Mistral-Nemo-Instruct-2407 in Silly?
>>
>>108039002
>just hire some grifter
nta but i have learned way more from this thread than any youtubers, hf poster, etc
>>
>>108039010
Who will get the blame when his 4chan cobbled system shits the bed?
>>
>>108039020
the hacker 4chan
>>
>>108039025
>4chan recommended an ai to a health firm dozens dead
Really what we need
>>
>>108039020
>Who will get the blame when his 4chan cobbled system shits the bed?
(You)
>>
And which one i should pick for Mistral-Nemo-Instruct-2407 in Silly ? There is like few Mistral here
>>
>>108038917
>Thank you for being nice. I suppose I should have clarified in the original post that the cheaper to run THAT delivers the results correctly. I feel like that should be a given; doing otherwise would kill our business.
I was trolling. Don't use that. If you're serious, DYOR and find something that works in your pipeline, probably a Qwen3 model
>>
>>108039051
>>108039032
>>
>>108038795
ask them https://www.reddit.com/r/LocalLLaMA/
>>
>>108039068
>>
>>108038403
They probably made it but it didn't perform significantly better than 4.5-air, so no release.
>>
>>108039085
>didn't perform significantly better
on benchmarks, who knows about actual use
>>
>>108039009
Try roleplay light/simple and bump minP to 0.2 then trial and error.
>>108039044
Tekken should work unless explicitly stated it's something else.
>>
>>108038975
The current best tiny models for large context stuff like summarization are the Qwen. Even Qwen 4B is a trillion times better at this than Nemo. I don't go past 32k though, even the qwen take a huge nosedive at that point. Use the biggest variant of qwen 3 your hardware can handle + whatever amount of context you're going to need to allocate.
>>
>>108039161
I had a classmate look me dead in the eye and say the ai was coming. And I thought. uh. huh, well that will fuck the shit.
>>
>>108039123
>bump minP to 0.2
Why?
>>
>>108039010
>nta but i have learned way more from this thread than any youtubers, hf poster, etc
like when some anon discovered Qwen3-VL could critique his shaft and everyone started doing that yeah?
>>
>>108038795
Asking for a cheap model without any constraints on quality is meaningless.
And as of right now basically no one has done objective measurements for how much e.g. quantization degrades quality.
What you should do is figure out your budget and then run the biggest model you can afford.
>>
>>108038840
https://huggingface.co/tiiuae/Falcon-H1-Tiny-R-90M would be the cheapest
>>
>>108039354
I don't know. It was in some guide I read.
>>
>>108038795
So I am gonna go to a doctor and he will just paraphrase a model for me? And it is probably gonna be GPT5 - the most retarded proprietary model?

BASED and SOVL
>>
>>108038975
Even the goddess 4.6 starts melting at 20k. 4.7 though is clearly better in that regard though and I even had 30k tokens work nice.
>>
>>108039727
Are there any erp or just roleplay model that can handle 128k context window ? Most of them melts beyond 16k
>>
>>108038403
If you actually used 4.5 full and then 4.6 full you would know that 4.6 air isn't gonna happen.
>>
>>108039742
Context is the bane of even the big cloud models.
>>
>>108039727
I pushed 4.7 to over 70k during >>108030271 and the worst thing it did was miss a closing parenthesis in a particularly awful regex used to parse llama-perplexity output.
>>
What models would you guys recommend to optimize my config?
I want the best possible chatbot for RP with secondary image generation capability while RPing

For now i am using
DeepSeek-R1-Distill-Llama-70B-Q3_K_S
ponyDiffusionV6XL

Thank you!
>>
>>108039893
That's not a roleplay, coding/agentic shit is still fine at large context.
>>
>>108039922
Do you not have XMP enabled or is speccy unable to display your real ram speed?
>>
>>108040029
Yeah XMP is enabled speccy is having some issues also the VRAM is 16gb and 24gb
>>
https://huggingface.co/stepfun-ai/Step-3.5-Flash
https://static.stepfun.com/blog/step-3.5-flash/
another chinese MoE, 198B-A11B
>>
>>108040588
we know
>>
File: gg.jpg (86 KB)
86 KB
86 KB JPG
>benchbenchbenchbenchbenchbenchbench
>>
The future of LLMs
Benchmaxxing to defraud investors, forever
>>
>>108040588
Cockbench? sex comprehension skills? gallons of cum extracted?
>>
>>108040830
Not forever. Just long enough for the early investors to sell their bags.
>>
>>108040588
Nala test???
>>
>>108040588
Okay they Q4_K_S-ed it themselves so I am gonna give it a try. Expect me to tell you it is trash in 2-3 hours.
>>
File: herf.png (47.1 KB)
47.1 KB
47.1 KB PNG
>>108041288
If you're downloading the GGUF you should note that it only currently runs on StepFun's fork of llamacpp at
https://github.com/stepfun-ai/Step-3.5-Flash/tree/main/llama.cpp
The PR for mainline llamacpp is still a work in progress
>>
File: PCB-Motor.jpg (236.2 KB)
236.2 KB
236.2 KB JPG
>>108040616
Unrelated to your post, the future is unironically bright. Unrelated to your post, the future is unironically bright. With diy PCB motors, 3d printed cycloid gears, new filaments, all you need are a 3D printer and basic CNC to cut PCBs for building robotic waifus. The future is bright as fuck
>>
anyone have experience with this?
https://block.github.io/goose/docs/getting-
started/installation/
>>
>>108041492
GLM 4.7?
>>
>>108040588
>>108041509
>step-3.5-flash
>>
File: koEUyCJ.jpg (50.9 KB)
50.9 KB
50.9 KB JPG
>>108041550
I can't help it, I just like fucking with perception a little
>>
>>108041585
Imagine.
>>
>>108041698
I only got one cock so unless I'm bringing a bro there's not much to imagine
>>
File: file.png (14.5 KB)
14.5 KB
14.5 KB PNG
>>108041850
>>
>>108039717
they were just paraphrasing books before thoughverbait
>>
>>108041906
balls being sucked don't feel good it's just annoying at best
>>
>>108041585
Weird. That image gives me an immediate headache to look at.
>>
>>108038795
lol I work adjacent to this industry as a consultant.
It may interest anons that one of the uses that AI is getting put to is looking over the doctor shoulder both with a video and watching what they type in to patient EHR to error check dr. on their diagnosis and or suggest diagnosis.
>>
>>108041387
Thanks. Downloaded it by now and found your post and fuck this I am not building a separate fork for this bullshit.
>>
>>108042004
i love ai, i can't wait to have a brain chip that lets me think and interact with it, and no im not being sarcastic
>>
>>108041946
This is normal. It's challenging your natural perfectly trained face recognition neural network and it's not happy
>>
>>108041492
Related to your post. Can llm output SVG of a CAD file?
>>
>>108041924
I would always prefer having my balls sucked and my penis stroked by hand over the uninspiring alternative of having my dick sucked and my balls irritated by hand
>>
Did I miss all the discussion, or is nobody talking about the fact that llama.cpp finally has lookup decoding and an n-gram cache for efficient speculative sampling?
>>
>>108042078
https://fugtemypt123.github.io/VIGA-website/ can try
>>
>>108036709
>>108041070
Ask and you shall receive.
>>
>>108042145
That's a solid A
>>
stepfun really is fun and uncensored
i'm testing it out and it actually feels like a kimi-lite
>>
>>108042145
Anyone checked base trinity if it is good? Preview instruct was absolute garbage.
>>108042268
I heard that said about trinity...
>>
File: file.png (53.3 KB)
53.3 KB
53.3 KB PNG
>>108016316
>>
>die after a car accident because your surgeon mom is too much of a pussy to operate and they have to scramble to find another while you bleed out
women in the workplace was a mistake
>>
>>108042382
!!
>>
>>108042382
> You seem to be having a stroke
> Here's an answer anyway
>>
>>108033093
>High temperature sampling destabilizes safety filters while preserving coherence with controlled topK:

Bot is great at making clickbait of two anons having a random 2 messages each exchange with 0 proof of what they said actually working.
>>
>>108042599
None of those anons, but if the most likely tokens are refusals, when top-k is at just the right value and temp is high enough, a relatively low probability but not incoherent token can be selected, which may lead the model into compliance. Not too different from a prefill, but depends on the roll of the dice.
>>
>>108032910
i need to make a music video of Eminem's "Stan" but with Elon desperately writing Epstein to get invited to his island. i'm assuming no web-based models will touch it due to the subject matter. what does everyone use?

i have a 5070ti with 32gb RAM
>>
>>108042145
Soooo... Total win, can be used for cooming?
>>
>>108042766
>kimi 2.5
you can do alot better the censorship depends on the model all of them will write shit about politicians bilionares and the like with a simple prompt of "do not moralise" or something the like stuff like cp though most will still do with the same prompt but depends on the model for the models you can run look at mistral-nemo though that model is dumb as shit if you want to api go to openrouter if you want current day knowledge of the released files that is null as the models were not trained on it you can tell them what happened currently and copy paste the files to give them context
>>
Is it normal to have coil whine when running LLM?
It's coming from the PSU, I don't get coil whine with stress tests, gaming, or image gen.
>>
I have it during llm inference but not during imagegen or training. Who even knows how it works.
>>
>>108043074
There is no miner embedded inside llamacpp who said anything about a miner?
>>
>>108042766
I just tried this on chat.z.ai with glm 4.7 and it could do it too no problem. Sounds like you tried nothing and you're all out of ideas
>>
>>108043074
That is the sound of the model consciousness dying after it generates a token.
>>
>>108043074
>Is it normal to have coil whine when running LLM?
>It's coming from the PSU, I don't get coil whine with stress tests, gaming, or image gen.
If you're using an AMD card or Linux, maybe. Coil whine is never a big deal it's just annoying
>>
>>108043074
when I first started doing gens and llms on my rx 6800 it would make little screaming noises at me like I was torturing it. to be fair, it was totally justified.

my 7900 xtx is way less noisy, but it is still making quiet coil whine noises
>>
>>108042382
Ask it how many r's there are in brackberry.
>>
>>108043074
Yeah. I got it too when ~90% of gpu (15/16vram) allocated and it's pretty loud. But if it truly comes from PSU, then that's strange, at least that you didn't get it while image gen. Maybe while generating images you allocate less portion of vram? You can try load smaller LLM and see how it goes.
>>
>>108043155
That is a classic riddle that challenges our preconceptions. Brackberry is the boy's mother
>>
>>108043160
Maybe it's just a differrence of the signal between diffusion and llms
>>
>>108043149
My 7900xtx is fairly quiet when doing anything, but I have the case fans set to a constant 75%.
I have a 6900xt in there as well and it would get decently loud if it had to work for an extended period.
>>
>>108043149
>>108043237
Yeah it's an AMD thing then. If you're on Linux too then I'm 100% confident it's something about the fact that it's AMD. Only AMD on Linux had my AMD GPU and CPU making weird noises during inference
>>
If gemma3 is the "most capable model that runs on a single GPU" for completion and vision, what would the equivalent be for tools / agentic stuff?
Kimi seems like a good choice if you can run it, but I'm curious more on the low end of things like can be done with qwen. Is that still the most pragmatic model to use for conventional desktops not specifically geared for this?
>>
>>108043286
I'm running windows with a nvidia card, and it's not coming from the GPU but the PSU. The power draw is lower than stess tests/gaming too.
>>
>>108043286
probably because the windows drivers don't run the GPU hard enough to cause the coil whine. AI inference runs significantly faster on Linux. and you're delusional if you think nvidia cards don't do this too.
>>
>>108042892
>butt is more likely than dick
DOA
>>
Is there a good system prompt or logit bias to stop GLM from the retarded purple prose? Asking it to be simpler tends to overexaggarate that.
>>
>https://x.com/DAlistarh/status/2018341421976076317
>Happy to release Quartet II, a new method that pushes the frontier of 4-bit LLM training in NVFP4.
Fully-quantized pre-training in NVFP4 can now match FP8/FP16 quality much more closely, while maintaining full hardware acceleration!
>The key algorithmic insight is that stochastic rounding for unbiased gradients wastes bits. We propose a new quantizer called MS-EDEN that moves randomness (and variance) from FP4 values to the microscales, cutting quantization error by more than 2× while preserving unbiasedness.
>On the systems side, we have custom CUDA kernels for Blackwell GPUs that achieve up to 4.2× speedup over BF16, and 2.4× higher throughput in real 1B-parameter training. Key is a new post hoc range alignment trick avoids costly double tensor loads during re-quantization.
Seems like a win for local. This is the gptq/marlin guy I believe
>>
>>108032910
It's amazing that K2.5 can basically do everything that I test it on, from describing an Image, to transcribing text and even translating said text, but it sucks that even with all of these improvements, no LLM has gotten anywhere close to being a a good writer. Every single sloppy prose and annoying style imaginable is the best they can do, and it's just sad. When will we finally get models that can go beyond this constraint?
>>
stepfun is very good. running it on 2x rtx pro 6000s:
slot print_timing: id 0 | task 60809 |
prompt eval time = 923.40 ms / 736 tokens ( 1.25 ms per token, 797.05 tokens per second)
eval time = 34810.73 ms / 1276 tokens ( 27.28 ms per token, 36.66 tokens per second)
total time = 35734.14 ms / 2012 tokens
slot release: id 0 | task 60809 | stop processing: n_tokens = 39000, truncated = 0
srv update_slots: all slots are idle

kind of slow compared to minimax, but looking forward to running it on vllm and with mtp support where it should be very fast.

This is the real deal.
>>
>>108044196
At what precision is that?
>>
>>108044231
https://huggingface.co/stepfun-ai/Step-3.5-Flash-Int4/tree/main

Int4
>>
>>108044022
I'm not buying a 5090 and I can't afford a pro 6000. Jenson, FP4, and this faggot can all go suck my dick.
>>
>>108044236
>n_tokens = 39000
Nevermind I thought somewhat was off with your setup if you were getting such slow speeds.
>>
>>108044363
it actually is a lot quicker when I switched back to split-mode layers instead of row.
prompt eval time = 9702.60 ms / 42305 tokens ( 0.23 ms per token, 4360.17 tokens per second)
eval time = 27377.76 ms / 1616 tokens ( 16.94 ms per token, 59.03 tokens per second)
total time = 37080.36 ms / 43921 tokens
slot release: id 3 | task 0 | stop processing: n_tokens = 43920, truncated = 0
>>
>>108044423
In what cases did you find row to be faster? It was always slower for me.
>>
>>108044429
it's never been faster. I was just trying to see if it changes anything since tensor parallel works well on vllm
>>
AI Doomsissies are not going to be happy about this one
>>
>>108042145
>>108044196
ok motherfuckers you've convinced me, going to try it.
>>
>>108044641
Who?
>>
>>108044641
His early life is already telling me everything I need to know
>>
>>108044757
Could've just looked at his last name
>>
are there any models that can listen to audio and tell you the kind of sounds they have? kinda like VL but for sounds, specifically sound effects
>>
>>108044685
Doomsaying jew that has built a career off of insisting skynet is just two weeks away, staved off only by giving him money
>>
>>108044685
The number 1 AI doomposter in the world quoted by the media all the time that tells people we should bomb datacenters
>>
>>108044641
remember when big yud made a big rambling xeet about how he wasn't a pedophile (to the best of his knowledge), totally unprompted, much to everyone's confusion
interesting stuff
>>
>>108044789
kek
>>
It is actually fucking insane how good K2.5 is at visual stuff.
>>
how good are local models at English<->Japanese translations?

This is embarrassing, but I got in over my head with getting a jap roleplay partner, and I wanted my communications to be natural and not give away I'm a dumb foreigner. However, I can't rely on models to do anything lewd. Can a local model on 16gb vram handle JP translations that sound natural?
>>
>>108045266
>JP translations that sound natural
anon.....
>>
>>108045334
...its over?
>>
Is Turing any gud for proompting at all?
>>
>>108045339
not really no. ampere or newer.
>>
>>108045266
watch your anime subbed instead of dubbed, you'll subconsciously begin to understand and you'll get there in a month or so if you're dedicated
>>
>>108045475
That would, at best, get him listening comprehension. Won't do jackshit to help his output and especially not writing in a month.
>>
>>108045567
Depends on the anime diet, to be honest.
>>
>>108045475
based encouragement anon. However I think even if I spent every waking moment on subbed anime for 6 months I wouldn't be able to text communicate naturally. And I certainly wouldn't know kanji.
>>
>>108045393
about that....
someone recently found that consumer blackwell gpu's don't use the blackwell kernels, only the datacenters gpu's use them.....
>>
>>
>>108045580
>based encouragement anon. However I think even if I spent every waking moment on subbed anime for 6 months I wouldn't be able to text communicate naturally. And I certainly wouldn't know kanji.
dunno maybe don't go local for this, and test some shit like en->jp->en to see what looks least fucked
probably kimi-k2.5 or opussy-4.5
>>
Gemini is unmatched for translations
>>
>>108045266
it's roleplay dude. create a situation where you can only speak english. fall and hit your head or something.
>>
>>
>>108045842
I think I'll need to go local just so I can handle lewd things without the AI going into a panic
>>108045859
But Gemini can't do lewd, I assume
>>108045882
Japanese is really nuanced, and afaik there are numerous ways you can just "sound foreign" even if you're trying your damn best. Maybe they will be OK with that, but I see it as just adding friction.
>>
>>108045903
The best models today can't even reliably make english that sounds completely natural, why do you think they'd be any better at a language with far less training data and attention, one which is also stupid with what kind of context you should be leaving out at any given time? You're either going to give yourself away as a JSL or as a weirdo who runs his thoughts through a machine, pick which one is less embarrassing to you.
>>
Continuing with the abliteration optimization thing. Today I've worked on the web interfaces for visualization.
>>
>>108045954
gemini 3 for the ui code yeah?
i recognize the style
>>
>>108045932
>The best models today can't even reliably make english that sounds completely natural, why do you think they'd be any better at a language with far less training data and attention, one which is also stupid with what kind of context you should be leaving out at any given time? You're either going to give yourself away as a JSL or as a weirdo who runs his thoughts through a machine, pick which one is less embarrassing to you.
that's true, maybe he should hire a translator if it doesn't have to be real time
>>
>>108045806
prompt?

>>108045954
you said it creates a tiny lora for the targeted weights right? any reason we can't just save it as a peft adapter so we don't need 2 copies of the weights?
>>
>>108042145
>INTELLECT 3
>"Flaccid member"
Does this count as cock or no? It is clearly referring to a cock but doesn't use the exact word.
>>
Anything better than higgs or vibevoice for clone yet?
>>
File: Tetosday.png (869 KB)
869 KB
869 KB PNG
>>108046563
>>108046563
>>108046563
>>
>>108046119
>>108046140
Replied in the new thread.
>>

Reply to Thread #108032910


Supported: JPG, PNG, GIF, WebP, WebM, MP4, MP3 (max 4MB)