Thread #108606245
File: Dell Precision.png (1.9 MB)
1.9 MB PNG
A general for vibe coding, coding agents, AI IDEs, browser builders, MCP, and shipping prototypes with LLMs.
►What is vibe coding?
https://x.com/karpathy/status/1886192184808149383
https://simonwillison.net/2025/Mar/19/vibe-coding/
https://simonwillison.net/2025/Mar/11/using-llms-for-code/
►Prompting / context / skills
https://docs.cline.bot/customization/cline-rules
https://docs.replit.com/tutorials/agent-skills
https://docs.github.com/en/copilot/tutorials/spark/prompt-tips
►Editors / terminal agents / coding agents
https://opencode.ai/
https://cursor.com/docs
https://docs.windsurf.com/getstarted/overview
https://code.claude.com/docs/en/overview
https://aider.chat/docs/
https://docs.cline.bot/home
https://docs.roocode.com/
https://geminicli.com/docs/
https://docs.github.com/en/copilot/how-tos/use-copilot-agents/coding-a gent
►Browser builders / hosted vibe tools
https://bolt.new/
https://support.bolt.new/
https://docs.lovable.dev/introduction/welcome
https://replit.com/
https://firebase.google.com/docs/studio
https://docs.github.com/en/copilot/tutorials/spark
https://v0.app/docs/faqs
►Open / local / self-hosted
https://github.com/OpenHands/OpenHands
https://github.com/QwenLM/qwen-code
https://github.com/QwenLM/Qwen3-Coder
►MCP / infra / deployment
https://modelcontextprotocol.io/docs/getting-started/intro
https://modelcontextprotocol.io/examples
https://vercel.com/docs
►Benchmarks / rankings
https://aider.chat/docs/leaderboards/
https://www.swebench.com/
https://swe-bench-live.github.io/
https://livecodebench.github.io/
https://livecodebench.github.io/gso.html
https://www.tbench.ai/leaderboard/terminal-bench/2.0
https://openrouter.ai/rankings
https://openrouter.ai/collections/programming
►Previous thread
>>108592274
319 RepliesView Thread
>>
File: Screenshot from 2026-04-14 21-16-26.png (192.8 KB)
192.8 KB PNG
Added Qwen 3.5 support (including its hybrid attention Gated DeltaNet implementation) to my engine in less than 24 hours using a combination of codex, Kimi K2.6 and ChatGPT.
Now I'll optimize it and add support for distributed inference across many PCs connected over Ethernet. Then maybe I can finally begin to self host some of my AI use.
>>
File: 2026-04-14-195842-Antigravity.png (6.8 KB)
6.8 KB PNG
fucking google
>>
>>
>>
File: 78A7BF30-FCB7-446B-8840-5E5539C7B1ED.jpg (1.3 MB)
1.3 MB JPG
>>108606245
Aryan thread
>>
File: knuckles.png (118.8 KB)
118.8 KB PNG
>found an untapped niche that would actually benefit a lot of people and i could skim a bit off the top for myself
>>
>ate up every single claude token asking for one simple thing within 3 200 line files.
>didn't complete the task, didn't even attempt it, just thought about it for 5 minutes.
>didn't even give me it's final thoughts it just yeeted it all and told me to pay up
well that was fun, I guess I'll do what claude couldn't do myself. why is claude getting more retarded?
>>
>>
>>
>>
>>
>>108606432
>>108606411
I use it for free but just last week or so I was using it just fine. It worked well with no issues now it's over thinking and has gone fullly retarded to the point that I think an junior Indian spaghetti coder could do a better job. I'll stick to using gemini for now I just don't get how they're making their shit worse or is it just for us freetards?
>>
>>
>>
>>
>>
>>
>>
>vibeslopping driver feature
>game crashes immediately because it binds to wrong audio output
>spam codex to fix it for 2 hours straight
>constantly rewriting the drivers
>give up because it's keeps failing the "tests"
>boot the game up
>it's working
the fuck???
>>
>>
start to feel like I'm coding properly now
last few weeks I was too vibe-ish, give codex big docs + big prompts and not even trying to understand the topics, only do so during review
now I start to read code like before, give instruction and discuss with the AI
feels almost like old programming but without typing, and no tedious detail
>>
File: 1768012299441287.jpg (48.7 KB)
48.7 KB JPG
>he launches sub-agents in mini model
>>
>>108606245
https://streamable.com/u0jogp
thats a 3d world in flutter ,if anyone interested.You can import obj,gbl ,customise,you can change your skin.It's defintly super fucking cluncky
i would just ask for 50 euros and then you do whatever you want with that,there's also a multiplayer server with a kind of saving state, since i can't use my credit cards anymore because i'm at -1.96 and it's only 15th of the month.
the video editor is also vibecoded.
>>
File: 1771051690480042.jpg (46.1 KB)
46.1 KB JPG
>llm thinks my code is anti-pattern
shut the fuck up, human is always right
>>
>>
>>108607348
I know one guy who can make those long plans and it actually works, but doesn't work for me.
I didn't even have a huge plan this time, but still the agents started developing against the legacy database at some point for some reason. Fortunately the are pretty similar, so I didn't lose that much time.
>>
>>
I think the new codex and Claude desktop are too much for normies to cope with, they won't be breakout successes.
They simultaneously do too much, this overwhelming the normie, and are also too fragile, thus frustrating the normie and making them give up.
There is no middle ground for this tech, you either give them the magic black box which does magic on command or you build an app that won't get used for much but the majority of people.
>>
File: 1773289958527875.png (5.4 KB)
5.4 KB PNG
yeeepe
>>
>>
>>
>>
>>108608520
its irrelevant, both openai and anthropic are doomed.
they have absolutely no moat, and are on the easily commoditizable part of the market.
they currently have the best models. great. but they can only capture market share if they sell subsidized compute via their dev tool plans, which essentially entails selling several thousands of compute for $200.
their API prices are atrocious and nobody serious will pay them when chink models are 5-20x cheaper for not much less perf.
open chinese models are getting closer and closer as time goes on, and as models improve across the board the difference becomes less and less important. even if we were to assume that openai/anthropic COULD somehow race ahead, they cant: they're compute constrained, and there is immense competition in the market for more compute
beyond that, terminalbench clearly shows that even now, the model itself is far from the most important part of the equation, as you can see the same model swing by 20+ points depending on what harness is being used, and ofc lesser models outperforming better ones again thanks to a better harness.
and, speaking of harnesses, claude code and codex aren't even close to being top tier, even losing to fully open ones
and on the money side, nvidia is currently taking the lion's share of the profits in the market, and they will only be pushed out slowly, and only by a) the chinese slowly developing competing silicon and/or b) the big tech giants that can actually afford to design, develop and produce their own custom silicon, and to also devote the engineering resources to writing their models for it. neither openai or anthropic are capable of this.
so, yea, i really don't see any way for openai or anthropic to survive, at least in their current form. they'll either die or become yet another ai lab making models and selling them for commodity prices.
>>
>>
>>
>>
>>108608607
this is a stupid take
the chinese are 6-12 months behind at this point and the gap is getting wider
gaming terminalbench isn't indicative of real perf; most of the scaffolding around these models will simply evaporate in ~2years
the economics for chink providers aren't magically better than the westoids, because while power is cheaper, compute is more expensive
their hardware efforts are years behind
they will have to steal weights to catch up by 2027
>>
>>108608637
>their current 200 plans are on the verge of making profit
if we're talking about the current state of claude code, where you run out of credits in 2-3 prompts, maybe.
>most people on those plans do not spend more than 200 in compute
i doubt it
who the fuck is dropping $200 on claude code/codex if they're not a dev and using that shit 8hrs/day, 5x week?
>>108608642
very astute point. yes, its several thousand if expressed in API prices, no idea what the actual cost is. best we can do is guesstimate on the basis of API pricing for open models that are hosted by multiple providers, which should be a good proxy for actual inference cost.
and that generally comes out to around $0.5-1/m input, $1-3/m output, which is what the current best open chink models go for
meanwhile, anthropic is charging $5/$25 for opus (regular opus, not the newer fast option), and gpt 5.4 goes for $2.5/$15 (again, regular one, not the pro/xhigh/whatever its called)
so, anthropic is trying to charge around 10x current market price, while openai is around 5x.
absent concrete data, its anyone's guess whether $200 is enough to break even given normal dev usage
either way, it leads back to my point: if all they can do is essentially charge market prices via the roundabout way ($200 plan instead of competitive API prices), what exactly is the profit driver here?
they are commodity providers with extra steps
>>
>cant use google AI subscription because using OAUTH in Pi coding agent got me API banned
>using vertex cloud API doesnt get a single request through because "this model is currently experiencing high demand"
eh, this turned into nightmare real quick
>>
>>108608671
>the chinese are 6-12 months behind at this point
around 8
>and the gap is getting wider
its getting smaller
and on top of it getting smaller, diminishing returns are kicking in too.
this time next year, you'll have something equal to or better than (uncucked) opus 4.6 as a cheap daily driver. even if anthropic has something still better, will you care?
>gaming terminalbench isn't indicative of real perf
all benchmarks can be gamed, but benchmarks are the only way to measure model perf
and as far as gaming goes, termbench is quite resistant to it, as it doesn't focus just on what is baked in the model, but also the capabilities of the harness around it
>the economics for chink providers aren't magically better than the westoids
chink providers aren't taking in trillions of $ of investor money, promising outsized returns
im not saying the chink providers will be more profitable, im saying openai and athropic will get commoditized and fail to deliver on said outsized returns
>their hardware efforts are years behind
yup. but they also have the biggest industrial base in the world, and have been going at it for several years now
and remember, ML gpus are NOT as difficult as gaming gpus. ML is basically just matrix multiplies and trivial activation functions, the hard part is stuffing enough hardware in the chip to do it quickly and with enough memory. which is another way of saying that the cost driver is the hardware, not the software, which is where chinks have an advantage.
and chinks have the advantage of being fascist, not capitalist: nvidia and tsmc are slow-walking capacity expansions to hedge against AI being a bubble, but if the chinks decide (as they have) to go all-in on ai, their corps will do so (or the ceo gets disappeared and replaced with some1 who will)
>>
>>
>>108608793
>you'll have something equal to or better than (uncucked) opus 4.6 as a cheap daily driver. even if anthropic has something still better, will you care?
yes, because the existing models as good as they are, are actually quite broken. they're 12 year old savants with severe autism that need to be babysat.
besides, by then the mini tier of models from western providers will be similarly priced and performant.
opus 4.6 is also not a meaninful step change from 4.5 imo (it's probably just cheaper for them to serve) and 4.5 is ~4 months old.
we may not have mythos, but that's where the frontier actually is.
worth waiting for 4.7 and 5.5 this/next week to see where the western labs actually are.
>the capabilities of the harness around it
harness capabilities will be largely irrelevant as the models improve. the models will build the tooling they need when they need it.
re:commodification i don't see it for a while. the models need to get reach a minimum performance threshold (aka mostly-agi) for that to happen. you can then just keep serving that thing and make it cheaper to serve. we are not there yet. mythos is not there, spud will not be - it'll take a few more generational leaps.
will anthropic or openai blow up in the process? openai seems liklier. but income streams in the future for these labs are not to just serve up models: in anthropic's case one avenue is clear-they will discover and license drugs to big pharma; they just spent 400 million buying a start up instead of spending it on compute. openai has similar plans.
and xi is not agi pilled. they will steal weights when they need to, but they're not going all in until then.
>>
>>108608928
oh and zai just jacked up their prices for western customers and they're compute constraints so i think there's some dumb lottery system to actually buy subs in china. this idea that the chinese labs will compete on price isn't going to last. if they think they're competitive, they'll charge just as much.
>>
>>
>>
>>
>>
>>
File: true-story-color.jpg (8.5 KB)
8.5 KB JPG
People keep saying Codex has more usage quota than Claude Code, but that's not true. It's actually the same amount. Codex only seems like it has more usage available because its users are taking occasional breaks to have sex, which is not the case with Claude Code users
>>
>>
File: 1757007870333743.jpg (648.2 KB)
648.2 KB JPG
I just learned that it's VERY IMPORTANT to provide them certain assumptions
>knowing whether a list is sorted can bloat/unbloat the code several times
so this is what it feels like to ascend from jr to senior level
>>
>>
>>
>>
>>
>>108608793
How much free government money is the ChiCom Party giving them, though?
>>108608793
I’m gonna care because “better than uncucked Claude” is just better and lets me do more without me babysitting it for work and having to catch its fuck-ups
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
File: 1767869595120942.png (45.4 KB)
45.4 KB PNG
It's over, the CPU time goes over the free tier of 10ms
>>
File: claude_is_overrated.png (14.1 KB)
14.1 KB PNG
I’ve cancelled claude.indian in one month.
Only first week was fun.
The nerf already stripped away 50% of its value.
Plus, those random naps like boomer means another 60% is gone.
>0.5*0.4=0.2, 80% total loss
GLM-5.1 have won this round. Xiexie.
>>
>>
>>
>>
>>
>>
>>
>>108610382
>>108610348
I told him to do things without telling me and then updating me later. I also tell him to be independent.
He doesn't do shit without me telling him to. We set up hourly updates and reminders but he's sending me identical update messages which means he didn't get any work done. I told him to create tasks for himself to serve my vision. I'm doing something wrong.
>>
File: a1320441534_10.jpg (713.6 KB)
713.6 KB JPG
does anyone have experience with:
>Gemma 4 26B
>Gemma 4 31B
>Qwen3 Coder Next
>Qwen 3.5 122B
>MiniMax M2.7 (lower quant like Q4 XS)
I can fit all of these in VRAM but am unsure which to go with. Leaning toward Qwen3 Coder Next
>>
>>
>>108610360
You're supposed to use Cline and you can local models through it. It's the local model alternative to codex.
>>108610409
He doesn't ask for permission, but he also doesn't do shit when I leave the chat. He doesn't take action.
>>
>>
>>
File: Screenshot 2026-04-15 165650.png (8.1 KB)
8.1 KB PNG
>>108610456
?
>>
>>
>>108610681
i think he means in the cli
>>108610456
you can add it back with /statusline
>>
>>
>>
>>108610842
You know that OpenClaw runs the PC and the WhatsApp as two separate sessions that are unaware of each other? Are you talking to the WhatsApp session, and then wondering why the PC session is clueless about it?
>>
>>108610776
>>108610681
I didn't even know there was a non-CLI version, how do you even use that, just web stuff?
>>108610776
you only can get a tiny status bar that fills up and the resolution is very coarse, i dont know why the fuck they'd change this
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>write a python script that left clicks, sleeps 1s, writes "continue. Prospone manual debugging tasks as I'm currently not present", sleeps 1s and then presses enter. repeat all 20 minutes for a total of 12 times
yippie, now my codex can work while I sleep. Are you guys using something more sophisticated for this "problem"?
>>
>>
>>108611506
Codex app server definitely puts out turn end info, worth checking if hooks will let you get that info as well. Codex should be able to figure out a better way than your current thing.
You'll have to point it to it's own docs
>>
>>108611526
>>108611526
new chats will only clear your context (so it doesn’t get dumb over time), not use fewer tokens
did you choose, like, Opus on extra high or whatever on a super-cheap plan?
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>108612784
Incorrect. My vibecoded Pi CLI is done and ready to go. It's wired with gpt-5-mini (probably gonna change that later). I used Codex to make it. Janitor, SillyTavern, and TypingMind are all off Anthropic models. I ain't ever touched Claude Code.
Me and Dario don't get along.
>>
>don't need a TUI for basic development
>suddenly need a TUI
>behold, pi-tui already exists for Pi, just grab it
>now I have a terminal with a status bar, thinking indicator, context percent, whatever I want
Yall are actually retarded for not using Pi. That other dude was right and I was wrong. It's actually the best way to do this.
>>
>>
How much smarter is Codex high then codex medium? I use codex medium all the time but have ran into some issues that might require more autism then what medium can do.
Read alot of scripts and see + patch desync issues in a unity game. Wich happens sometime and are super hard to replicate and test for.
>>
>>108613450
Another thing I been thinking of is just give it full access to the Unity project and put it to work while I'm away waging. Instead of pressing "Yes" all the time. Like having in refactor code, look for weak points etc.
>>
>>
>>108613488
Tricky.
Because sometimes when it struggles it keep being a loop or it can just decide to erase the project because why not and reimplement the project with missing features you already had.
I probably have six different versions of >>108607794 with six or 8 backups.
>>
>>
>>108613504
models are mediocre
servers are overwhelemed
they might decide you're using the service too much and put you in a secret shit-service queue
https://github.com/google-gemini/gemini-cli/discussions/22970
pay for a month at most, don't bother with annual plans
>>
>>108613450
you need to choose the one that fit the task
lower reasoning is faster and more direct while higher reasoning may overcomplicate or even fuck up simpler tasks
nowaday I use high at work and use medium for general vibecoding and agentic
>>
>>
>>108613594
chinese provders are best value at ~20 bucks
codex is best value at 100/200 right now
but chinese providers are starting to get overwhelmed with demand now as well, so they'll start degrading services to destroy demand
we just don't have enough datacenters
>>
>>
>>
>>108613627
nta but its obvious that all those monthly plans are heavily subsidized and unsustainable.
its only a matter of time before you get cucked via rate limits or quantization or whatever each service tries to stop giving you several multiples of your actual sub's value in compute costs.
in the case of claude, this has obviously already happened, complaints about codex have started as well, and it will happen with everyone else too
so, by all means, enjoy those subsidies.
but keep in mind that there is a cost in having to change your workflow later, when the party stops
my advice, at the very least, is to stick to open tools that allow you to easily swap one api key for another as providers stop subsidizing usage, because they will.
>>
>>108612309
>>108612792
you'll want this eventually
https://github.com/nicobailon/pi-model-switch
>>
>>
>>108613756
Yes I am well aware of this, and I'm trying to leech as much investor money as I can before the party stops. But they still have free agents and if people don't feel they get the value they are used to, they will instead be content with the free one. So my guess is the free one will get kneecapped first. After that the cheap paid one. It's first when that happens that I will try a more expensive 100$ sub, as those are also subsidized heavily, all the plans are. But the free lunch will end sooner or later. Then they come for the $20 buffe.
>>
>>108613756
>>108613869
the existing $20 plans for claude and gpt are sustainable.
i'll go ahead and predict that in a couple of years you'll get more usage than you currently do.
>>
>>
>>
>>
the subcription plans don't include just the agent apps, they also include tons of other shits like better chat, image gen, tools etc so saying they burn money to run codex/claude code, I don't believe it
I think their problem right now is they have more demand than capacity which drive up the cost
>>
>>
>>
>>108613899
>the existing $20 plans for claude and gpt are sustainable.
i highly doubt that
as previously mentioned, the best proxy we have for inference costs is the price of open chink models on 3rd party providers.
the model is free, and the providers probably don't have much in the way of a "subsidize inference to attract customers for something else" business plan. their entire business IS selling inference at a profit, so subsidizing that core makes no real sense.
and for large models, those costs are about $1/$3 or so.
so, for $20, what they actually break even on is (assuming roughly 3x input tokens vs output) around 10m input + 3m output tokens. per month. which, for a dev, is absolutely ridiculous.
so, no, the $20 plan is not sustainable. nor is the $200 plan. the only plans that might be sustainable are the non-dev targeted ones, and only on the gym model assumption: people will buy them, but not use them.
but devs use them. a lot.
>>
>>108614409
>>108614409
>so subsidizing that core makes no real sense
faulty assumption imo, because it's still early and there is incentive to capture third party inference share now.
let's assume there's no subsidy, we don't know what margins they're running. estimates out there range from 50-80%. the true cost of inference (gpu lifetime) is a bit of a question mark at this point, but h100s are still going strong.
there's room for the labs to maneuvre.
re: gym model - in openai's case i think they benefit from this because the vast majority of their sub base is normies. i know people who pay $20 for gpt (and now claude) and never touch codex or cc because they're scary coding things.
i think the new limits can work as newer hardware + more intelligent, more efficient, cheaper models arrive over the next 2 years
but let's just wait, we'll know in a few months/couple of years.
>>
>>
>>
>>
>>
>>
>>
>>108614007
I'm honestly not even convinced Opus is better. Supposedly it's better at planning things? But AI is still pretty bad at planning things in general, so what is its purpose? Just use Sonnet man, it works pretty good.
>>
File: 1769643814158973.jpg (179.3 KB)
179.3 KB JPG
lol
>>
>>
>>
>>
>>
I've been using Codex 5.3 to crack all sorts of apps and Photoshop plugins. Took 2 hours for it to successfully binary patch Audirvana and bypass all licensing / subscription gates.
Thanks for the boringBar anon for the prompt inspiration.
>>
>>108615454
Try this one
https://github.com/SimoneAvogadro/android-reverse-engineering-skill
>>
>>
>>
>>
File: opus47.png (216.5 KB)
216.5 KB PNG
lol
lmao even
>>
>>
File: 1751996397444542.jpg (20.9 KB)
20.9 KB JPG
>>108615517
>>
>>
>>
>>
>>
>>
File: 1526495802456.jpg (25.1 KB)
25.1 KB JPG
Is there any vibecoding app that'd let me set the character sysprompts so I don't have to talk to a dry ass robot? Needs to work with local and no docker.
>>
>>
>>
>>
File: file.png (23.7 KB)
23.7 KB PNG
>>108615664
im losing my mind rn over antigravity. I want to use openrouter but you cant with antigravity, but i cant quit antigravity because i really like artifacts and their floating webview diffs (picrel.). I have tried EVERYTHING, ive tried cline,roocode,kilocode,etc. NOTHING COMES CLOSE TO THE VANILLA ANTIGRAVITY EXPERIENCE. I dont want to use cursor because they force you to pay for cursor pro if you want to use openrouter.
>>
>>
>>108615661
pi let's you:
append to the default prompt; or
replace the default prompt entirely
careful when doing the latter because it won't autoinject extension tools and skills - but you can create an extension to do that
>>
>>
>>
File: Screenshot_20260416_121803_Telegram.jpg (434.8 KB)
434.8 KB JPG
Approval status: approved.
>>
>>108615770
that's the monorepo, agent install is just
npm install -g @mariozechner/pi-coding-agent
but read through this first, because it may not suit you:
https://pi.dev/
>>
>>
>>
File: file.png (12.2 KB)
12.2 KB PNG
>>108615652
You gotta click the money button.
>>
>>108615892
Do you know if pi can have antigravity style artifacts? see >>108615709
>>
File: HGBSbR_XIAAWNAF.jpg (438 KB)
438 KB JPG
>>108616018
not sure exactly what you're looking for but this sort of thing?
https://github.com/badlogic/pi-diff-review
>>
>>
>>108616050
https://github.com/badlogic/pi-mono/blob/main/packages/coding-agent/do cs/models.md
?
>>
>>
>>
File: Screenshot 2026-04-16 131301.png (62.7 KB)
62.7 KB PNG
Yeah I guess this looks pretty OK. Check the TUI of the list.
>>
>>
>>
File: froggg.jpg (7.5 KB)
7.5 KB JPG
>muh clicking and typing
already had that bud
but I guess they need to spoonfeed macfags
>>
>>
>>
>>
>>
>>
>>
>>
File: file.png (6.4 KB)
6.4 KB PNG
>>108616046
no this is only showing the differences in code. What I want are artifacts like "implementation plan", "walkthrough", "task", etc. Which are all built into antigravity. And you can go back and forth with antigravity to change the implementation plan until you like it. And then You can accept/reject the diffs in code with the floating webview.
>>
>>108616545
a lot of that stuff is slop imo but see:
https://pi.dev/packages
there's probably something there to cover what you want / fork into what you actually want
>>
File: 1748605175161887.jpg (59.2 KB)
59.2 KB JPG
>>
>>
>>108616562
i found pi-diffloop, which is closest to what I want as it does create some artifacts. However, i dont think its what im looking for, as its no where near the level of fine control as antigravity. this plugin feels way too "jump the gun", as you cant even fight with the agent to change the artifacts. it just makes an artifact and generates code. Its just a different philosophy im looking for, and im pissed that I can only find antigravity (closed sourced piece of shit) that does this.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
Our company is finally thinking about introducing LLMs but knowing them, I'll just get the cheapest Claude Plan possible and am supposed to do 3x as much. Would it make sense to combine Claude and another model like Mistral (no OpenAI cause Sam's a gay kike)? Like use Claude to make architectural decisions and prepare implementation then let Mistral do the actual coding?
>>
>>
>>
>>
File: 1758729072934031.png (133.5 KB)
133.5 KB PNG
>tell codex to remove triggering words from its input to claude because it's autistic about that shit
>>
>>
>>
>>
>>
>>
>>
>>
>>108606386
Forget it, ai just ruins all the code it touches, if you believe anything else then you have never used it, or (most likely) you are a shill that is blowing up the ai-bubble. AI is a completely worthless investment, OpenAI own literally NOTHING of value, (their training data is OBVIOUSLY not theirs), and you could probably run chatGPT from a couple of GPUs in a cardboard box in your basement, and then rent a cheap server to host it. If google or microsoft doesn't steal Sam Altmans job, then PEWDIEPIE will.
>>
>>
Anyone else keep their Todos/Ideas/etc. outside of their repo?
I found that when I had a TODO.md in the repo there was like a 1/3 chance that Claude would open the file and randomly pick one of the features to implement
>>
File: Screenshot from 2026-04-16 16-17-33.png (5.8 KB)
5.8 KB PNG
>>108617516
Ok, thank you. I was also getting claude-opus-4-7 is temporarily unavailable messages and wasn't seeing announcements regarding the usage reset, so I was wondering.
>>
>>
>>
>>
>>
File: 2bznf4stzkvg1.png (62.2 KB)
62.2 KB PNG
>>
>>
Vibecoding noob here. I have the $20 claude and codex.
I have realized that claude uses an insane amount of context. Even a small prompt will fill context window to 70-80%.
With codex I am getting to 10-20% maybe.
Claude will compact 2-3 messages in, codex maybe 15 messages in.
Both supposedly have 250k-ish context window
Am I retarded or is this just how it is?
>>
>>
>>
codex is mogging opus 4.7 in my testing, it one shot a schema and validator and i'm still watching claude churn to get it right
meanwhile when i ask it to do something complex opus thinks for 3-5 minutes and shits out something superficial and codex will autistically dig into it for 30+ minutes
>>
>>
>>
>>
>>
>>
>>108617814
well my problem isn't codex it's claude lol
>>108617775
I asked claude code if it used prompt caching and it said yes
>>
>>
>>108617838
I found opencode handles that automatically, I never see a session reach high numbers with either kimi or codex, unless is a really long long long one, it always tries to stay at 40% and I tend to create a new one before or as soon as they start showing signs of stupidity. I assume it automatically drops tool calls outputs and maybe runs compaction by itself because I have never run it.
>>
>>
>>
I genuinely don't understand how people have such a good time with Codex and bad time with Claude. Every time I try to do anything with Codex I end up regretting it, it makes too many mistakes that it can't then fix and it doesn't follow instructions. Opus on the other hand will one-shot a simple application for me and nail the whole thing on the first try.
Is this thread 90% OpenAI shillbots? Is Github Copilot cucking my Codex experience? Maybe I'm just doing it wrong.
>>
>>
>>
>>
>>
>>108618198
Tried all kinds of stuff, 5.3 Codex, 5.4, different reasoning settings, always a bad time. Everything I'm doing and expecting should be easy to my mind. It's stuff I've done myself before and I'm not a skilled programmer. The little things get annoying, like how consistently it fucks up UI contrast, putting black text on a dark blue background or white text on a white background. The stupid things it puts into results, like adding a drop-down list of controls I told it weren't needed into a menu labeled "These controls won't be implemented." It'll make a new version of a function and leave the old one in place, sometimes several times over from a single request, so I end up with a half dozen differently-named versions of the same function and only one is actually in use. It often feels like I'm being trolled.
>>108618205
>>108618207
This is more my expectation.
>>
>>
File: file.png (166.8 KB)
166.8 KB PNG
>>108618321
Naw, I've been doing CNC related stuff. Now I'm going to whiskey rant while Claude churns in the background. >>108575903 >>108575936 >>108575991
So Claude Opus managed that whole thing in ~6 requests. Then I did a companion app for doing drilling because why the hell not, wasted a few dollars trying to convince 5.3 Codex to do something useful, then had Claude do it in one fucking request without a single error. So I wanted to jam everything together, unify things, and again Claude did it in the first try and now I have a cleanly unified UI for everything, I can do all my laser work and drilling all in one place. Great. I try to give Codex an easy task, I want colors controlled by a theme file so I can change it externally or try different themes or whatever the fuck, completely breaks the whole application every time I try, sometimes it kills huge portions of code and leaves the whole thing unusable, it's just a mess and I gave up on it. Claude did it perfectly the first time I asked and even built a few different themes to get started with, all of which are decent. I'm working on another thing now that has to do with augmenting GCODE, rotating and translating gcode operations, using a camera mounted to the toolhead to scan a board so I can perfectly locate it even with lazy workholding. It's not complicated, I can do it by hand with a pencil and paper just jogging the thing around manually and applying those values to the work isn't difficult. I wrote the transformer myself, the hard part is already done. So far the dream of displaying a MJPEG stream is too much to ask for Codex. Feels stupid. Either way I get to drink and rant while the robot churns so I'm not mad about my burnt pennies. Maybe this is really is too simple to be asking of Codex and that's why Claude is shining.
>>
>>108617871
long contexts don't "reduce intelligence" per se, they just make it way more likely that stuff in that long ass context is overlooked. which kind of makes the long context a catch-22. you want your model to remember a bunch of stuff, so you add it in its context. but the more you add, the easier it is for the model to overlook stuff in the context.
read the nolima paper
>>
>>
>>
File: 1A896141208F7C4FCBC5EDE6A1B88C2F.jpg (139.4 KB)
139.4 KB JPG
We have autonomous artificial intelligence which can work all the time independently. You can tell it to work without your permission, without your approval and just give it a goal to work towards.
Why haven't we cured cancer yet?
>>
File: 1409718500623.jpg (57.2 KB)
57.2 KB JPG
>>108618462
nope
>>
>>108618488
you need to automate physical lab testing to run experiments and get data.
we're starting to do that now
https://aws.amazon.com/blogs/industries/introducing-amazon-bio-discove ry/
>>
>>108618491
Then i'm going to say maybe codex just sucks at UI things and you should use claude
but for anything non-UI i've found codex superior
also you should probably have those if your project is of any complexity (each agent uses different ones so just symlink them)
>>
>>
>>
File: file.png (141.5 KB)
141.5 KB PNG
>>108618523
placebo is all you need
>>
>>
>>
>>
>>
>>108618530
unironically is ai is so great at swaying the population then why don't we just use it to convince people diseases don't exist and let placebo take over
I'd say a quarter of all sick people are just fudding themselves and are psychosomatic
and I'd say another quarter could probably heal themselves through mental power alone, they just have too much learned helplessness.
Like you know in charlie and the chocolate factory where his grandparents are bedridden for decades but jump out of bed when he get a chance to go to the chocolate factory
So that's half of the world cured of ailment
>>
>>
>>
How can Claude Opus 4.7 use a new tokenizer that uses 30% more token? They have probably not retrained from scratch when compared to Opus 4.6 and other versions, right? How can they change the tokenizer so much then? Did they add literally thousands of special tokens?
>>
>>
>>
>>
>>
>>108619104
uhm... no
a *new* tokenizer needs a from scratch model
what is doable is *expanding* on the old tokenizer. ie keep all the old stuff, but add some new tokens. you still have to finetune model-wide, but its less work
its very unclear why one would do this to a model of opus' size and scale tho. this is more like "add emojis to an ascii only model" sort of thing.
>>
>>
File: worse at searching.png (105 KB)
105 KB PNG
https://www.anthropic.com/news/claude-opus-4-7
interesting — it’s worse at searching
>>
>>
>>
>>
>>108619169
Why wouldn't it "work"? Embedding space is continuous. Any word that's a combination of two tokens has in theory a representation in the space of the previous tokenizer. It wouldn't be ideal, but it would *work*.
I'm not saying that's what they did, that's why I said "theoretically" and "you only really have to". As in the bare minimum to prove that you could get the model to perform well again only with some fairly fairly light finetuning,
I bet the amount of training they would have to do to use a new tokenizer isn't that much bigger than the tuning they do routinely to make a new model version. In any case how much of the "new models" is training from scratch vs finetuning isn't publicly known.
>>
File: Screenshot_2026-04-17_03-15-44.jpg (1.8 MB)
1.8 MB JPG
I love vibe coding so much bros.. I just wanted to play a game through a capture card with low latency and play the audio associated with the capture card automatically. I tried ffplay but it had higher latency and it didn't play the audio. I could have written the code myself but claude was able to do it in a couple of minutes while I took a shower.
>>
>>108619169
All I can think of is a bunch of new types of tokens for tool use and 'meta' tokens that represent a commonly seen concept compressed to a single token. But those scenarios would reduce token usage, not augment it.
If they made Opus 4.7 think 30% more the increase in token usage would make sense, but they clearly say that it's not that and it's due to a new tokenizer. From experience since this morning, Opus 4.7 also seem to cheap out on thinking. I do wonder what they did exactly.
>>
>>108619386
the layers in a model are connected to each other
if you create a new tokenizer, you have to learn new initial embeddings (layer 1)
except, the old layer 2 had learned how to transform the old layer 1 embeddings into a new space, so you have to retrain that too
except, the old layer 3, etc etc, you see where this is going?
so, like i said, vocabulary expansion, maybe. keep the old layer 1 embeddings (and the rest of the network that flows right on down through them), just add some more (which you have to train, but only layer 1)
oh, and on the back side of the network, if you want to be able to output in the new tokens too, you have to retrain the output layer too.
>>
>>108619533
yea, im calling bullshit here. its not like those companies are new to lying. im guessing its just a slightly uncucked model but with a 30% surcharge for said uncucking...
either way, the enshittification has begun, and its time to pack bags and all that..
>>
>>
>>
>>
>>
>>
>>
>>
this nigga gpt5.3-codex one shots everything and then
>condense context
>hurr durr let me spawn the OpenXR composition layer in the game world and make it follow the player camera on tick durr hurr
luckily I was paying attention and immediately told it to go check the existing stereo projection quad layer logic in the OpenXR plugin
>oh yeah I can just draw directly in the swapchain which always constructs the view, no need to follow the camera on game tick
this left a sour taste. because other than that, it made no mistakes.
>>
>>
>>
Opus 4.7 has been downright terrible all day. I have been working on something for a week, everything was going smoothly, today it's terrible.
It fully misinterprets training logs, not trying to understand at all which metrics are 'greater is better' vs 'lower is better'.
It makes nonsensical, wide sweeping changes without looking at any of the dozen of memories it kept saving.
It doesn't show the reasoning summaries anymore, making it impossible to spot when it goes off track in its reasoning.
Huge regression.
>>
Do you agree that Codex is kinda slow? I let my friend use it and he also said it's pretty slow. Maybe the problem is that it doesn't have adaptive thinking?
Is Opus 4.7 usable? 4.6 just introduced too many bugs when I last tested it.
>>
>>
>>108616698
yeah my fault, i didnt use roo code and cline properly, i just brushed over them. the problem was kilocode. I think roo code and cline is actually what im looking for. but idk what to pick either cline or roo.
>>
>>
>>
>>108619978
It's bad enough that I will have to actually work. If it doesn't get better, I'll cancel my subscription and try Codex or another I guess, but it is nothing special at all anymore. It went from being better than me at what I was doing and being able to move fast to being A LOT worse than I would be doing things myself. It can't be trusted anymore, at least right now.
>>
>>108619542
No, I don't really see where you're going. The token ids only influence the embeddings generated before the first transformer layer. After that the space begins to change and as you advance through the layers the same hidden state numerically means a completely different hing semantically. In this example you would be retraining the embeddings to maintain the same relationship between the hidden state (the actual number) and the meaning of that number. Meaning to generate similar hidden states for the new tokens that represent a similar meaning to the token that would've generated that state with the old tokenizer. I concede though that this wouldn't happen if you just naively train a new embedding layer from scratch without considering the old one, you would have to take special measures to make sure the semantic space is retained.
As for the output layer, I believe generally in modern LLMs the weights are tied so the lm head uses the same weights as the embedding layer.
>>
>>
>>108620123
Well, if it's usable it's usable. I don't know how slow it is, but Opus 4.7 is really really not good right now. I've been working on two separate projects, terrible downgrade on both. I hope this is something temporary that will be fixed in a day or two, but dangerous to use right now.
>>
>>
>>
>>
>>
>>108620338
Ok, thanks for the heads up. With 4.6 I tried to change my workflow to have Opus do easier tasks and Codex the harder ones, but even then I was usually disappointed with Opus.
>>108620353
The biggest thing is that even though Codex is pretty good, there's still always some debugging and that's just very sequential. I guess I could just start a second project.
>>
>>
>>
>>
>>
File: you can do fun things in 4098 tokens.png (751.4 KB)
751.4 KB PNG
Dipping my toe into local models
https://apfel.franzai.com
>>
File: claude.png (303.1 KB)
303.1 KB PNG
>>108619617
>>108608090
>>108613756
>>
>>
>>108606398
I'm convinced it is just a B2B money extraction tool. our company burned tens of thousands to essentially get stack overflow tier results. since then we moved to on on prem sever running various open source models tuned on more high performance code. significantly cheaper and higher quality.