At a glance
Nate B Jones tried GLM 5.2, the open weight model from Z.ai (formerly Zhipu AI), and it genuinely impressed him. Not a fake impress, he says: for the fat middle of everyday AI work, the brochure site, the standard slide deck, first pass copy, coding tasks with familiar shapes, GLM 5.2 is often better than Claude, and it is close to free to run. So why isn't every company routing its work to the cheapest model that can do the job? Jones spends the video answering that, and the answer is not about intelligence at all. It is about the harness, the whole system of files, tools, memory, and workflow that a model sits inside, and about who already owns the harness your company runs on. He walks through why the ergonomics of brand pressure, the difficulty of measuring your own task mix, the cost of rebuilding a harness from scratch, and above all the context lock in of new team level products like Anthropic's Claude Tag all conspire to keep companies paying frontier prices even when a 98 percent cheaper model can do most of the work. His conclusion: this is not a story about GLM 5.2 being bad. It is a story about the last mile of AI being a trillion dollar problem, and about who is going to get paid to build it.
Why GLM 5.2 blew my mind on everyday work
Jones opens by naming exactly what he is promising: by the end of the video, viewers should know where GLM 5.2 can be relied on, where it can safely replace an expensive model, and where switching models is a trap, because you are not replacing a model call, you are replacing a whole work system. That throughline is the whole video.
GLM 5.2, he says, did not fake impress him. It actually impressed him, because it is not just cheap, it is free if you run your own servers and very cheap on the cloud, and for a lot of normal work it is incredibly good, often better than Claude. By normal work he means the fat middle of everyday AI tasks: a brochure site for a client, a fairly standard PowerPoint outline, a first pass of copy, routine synthesis, and coding tasks tackling familiar problem types, the kind with lots of prior examples and outputs a human can check quickly. He calls this the center of the distribution of AI work, meaning the pattern has been tried with models millions of times before, the answer shape is normal, and the output is easy to inspect. How many brochure sites has anyone seen, he asks. In that world, GLM 5.2 is fast, cheap, easy, and extremely high quality, higher quality than Claude, and for a lot of those tasks it is not just good enough, it is honestly the best model in the world at center of distribution work, especially anything where front end taste matters.
To be clear, this is not a video about GLM 5.2 being bad, even though it is not his daily driver, and he says he is going to explain why. GLM 5.2 is incredible, and he is still not using it every day. A lot of companies he knows are struggling with exactly this tension: they want to move toward a generic router that sends work to the cheapest model available, but it is not actually easy to do in practice.
| GLM 5.2 | Claude (frontier) | |
|---|---|---|
| Cost | free on your own servers, very cheap on the cloud, about 98% cheaper than Claude | expensive at scale; one engineer burned $80,000 in tokens in a week |
| Center of distribution work | brochure sites, standard decks, first pass copy, familiar coding: often the best model in the world here | solid, but you are paying frontier prices for common patterns |
| Edge of distribution work | not what it is tuned for; not the video's claim to make | this is where frontier pricing earns its keep |
| Own harness | shipped with a Codex clone harness, a first stab | Claude Code, and now Claude Tag, a team level Slack harness |
| Context lock in | none yet, still building the last mile | Claude Tag reads a team's live Slack context automatically, hard to rip out |
| Switching cost for a company | needs a rebuilt harness: new prompts, memory, tool calls, from scratch | zero, if your stack is already built around it |
Cheap AI is here, and frontier releases are slowing
Cheap AI, Jones says, is not a theory anymore, and it is going to keep arriving, in part because the US government is now slowing down frontier model releases. The next major frontier model, numbered 5.6, is the latest to be affected, apparently set to be released customer by customer, which he reads as code for nobody knowing when it is actually coming. For the first time, there is no defined cadence for future frontier releases, even though the labs are still doing phenomenal work training and reinforcement learning their models. That is going to push more of the conversation toward open source, and a lot of that conversation is frankly about moving down the cost curve, because frontier model costs get expensive fast. He cites a story making the rounds: one engineer spent $80,000 in token costs in a single week. When people are spending tens of thousands of dollars a week on tokens, there is tremendous incentive to make cheaper models work.
So why isn't there a tipping point away from the frontier labs? Why are Anthropic and OpenAI still growing revenue like crazy when incredibly good, incredibly cheap open models exist? Jones lists the reasons, drawn from conversations with engineers and leaders at companies actually living this.
Why companies still aren't switching to open models
The first reason is the ergonomics of work. If you already have a frontier model at home on your phone, you just want that access at work too. There is real employee pressure around Claude and OpenAI that simply does not exist for open source models, and it is not small: when people vocally say a tool will help their work, overburdened IT departments tend to listen.
Center of distribution vs edge of distribution tasks
The second reason is harder to see: it is genuinely difficult to correctly figure out whether your task load is weighted toward the center of the distribution or the edge of it. If your work is edge weighted, novel, high stakes, hard to verify, you actually do want frontier models. If it is center weighted, common patterns with lots of precedent, open source models are going to be really, really good. But people are not used to measuring their own work this way, individuals aren't, teams aren't, and almost no company has properly asked the question of what its own distribution of tasks actually looks like.
Lindy rebuilt its whole harness to leave Claude
The people who have gone furthest on this, Jones says, are folks like Flo Crivello, who leads the Lindy team and very publicly wrote up his journey moving to a DeepSeek architecture away from Claude. Crivello saved a lot of money doing it, but he was also honest that his team had to rewrite its harness essentially from scratch around DeepSeek. They could not just take their existing systems for working with Claude, all their prompts, all the ways they handled memory, all their tool calls, and automatically lift and shift them onto the new model. It does not work that way. These models need their own harnesses.
Crivello was incentivized to do that work because Lindy literally serves AI as a service: if he can deliver a cheaper, equally effective product, it hits his margin directly, and it matters enormously. For companies using AI internally, for coding or back office automation, that ROI is not nearly as clear, and the incentive to make the jump is weaker. Jones says he has seen this from entrepreneurs personally: the ones actually making the switch, dealing with different tool calling conventions, different memory architectures, are the ones with a clear, direct ROI on a specific AI product already in market. For everyone else, the incentive is not as strong, so there is no equivalent push to wade through the harness rebuild.
A model is a brain in a jar without a harness
This is the idea Jones wants viewers to take away above all: a model can be an incredible brain in a jar, and it just is not useful to you without a harness. That is why he pays close attention to harness innovations, and he names three that are top of mind right now.
First, GLM 5.2 shipped with its own Codex clone harness. That tells him open source model makers are realizing they need to deliver harnesses too, not just weights, and he expects more innovation in that direction. Second, Codex itself is starting to publicly call out that you can use the Codex harness without using any OpenAI model at all, which is notable because it gives OpenAI a different path to value, and stickiness, than just being the default model inside it. Third, Anthropic is not sitting still either: they launched Claude Tag this week, and Jones calls it an incredibly sticky product, a team level harness at exactly the moment the industry is trying to figure out how to turn individually productive AI use into team productive AI use.
- GLM 5.2 ships with its own Codex clone harness, an open model maker's first attempt at owning the last mile, not just the weights.
- Codex starts publicly telling users its harness works without any OpenAI model underneath it, a different, stickier path to value than being the default model.
- This week Anthropic launches Claude Tag: tag Claude in a Slack channel and it works the job, a team level harness rather than an individual chatbot.
- Same week the US government's slowdown on frontier releases leaves the next major model, 5.6, shipping customer by customer with no public cadence.
Claude Tag and the rise of team level harnesses
Team level harnesses are where the energy is going, Jones argues, because so much of the value we have gotten from AI so far is individually productive, one person, one chat window, not team productive. Companies are trying to figure out how to turn individual productivity into something the whole team benefits from, and Claude Tag, literally just tagging Claude in a Slack channel, is one of the first examples of a sticky, viral, consumer facing team harness. An ordinary knowledge worker does not need to know the phrase "team harness." They just tag Claude and it works.
But look at it strategically from Anthropic's side, Jones says. They are no longer just capturing engineers who choose to use Claude Code. Now they are capturing every knowledge worker in Slack, reading all the messy context that lives there, the context nobody has ever managed to codify, and feeding it into Claude automatically. Within privacy policy limits, that is context Anthropic's systems get to learn from, long term, in the context of that specific company, and it starts to let Claude own the harness itself in a way no company can easily walk away from.
Why you can't rip out a model that owns your context
Here is the trap, stated plainly. Say GLM 5.2 really is roughly 98 percent cheaper than Claude and about as good on most tasks. It would be entirely rational to build a routing system that sends most work to GLM 5.2. Except: are you going to have Claude Tag on that stack? Is that convenience going to be there? Are you going to have to restart the job of giving your new AI the company context Claude already acquired automatically inside Slack?
Jones frames this with an old business truism: we have taught companies for decades that data is alpha, that data is the edge you have if you are serious. If that is true, then handing all of that data over as context to a frontier model provider, even a scrupulously ethical one with a strong privacy policy, means you are effectively renting your own context back from yourself. Claude sits inside your Slack as a team level harness, incredibly close to everything your team does, and that proximity makes it nearly impossible to rip out, no matter how cheap a GLM 5.2 class model gets.
Jones thinks the GLM 5.2 team knows this, which is exactly why they shipped a harness, a Codex like interface, alongside the model. It is a first stab, but the industry has to go much further, because the companies that most need to build their own harness generally cannot afford the AI talent to do it. That talent is scarce enough right now that it can charge almost anything, and it usually ends up at a hyperscaler or another large company instead. So the only companies actually building their own last mile harnesses and auto routers today tend to be the ones that can afford that scarce AI talent in the first place.
The harness talent shortage is a builder's opening
Jones frames all of this as, at bottom, not a story about intelligence. It is a story about the last mile in AI, and about how scarce the talent to build that last mile really is. He thinks that should be a source of optimism for a lot of viewers: if talent that scarce is what stands between companies and being locked into frontier contracts, there is an enormous opportunity in knowing how to build a harness.
It is not easy work. Knowing how to handle a tool call correctly in GLM 5.2, and how that should differ from handling one in Claude, figuring out how memory should work for that system, figuring out how a system prompt needs to change for a center of distribution model, all of that is serious technical work. Anyone who can do it, or even parts of it, refactoring agentic pipelines so they work well with an open source model, is going to be in high demand, especially paired with the skill of routing: recognizing on the fly which tasks are frontier model tasks and which should go to a cheaper open source model instead. Jones calls this a huge investment theme for companies in 2026 and 2027.
Compare that to what the frontier labs are doing with their pricing power. Claude Tag is a great example of how incentives in closed source, high margin models produce genuinely great experiences fast: if you have pricing power, you are heavily incentivized to make your product as convenient and ergonomic as possible, and features like Claude Tag are going to keep appearing rapidly from Anthropic and from OpenAI, because both are incentivized to protect high prices and win that business. Open source model makers do not have the same margin or cash flow to deploy thousands of forward deployed engineers making a harness sing.
So one of the strange, simultaneous truths Jones lands on is that GLM 5.2 can be an incredible model, one that technically savvy entrepreneurs switch to the moment the ROI is clear, and at the same time not an easy model for an average company pulled out of the phone book to actually use. Any given company has to think hard about how to use GLM 5.2 usefully, and it takes a lot less thought to just sign a frontier model contract that slots into existing workflows. That last mile, Jones says, is a literal trillion dollar problem, and one of the biggest open questions right now is whether the talent to build it will scale fast enough for businesses to tackle it without paying so much they cannot afford it. He does not know the answer, but expects it to become clear in the next three to six months, especially with the US government's effective pause on frontier releases still in place. For anyone in an agency or consulting, he calls this a golden goose moment: you can promise real token savings as part of an ROI pitch, as long as you can actually deliver the harness refactor without breaking quality, which he is careful to say is not a trivial task.
Take the last mile seriously before you rent your brain
Jones closes by refusing to let anyone dunk on GLM 5.2 for being good at center of distribution work, because by definition, most of the species's knowledge work is center of distribution work. A model that is genuinely excellent at that is worth taking seriously, and taking it seriously means taking the last mile seriously, which means taking the need for a harness around it seriously. That is a lot of what he has been doing publicly: articulating what it takes to build a harness, whether he calls the pieces open skills, open brain, or open engine, ideas he has talked through elsewhere on his channel, so that the pieces can be assembled in a way that is agent agnostic and model agnostic. The goal is being able to install those pieces and take advantage of whatever intelligence is on tap, Claude, Codex, Hermes, or whatever system comes next, easily.
He is direct about the stakes: this is a moment for builders, and there is a lot of custom work ahead for individual companies. But if companies do not start down that path, they are, in his words, going to end up renting their own company brain and company context back from the frontier model providers. Those providers will have that context, and they will use it to keep improving their systems, making them more convenient and more sticky, and companies will have no choice but to keep using them.
Jones calls this a genuinely pivotal moment: the firm has never before faced a situation where its own brain is effectively on rent, and that is exactly what tools like Claude Tag put on the table, useful, he stresses, genuinely useful, which is precisely what makes it dangerous. His advice, whether you run a large company or a one person agency, is to think seriously about whether you want to rent your context and intelligence long term. Ask yourself if you actually know the distribution of your own tasks. Ask whether you have access to the technical talent to build the last mile. Ask which specific task sets would save you real money in tokens if you moved them. Most people, he says, never sit down with pencil and paper and actually answer those questions, and he has a more detailed version of that question set posted on his Substack for leaders working through it. This is a serious moment for open source. GLM 5.2 opened the door, and it is up to each of us to decide how we take advantage of it.
Key takeaways
- GLM 5.2 is free to run on your own servers, very cheap on the cloud, and for center of distribution work, brochure sites, standard decks, first pass copy, familiar coding tasks, Jones calls it honestly the best model in the world, often better than Claude.
- The reason companies are not stampeding to switch is not intelligence, it is the harness: everything that turns a raw model into usable work, prompts, memory, tool calls, system design, none of which lifts and shifts between models.
- Flo Crivello's Lindy switched 100 percent of traffic from Claude to a DeepSeek architecture and saved millions, but only after rebuilding its harness from scratch, because AI as a service gave him a direct, clear ROI that most internal AI users do not have.
- A model is a brain in a jar without a harness. GLM 5.2 shipped its own Codex clone harness, Codex is marketing itself as usable without any OpenAI model, and Anthropic launched Claude Tag, a team level Slack harness, in the same window.
- Claude Tag turns individual AI use into team AI use by letting anyone tag Claude in Slack, which also means Anthropic's systems are learning from a company's live, messy Slack context automatically, within privacy limits.
- Because data is alpha, handing that context to a frontier provider means renting your own context back from yourself; that proximity to your company's real work is what makes a model impossible to rip out, regardless of how much cheaper an alternative is.
- The scarce resource is not intelligence, it is the talent to build a last mile harness: figuring out tool calling, memory, and system prompts for a center of distribution model like GLM 5.2 is specialized, valuable, and in high demand.
- One engineer reportedly spent $80,000 in token costs in a single week, illustrating why the incentive to find cheaper models is real even though the switch is hard.
- The next major frontier model, numbered 5.6, is affected by a US government slowdown on frontier releases and is shipping customer by customer with no public cadence.
- Jones's closing warning: without building your own harness, companies risk renting their own company brain and context back from frontier labs indefinitely, a genuinely pivotal moment for how firms operate.
Chapters
Timestamps are clickable. Click one and the player jumps there and keeps playing while you read.
- 0:00 Why GLM 5.2 blew my mind on everyday work
- 2:22 Cheap AI is here and frontier releases are slowing
- 3:41 Why companies still aren't switching to open models
- 4:11 Center of distribution vs edge of distribution tasks
- 4:53 Lindy rebuilt its whole harness to leave Claude
- 6:39 A model is a brain in a jar without a harness
- 7:23 Claude Tag and the rise of team-level harnesses
- 8:47 Why you can't rip out a model that owns your context
- 10:36 The harness talent shortage is a builder's opening
- 14:50 Take the last mile seriously before you rent your brain
Notable quotes
GLM 5.2 did not fake impress me. It actually impressed me because it's not just cheap, and it's very cheap to run on the cloud, it's free if you set up your own servers, and for a lot of normal work, it's incredibly good. Nate B Jones, 0:26
This is the best model in the world at those center of distribution kinds of tasks, especially ones where front end taste is important. Nate B Jones, 0:59
One engineer spending $80,000 in token costs in a week. That's a lot. Nate B Jones, 1:45
A model can be an incredible brain in a jar. And it just isn't useful to you without a harness. Nate B Jones, 6:39
We have taught companies for decades that data is alpha. If data is alpha, what do we think about giving all of that data to a frontier model provider as context? Nate B Jones, 8:47
It's actually not a story of intelligence. It's a story of the last mile in AI, and the fact that the talent to build the last mile in AI is incredibly scarce. Nate B Jones, 9:41
That last mile is literally a trillion dollar last mile in AI. Nate B Jones, 11:44
The firm has never faced a moment where the firm's brain has been on rent. And that is what we're on the verge of with tools like Claude Tag. Nate B Jones, 15:31
Resources mentioned
- Nate B Jones, the channel, AI News and Strategy Daily, where this analysis was published.
- Nate's Substack, where he has posted a more detailed question set for leaders working through their own harness and task distribution decisions.
- GLM 5.2, the open weight model from Z.ai (formerly Zhipu AI) that is the subject of the video.
- Claude and Anthropic, the frontier lab and model GLM 5.2 is compared against throughout.
- Claude Tag, Anthropic's new team level Slack harness, launched the week of filming.
- Codex, OpenAI's coding agent harness, cited for now marketing itself as usable without any OpenAI model underneath.
- Lindy, the AI agent startup that switched all of its traffic from Claude to a DeepSeek architecture.
- Flo Crivello, Lindy's founder and CEO, who wrote publicly about rebuilding Lindy's harness from scratch to leave Claude.
- DeepSeek, the model architecture Lindy moved its traffic to.


