The race nobody else is running: why personal AI belongs on hardware you already own
Tim Cook stepped down this week. The next CEO of Apple is John Ternus, the hardware engineer who ran the Mac transition to Apple Silicon. The new Chief Hardware Officer is Johny Srouji, who built the chips that made the transition possible. Two hardware people at the top of the company, at the exact moment the cloud AI labs are quietly losing money on their best customers.
Nate B. Jones made the case that this is Apple declining the race the rest of the industry is running and lining up for a different one. I think he’s right, and I want to add what I can only add from inside it: this is the race I’ve already been running on a four-year-old machine under my desk, and the people I trust are starting to run it too.
The Mac Studio in my office is an M1 Max with 64 GB of unified memory. I bought it almost four years ago for video work. For the last several weeks it has been transcribing my dad’s letters, ~45–50 seconds per page of cursive, on Qwen multimodal via LM Studio . Nothing leaves the house. The out-of-pocket cost, once you’ve already paid for the machine, is electricity.
That’s the part I keep coming back to. The reason I can do this — privacy and cost — is the same reason every regulated profession is about to converge on the same hardware. A lawyer can’t put privileged client material in someone else’s data center. Neither can a therapist, an accountant, a fiduciary, or a family historian going through a box of letters their grandparents wrote each other in 1942. The economics of cloud inference are punishing for power users right now and getting worse as model sizes grow. The economics of a chip you already own are fixed. You bought it. The marginal cost of the next page is electrons.
My dad’s letters are the first generation. There are at least two more generations of family paper still in the house — the wartime letters from my grandparents, and the 1910–1913 stack from my great-grandparents. This project is going to keep running on this same machine for a long time.
The conversations I’ve been having lately all rhyme.
A senior developer I respect is migrating his daily workflows local — not as a stunt, as a tool change. A CTO friend is counting days to the M5 Mac mini and has already cleared a spot on his desk. Multiple founder friends, when the topic comes up, light up about peers who run 256 or 512 GB Apple Silicon rigs the way you’d light up about someone with a great workshop. None of them are anti-cloud. All of them have noticed something shifting under their feet.
I’m reporting, not forecasting. The shift is happening in the part of the room where the people who pay attention to tools live first.
Here’s the honest middle, because the post doesn’t work without it.
I still use cloud AI heavily. I’m writing this in a tool that calls a frontier model in a data center, because for a piece like this the frontier is the right tool. The largest models, the multi-step agentic work, the deep research runs — those still belong in the cloud, and I’m not romantic about pretending otherwise. The local-only zealot pose is wrong, and I’ve watched it cost people credibility.
What I am claiming is narrower and, I think, more interesting: the opportunity surface for on-device inference is going to balloon over the next few years, the share of my own work that runs locally has been climbing month over month from lived experience, and the builders who notice this early will ship different products than the builders who don’t. I am leaning in based on what I am actually seeing on my own desk and in my own conversations, not because I have a thesis I want to defend.
The frontier and the floor are both moving. The frontier is moving up — fine, expected, that’s what frontiers do. The floor — what an individual can run on hardware they already own — is moving up faster than most people have noticed. That second curve is the one I’m building toward.
The framework I keep using for this is Capture, Enrich, Synthesize. Three laps, in that order.
Capture is won by whoever started early. The years of photos, the journal entries, the location traces, the letters, the financial history, the chat logs — you either have them or you don’t. There is no AI that retroactively gives you a 2008 you didn’t record.
Enrich is collaborative. Human plus model. You bring the context the model can’t infer — the family tree, the names, the place a nickname points to — and the model brings the patience and the throughput to apply that context across thousands of items. Some of this work belongs in the cloud and some belongs at home. It depends on what’s in the file.
Synthesize is the lap most people are paying a cloud meter for, and the lap most likely to come home to consumer hardware first. Synthesis on your own corpus is repetitive, latency-tolerant, privacy-sensitive, and — once you’ve tuned a prompt — it doesn’t need a frontier model to do well. It needs a competent model that knows your stuff and never sees anyone else’s. That’s a local-inference profile if I’ve ever seen one.
Tractor and Silo — the company I’m building — was always pointed at this world, and not by accident.
The whole product is structured around a record of your life that is yours. The iPhone capture app keeps your timeline on your phone. The Silo is the durable archive nobody else’s revenue model will ever incentivize them to build for you, because nobody else’s revenue model survives you keeping your own data. Lens, the macOS journaling app under the same brand, has been running its daily synthesis entirely on-device with Gemma for weeks now. Not as a demo. As how it works. The pass that reads your day and writes a draft of your journal entry happens on the chip in your machine, and the entry never leaves it.
I didn’t build it that way because I predicted this week’s news. I built it that way because the alternative — your life’s record sitting on someone else’s GPU under someone else’s quarterly earnings pressure — was never the product I wanted to live with, much less ship.
Nate’s strongest point in the video is the unserved professional market, and I want to underline it from a builder’s chair.
There is an entire economy of regulated work — law firms, medical practices, accountants, therapists, fiduciaries, advisors — that cannot put client material in a cloud LLM and is currently improvising. Some of them are stringing together Mac minis. Some of them are pretending the problem doesn’t exist. Most of them are waiting for someone to ship the infrastructure layer that lets a small practice run real models on real client work in a real compliance posture without hiring a platform team.
That layer doesn’t exist yet. Apple isn’t going to build it for them. The frontier labs structurally can’t. Someone is going to, and the someones who do will quietly own a category that the rest of the industry is too busy losing money in the cloud to notice.
The same shape applies to the family historian sitting in front of a box of letters, even if the regulator is just their own conscience. The work is high-context, irreplaceable, and not anyone else’s business. It belongs on hardware you control.
If you’re already running models locally — for client work, for an archive, for a product you’re shipping — I want to compare notes. Especially if you’re hitting the same gotchas I’m hitting (vision-versus-text routing, context windows quietly eaten by reasoning tokens, hardware that is almost enough). The interesting conversations in this corner of the field aren’t on stage at conferences. They’re between two builders comparing what’s actually running on their desks.
I’ll keep using the cloud where the cloud is the right tool. And I’ll keep watching the share of what I do that comes home to a four-year-old Mac under my desk — and to whatever I replace it with — keep going up.
The race nobody else is running is the one most worth running. That’s where I’m putting my hands.