There are two anecdotes that color my opinion here. 1. If you'll recall I worked for a while inside a tech startup building a mobility app. Dev work was split into a small onshore and a large offshore team in India. The onshore team was, essentially, the product manager and four coders of varying seniority. The offshore team was something like 20-30 fulltime developers? If there's anything I'd want to get built or get changed I had a feedback slack that the onshore devs would process into Jira. If there was anything big that I needed to be implemented, I'd get the onshore junior dev to essentially parse my request into Jira for me, often after a few meetings because chopping my idea into baby-sized steps is never a straightforward task. It'd be well-documented what the desired change was, and what the steps were to get there before it went to offshore. Then a week or so later they'd be working on it and have a bunch of extra questions. Then I'd get a new TestFlight version of the app, I'd do a bunch of testing with that, and would often come back with half a dozen edge cases and misinterpretations. Back and forth, testing, questions, back and forth, testing and then it would usually be fine. 2. A good friend of mine works for a Dutch competitor to EPIC. She graduated CS just over a year ago. Basically landed the job as a college intern and was allowed to stay. The first months of her job, she was not allowed to write code, instead just having to review other people's code in order to initiate osmosis for the inner workings of EHRs because there is, essentially, no documentation. The entire company from what I gather seems to operate on tribal knowledge, the elders passing down quirks and edge cases that stay in. It is also company policy to forbid writing any documentation in the code. Instead of documentation, they've created a layer cake of processes that code has to go through to be reviewed and checked. The processes are well-defined, but what goes through it could be whateverthefuck. Most of the time, the changes are very small. They're scared to accidentally break anything in production (and hey, rightly so!) but that also means they rarely change anything large, so the codebase is never refactored and is a layer cake too of cruft accumulating by hundreds of individual developers making changes and writing code in their little corner of the monolith in their particular way. You don't think you can include that in the prompt? Or documentation it reads? I had to regularly instruct the offshore team to do things in a particular way in the app, because I was using a third party tool for our analytics and that tool needs data in a specific way to work. "Yeah I know this is not ideal but can you please define sessionID as text, even though it always consists of numbers only?" The timeline in the past half year or so has been wild, in my humble opinion. When the term vibecoding was coined this January, Claude 3.5 was the best we had, which is (still) quite good at writing a few dozen lines of code but gets lost as soon as it's even slightly bigger than that. My first vibecoding experiences were...rough. The improvements in the past months have been incredibly siginificant but specifically for coding. For my first vibecoded app I had a prototype in 2 minutes and would spend a good hour or two debugging by "there's a new error, go fix it" again and again. For the OVguesser app I mentioned in Pubski? I had a prototype in 2 minutes and...essentially only one or two bugs, despite being a more complex app with 50+ files spread across client and server. I could go straight to "this is great, let me tweak it until I'm happy". Adding reasoning, better prompts, and most of all tool use (you tell the AI how it can Google something, how it can grep a file, how it can use any tool you can imagine) has dramatically improved its ability to do the vast majority of software work. I used to be able to tell AI-gen code from regular code apart. I lost that ability - there are no longer six fingers on the hand, the lighting isn't "off" anymore, especially not with the right prompt. When I showed my EHR friend this week the code Sonnet 4 produces, she too could not find anything bad about it. It genuinely just writes decent code now. The size of what it can write has dramatically increased, from autocompleting your sentence in an IDE to writing functions for you to, now, being able to writing an entire app out of nowhere that sometimes actually works. Now - that does not mean Jesus can take the wheel. I know. The main issue, now, is that the agents are not very good at exploring the solution space. When the code base becomes larger, they often struggle to take the logic of line 265 in file A into account when writing line 1,038 in file B. Or they are too eager to jump to the first solution that sounds remotely like it could work. So you end up with short-sighted solutions that break something else somewhere. It really, really needs checks and balances now to prevent the  's from happening but let's be honest, do you realized how often shit breaks in normal software developers? Are the blanks you are getting from the EHR API any better than the  's because those shit sausages were made differently? Even if there is not a single inch of progress in the models, I'm fairly certain we'll still see progress in the coming years in the ability of AI to improve software engineering. What I didn't know, until reading this article, is what that future could look like. I don't think the blog is a blueprint? There's every chance it will be kneecapped in multiple ways? But I'd be surprised if this is not the direction we'll be heading in for the next few years. Right now we're in the It Moves Fast And Breaks Things era. But that era will end and I'm intrigued slash terrified slash in awe of what that might look like. In a way, some of this is already here and working, just in small pockets. NotebookLM's podcast feature is a single button to the user but behind the scenes it is basically an agent cluster that takes a document, creates a podcast script, refines the script to add uuhs and ahs and other vocal nuances, and text-to-speeches it. Not just in a single pipeline, but even on-the-fly when you "call into" the podcast. You take a complex task, break it up into its constituent parts, and refine an agent with a specific agent-prompt and toolset. Then you tell the smartest AI you can afford "these are your minions, go do this overarching task" and have only that one talk to the human in the loop. From my experience, both this concept of agent clusters and vibecoding feels eerily similar to the way I worked at that mobility app. The onshore dev I worked with the most was a junior dev. He wasn't particularly bright or experienced (he was the same age as I at the time) but he had a) a few tools at his disposal b) he knew what the architecture of the app looked like and c) he was somewhat good at reasoning about code (but I'd often be just as good). His task was to pour my request into the molds of the Jira processes they were used to and he would delegate everything else to his offshore colleagues to actually do. Along the entire process of going back and forth with the junior dev and at times with the Indian devs I'd do nothing different from what the article describes as agent babysitting. And the results of that process was often just, like, implementing 1 new class or function call in the backend. I also don't think my friend at Not-EPIC should be unworried. She knows how wretched the way they develop is. She doesn't know what her code does IRL or what workflow it could break. This fall she has a surgery coming up which will knock her out for a good three months. That means for three months, there are exactly zero people available to deal with problems in her corner of the monolith. When people leave the company (because of course they under-pay and over-ask) it often results in a plethora of problems because the next person put on that bit of the system breaks a bunch of shit because there is nothing documented. My expectation is that management will, sometime in the next years, realize that they can actually fix the fundamental problems with their organization and get more done. Get everyone to talk through the code they manage, record and transcribe it all, and autogenerate the mother of all documentation and mandate docs updates from there on in the code change process. Now you're no longer dependent on tribal knowledge and expensive senior devs. Then, talk to every client about all of their wishes they have, for as long as the client wants. Record and transcribed it all and generate the mother of all backlogs and requirements. You give every medior or even junior tasks from that list and assign them an agent cluster, only requiring the senior devs for the aforementioned checks and balances. They could even take all 8 hours of a work day just for picking the tasks and for checks and balances, and have the agent cluster work throughout the night to provide new code to check and balances. But I highly doubt all of that will require more devs.Are... you going to explain to the AI fleet why that bug has to stay?
The upcoming wave, which I'm calling "agent clusters" – the chariot I hinted at in the last section – should make landfall by Q3. This wave will enable each of your developers to run many agents at once in parallel, every agent working on a different task: bug fixing, issue refinement, new features, backlog grooming, deployments, documentation, literally anything a developer might do.
We have had this very discussion about self-driving cars. We have had this very discussion about AI in audio. We have gotten to the point where AI is tentatively shuttling passengers around urban hubs, effectively turning an open network into a closed system. Waymo and a couple others have gotten to the point where they can replace a fresh-to-the-country Uber driver, under ideal conditions, within a closed environment. But all the companies that proposed a wide-open adaptive environment are either (A) gone, like Uber or (B) killing people with aplomb, like Tesla. We have gotten to the point where AI is tentatively adding chorus effects and synthesizing voices but it still can't mix worth a shit. We had a lengthy discussion about how to do a simple recording and not only did you not hear the chair squeaks, but they were so bad I couldn't do anything with them. AI can't either, of course. Izotope tried to prove that my entire industry was doomed in 2010 and released a tool that went so badly they scrubbed it from the Internet. I'm about to cut a DJ session. You would think that beat-matching and pitch-shifting and crossfading would be the sort of thing an AI could do the shit out of. And yes! Rekordbox has an automix. It's got some fuckin' "AI" tag on it too. And if you want to hear the worst mixes you can imagine, engage it. Fuckin' AI can't even tell the difference between a chorus and a verse. You're not a coder. Neither am I. I know enough to know that I hate it, but that's based a lot on learning to code in fucking Fortran and Turbo Pascal. You know enough that you'd rather code than pick up a soldering iron, but you're very much at the "genie, make me a thing" phase of the adventure. You never got to try a self-driving car in the beginning; you likely would have survived but also likely wouldn't have pushed it into the corner-cases necessary to ensure life-safety for everyone on the road. "Push it into the corner cases" is the thing every AI booster refuses to do. kk in this scenario? For Pro Tools? I'm the onshore dev. Their problem is a lack of documentation, not the imminent threat of AI. The Birth Center has a 500pp binder of documentation on every fucking thing we do, not because we really like documentation but because we're required to have all this shit documented for certification. Because if we fuck up in the clutch, people may die. here's the problem in a nutshell: - Be this company - Be ready to join the 20th century - Be stopping down for eight months to fucking document everything - Be ready to join the 21st century OR - Be this company - Be ready to join the 20th century - Be feeding your codebase to an AI to generate documentation - Be spending three years on pins and needles as you spend eighteen months proofreading the documentation and then eighteen months breaking things you missed From your addendum: FUCKING HELL DUDE. Which is faster - writing it yourself or playing minesweeper with someone else's code? If you answered "playing minesweeper" you just tattled on yourself. Part of being a senior developer is making less-able coders productive, be they fleshly or algebraic. Using agents well is both a both a skill and an engineering project all its own, of prompts, indices, and (especially) tooling. LLMs only produce shitty code if you let them. Here's a game I play regularly: "Hey receptionist - I would like a slide for the billboard that says this." (crap slide) "Great. Now apply the discussions we've had about whitespace and readibility." (Vaguely less-crap slide) "Okay awesome. Can you mess with the color palate a little?" (heinously more-crap slide) "Okay try these RGB values" (less-crap slide, copy changes, receptionist goes on crying jag) "No no you're doing great. Er... do this." (less-crap slide, let's ship it) I play this game because it's good for her self-esteem. Her roommate is a designer, so she fancies herself a designer. SHE IS NOT A DESIGNER. We've had all sorts of discussions about rule-of-thirds, read-three-times, don't-flash, etc. About a third of it is accessible to her at any given time. But she has such pride in seeing her work parking-lot sized that it's worth it to me for morale to let her pretend she's designing things, rather than whipping that shit out on my own in a third the time. I give no fux about Gemini's self-esteem Your expectation is that everyone will go "well, it'll make it that last 20 percent no problem so we should adopt it now, and damn the consequences." Mine is, too. The difference is I don't think it'll work out. If there's anything I'd want to get built or get changed I had a feedback slack that the onshore devs would process into Jira. If there was anything big that I needed to be implemented, I'd get the onshore junior dev to essentially parse my request into Jira for me, often after a few meetings because chopping my idea into baby-sized steps is never a straightforward task. It'd be well-documented what the desired change was, and what the steps were to get there before it went to offshore. Then a week or so later they'd be working on it and have a bunch of extra questions. Then I'd get a new TestFlight version of the app, I'd do a bunch of testing with that, and would often come back with half a dozen edge cases and misinterpretations. Back and forth, testing, questions, back and forth, testing and then it would usually be fine.
The first months of her job, she was not allowed to write code, instead just having to review other people's code in order to initiate osmosis for the inner workings of EHRs because there is, essentially, no documentation. The entire company from what I gather seems to operate on tribal knowledge, the elders passing down quirks and edge cases that stay in. It is also company policy to forbid writing any documentation in the code. Instead of documentation, they've created a layer cake of processes that code has to go through to be reviewed and checked.
Reading other people’s code is part of the job. If you can’t metabolize the boring, repetitive code an LLM generates: skills issue! How are you handling the chaos human developers turn out on a deadline?
Does an intern cost $20/month? Because that’s what Cursor.ai costs.
My expectation is that management will, sometime in the next years, realize that they can actually fix the fundamental problems with their organization and get more done.