The Abstractions, They Are A-Changing – O’Reilly

Spread the love

Since ChatGPT appeared on the scene, we’ve known that big changes were coming to computing. But it’s taken a few years for us to understand what they were. Now, we’re starting to understand what the future will look like. It’s still hazy, but we’re starting to see some shapes—and the shapes don’t look like “we won’t need to program any more.” But what will we need?

Martin Fowler recently described the force driving this transformation as the biggest change in the level of abstraction since the invention of high-level languages, and that’s a good place to start. If you’ve ever programmed in assembly language, you know what that first change means. Rather than writing individual machine instructions, you could write in languages like Fortran or COBOL or BASIC or, a decade later, C. While we now have much better languages than early Fortran and COBOL—and both languages have evolved, gradually acquiring the features of modern programming languages—the conceptual difference between Rust and an early Fortran is much, much smaller than the difference between Fortran and assembler. There was a fundamental change in abstraction. Instead of using mnemonics to abstract away hex or octal opcodes (to say nothing of patch cables), we could write formulas. Instead of testing memory locations, we could control execution flow with for loops and if branches.

The change in abstraction that language models have brought about is every bit as big. We no longer need to use precisely specified programming languages with small vocabularies and syntax that limited their use to specialists (who we call “programmers”). We can use natural language—with a huge vocabulary, flexible syntax, and lots of ambiguity. The Oxford English Dictionary contains over 600,000 words; the last time I saw a complete English grammar reference, it was four very large volumes, not a page or two of BNF. And we all know about ambiguity. Human languages thrive on ambiguity; it’s a feature, not a bug. With LLMs, we can describe what we want a computer to do in this ambiguous language rather than writing out every detail, step-by-step, in a formal language. That change isn’t just about “vibe coding,” although it does allow experimentation and demos to be developed at breathtaking speed. And that change won’t be the disappearance of programmers because everyone knows English (at least in the US)—not in the near future, and probably not even in the long term. Yes, people who have never learned to program, and who won’t learn to program, will be able to use computers more fluently. But we will continue to need people who understand the transition between human language and what a machine actually does. We will still need people who understand how to break complex problems into simpler parts. And we will especially need people who understand how to manage the AI when it goes off course—when the AI starts generating nonsense, when it gets stuck on an error that it can’t fix. If you follow the hype, it’s easy to believe that those problems will vanish into the dustbin of history. But anyone who has used AI to generate nontrivial software knows that we’ll be stuck with those problems, and that it will take professional programmers to solve them.

The change in abstraction does mean that what software developers do will change. We have been writing about that for the past few years: more attention to testing, more attention to up-front design, more attention to reading and analyzing computer-generated code. The lines continue to change, as simple code completion turned to interactive AI assistance, which changed to agentic coding. But there’s a seismic change coming from the deep layers underneath the prompt and we’re only now beginning to see that.

A few years ago, everyone talked about “prompt engineering.” Prompt engineering was (and remains) a poorly defined term that sometimes meant using tricks as simple as “tell it to me with horses” or “tell it to me like I am five years old.” We don’t do that so much any more. The models have gotten better. We still need to write prompts that are used by software to interact with AI. That’s a different, and more serious, side to prompt engineering that won’t disappear as long as we’re embedding models in other applications.

More recently, we’ve realized that it’s not just the prompt that’s important. It’s not just telling the language model what you want it to do. Lying beneath the prompt is the context: the history of the current conversation, what the model knows about your project, what the model can look up online or discover through the use of tools, and even (in some cases) what the model knows about you, as expressed in all your interactions. The task of understanding and managing the context has recently become known as context engineering.

Context engineering must account for what can go wrong with context. That will certainly evolve over time as models change and improve. And we’ll also have to deal with the same dichotomy that prompt engineering faces: A programmer managing the context while generating code for a substantial software project isn’t doing the same thing as someone designing context management for a software project that incorporates an agent, where errors in a chain of calls to language models and other tools are likely to multiply. These tasks are related, certainly. But they differ as much as “explain it to me with horses” differs from reformatting a user’s initial request with dozens of documents pulled from a retrieval system (RAG).

Drew Breunig has written an excellent pair of articles on the topic: “How Long Contexts Fail” and “How to Fix Your Context.” I won’t enumerate (maybe I should) the context failures and fixes that Drew describes, but I will describe some things I’ve observed:

What happens when you’re working on a program with an LLM and suddenly everything goes sour? You can tell it to fix what’s wrong, but the fixes don’t make things better and often make it worse. Something is wrong with the context, but it’s hard to say what and even harder to fix it.
It’s been noticed that, with long context models, the beginning and the end of the context window get the most attention. Content in the middle of the window is likely to be ignored. How do you deal with that?
Web browsers have accustomed us to pretty good (if not perfect) interoperability. But different models use their context and respond to prompts differently. Can we have interoperability between language models?
What happens when hallucinated content becomes part of the context? How do you prevent that? How do you clear it?
At least when using chat frontends, some of the most popular models are implementing conversation history: They will remember what you said in the past. While this can be a good thing (you can say “always use 4-space indents” once), again, what happens if it remembers something that’s incorrect?

“Quit and start again with another model” can solve many of these problems. If Claude isn’t getting something right, you can go to Gemini or GPT, which will probably do a good job of understanding the code Claude has already written. They are likely to make different errors—but you’ll be starting with a smaller, cleaner context. Many programmers describe bouncing back and forth between different models, and I’m not going to say that’s bad. It’s similar to asking different people for their perspectives on your problem.

But that can’t be the end of the story, can it? Despite the hype and the breathless pronouncements, we’re still experimenting and learning how to use generative coding. “Quit and start again” might be a good solution for proof-of-concept projects or even single-use software (“voidware”) but hardly sounds like a good solution for enterprise software, which as we know, has lifetimes measured in decades. We rarely program that way, and for the most part, we shouldn’t. It sounds too much like a recipe for repeatedly getting 75% of the way to a finished project only to start again, to find out that Gemini solves Claude’s problem but introduces its own. Drew has interesting suggestions for specific problems—such as using RAG to determine which MCP tools to use so the model won’t be confused by a large library of irrelevant tools. At a higher level, we need to think about what we really need to do to manage context. What tools do we need to understand what the model knows about any project? When we need to quit and start again, how do we save and restore the parts of the context that are important?

Several years ago, O’Reilly author Allen Downey suggested that in addition to a source code repo, we need a prompt repo to save and track prompts. We also need an output repo that saves and tracks the model’s output tokens—both its discussion of what it has done and any reasoning tokens that are available. And we need to track anything that is added to the context, whether explicitly by the programmer (“here’s the spec”) or by an agent that is querying everything from online documentation to in-house CI/CD tools and meeting transcripts. (We’re ignoring, for now, agents where context must be managed by the agent itself.)

But that just describes what needs to be saved—it doesn’t tell you where the context should be saved or how to reason about it. Saving context in an AI provider’s cloud seems like a problem waiting to happen; what are the consequences of letting OpenAI, Anthropic, Microsoft, or Google keep a transcript of your thought processes or the contents of internal documents and specifications? (In a short-lived experiment, ChatGPT chats were indexed and findable by Google searches.) And we’re still learning how to reason about context, which may well require another AI. Meta-AI? Frankly, that feels like a cry for help. We know that context engineering is important. We don’t yet know how to engineer it, though we’re starting to get some hints. (Drew Breunig said that we’ve been doing context engineering for the past year, but we’ve only started to understand it.) It’s more than just cramming as much as possible into a large context window—that’s a recipe for failure. It will involve knowing how to locate parts of the context that aren’t working, and ways of retiring those ineffective parts. It will involve determining what information will be the most valuable and helpful to the AI. In turn, that may require better ways of observing a model’s internal logic, something Anthropic has been researching.

Whatever is required, it’s clear that context engineering is the next step. We don’t think it’s the last step in understanding how to use AI to aid software development. There are still problems like discovering and using organizational context, sharing context among team members, developing architectures that work at scale, designing user experiences, and much more. Martin Fowler’s observation that there’s been a change in the level of abstraction is likely to have huge consequences: benefits, surely, but also new problems that we don’t yet know how to think about. We’re still negotiating a route through uncharted territory. But we need to take the next step if we plan to get to the end of the road.

AI tools are quickly moving beyond chat UX to sophisticated agent interactions. Our upcoming AI Codecon event, Coding for the Future Agentic World, will highlight how developers are already using agents to build innovative and effective AI-powered experiences. We hope you’ll join us on September 9 to explore the tools, workflows, and architectures defining the next era of programming. It’s free to attend.

Register now to save your seat.