How AI is breaking traditional remuneration models – TechnoLlama

“All watched over by llamas of loving grace”

It’s been a busy few weeks in the AI and copyright beat, and while I’ve been following all of the developments closely, I haven’t had the time to react to everything in the blog. However, with two first decisions handed down in the US, and one in the UK expected soon, it is a perfect time to start looking at the subject from a long-term perspective. A lot of the work from copyright lawyers and researchers who look at AI tend to propose solutions based on the traditional models, namely licensing and remuneration. I have been arguing for a while that the nature of AI training and of the technology do not fit perfectly into the existing frameworks, and that we should be looking to either reform the system or adjust business models, and copyright policy should reflect those changes. I’ll try to explain the problems, and I’ll begin exploring new ways of looking at copyright in this context, although I do not claim to have any answers.

The traditional copyright model

The way copyright has worked traditionally in the creative industries is well-documented, but for the purpose of this post this is roughly how it is supposed to operate. In an ideal world creators generate a work worthy of copyright protection, this gets then distributed to the public, who pay for it, and royalties flow back from that payment to the author. The reality is evidently infinitely more complex than that, as creatives may not own copyright, then you have to consider intermediaries, licensing deals with distributors, and we have a complex picture of moving parts where often just a small percentage of money goes back to the person who created it.

The term ‘creative industries’ itself covers a vast and varied array of sectors. It’s not just the traditional realms of publishing and music, but also film and television production, theatre, design, fashion, advertising, and the ever-expanding world of video games and digital media. Each of these operates as its own distinct ecosystem, with unique commercial pressures and conventions for how creative work is funded, produced, and sold. Authors often have to operate in a world of literary agents and publishing houses, and this provides a different set of challenges to a freelance graphic designer licensing their work for a commercial campaign, or a musician trying to make sense of streaming royalties. This diversity is key to understanding why a single, simple model of copyright rarely fits the bill.

So while the model still relies on individual creators, the reality is a system of publishers, agents, collecting societies, distributors, and licensing bodies. But even with this complexity the idea is simple: work → licensing → remuneration. Create a work, sell it or licence it, profit.

But this model also relies on several cogs being maintained that are already under threat. There has to be a product that can be sold or licensed to an audience, and a royalty system in place that sends the money back, often operated by these intermediaries and collecting societies. But digitisation and the rise of online streaming has changed many of those premises. More often than not nowadays, creators are employees in large conglomerates that churn out content and keep ownership of all that is produced. Moreover, streaming models often translate into less money from royalties as the system is propped up by subscription fees, which often generate less than what the traditional product model used to accrue. So there has been a concentration of power at the top, and less money trickling down to the creators that actually make the copyright works. This is coupled with the fact that new generations of creatives are starting to opt out of the traditional model to engage with the content-creation economy that is more direct: influencers, streamers, and YouTubers do not operate within the traditional copyright framework at all.

The problem with the traditional model and AI

The model described above, while it has been suffering, still produces a sizeable amount of returns, which is the reason why it remains in place for the most part. So when generative AI started becoming more popular, it was assumed by most people in the industry, as well as some legal commentators, that this would be a similar way to move forward with AI. I’ve read several opinions that AI should be dealt with in roughly a similar way as the copyright industries handle remuneration nowadays, but there are a few problems in implementing this model to AI, and we are starting to see these play out in practice.

The ideal application of the model would work something like this: author creates a work protected by copyright, the AI developer purchases a licence to use the work as training data. Rinse and repeat.

But this is not how things have proceeded in reality. At first the data used for training was the largest repository of works in existence, and that is the public web. Earlier language models, and most image ones, used content already found online for training. So from the start, large amounts of data already available for decades on the open web was the source, and that usually meant bypassing any sort of negotiation with content owners. Evidently, creators cried foul, and in many instances proceeded to sue for copyright infringement. At first it looked like this would be an easy win for authors, but things haven’t been so clear-cut, and this appears to be confusing quite a lot of people who were told that copyright would be the silver bullet that would kill generative AI.

Here is what I think is happening.

The first problem has been the fact that creators have been playing catch-up from the outset. A couple of years ago I gave a lecture at the London Book Fair on generative AI and copyright infringement; it was a full house and everyone from the copyright industry was there. It was the copyright maximalist Woodstock. I was a bit nervous as someone who is famously a minimalist, but I think that the talk went quite well. My main message was that the copyright industry had been asleep at the wheel, and that they had already lost the battle. Today that message would have gone down very differently, but at the time these people needed a wake-up call. My argument was simple: the models are already trained, you could get rid tomorrow of every single generative AI company, and the models would still exist. Moreover, other countries with no copyright enforcement would start getting in on the act. My advice at the time was to try to start negotiating immediately, I wasn’t sure if that would work, but it was a start. Soon after, the lawsuits started flying, but the assessment remains. The models are already trained, and there is no putting the genie back in the bottle, so the copyright industry has already been trying to catch up with a runaway technology. See, it would have been easier to get all negotiations done beforehand, but they were mostly worried about the link tax, but I digress…

The second issue is about the models themselves. Traditional copyright remuneration systems work because there is a product that faces a consumer; in other words, there is a publication of a work that gets distributed and communicated to the public. AI models take millions of works and do not publish them in any sense of the word—they’re not facing the public. You wouldn’t go to ChatGPT or Gemini to read ‘The Hobbit’; you would go there to know the plot of ‘The Hobbit’, but that is the same as Wikipedia. Training an AI means accessing a copy to extract data from it, but that is not an adaptation and it’s not a publication. Sure, there is a reproduction, but at least in some systems that could fall under existing exceptions and limitations. There may still be an infringement, but the lack of publication will mean that damages for the most part may end up being negligible—in most countries that do not have statutory damages, that is. But even if these companies get hit with billions in damages for those copies, you end up with the first problem.

The third problem for a remuneration model was recently made evident in the Anthropic fair use decision by Judge Alsup. An interesting detail in the discussion that came to light was that Anthropic had purchased books for scanning, something that I have been expecting would eventually happen. The issue is that copyright forbids people from making a reproduction of a work without authorisation, which is what most training from the web is. But what if someone purchases a legal copy of a work, and trains from that? I think that this is fine; training an AI is not an exclusive right of the author, extracting information from a purchased work is not an infringement of copyright, otherwise reading a book would be actionable. The problem here is that, with the exception of evidently infringing outputs, for the most part AI outputs cannot be considered to be either a communication to the public of a work, a publication, or an adaptation, all exclusive rights of the author. So if there is no unauthorised reproduction, there is no cause for action. This may be solved by eventually making AI training something that requires authorisation from the author, but for now, that is not the case.

And the final problem for establishing a remuneration model is one of scale. I’m often baffled by how people simply seem to ignore the gargantuan amount of data that goes into the training of a model such as an LLM, but also what effect that has on any sort of licensing deals, as well as any viable remuneration model. For the most part, a model such as ChatGPT requires billions of tokens (a unit of text such as part of a work which is used by language models to process and generate language). The sheer number of tokens, coupled with the fact that you only need to pay for one copy of a work, means that the value of a licensing market is negligible. For there to be a traditional licensing system, there has to be a value to each individual work that is worth the transaction costs, but if what you are going to get is 0.00000001% of an already low transaction cost, then most creators would not expect to see any money at all. Imagine the Spotify remuneration model, but a thousand times less money going to creators.

What next?

I’m tempted to just add the shrug emoji here. Heck, I don’t know, that is way above my pay grade. I’m just diagnosing the problem; I have no idea how to come up with solutions. I know people in the creative industries will not like my analysis above, but I think that it is about time that people stopped listening to those peddling 18th-century models, and at least recognised that the current copyright system is just not suited to the challenges posed by AI. We are here because people have been assuming that the traditional models will still hold, but that is not going to happen for various reasons, and you will continue getting people being surprised, hurt, and disappointed when court decisions don’t go their way. The problem is that a lot of people were sold lies about AI and about copyright. They were told things were certain, but they rarely are. Copyright is messy, costly, and you almost never get the result you wanted. This is because copyright also has to have a system of exceptions and limitations, otherwise even running a game on your computer or browsing the web would be infringing. (Notice the two em-dashes here, I’m trying to make em-dashes happen).

So at the very least stop trying to make copyright do something that it’s not meant to do. Some of the issues that are being discussed are about societal challenges. They are also about competition, funding to the arts, capitalism, corporate greed, and inequality. So reform the system if you must, establish grants for creators, tax Big Tech developers up to the gills and give the money to creators; anything but the assumption that the traditional remuneration system has any hope of surviving.

And if you’re a copyright holder, time is not on your side. The lawsuits in the US will keep going for years and years, and by that time more models will have come online, and more countries will have joined the AI race. I’d argue that the bulk of the generative AI revolution took place between 2014 and 2017. We’re living in the aftermath of that momentous time, which most people missed. AI is here to stay, so perhaps start acting accordingly.

Concluding

It’s been a weird decade, 10 years ago I published my first generative AI copyright blog post. So much has happened since that I don’t know where to begin. I got very sick, and I got better. I had a short NFT copyright obsession, and I got better. I liked ‘The Last Jedi’, and I got better. But one thing has remained clear. AI tools have continued to improve, and like it or not, we are becoming more reliant on them. If you don’t like that statement, that’s fine, but copyright is not the answer.

I don’t know what the answer is, I’m just a guy cosplaying as a llama online.