Ensuring Fair Compensation for Content Creators in the Age of AI Chatbots

David Gilbertson
14 min readJul 7, 2024

--

Much of the content on the internet is created by humans, and these content-creating humans need food to operate. Food costs money, and this is why websites run ads: to make money, to give to the humans so they can eat and type out more content.

This ad-supported content model has been working well for the past few decades, but the new world of AI chatbots is set to upset the delicate balance.

In the future, fewer people will read website content in a browser, instead, they’ll ask a chatbot (ChatGPT, Claude, Gemini, etc.) and the chatbot will fetch the content and provide summarised information to the user.

The problem of course is that chatbots don’t click on ads, so the website owners will have less money to pay the content-creating humans, the humans will get hungry, their fingers will grow weak, and they’ll create less content.

Right now the problem is not so bad, but the trajectory is clear: the ad-supported content model will begin to fail when a certain percentage of users switch from browsers to chatbots as their preferred tool for accessing internet content.

In this post, I’d like to propose a way forward.

As you read the details below, I encourage you to think them over in multiple time frames: what will things look like in 1 year, 5 years, 10 years, as chatbots grow ever more competent and less content is consumed via web browsers?

One day, maybe artists will receive money for images created by DALL·E and friends.

The gist of it

The core of this concept is a new flow of money: from users, to chatbot vendors, to site owners. I’ll propose a few stages of complexity, but a simple implementation would look something like this:

  • The user pays a monthly fee to a chatbot vendor (OpenAI, Anthropic, Google, etc.), as they currently do.
  • The chatbot occasionally visits websites to generate a response (‘realtime search’) and the chatbot vendor pays a sum per site visit to the site owners.

Let’s run through an example, plugging in some numbers to see how this might pan out. (These numbers can vary greatly so what follows is just to give an idea of the orders of magnitude we’re dealing with.)

For users, a typical monthly fee for a top-shelf chatbot is about $20.

For a site owner, a conservative ballpark figure for expected ad revenue is about $1 per thousand page views. So let’s say this is what the chatbot vendors should aim to pay site owners.

A user who chats with a chatbot a lot might trigger 10 realtime searches per day (where the chatbot actually performs a web search, visits the pages in the results, and generates a response from those pages). Let’s say that on average 3 sites are visited for each search. That’s 30 site views per day, or close enough to 1,000 views per month. In this scenario, the chatbot vendor would distribute $1 (of that user’s $20 monthly fee) to the owners of those sites.

Don’t get too caught up in the exact numbers, the point is that 1 is a smaller number than 20.

What about from a site owner’s perspective, how much might they expect to make? This will require some even wilder assumptions; let’s say there are one million paying chatbot users, (a milestone we’ll hit soon if we haven’t already). And let’s say that in a given month, each user happens to trigger one realtime search that ends up on a particular popular news site. That would result in one million views and a rather paltry $1,000 per month paid to the news site. For most sites the amounts would be closer to piddling than paltry.

So these numbers will start low, but before you know it we’ll hit the billion-user milestone and start seeing some significant payments.

This fairly simple idea probably leaves you with quite a few questions. Clearly, chatbots visiting websites is only part of the picture. The initial training data is relied on more heavily, but it’s also much harder to map responses back to the original source. So I suggest we start simple, verify that chatbot vendors have the desire to pay content creators, and move on from there.

To be specific, I propose the following rollout stages.

Stage 1: Realtime search only, manual payouts

A chatbot vendor wanting to implement this concept could start with a manual setup of payments to a handful of site owners. The vendor can look at their realtime search logs and select the few domains that would have the highest payouts. They get in touch and ask the site owners if they’d like to receive some money, and — if the answer is yes — set up the payment mechanism.

This stage is mainly about ironing out the details of tracking, reporting, payment methods, regulation, taxation, etc.

This stage will also answer the question of which — if any — chatbot vendors are open to this concept.

Stage 2: Automation

If stage 1 goes well and a few chatbot vendors jump on the bandwagon (for moral or PR reasons, doesn’t really matter), then it will be time to automate the system.

This means any website owner should be able to communicate to all chatbot vendors how they would like to be paid for their content.

One option is to add fields to the robots.txt file describing payment information. Perhaps there could be a few payment options: bank details, a payment service ID, or a contact email address.

Then, instead of blocking AI bots like ChatGPT doing realtime search:

User-Agent: ChatGPT-User
Disallow: /

Site owners could open up a new income stream like so:

User-agent: PayingBot
Payment-method: PayPal
Payment-ID: payments@domain.com

Individual sub-domains can have their own robots.txt files and these could contain different payment information.

Keep in mind that robots.txt operates on the honesty system (any bot can ignore the file and scrape a website) so I see no need to try and enforce a link between who pays and who’s ‘allowed’ access.

Side note: I suspect chatbots will soon be available that allow people with disabilities to use the web more easily by interacting with chatbots (e.g. WebLlama or LASER), so the whole ‘bots are bad’ mindset is fast becoming outdated; they are just another tool, like web browsers, that let humans access content.

Another option might be to leave robots.txt as it is and define a new file — something like robot-payments.txt — to contain payment instructions. If payments are accepted by the site owner and being paid by a chatbot vendor, then the restrictions defined in robots.txt can be ignored.

At the end of each month, the chatbot vendor would look up every domain that has views over some threshold and if they’ve defined valid payment information (in robots.txt or robot-payments.txt) then they’d get paid.

Stage 3: Training data compensation

So far I’ve covered the easy scenario: payments based on the sites chatbots visit via realtime search (there are no significant technical hurdles for those first two stages).

The hard part will be paying website owners for the use of their content in the original training set.

It doesn’t make sense to try and send out payments at the time of training, so what we want is a way to know when content from the training data has been ‘used’ in a chatbot’s response.

I say this is ‘hard’, but that’s probably an understatement. An example, if you feel you need one: in the training data, there are probably thousands of pages with some mention of flour and sugar and chocolate chips, so when a chatbot comes up with a recipe for cookies, how could we possibly say which of those sites ‘contributed’? Not to mention the fact that it learnt verb conjugation from somewhere, but where exactly?

This challenge is often referred to as ‘source data attribution’. There’s plenty of work being done, but it seems like we’re not quite there yet (as of July 2024).

Things look promising though, so it seems reasonable to predict that by the time chatbots are drawing a significant amount of traffic away from old-school web browsing, we’ll be good enough at attribution to support a decent attempt at directing money to the right website owners. Even if attribution can only be identified in a subset of responses, it’s still a step in the right direction.

In stages 1 and 2, paying a fixed amount like $0.001 per visit might add up to a palatable sum (like $1/month as in the earlier example). But paying out that amount for every single response could exceed the user fee. So at this stage, a different approach to working out payments will be required. I see no point in trying to foresee the form that this could take, as the details will rely heavily on how the source data attribution works among many other factors. So I’ll leave this as a problem for the chatbot vendors of the future.

So that’s the plan, all laid out. Now we just have the minor question of …

What’s in it for the chatbot vendors?

Any opinion about whether a chatbot vendor would willingly give away money probably says more about the opinion holder than reality, so I’ll refrain from such predictions.

But due to FOAN (Fear Of Appearing Naive), I’ll list a few reasons this idea isn’t entirely unrealistic:

  • When OpenAI announced ChatGPT-4o, which is better than ChatGPT-4, they decided to price it at $0/month instead of $20/month (there’s an asterisk, but still…)
  • For any vendor charging in the ballpark of $20 a month, that number is somewhat arbitrary. You could make it $21 and give $1 back to content creators and that barely shifts the value proposition.
  • The costs of training and serving models is going to get cheaper (probably?), while the models get better, so the profit margin of that $20 will be growing as the years go by.
  • AI chatbots get some pretty bad press, with many websites happily accusing chatbot vendors of ‘stealing’ content. But if website owners have the option to earn money from chatbot visits, it might soften their view and lead to less negative press, ultimately benefiting the chatbot vendors. Even if a website owner chooses not to accept payments, the fact that it’s an option makes accusations of ‘stealing’ more dubious.
  • Chatbot vendors love to talk about ‘responsibility’. But talk is cheap. Stating a dollar figure that they’ve given back to content creators might convert eye-rolls to eyebrow-raises. And once one vendor jumps on board, that makes it hard for the others to say “we care deeply about responsibility” without raising the sticky question of why they don’t give back to creators.
  • ChatGPT has been enjoying some alone-time out in front of the pack, but the field will even out with time and chatbots will start competing on other Unique Selling Points. (I for one would gladly switch to a chatbot that was about as good as ChatGPT, but didn’t scroll the bloody response text while I was trying to read it.) My point: non-OpenAI vendors might have the most to gain by making the first move, if they can draw customers away from ChatGPT with the USP of demonstrably caring about creators.

So yes, it might seem a bit optimistic to hope that any organisation would opt in to voluntarily giving away cash, but I have faith regardless.

Abuse case analysis

This idea involves a new flow of money, so naturally some low-quality humans will try and get their dirty mitts on some of that cash.

Let’s look at a few potential weak points…

Site owners

Nefarious site owners might think they can game the system by driving a lot of traffic to their site via carefully crafted chatbot prompts.

This is easy enough to protect against with maximum values per user per domain (when talking just about stage 1, based on realtime search). Limiting to, say, 1,000 views per user per domain seems like a good start. This caps the potential revenue the website owner gets from such activity at $1 a month, a pretty bad scam if they’re paying $20 a month for the chatbot account.

As long as the system is driven by small amounts of money from large numbers of users, it should be fairly robust in this respect (he says, regretting it already).

Site insiders

Another angle of attack is individuals that have some access to the operations of a site. A rogue employee, dodgy contractor, etc.

Whoever controls robots.txt (or robot-payments.txt) would control where the money goes. For some sites this might be a problem, if robots.txt had previously been treated as a low-security file and a lot of people have access.

I’m no payments expert, but only allowing traceable payment methods should mitigate such man-or-woman-in-the-middle attacks. Particularly if chatbot vendors send payment notices to a standard email address like payments@domain.com, any attempt to divert payments is going to be noticed and traceable.

Chatbot vendors

Some chatbot vendors might like the good PR from sending money to content creators, but dislike the idea of actually parting with money. They might realise that all they need to do is lie about it, say they give back, but actually only send some small fraction of what they ‘should’.

Eventually, some level of third party certification might be called for. But this will bring its own vultures to the table, namely the lawyers and consulting firms who will rack up million-dollar bills working out the ins and outs of certification. So certification is only worth doing once this concept looks like The Way Forward and the expected certification costs are small compared to the cash flow going to creators.

If chatbot vendors were to publish itemised payments (how much was paid per domain per month) this would prevent lying as any site owner could say “no, we never got paid that”. This might be a privacy concern, but since chatbot vendors will be voluntarily paying site owners, that might be part of the deal (we’ll send you money, but the details will be publicly known, take it or leave it).

Journalists/reputation risk

An indirect way to extract money from the system is to write an article with a misleading negative spin. Get people nice and upset so they’ll share the article (I call this high dudgeon journalism).

I can imagine the headline now: Kids Slam Wikipedia: ‘Our Volunteer Work Sold to OpenAI!’, or some such nonsense. A journalist will find a group of underprivileged inner-city kids who banded together (against the odds) to update information about their neighbourhood (threatened by gentrification) on Wikipedia. One-by-one, the kids will be asked how they feel about Wikipedia ‘selling’ this information to the billionaire fat cats at OpenAI without their consent, till a kid with a quotable response (and preferably dimples) is found. People will be outraged. Some Wikipedia contributors will vandalise their prior contributions to protest their data being sold to Big AI. Sounds silly, but people do odd things.

Some websites might wish to get out ahead of this and clarify things, for example Wikipedia could state “Yes, we receive money from chatbot vendors; to put it in perspective, last month it cost us $14,291,860 to keep the lights on, and we received $2,433 from chatbot vendors, so don’t get your knickers twisted in your pitchforks”.

All this is hardly a reason not to proceed, I just wanted to get my prediction on record.

Miscellaneous considerations

In no particular order, some other considerations…

  • This opens up a potential revenue stream for platforms that are useful public resources but don’t want advertising, like Wikipedia, arXiv, technical documentation sites, etc. So that’s nice.
  • Some website owners may not want to accept AI money. This is fine, they don’t need to provide payment info.
  • If the system is set up and working well, then word needs to spread to all site owners that they can receive money from chatbot vendors by setting up their payment details. Perhaps vendors could publish ‘uncollected funds’ lists so site owners could see if it’s worth their while.
  • This all makes sense for paying chatbot users, but what about the free tiers and the vendors (like Meta) that don’t charge anything? Personally, I can’t see any way that this could work unless the vendor has a lot of spare cash they don’t really want.
  • For vendors offering a free tier, they could trial a $1 ‘ethical’ tier that distributes 100% of funds to content creators. People pay extra for ‘ethically sourced coffee’, maybe they’ll happily pay for ethically sourced data (especially if it comes with some sort of social bragging rights).
  • How should chatbot vendors report to the public what they’re doing? If users want to select a chatbot based in part on how much money is sent back to creators, they’ll need a figure that can be compared between vendors. It’s easy enough for OpenAI or Anthropic to announce that they give, say, 5% of revenue back to creators, but what about Google and X, where the cost of the chatbot covers other features as well?
  • When paying site owners, chatbot vendors may wish to provide a list of visited URLs with view counts, so that the site owner can further distribute funds internally. E.g. Medium.com might choose to pass payments on to writers directly; a sound idea, says me (although these will be tiny sums in the short to medium term).
  • Different types of sites will be impacted differently: sites with mostly current events are more likely to be visited by realtime search, while sites with timeless content are more likely to have their content baked into the model. So in stage 1 — before data attribution is up and running — it will be current events sites (news, weather, etc) that benefit the most.
  • I’ve focused on the text modality here, only because anything else is a much harder problem, but of course worth pursuing.
  • There is a very long tail of web content, with millions of sites receiving almost no traffic. I don’t think this affects the concept proposed here in any meaningful way, but it seemed worth a mention.
  • I’ve been referring to ‘chatbot vendors’ but really this applies to any entity that is scraping information from a website and profiting from that content at the expense of the original creator. Once the system is in place, other bot operators may wish to ‘do the right thing’ (again, for moral or PR reasons).
  • A lot of what I’ve proposed here assumes a strong link between ‘site owner’ and ‘content creator’, but this isn’t always the case. The goal of this concept is to do something about loss in ad revenue for site owners caused by chatbots. Any attempt to make more precise payments directly to content creators is great, but a different kettle of fish.

There are more little “what-ifs” that I could mention, but let’s not make great the enemy of good. I think it’s important to just take a step in the right direction, then take a look around and work out the best way forward from there.

Related work

This idea is pretty simple, so I went looking, assuming it must have been proposed already, but couldn’t find anything.

OpenAI has its ‘Preferred Publishing Program’ (allegedly?) and is working on Media Manager.

Microsoft were talking about it in 2023, but their goal was to work out how to keep sending traffic to sites. That’s fine in the short term, but the better chatbots get at creating the content we want right in our chat windows, and the more hallucination is reigned in, the less we’ll need to click on links. And the better they get at speech, the less likely we are to be sitting with our fingers on a keyboard and eyes on a screen. So I think this idea of sending traffic to websites to try and maintain ad revenue levels is not going to hold up for long.

To ensure that the web stays alive in the long term, we need a system that ensures a fair flow of money to those who create the content, or one by one, they’ll stop creating content.

I hope that what I’ve proposed here at the very least gets people thinking about how this new system might work.

And now, a …

Call to action

If you like the idea of humans continuing to create content on the internet, please let your favourite chatbot vendor know that you’d like them to do this. Tweet at them. Email them. If you’re on holidays, send them a postcard.

Another option is to assume that surely someone else has already let them know, and that there is no need for you to do anything, in which case just think to yourself “this is a good idea, I sure hope it happens” and then quietly go about your day, happy in the knowledge that others will be the change you want to see.

Hey, thanks for reading. Goodbye.

--

--