Cloudflare, which services about 20% of websites, announced Tuesday that it is now blocking web-scraping AI bots from accessing those sites by default.
Unless a website explicitly turns off the default, an AI crawler will need to obtain permission from the website to scrape its content. Website owners can choose whether they want AI crawlers to access their content, and decide how AI companies can use it, Cloudflare explained in a statement.
AI companies can now clearly state their purpose — whether their crawlers are used for training, inference, or search — to help website owners decide which crawlers to allow.
Cloudflare explained that for decades the internet has operated on a simple exchange: search engines index content and direct users back to original websites, generating traffic and ad revenue for websites of all sizes. This cycle rewards creators who produce high-quality content with financial compensation and a growing following, while helping users discover new and relevant information.
That model is now broken, it continued. AI crawlers collect content like text, articles, and images to generate answers, without directing visitors to the original source, thereby depriving content creators of revenue and the satisfaction of knowing someone is viewing their content. If the incentive to create original, high-quality content disappears, society loses, and the future of the internet is at risk.
“If the internet is going to survive the age of AI, we need to give publishers the control they deserve and build a new economic model that works for everyone — creators, consumers, tomorrow’s AI founders and the future of the web itself,” Cloudflare Co-founder and CEO Matthew Prince said in a statement.
“Original content is what makes the internet one of the greatest inventions in the last century, and it’s essential that creators continue making it,” he continued. “AI crawlers have been scraping content without limits. Our goal is to put the power back in the hands of creators while still helping AI companies innovate. This is about safeguarding the future of a free and vibrant internet with a new model that works for everyone.”
Pay Per Crawl Model for AI Access
In addition to blocking AI bot scraping by default, Cloudflare also announced Pay Per Crawl, which allows website owners to choose, on an individual basis, to let AI crawlers scrape their site at a set rate — a micropayment for every single “crawl.”
“Cloudflare’s primary goal is to help site owners and publishers decide which crawlers can access their content and create the conditions for a market to develop,” Cloudflare’s Head of AI Control, Privacy and Media Products, Will Allen, told TechNewsWorld.
“With the development of Pay Per Crawl,” he said, “Cloudflare is experimenting with a way to help content creators be compensated for their contributions to the AI economy. Pay Per Crawl will let creators control access and get paid, ensuring AI companies can use quality content the right way — with permission and compensation.”
“Personally, I like this idea of a pay-per-crawl model,” observed Jason Dion, chief product officer and founder of Akylade, a provider of cybersecurity certifications, in Altamonte Springs, Fla. “It is similar to using an API and paying for what you utilize.”
“Just like ChatGPT charges users fractions of a penny per token, a similar model could be used to compensate websites that opt in to scraping of their content,” he explained to TechNewsWorld.
“Handling compensation for creators in an AI-augmented world is a sticky issue,” added Allie Mellen, a senior analyst with Forrester Research, a national market research company headquartered in Cambridge, Mass.
“This is one potential solution; however, it’s unclear how AI providers will handle this cost or if they will look to scrape content elsewhere,” she told TechNewsWorld. “It may also result in a few highly trusted websites being offered compensation per crawl, while others stagnate.”
However, Andy Jung, associate counsel for TechFreedom, a technology advocacy group in Washington, D.C., argued that AI companies may settle for the Pay Per Crawl scheme without much resistance to ensure they don’t get accused of “pirating” content, as Anthropic was in the Bartz v. Anthropic case.
“AI companies might agree to pay to crawl websites just to avoid website owners analogizing unpaid crawling to pirating, thereby casting a shadow of doubt over the data AI companies use to train their models,” he told TechNewsWorld.
Potential Big Deal
Greg Sterling, co-founder of Near Media, a market research firm based in San Francisco, argued that Cloudflare’s move is “potentially a big deal,” as the company powers about 20% of the internet and a third of the higher-profile sites.
“It’s an effort to reclaim power and give publishers control over whether and how their content is used by AI, and it seeks to compensate publishers in a time of declining traffic and clicks, which puts their business models at risk,” he told TechNewsWorld, “but it may ultimately not have a significant impact on AI.”
“It remains to be seen how many sites choose to use this,” he said. “There’s a potential FOMO [fear of missing out] problem or prisoner’s dilemma that advantages the AI companies: ‘If I’m not there, my competitors will be.’”
“Yet, it’s still an important step that potentially shifts the terms of debate and power dynamics between content publishers and AI platforms,” he added.
In Cloudflare’s statement, it listed more than 50 companies supporting a permission-based model for AI web crawling, including Adweek, The Associated Press, The Atlantic, BuzzFeed, Condé Nast, Fortune, Gannett Media, O’Reilly Media, Pinterest, Reddit, Sky News Group, Snopes, Time, Universal Music Group and Ziff Davis.
Mark N. Vena, president and principal analyst at SmartTech Research in Las Vegas, maintained that permission-based AI web crawling could be a significant curveball for AI companies, especially those relying on scraping massive amounts of web data to train their models.
“If large swaths of the internet go dark to bots overnight, it limits the diversity and freshness of the training data,” he told TechNewsWorld. “Big players might pivot to more licensing deals, but smaller startups could be left scrambling.”
Rob Enderle, president and principal analyst of the Enderle Group, an advisory services firm in Bend, Ore., noted that Cloudflare’s permissions play will significantly affect both established and new market players. “For existing AIs that already have their training sets, this will reduce their ability to remain current,” he told TechNewsWorld. “For new AIs, it will potentially reduce their initial training sets, making the result less performant.”
“It also looks like they are getting creative with how to deal with AI revenue loss and what many believe is data theft,” he added. “This effort is early yet, and I expect it will evolve significantly over the years, but it is an impressive initial start.”
Balancing AI Innovation With Content Control
Matt Mittelsteadt, a technology policy research fellow at the Cato Institute, a Washington, D.C., think tank, pointed out that there could be security benefits for websites using Cloudflare’s permission-based scheme.
“A permissioned approach is an improvement on the current wild west model,” he told TechNewsWorld. “As is, permissionless scraping has indeed challenged the ability of content providers to maintain control over their digital property. Soon, however, permissions will matter even more.”
“If AI agents become a reality,” he said, “it will be crucial to build infrastructure that can manage, control, and authenticate bots if sites wish to minimize the security risks of malicious or malfunctioning bots or ensure bandwidth preferences for human users.”
Daniel Castro, vice president of the Information Technology and Innovation Foundation, a research and public policy organization in Washington, D.C., argued that Cloudflare’s decision to block AI bots from scraping websites by default could have a meaningful impact on the AI ecosystem.
“Many AI companies are actively seeking access to trusted, high-quality information to train and refine their models — sometimes paying for it, but often relying on public data,” he told TechNewsWorld. “By defaulting to block these crawlers, Cloudflare risks limiting access to that public information, especially for companies that are transparent about their practices and respectful of site preferences.”
“While website owners have every right to control access to their content, restricting broad access to web data could ultimately diminish the accuracy and quality of AI systems,” he continued. “Over time, this could disadvantage users who depend on AI tools to summarize, interpret, or analyze online information. Meanwhile, less scrupulous actors may simply bypass restrictions by mislabeling crawlers or sourcing the data from third-party aggregators.”
Castro added that the Pay Per Crawl model is an interesting attempt to address the tension between AI demand and publisher control. Still, micropayments for individual crawls may not be viable at scale. “The value in training data lies in its breadth, not any one specific source, so this model may primarily benefit the payment intermediaries rather than website owners or AI developers,” he explained.
“In the end, these moves highlight a broader challenge: balancing innovation with control,” he said. “If we tilt too far toward restricting access, we may undermine the open web and the potential of AI to serve the public interest.”