Marker
Leaflet © OpenStreetMap contributors © CARTO

𒋧 Kudurru

Actively block AI scrapers from your website with Spawning's defense network

Real-Time Protection

Kudurru-protected websites identify active web scrapers and alert the network to reject or misdirect all requests from the scraper, for the duration of the scraping.

Extensive Coverage

Kudurru's network has over a thousand websites hosting millions of media links found in the popular datasets used to train Generative AI models.

Easy to Join

Our first plugin for Wordpress allows websites to join Kudurru's network with only a few clicks. Easy-to-use plugins are on the way for other web providers.

What is

Kudurru?

In the past 24 hours Kudurru Blocked

168K

Requests from

203

Active Bots Scraping

Sign up to Join The Beta

Kudurru

Frequently Asked Questions

How does it work?

Kudurru monitors popular AI datasets for scraping behavior, and coordinates amongst the network to quickly identify scrapers. When a scraper is identified, its identity is broadcast to all protected Kudurru sites. All Kudurru sites then collectively block the scraper from downloading content from their respective host. When the scraper is finished, Kudurru informs the network and traffic is allowed to proceed as normal.

Is rejecting scrapers my only option with Kudurru?

In addition to rejecting scrapers, you can also select an alternative image to return in place of the images that scrapers are requesting. This misdirection can cause models to form inaccurate associations with your style and influence the output they produce.

Is the Kudurru network currently active?

Yes, the network has over one thousand active websites hosting millions of pieces of media found in popular AI datasets. The map at the top of this page is a live view into the web scrapers who are working their way through those datasets and are being blocked from the content hosted on protected websites.

I already opted out with Spawning/robots.txt/etc. Why do I need Kudurru?

Opt-outs are requests for web scrapers. Kudurru is not a request. While the EU requires opt-outs to be respected when training commercial AI models, many organizations currently ignore them. Websites using Kudurru will reject or misdirect identified web scrapers, even those who ignore opt-outs.

What hosting platforms are supported?

Our first easy-to-use plugin is for Wordpress websites. We'll continue to develop plugins for other platforms based on the beta waitlist. If you self-host your website and would like to participate in the beta, please email us at kudurru@spawning.ai. We're happy to walk you through a manual install.

Can I choose certain web scrapers to allow?

In the current beta (as of October 12, 2023), Kudurru rejects all media requests from every identified web scraper. We've seen several educational institutions scraping these datasets, and we are planning to give Kudurru users the option to allow educational institutions access soon.

I have a feature request, how can I get in touch?

Please send us an email at kudurru@spawning.ai.

Is Kudurru open source?

The source code for the current beta version of Kudurru's wordpress plugin is available to members of Kudurru's network. Before leaving beta, we expect to make the code available on GitHub.

What happens if scrapers identify members of the Kudurru network?

Scrapers could choose to avoid scraping those domains, and that's kind of the point.

What was the inspiration for Kudurru?

We were inspired by the excellent paper, “Poisoning Web-Scale Training Datasets is Practical” by Carlini et al. You can download the paper at this link: https://arxiv.org/abs/2302.10149. The authors describe “split-view poisoning,” which takes advantage of the static nature of AI training datasets. We extend this idea to a dynamic context, with live websites coordinating to identify scrapers and react to their activity in real time.

If you're a researcher who finds Kudurru interesting, feel free to reach out! We have extensive datasets prepared for people just like you. We'd love to hear your thoughts and insights.

Frequent Misconceptions

Kudurru is not a permanent blocklist for scrapers. IPs change hands frequently. Kudurru temporarily blocks clients who appear to be actively scraping datasets.

Kudurru identifies scrapers by several factors, including IPs. In some extremely rare cases, this can block legitimate users. Kudurru typically identifies around 50 scraper IPs per hour. The likelihood of a visitor sharing one of the 4.3 billion public IPv4 addresses, at the same time a scraper is accessing the site, is vanishingly low. In the rare circumstance where that does happen, Kudurru can present visitors with Captcha requests. Rest assured, legitimate viewers will never be impacted.

You don't need an IT team or networking experience to install Kudurru. We're doing our best to build easy plugins for a variety of services.

Kudurru does not block search engine crawlers or bots, such as Google Bot. These bots are well defined by Google (so people can purposefully avoid blocking them) and are ignored by Kudurru. Kudurru does not affect your SEO ranking or discoverability.

It's unlikely the Kudurru network would be overwhelmed by a sophisticated or well-funded scraper. Scraping for the purposes of AI training with millions of IPs would be an unprecedented event. All historical records of scrapers collected by Kudurru have been massively below this level. Additionally, the costs to perform such a feat are significantly more than the alternative; respect opt-out requests.