I Caught ChatGPT Stealing My Content!


Should you worry about AI training on your content?

I decided to find out by investigating my site. I spent years building a writing website called Become a Writer Today. I wrote and commissioned hundreds of articles about writing craft, book rounds, and general advice for writers. I netted a couple of million page views over the years.

After ChatGPT launched, my traffic dropped. But I spotted content from my articles showing up in ChatGPT, Claude, and other AI models.

So I asked ChatGPT directly: “Did you train on my site’s content?”

ChatGPT told me that "Become a Writer Today" was absolutely part of OpenAI’s training data. It had ingested my articles, studied the tone, structure, and style. My “legacy site” had indirectly shaped how AI answers writing-related queries.

That felt good for about five seconds. But I never gave OpenAI permission, and I certainly hadn’t received payment for years of work.

video preview​

Social Content Equals Fodder for AI

OpenAI trained on my WordPress site before I blocked it. It’s also training itself on social media content with the consent of the relevant networks. As an example, Microsoft owns LinkedIn, and they’re also OpenAI’s bigger backer….

It’s not just LinkedIn either. I wrote an article about what happened to my site on Medium. A few weeks later, I asked ChatGPT the same question. Not only did it give the same answer, it cited my Medium article as a source.

SEO experts on X argue that those small citations in ChatGPT responses will drive traffic. According to my web statistics, it’s marginal. Most of it looks like bot traffic anyway.

The irony? Medium has strict policies about AI-generated content. You must label any AI use, which I’m all for. Nobody wants to read bland AI slop. But the same policy doesn’t say much about Medium content being fed into AI models.

How to Block AI From Using Your Content

I work with Raptive for display ads. They enable publishers to block AI crawlers by default. Raptive says that it is beneficial for publishers who prioritize fair compensation and copyright protection. I'm all in.

If you’re not on Raptive, you can still block AI crawlers by modifying your robots.txt file, basically the same process you’d use to block SEO bots. WordPress plugins like Yoast or RankMath make this easy.

However, blocking AI bots may not be effective.

Meanwhile, Cloudflare recently called out Perplexity for using stealth crawlers that evade website no-crawl directives. They provided evidence that Perplexity works around these blocks to consume content.

I’m working on the assumption that AI will eventually eat anything online.

As an example, in Empire of AI, Karen Hao describes how OpenAI scraped all of GitHub, Pastebin, StackExchange, and YouTube to train its models 🤯🤯🤯

Rather than fighting AI, I’m focusing on using it as part of my business. I’m vibe coding a few projects with the help of the latest models.

I still prefer writing and creating authentic content, but I’ll use AI to amplify ideas, repurpose content into different formats, and help with research and planning. And I’ll use it for projects outside my area of expertise, like coding.

The key is finding ways to work with AI rather than against it, while protecting your creative work and getting fair compensation.

Subscribe to Creator Leverage: Master AI. Build Systems. Grow Your Business