Can we detect AI-generated content?

I guess Saruman is the only one who gets to say “Delve” unironically now

August 02, 2024
Can we detect AI-generated content?

The word “Delve” has became a meme in AI circles as people have started pointing it out as a dead giveaway for AI generated content. I’ve personally seen plenty of personalised and very obviously ChatGPT generated emails that offer to “delve deeper” or “delve into” some sales pitch.

The Guardian later speculated that Delve and other AI-isms might be getting introduced during the post-training RLHF process which companies like OpenAI must run to make the models fit for the general public.

Although RLHF successfully removes the worst safety issues (e.g. freshly trained LLMs write Walter White fanfic that would get you arrested) it also adds weird word choices like Delve and develops a writing style that “sounds” like it knows what it’s talking about (i.e. sycophancy) as a strategy to dodge negative feedback from the auditors.

Twitter: @JeremyNguyenPhD!

@steve.green.nz Since 1990 46% of all the papers on OpenAlex that use the word "delve" were added since January 2023. #ai #artificialintelligence #delvedeep #practicalai #nz #wellington ♬ original sound - Steve Green

Can we detect AI-generated images?

As of August 2024? Not very well, but like everything else in AI right now this question is a moving target. If I had to guess I’d say we’ll see it boil down to a question of how much information is getting tested. AI detection will be an ongoing arms race where the more information available the more opportunities detectors have to spot signs of synthetic content.

It’s in the interests of companies like OpenAI/Anthropic to make their content easy to detect and Adobe Firefly is already watermarking their images, going forward dodging detection is going to be something only bad actors will be attempting.

Images and video should have more information available to detect but one way to counter that is to keep the changes minor, consider this Deepfaked version of Back to the Future:

Robert Downey Jr and Tom Holland star in...

'Deepfake to the Future!'

When you do a frame by frame comparison you can see how little of the video is actually changed:

Deepfake to the Future!

(That’s an animated GIF, blink and you’d miss it.)

The fact so little of the image is modified probably explains why this Deepfake detector tool struggles to detect the change:

Back to the Future

Back to the Future!

Deepfake to the Future

Deepfake to the Future!

Can we detect AI-generated text?

Does the word “Delve” really give away that text is AI generated?

We can try triggering a false positive by adding words such as “Delve” to my blog post from last week.

Adding Delve!

  • Adding the 5x words (including “Delve”) didn’t create a false positive, in fact it actually made writer.com more confident that my blog post was 100% human generated (which it is, artisanally hand crafted by a real person just like our ancestors did countless 100s of days ago).
  • Adding 15x words did make ZeroGPT a little suspicious but otherwise they all seemed to agree that my blog post was an original work.

So does this mean that a Twitter meme was wrong and Delve isn’t a signal for AI generated content? Most likely not, these detectors are doing statistical analysis of the text rather than looking for specific words, my guess is that unless your writing style weirdly matches ChatGPT then your content should be safe.

AI detectors tools:

LLM Model: For the purposes of today’s test I’m using Anthropic’s Claude-3.5

List of words believed to be commonly added by LLMs today:

  1. Delve
  2. Harnessing
  3. At the heart of
  4. In essence
  5. Facilitating
  6. Intrinsic
  7. Integral
  8. Core
  9. Facet
  10. Nuance
  11. Culmination
  12. Manifestation
  13. Inherent
  14. Confluence
  15. Underlying

Let’s try this in the other direction, attempt to plagiarise the blog post and have the LLM write a knockoff version, perhaps to generate marketing copy that google won’t flag as a duplicate.

I’ll try 3 approaches:

  • Summarise then Generate: Prompt the LLM to summarise the blog post, capture all the points made and then use a second prompt to generate a new blog post based on the summary.
  • Plagiarize: Prompt the LLM to simply recreate the blog post, add a few spelling mistakes and vary the tone and style.
  • Translate: As above but feed the output through an English > Te Reo translation and back again.

Adding Delve!

  • Summarise then Generate: Easily detected (except Writer which seems to be struggling) but it didn’t generate a very compelling blog post.
  • Plagiarize: Much better, the spelling mistake helped but providing the full original blog post in the prompt and asking it to match the style had the biggest impact.
  • Translate: This worked surprisingly well in fooling the detectors but obviously translating anything into a language and back again results in some fairly garbled text.

(Note: I did have a 4th option that was detected a little easier than the Translate option but the content was much better, it wasn’t rocket science but no good will come of sharing it.)

So we’re doomed to AI-Slop forever?

I actually think we’ve already passed peak AI-slop, the period of time where low quality AI-generated content could be passed off as human made and sucessfully monetized was very short and hopefully already finished.

Companies are also learning that just tacking the letters AI onto your product isn’t an AI strategy, in fact it’s driving consumers away and any attempts to hide AI cost cutting risking public shaming on reddit. What’s overlooked with AI-Slop is that it doesn’t add anything novel, AI’s strength is intelligence not cheap content generation, we already had Fiverr and content farms long before ChatGPT and LLMs.

Similar to the concerns around job market disruption AI doesn’t so much introduce a problem as it exacerbates an existing one, usually one we’ve been in denial about. Much of the focus on detecting AI-generated content is coming from Education but as any hiring manager or parent will tell you there was already a disconnect between student’s grades and their actual knowledge and skills.

Likewise with plagiarism and the problem of low quality content in general, these aren’t new problems. There isn’t a recipe on the internet that was improved by the author’s rambling life story nor should we pretend AI invented the “Big-4 intern” style of writing it’s now mimicking. The problems were already here, AI has made them worse and the only thing that’s changed is that we can’t ignore them any longer.

AI vs AI could be the solution

One incredible and slightly scary possibility is that if this Generative-AI vs AI-Detectors arms race continues the AI-Detectors may be forced to become less statistical detectors of AI but universal detectors of any kind of “low quality content” regardless of author Human or AI.

Problems that AI makes worse are often opportunities for AI to resolve entirely, if we could detect low quality content we could:

  • Get rid of exams in education and the accompanying studying/teaching to standarized tests. Exams could be replaced with personalised evaluations that measure concrete skills and knowledge that has stuck.
  • Companies could transform their HR departments recruitment process into a true ‘diamond in the rough’ detector rather than today’s CV keyword filters which reject all but the most ‘polished rocks’.
  • Break the effective monopoly Reddit has on user generated content by allowing anyone to sift out the quality comments without needing an army of volunteers to push back the bots, trolls and brigading that ruins all other smaller platforms.

So yes, we have an AI-Slop problem and we have an AI-detection problem but before those we already had a much bigger “low quality content” problem which AI could now help us solve and like everything else in AI, that would change everything.