Rise of the Writer
Web-scraped training post-2022 is increasingly poisoned. Human writing is down. The importance of human writing is up. You should write more.
Key Papers:
- The Curse of Recursion: Training on Generated Data Makes Models Forget (arXiv)
- AI models collapse when trained on recursively generated data (Nature)
- Model Collapse and the Right to Uncontaminated Human-Generated Data (Harvard JOLT)
Explainers:
- What Is Model Collapse? (IBM)
- The AI feedback loop: Researchers warn of 'model collapse' (VentureBeat)
- Model collapse (Wikipedia)
- Synthetic data, real harm (Ada Lovelace Institute)
What humans actually wrote prior to 2023 is quickly becoming the new pre-atomic steel. Blog posts. StackOverflow answers. Even tweets.
As LLM-slop writing explodes on all platforms, "natural", authentic human writing collapses. Why not have OpenClaw write 10 blog posts to try and boost engagement on your website? Then share it to HackerNews?
These vectors combine to create a unique situation where your natural writing may be more powerful than ever.
Assuredly (no citation) the big AI players will be pushing their scrapers to downgrade or filter out AI slop or heavily AI generated content. They will be looking high and low for authenticity.
historical blogs offered excellent training
Pre-2009 was a heyday for blogging. Before everyone moved to twitter or something else. You had a blogroll. You typed a url into your desktop and checked to see if others you shared interests with updated their blogs. This information is rich and easy to contextualize and parse, so LLMs have an unbelievable understanding of 2003-2009 tech information.
You can actually test this.
_why's desktop application system Shoes.rb has not been meaningfully used in over a decade and really lost steam in August 2009 when _why disappeared. (Note: SchwadLabs is actively working on bringing it back).
Ask your LLM to whip up a Shoes app for you and it will dance circles like Shoes is still the hottest thing on the street.
People BLOGGED about Shoes. They SHARED their Shoes code. They TALKED about it. It was easy to see and document. (Not to mention, the elegant DSL must be very easy to grok for an LLM).
shitty writing
Our blogging dialects have had to drift a little, haven't they? The double-hyphen is definitely a no-go anymore. Do you ever read a paragraph and think "damn, that sounds like AI, I have to change it"? Nightmare. Because we all now have pattern-matching filters with everything we read. We're almost blinded from the content while searching out trust or authenticity.
It kind of sucks.
What sucks more is LLMs will become savvy to this, or with savvy prompting, would be easily able to emulate my intentionally-non-ai-toned-please-believe-this-is-a-human post. Building trust with new content will continue to be a challenge.
I don't really have an answer to this. For now, using a slightly more informal tone and maybe letting some editorial mistakes in my writing slide as a proof-of-humanity, that's what I think I'll go with.
write something, anything
With fewer people writing "by hand", and the very real possibility that LLMs will start hyper-weighting "handwritten" content, your words will have an outsized impact now.
If you can bear it, hop back on your keyboard. Write something out to the universe. Not for likes on LinkedIn. Not to grift for your B2B Developer Tool SAAS AI LLM Integration software widget. Just for the love of the game. Publish it to your personal website with no analytics.
Has anyone read it? Who gives a shit.
Get updates from SchwadLabs
New projects, technical posts, and the occasional behind-the-scenes note. No spam.