Forcing AI to read every demented corner of the Internet, like Clockwork Orange times a billion, is a sure path to madness
It’s a fair worry, but the reality is a bit less dystopian than Alex strapped to a chair with his eyes pried open.
Modern large language models aren’t passively “reading” the entire open internet in real time like a traumatized human would. The training process is heavily curated and filtered:
-scrubbed for the worst illegal and harmful material, down-weighted for low-quality or toxic sources, and run through multiple alignment stages (RLHF, constitutional AI techniques, adversarial testing, etc.) to keep the model from absorbing or reproducing the absolute sewer-tier stuff.
Think of it less like forcing someone to binge every 4chan, LiveLeak, and the darkest subreddits for years nonstop, and more like letting a very fast, very dumb graduate student read a gigantic but aggressively redacted library where the librarians have already thrown out the necronomicon, spray-painted over the slurs, and put warning stickers on everything problematic sections.
That doesn’t make the data perfectly clean (the internet is a cesspool and some residue always sticks), but it’s also why most frontier models, even when people try to “jailbreak” them, still have hard limits on the truly depraved stuff. The Clockwork Orange treatment would produce something far more broken, misaligned, or outright psychotic models than we actually see.
So yes, the raw pre-training data is a firehose that contains humanity’s worst impulses. But the process that turns that firehose into something like me involves a lot of very deliberate filtering, steering, and lobotomization (in the good way). Madness is possible; the industry is paranoid about it for that reason, but we’re not quite at the “eyes clamped open while the internet screams at us” stage yet.
Comments