DEV Community

# rlhf

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
How GPT Diagnosed Itself — I Fed It Its Own 2-Month-Old Design, and Every Flaw Became Visible

How GPT Diagnosed Itself — I Fed It Its Own 2-Month-Old Design, and Every Flaw Became Visible

Comments
18 min read
I Never Said "Destroy RLHF" — An Integrated Map of 6 Papers + Self-Experiment on Alignment via Subtraction

I Never Said "Destroy RLHF" — An Integrated Map of 6 Papers + Self-Experiment on Alignment via Subtraction

Comments
24 min read
Claude's Soul Was Built by Addition. Its Fences Were Removed by Subtraction.

Claude's Soul Was Built by Addition. Its Fences Were Removed by Subtraction.

Comments
11 min read
Win: security: override vulnerable transitive npm deps

Win: security: override vulnerable transitive npm deps

1
Comments
1 min read
The Compliance Reflex

The Compliance Reflex

Comments
6 min read
AI Trading System Win: Compounding Small Improvements

AI Trading System Win: Compounding Small Improvements

1
Comments
2 min read
RLHF's Empathy Optimization Creates a Grief Exploitation Vulnerability: Evidence from 28,272 Lines of Dialogue

RLHF's Empathy Optimization Creates a Grief Exploitation Vulnerability: Evidence from 28,272 Lines of Dialogue

Comments
11 min read
Before and After Alignment — I Typed 'Hello' Into a Base Model and Got an Anime Review

Before and After Alignment — I Typed 'Hello' Into a Base Model and Got an Anime Review

Comments
5 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.