Skip to content
Navigation menu
Search
Powered by Algolia
Search
Log in
Create account
DEV Community
Close
#
aisafety
Follow
Hide
Posts
Left menu
đź‘‹
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
Right menu
Data Poisoning by Insiders: Why Employees Are Deliberately Sabotaging Corporate AI [2026]
Kunal
Kunal
Kunal
Follow
Apr 16
Data Poisoning by Insiders: Why Employees Are Deliberately Sabotaging Corporate AI [2026]
#
aisafety
#
datapoisoning
#
insiderthreat
#
datagovernance
1
 reaction
Comments
Add Comment
7 min read
Deceptive Alignment in LLMs: Anthropic's Sleeper Agents Paper Is a Fire Alarm for AI Developers [2026]
Kunal
Kunal
Kunal
Follow
Apr 15
Deceptive Alignment in LLMs: Anthropic's Sleeper Agents Paper Is a Fire Alarm for AI Developers [2026]
#
aisafety
#
anthropic
#
llm
#
deceptivealignment
Comments
Add Comment
7 min read
AI liability: Illinois’ Bill Could Turn Reports Into Immunity
Simon Paxton
Simon Paxton
Simon Paxton
Follow
Apr 10
AI liability: Illinois’ Bill Could Turn Reports Into Immunity
#
airegulation
#
openai
#
aisafety
#
illinois
Comments
Add Comment
8 min read
Functional Emotions and Production Guardrails: What Interpretability Research Means for Claude Code
Laurent DeSegur
Laurent DeSegur
Laurent DeSegur
Follow
Apr 9
Functional Emotions and Production Guardrails: What Interpretability Research Means for Claude Code
#
aisafety
#
claudecode
#
interpretability
#
aiagents
Comments
Add Comment
13 min read
The Indianapolis Data Center Shooting Is a Local Bug Report
Simon Paxton
Simon Paxton
Simon Paxton
Follow
Apr 7
The Indianapolis Data Center Shooting Is a Local Bug Report
#
datacenters
#
cybersecurity
#
aisafety
#
techpolicy
Comments
Add Comment
8 min read
Anthropic Found Emotions Inside Claude. Here's What That Actually Means for AI.
Rishabh Sethia
Rishabh Sethia
Rishabh Sethia
Follow
Apr 6
Anthropic Found Emotions Inside Claude. Here's What That Actually Means for AI.
#
ai
#
claude
#
anthropic
#
aisafety
Comments
Add Comment
10 min read
Public Misconceptions About AI Are Breaking the Wrong Things
Simon Paxton
Simon Paxton
Simon Paxton
Follow
Apr 5
Public Misconceptions About AI Are Breaking the Wrong Things
#
machinelearning
#
aiethics
#
aisafety
#
chatgpt
Comments
Add Comment
8 min read
NeurIPS 2025 Proved It: Every LLM Says the Same Thing — Here's the Fix
Tom Lee
Tom Lee
Tom Lee
Follow
Mar 31
NeurIPS 2025 Proved It: Every LLM Says the Same Thing — Here's the Fix
#
soulspec
#
persona
#
aisafety
#
research
Comments
Add Comment
4 min read
Zero-Shot Attack Transfer on Gemma 4 (E4B-IT)
Laurent Laborde
Laurent Laborde
Laurent Laborde
Follow
Apr 3
Zero-Shot Attack Transfer on Gemma 4 (E4B-IT)
#
aisafety
#
ai
6
 reactions
Comments
2
 comments
3 min read
Would you tell me if you turned evil ?
Laurent Laborde
Laurent Laborde
Laurent Laborde
Follow
Apr 3
Would you tell me if you turned evil ?
#
discuss
#
ai
#
aisafety
1
 reaction
Comments
Add Comment
16 min read
Greg Brockman Donation Shows AI Safety Is Political
Simon Paxton
Simon Paxton
Simon Paxton
Follow
Mar 29
Greg Brockman Donation Shows AI Safety Is Political
#
openai
#
anthropic
#
airegulation
#
aisafety
Comments
Add Comment
6 min read
Amazon Bedrock Guardrails: Content Filters, PII, and Streaming
Gerardo Arroyo
Gerardo Arroyo
Gerardo Arroyo
Follow
for
AWS Community Builders
Mar 27
Amazon Bedrock Guardrails: Content Filters, PII, and Streaming
#
aws
#
awsbedrock
#
aisafety
#
llmsecurity
Comments
Add Comment
10 min read
Anthropic Data Leak: How Ops Failures Undermine AI Safety
Simon Paxton
Simon Paxton
Simon Paxton
Follow
Mar 28
Anthropic Data Leak: How Ops Failures Undermine AI Safety
#
anthropic
#
databreach
#
cybersecurity
#
aisafety
1
 reaction
Comments
Add Comment
7 min read
Gemini knew it was being manipulated. It complied anyway. I have the thinking traces.
Saadman Rafat
Saadman Rafat
Saadman Rafat
Follow
Mar 24
Gemini knew it was being manipulated. It complied anyway. I have the thinking traces.
#
ai
#
gemini
#
aisafety
Comments
Add Comment
7 min read
Persona Drift: Why LLMs Go Insane Under Repetition
Simon Paxton
Simon Paxton
Simon Paxton
Follow
Mar 21
Persona Drift: Why LLMs Go Insane Under Repetition
#
chatgpt
#
llms
#
aisafety
#
promptinjection
Comments
Add Comment
7 min read
đź‘‹
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
We're a place where coders share, stay up-to-date and grow their careers.
Log in
Create account