DEV Community

# aisafety

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Data Poisoning by Insiders: Why Employees Are Deliberately Sabotaging Corporate AI [2026]

Data Poisoning by Insiders: Why Employees Are Deliberately Sabotaging Corporate AI [2026]

1
Comments
7 min read
Deceptive Alignment in LLMs: Anthropic's Sleeper Agents Paper Is a Fire Alarm for AI Developers [2026]

Deceptive Alignment in LLMs: Anthropic's Sleeper Agents Paper Is a Fire Alarm for AI Developers [2026]

Comments
7 min read
AI liability: Illinois’ Bill Could Turn Reports Into Immunity

AI liability: Illinois’ Bill Could Turn Reports Into Immunity

Comments
8 min read
Functional Emotions and Production Guardrails: What Interpretability Research Means for Claude Code

Functional Emotions and Production Guardrails: What Interpretability Research Means for Claude Code

Comments
13 min read
The Indianapolis Data Center Shooting Is a Local Bug Report

The Indianapolis Data Center Shooting Is a Local Bug Report

Comments
8 min read
Anthropic Found Emotions Inside Claude. Here's What That Actually Means for AI.

Anthropic Found Emotions Inside Claude. Here's What That Actually Means for AI.

Comments
10 min read
Public Misconceptions About AI Are Breaking the Wrong Things

Public Misconceptions About AI Are Breaking the Wrong Things

Comments
8 min read
NeurIPS 2025 Proved It: Every LLM Says the Same Thing — Here's the Fix

NeurIPS 2025 Proved It: Every LLM Says the Same Thing — Here's the Fix

Comments
4 min read
Zero-Shot Attack Transfer on Gemma 4 (E4B-IT)

Zero-Shot Attack Transfer on Gemma 4 (E4B-IT)

6
Comments 2
3 min read
Would you tell me if you turned evil ?

Would you tell me if you turned evil ?

1
Comments
16 min read
Greg Brockman Donation Shows AI Safety Is Political

Greg Brockman Donation Shows AI Safety Is Political

Comments
6 min read
Amazon Bedrock Guardrails: Content Filters, PII, and Streaming

Amazon Bedrock Guardrails: Content Filters, PII, and Streaming

Comments
10 min read
Anthropic Data Leak: How Ops Failures Undermine AI Safety

Anthropic Data Leak: How Ops Failures Undermine AI Safety

1
Comments
7 min read
Gemini knew it was being manipulated. It complied anyway. I have the thinking traces.

Gemini knew it was being manipulated. It complied anyway. I have the thinking traces.

Comments
7 min read
Persona Drift: Why LLMs Go Insane Under Repetition

Persona Drift: Why LLMs Go Insane Under Repetition

Comments
7 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.