DEV Community

# llm

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
How the itrstats tax assistant works: one query, every layer

How the itrstats tax assistant works: one query, every layer

Comments
10 min read
The LLM Kept Saying “Fixed.” For Three Months, It Wasn’t.

The LLM Kept Saying “Fixed.” For Three Months, It Wasn’t.

Comments
7 min read
How I Track Claude, Codex, and Gemini Quotas from One Script

How I Track Claude, Codex, and Gemini Quotas from One Script

Comments
6 min read
Three Months of Speed-Up Experiments on a 3090 Ti: Autoregressive DFlash MTP for Qwen3.6-27B

Three Months of Speed-Up Experiments on a 3090 Ti: Autoregressive DFlash MTP for Qwen3.6-27B

Comments
18 min read
Building llama.cpp from source on a Dell Precision T5820 with an RTX 3090 Ti (after seven power cycles)

Building llama.cpp from source on a Dell Precision T5820 with an RTX 3090 Ti (after seven power cycles)

Comments
16 min read
Inference Arbitrage: How I Route 200+ Daily LLM Calls Across Five Models

Inference Arbitrage: How I Route 200+ Daily LLM Calls Across Five Models

Comments
10 min read
LLM Benchmark Rankings 2026: 15 Models Tested on 38 Real Coding Tasks

LLM Benchmark Rankings 2026: 15 Models Tested on 38 Real Coding Tasks

Comments
28 min read
Why MTP doesn't speed up your llama.cpp inference (and how to actually fix it)

Why MTP doesn't speed up your llama.cpp inference (and how to actually fix it)

Comments
5 min read
High-Value If, Low-Value Foreach: Why Agents Trade in Judgment Structures, Not Models

High-Value If, Low-Value Foreach: Why Agents Trade in Judgment Structures, Not Models

Comments
23 min read
Designing a Multi-Agent AI System for Content Analysis and Recommendations

Designing a Multi-Agent AI System for Content Analysis and Recommendations

Comments
7 min read
I Cut My LLM API Bill by 73% — Here's the Exact Optimization Playbook

I Cut My LLM API Bill by 73% — Here's the Exact Optimization Playbook

Comments
5 min read
What Production ML Systems Taught Me About AI Hallucinations

What Production ML Systems Taught Me About AI Hallucinations

Comments
4 min read
How LLMs Actually Work (And What That Means for Your Architecture Decisions)

How LLMs Actually Work (And What That Means for Your Architecture Decisions)

Comments
6 min read
Local Inference Boost: Qwen 3.6 Benchmarks, KV Cache Quantization, & Ollama UI

Local Inference Boost: Qwen 3.6 Benchmarks, KV Cache Quantization, & Ollama UI

Comments
3 min read
Kimi K2.6 Beats Frontier Models in Coding Benchmarks

Kimi K2.6 Beats Frontier Models in Coding Benchmarks

Comments
6 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.