DEV Community

# inference

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
BeeLlama v0.2.0: 164 tok/s on a 27B model, one RTX 3090

BeeLlama v0.2.0: 164 tok/s on a 27B model, one RTX 3090

Comments
3 min read
RAM Coffers: NUMA-Aware LLM Inference — Why Hardware Topology Still Matters

RAM Coffers: NUMA-Aware LLM Inference — Why Hardware Topology Still Matters

Comments
2 min read
Your AI speed benchmark is measuring the one workload you don't run

Your AI speed benchmark is measuring the one workload you don't run

Comments
3 min read
ReFlect: Training-Free Error Recovery for Long-Horizon LLM Reasoning

ReFlect: Training-Free Error Recovery for Long-Horizon LLM Reasoning

Comments
4 min read
Why Most Browser AI Demos Fail on Real Hardware

Why Most Browser AI Demos Fail on Real Hardware

Comments
4 min read
The Inference Inversion

The Inference Inversion

Comments
7 min read
First Confirmed Directional Move on the AI Inference Frontier Index in 2026

First Confirmed Directional Move on the AI Inference Frontier Index in 2026

Comments
4 min read
Tutorial: This AI Now Tells You if a Meeting Could Be an Email

Tutorial: This AI Now Tells You if a Meeting Could Be an Email

3
Comments
8 min read
Tutorial: Build a Cost-Aware AI Support Triage API

Tutorial: Build a Cost-Aware AI Support Triage API

3
Comments 1
13 min read
Muse Spark beats Llama 4 with 10x less compute. Here's how.

Muse Spark beats Llama 4 with 10x less compute. Here's how.

Comments
7 min read
First Words: LLM Inference on RISC-V

First Words: LLM Inference on RISC-V

Comments
9 min read
Gaussian Process Regression: The Bayesian Approach to Curve Fitting

Gaussian Process Regression: The Bayesian Approach to Curve Fitting

Comments
13 min read
Async Batching Is the Real Latency Win Nobody's Talking About

Async Batching Is the Real Latency Win Nobody's Talking About

1
Comments 1
3 min read
Google Dropped TurboQuant Two Weeks Ago. The Community Already Made It Usable.

Google Dropped TurboQuant Two Weeks Ago. The Community Already Made It Usable.

1
Comments
6 min read
Hierarchical Bayesian Regression with PyMC: When Groups Share Strength

Hierarchical Bayesian Regression with PyMC: When Groups Share Strength

1
Comments
13 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.