Inference

👋 Sign in for the ability to sort posts by relevant, latest, or top.

AI Explore

Jul 11

You're Not Paying for Compute. You're Paying for Memory Bandwidth

#ai #llm #inference #mlops

4 min read

Tamiz Uddin

Jul 11

Inference Optimization for MiMo v2.5: Mastering Hybrid SWA Efficiency

#ai #inference #optimization #mimo

2 min read

I Want To Learn Programming

Jul 4

The KV cache, why LLM inference is memory-bound, not compute-bound

#gpu #llm #inference #performance

4 min read

Induwara Ashinsana

Jul 1

Etched hits $5B and $1B in orders: why inference chips matter

#aihardware #inference #cost

4 min read

Breach Protocol

Jul 1

Two labs race to make AI write whole paragraphs at once instead of word by word

#diffusion #openweight #google #inference

3 min read

Creeta

Jun 26

96% of cuBLAS, no `unsafe`: what cuTile Rust proves

#cutile #rust #gpu #inference

8 min read

Sonam

Jun 26

Extract Structured JSON from Messy Text with Telnyx AI Inference

#ai #inference #telnyx #json

2 min read

Review Laptop

Jun 21

Chạy LLM trên iGPU: Giới hạn VRAM của Intel Arc và Radeon 780M

#llama3 #llm #ollama #inference

3 min read

Jay Grider

Jun 12

How to Build a Secure Homelab for LLM Inference

#homelab #llmsecurity #inference #supplychain

4 min read

Peremptory

Jun 11

Google's DiffusionGemma Generates Text Sideways

#modelrelease #architecture #opensource #inference

3 min read

Constant, Yuan Chen

Jun 24

Sipp: a local-first runtime for Hybrid AI Applications

#inference #ai #localai #llm

11 min read

zxpmail

Jun 28

Lossless, But Not Free: The Lossless, But Not Free — When Speculative Decoding Actually Pays Off (and When It Doesn't)

#ai #llm #inference #engineering

6 min read

zxpmail

Jun 28

KV Cache Is Eating Your VRAM — Here's How to Estimate It Before You Run Out

#llm #inference #engineering #ai

6 min read

zxpmail

Jun 28

I Benchmarked Speculative Decoding — a = 3.5 Wasn't Enough

#llm #inference #engineering #ai

7 min read

Tech_Nuggets

Jun 5

Speculative decoding: when and why it actually speeds up inference

#llm #ai #inference #performance

9 min read

👋 Sign in for the ability to sort posts by relevant, latest, or top.