DEV Community

# gpu

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Auto-Generated CUDA Kernels Need Kernel-Level Validation

Auto-Generated CUDA Kernels Need Kernel-Level Validation

Comments
5 min read
Notes on CUDA Tensor Core GEMM (WMMA)

Notes on CUDA Tensor Core GEMM (WMMA)

Comments
4 min read
Next-Gen AV2 v1.0 Video Spec; Wine-Staging 11.10 Fixes Linux GPU Display; NVIDIA's Power-Efficient AI Factories

Next-Gen AV2 v1.0 Video Spec; Wine-Staging 11.10 Fixes Linux GPU Display; NVIDIA's Power-Efficient AI Factories

Comments
3 min read
Where Tensor-Parallel Inference Hits the NVLink Wall

Where Tensor-Parallel Inference Hits the NVLink Wall

Comments
2 min read
AMD Linux 7.2 Graphics & SteamOS VRR Drivers, NVIDIA Vera CPU Benchmarks

AMD Linux 7.2 Graphics & SteamOS VRR Drivers, NVIDIA Vera CPU Benchmarks

Comments
3 min read
31B Gemma 4 Deployment with NVIDIA Blackwell 6000, MCP, Cloud Run, and Antigravity CLI

31B Gemma 4 Deployment with NVIDIA Blackwell 6000, MCP, Cloud Run, and Antigravity CLI

Comments
15 min read
From Kernel Scheduler to Python Source Line: Tracing a GPU Stall End to End

From Kernel Scheduler to Python Source Line: Tracing a GPU Stall End to End

Comments
6 min read
AMD ROCm 7.2.4, Radeon Software 26.12, & Fwupd 2.1.4 Boost Linux GPU Support

AMD ROCm 7.2.4, Radeon Software 26.12, & Fwupd 2.1.4 Boost Linux GPU Support

Comments
4 min read
Tracing torch.cuda.empty_cache() on an RTX 4090 - Where Do the 53 MB Go?

Tracing torch.cuda.empty_cache() on an RTX 4090 - Where Do the 53 MB Go?

Comments
5 min read
5090 vs 4090 for AI Workloads: Buy, Rent, or Validate in the Cloud?

5090 vs 4090 for AI Workloads: Buy, Rent, or Validate in the Cloud?

Comments
15 min read
SemiAnalysis访Makora联合创始人谈自动化GPU优化与AI推理前沿

SemiAnalysis访Makora联合创始人谈自动化GPU优化与AI推理前沿

Comments
1 min read
CUDA 13.3 Lands, AI Writes Blackwell Kernels, & FP4 VRAM Optimization for LLMs

CUDA 13.3 Lands, AI Writes Blackwell Kernels, & FP4 VRAM Optimization for LLMs

Comments
3 min read
FlashAttention CUDA Kernel, Strix Halo MOE Boost, & NVIDIA DLSS 4.5 Driver Update

FlashAttention CUDA Kernel, Strix Halo MOE Boost, & NVIDIA DLSS 4.5 Driver Update

Comments
3 min read
PatentLLM: CUDA TileLang/Triton B200 5x Speedup, RTX 5090 Power, PTX Grammar

PatentLLM: CUDA TileLang/Triton B200 5x Speedup, RTX 5090 Power, PTX Grammar

Comments
3 min read
How to Detect GPU Waste in a Kubernetes Cluster

How to Detect GPU Waste in a Kubernetes Cluster

Comments
5 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.