DEV Community

# cuda

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Notes on CUDA Tensor Core GEMM (WMMA)

Notes on CUDA Tensor Core GEMM (WMMA)

Comments
4 min read
Where Tensor-Parallel Inference Hits the NVLink Wall

Where Tensor-Parallel Inference Hits the NVLink Wall

Comments
2 min read
Tracing torch.cuda.empty_cache() on an RTX 4090 - Where Do the 53 MB Go?

Tracing torch.cuda.empty_cache() on an RTX 4090 - Where Do the 53 MB Go?

Comments
5 min read
The Microsecond Lie: Why your Go timers are lying about the GPU

The Microsecond Lie: Why your Go timers are lying about the GPU

Comments
3 min read
Profiling a CUDA Python Program with GPUFlight

Profiling a CUDA Python Program with GPUFlight

Comments
10 min read
TensorRT `trt.Dims` SIGSEGV inside a GStreamer Python plugin — root cause and fix

TensorRT `trt.Dims` SIGSEGV inside a GStreamer Python plugin — root cause and fix

Comments
4 min read
Calling CUDA from Go without cgo

Calling CUDA from Go without cgo

1
Comments
2 min read
Why CUDA kernels silently corrupt memory and how to catch the bug

Why CUDA kernels silently corrupt memory and how to catch the bug

Comments
5 min read
CUDA Out of Memory at 60% Utilization: Tracing PyTorch GPU Memory Fragmentation

CUDA Out of Memory at 60% Utilization: Tracing PyTorch GPU Memory Fragmentation

Comments
4 min read
How I optimized a Solana vanity address grinder to 44M keys/sec on GPU

How I optimized a Solana vanity address grinder to 44M keys/sec on GPU

Comments
2 min read
From Black Magic to Science: The Evolution of the CUDA Optimization Skill

From Black Magic to Science: The Evolution of the CUDA Optimization Skill

Comments
11 min read
Learning Resources Tech

Learning Resources Tech

Comments
1 min read
512MiB 512MB — the silent trtexec bug

512MiB 512MB — the silent trtexec bug

Comments
2 min read
Memory Coalescing: Same computation, 6x Performance Difference

Memory Coalescing: Same computation, 6x Performance Difference

Comments
6 min read
Setting Up NVIDIA Drivers and CUDA for ML/DL on Ubuntu 22.04

Setting Up NVIDIA Drivers and CUDA for ML/DL on Ubuntu 22.04

1
Comments
3 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.