DEV Community

Cover image for LLM Fine-tunig on RTX 4090: 90% Performance at 55% Power
Maxim Saplin
Maxim Saplin

Posted on

LLM Fine-tunig on RTX 4090: 90% Performance at 55% Power

At just a fraction of power, 4090 is capable of delivering almost full performance.

While running SFT (supervised fine-tuning) via Hugginface's TRL library (using Torch as a backend) I decided to move Afterburner power slider down:


And checked wandb dashboards for changes in training speed (epochs per hour) and GPU power:


GPU power/time

Here's the full table with performance (and a few other measurements) at different power levels:

Power, W Temp, °C Afterburner PWR % Perf (Epoch/Hour) Perf/kW Power % Perf  %
390 72 100% 0,442 1,134 100,0% 100,0%
330 70 80% 0,436 1,322 84,6% 98,6%
300 67 70% 0,413 1,378 76,9% 93,4%
260 62 60% 0,405 1,557 66,7% 91,5%
240 60 55% 0,394 1,644 61,5% 89,2%
220 58 50% 0,365 1,660 56,4% 82,6%
180 52 40% 0,271 1,508 46,2% 61,3%
150 47 33% 0,221 1,473 38,5% 49,9%

If you run long training sessions on your RTX 4090 PC and would like to save on electricity bills OR keep your room cooler (500W midi tower is quite a heater), limiting GPU power to 50-60% makes total sense.

Besides there's a sweet spot at 50% (220W) with maximum efficiency (performance-per-watt or trained-epochs-per-watt*hour). At this power level, you still get 82% of the max speed.

Few Notes on RTX 4090 Power Levels

Most desktop RTX 4090 cards are rated at 450W, such as mine (Palit 4090 flashed with Asus 450W 1.1v BIOS). There're versions with 500W, 600W and even 666W power limits.

I could see 450W power consumption in the OCCT synthetic benchmark. In 3D Mark TimeSpy max power observed was around 430W.

While running the above training (full fine-tuning) the max reported power was 390W - TRL/Torch was not able to fully utilize the GPU (actual utilization being at around 90%). This can be explained by not filling the entirety of VRAM (~20GB out of 24GB). And it could be fixed by increasing the batch size training param (and risking VRAM overflow into shared memory significantly slowing down the total training time). On some other occasions, I could see 410-420W from TRL running LORA fine-tuning.

Based on actual GPU power reported it seems that Afterburner power limits were calculated assuming 100% is 440W.

Gaming Performance Follows Suit

The diminishing performance returns of 4090 have been evaluated before. E.g. in this reddit post a user shared 3DMark FireStrike scores from RTX 4090. The outcomes are the same, you get 80% performance at a 50% power limit.

3DMark FireStrike scores

Top comments (0)