LLM Fine-tunig on RTX 4090: 90% Performance at 55% Power

#ai #machinelearning #hardware #perfrormance

At just a fraction of power, 4090 is capable of delivering almost full performance.

While running SFT (supervised fine-tuning) via Hugginface's TRL library (using Torch as a backend) I decided to move Afterburner power slider down:

And checked wandb dashboards for changes in training speed (epochs per hour) and GPU power:

Here's the full table with performance (and a few other measurements) at different power levels:

Power, W	Temp, °C	Afterburner PWR %	Perf (Epoch/Hour)	Perf/kW	Power %	Perf %
390	72	100%	0,442	1,134	100,0%	100,0%
330	70	80%	0,436	1,322	84,6%	98,6%
300	67	70%	0,413	1,378	76,9%	93,4%
260	62	60%	0,405	1,557	66,7%	91,5%
240	60	55%	0,394	1,644	61,5%	89,2%
220	58	50%	0,365	1,660	56,4%	82,6%
180	52	40%	0,271	1,508	46,2%	61,3%
150	47	33%	0,221	1,473	38,5%	49,9%

If you run long training sessions on your RTX 4090 PC and would like to save on electricity bills OR keep your room cooler (500W midi tower is quite a heater), limiting GPU power to 50-60% makes total sense.

Besides there's a sweet spot at 50% (220W) with maximum efficiency (performance-per-watt or trained-epochs-per-watt*hour). At this power level, you still get 82% of the max speed.

Few Notes on RTX 4090 Power Levels

Most desktop RTX 4090 cards are rated at 450W, such as mine (Palit 4090 flashed with Asus 450W 1.1v BIOS). There're versions with 500W, 600W and even 666W power limits.

I could see 450W power consumption in the OCCT synthetic benchmark. In 3D Mark TimeSpy max power observed was around 430W.

While running the above training (full fine-tuning) the max reported power was 390W - TRL/Torch was not able to fully utilize the GPU (actual utilization being at around 90%). This can be explained by not filling the entirety of VRAM (~20GB out of 24GB). And it could be fixed by increasing the batch size training param (and risking VRAM overflow into shared memory significantly slowing down the total training time). On some other occasions, I could see 410-420W from TRL running LORA fine-tuning.

Based on actual GPU power reported it seems that Afterburner power limits were calculated assuming 100% is 440W.

Gaming Performance Follows Suit

The diminishing performance returns of 4090 have been evaluated before. E.g. in this reddit post a user shared 3DMark FireStrike scores from RTX 4090. The outcomes are the same, you get 80% performance at a 50% power limit.

DEV Community

LLM Fine-tunig on RTX 4090: 90% Performance at 55% Power

Few Notes on RTX 4090 Power Levels

Gaming Performance Follows Suit

Top comments (0)

Read next

Accelerate AI Workloads with Amazon EC2 Trn1 Instances and AWS Neuron SDK

Brain-Inspired Method Cuts Neural Networks by 90% Without Losing Accuracy

How Machine Learning Models Learn: A Journey from Basics to Foundation Models (2)

ECCV 2024 Redux: Fast and Photo-realistic Novel View Synthesis from Sparse Images