DEV Community

Cover image for Introducing LoadTime: Bringing Clarity to HuggingFace Model Loading Times
riversun
riversun

Posted on

Introducing LoadTime: Bringing Clarity to HuggingFace Model Loading Times

Are you tired of the uncertainty of how long it takes to load large-scale pre-trained language models, such as those from HuggingFace, into GPU or CPU memory? Do you find yourself gazing blankly at a static screen, not knowing when the loading will finally conclude? Let me present you with the solution - LoadTime.

LoadTime is a library that I've developed to tackle this very problem. It provides a progress bar during the memory loading process, bringing an end to your uncertainty. Now, let's dive a little deeper into how it works and what it has to offer.

While HuggingFace's models do provide progress indications during the download phase, they leave you in the dark during memory loading. This is where LoadTime comes into play. The mechanism is simple yet effective: During the initial load, LoadTime caches the total loading time. When the same model is loaded subsequently, LoadTime uses this cached time as a reference to display a progress bar.

Unlike other progress display libraries like tqdm , which require a known total count beforehand, LoadTime is designed for situations where the total is unknown. It provides real-time updates based on past data, making it a versatile tool for your tasks.

Example

pip install loadtime
Enter fullscreen mode Exit fullscreen mode

Here is a simple example of how to use the LoadTime package:

In order to use it, you simply need to wrap the model loading part with LoadTime. It's as easy as that!

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from loadtime import LoadTime

model_path = "togethercomputer/RedPajama-INCITE-Chat-3B-v1"


# model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16)


model = LoadTime(name=model_path,
                 fn=lambda: AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16))()


tokenizer = AutoTokenizer.from_pretrained(model_path) # important. load tokenizer after model.

Enter fullscreen mode Exit fullscreen mode

Thanks.

Top comments (0)