Save/Load Tensorflow & sklearn pipelines from local and AWS S3 using Joblib

#tensorflow #sklearn #aws #machinelearning

After a lot of struggle doing this, I finally found a simple way.

IMPORTANT:

I've discovered that if you want to be able to save a model/pipeline and have it be importable without encountering ModuleNotFoundErrors when you try to load it again, then you need to be sure that your model is built in the same place that it's getting saved. In the case of a neural network, this means compiling, fitting, and saving in the same module. This has been a big headache for me, so I hope you can avoid it.

We can write and read Tensorflow and sklearn models/pipelines using joblib.

Local Write / Read

from pathlib import Path
path = Path(<local path>)

# WRITE
with path.open("wb") as f:
    joblib.dump(model, f)

# READ
with path.open("rb") as f:
    f.seek(0)
    model = joblib.load(f)

We can do the same thing on AWS S3 using a boto3 client:

AWS S3 Write / Read

import tempfile
import boto3
import joblib

s3_client = boto3.client('s3')
bucket_name = "my-bucket"
key = "model.pkl"

# WRITE
with tempfile.TemporaryFile() as fp:
    joblib.dump(model, fp)
    fp.seek(0)
    s3_client.put_object(Body=fp.read(), Bucket=bucket_name, Key=key)

# READ
with tempfile.TemporaryFile() as fp:
    s3_client.download_fileobj(Fileobj=fp, Bucket=bucket_name, Key=key)
    fp.seek(0)
    model = joblib.load(fp)

# DELETE
s3_client.delete_object(Bucket=bucket_name, Key=key)

DEV Community

Save/Load Tensorflow & sklearn pipelines from local and AWS S3 using Joblib

IMPORTANT:

Local Write / Read

AWS S3 Write / Read

Top comments (0)

Read next

Predicting House Rent with Linear Regression in Python

Understanding Large Language Models: From Training to Real-World Use

Amazon Q: Your GenAI Assistant for Business Processes, Code Reviews, and Documentation

What is MACsec, and why is it important?