DEV Community

nicolaspinocea for AWS Community Builders

Posted on

Deploying the Xgboost model to AWS from locally developed artifacts, adding inference pipeline

There are companies or clients that need to deploy models in the cloud, without training their models in the AWS environment, since a new training can modify performance, change metrics and ultimately not respond to fundamental needs.

This blog will show how to deploy an Xgboost model binary built for a developer, where a post-processing layer is added through an inference pipeline in sagemaker, deploying an endpoint.

Xgboost algorithm

Tree-based ensemble methods frequently get a good performance, besides offering an interpretation of variables that are employed, making them popular models within the machine learning solution developer community. Extreme gradient boosting (Xgboost) is a variant of a tree-based ensemble method widely useful for handling sparse data, employing a minimal amount of resources, besides being a highly scalable solution. Xgboost is a supervised learning model, where the learning process is sequential, capturing the error of each prior learner, being considered an adaptive algorithm. Also, employ gradient descent for learning. The next figure show as work the gradient boosting:

Work Xgboost

Add booster from .pkl to in tar.gz

The key process and the central theme of this publication are translated in this section. The fundamental element as a result of training a tree-based model is the booster. When we use the Xgboost library, we train and save a model, where a series of attributes from the modeling stage is saved, which for purposes of inference and use of the model, do not contribute. In this way, by rescuing only the booster from the format in which the model is saved, it allows us to communicate with the pre-built AWS Xgboost solution and generate the deployment and use of the solution

import xgboost
import joblib
import tarfile
model_pkl=joblib.load('model_client.pkl')
booster=model_pkl.get_booster()
booster.save_model('xgboost-model')
# add xgboost-model to tar.gz file
fp = tarfile.open("model.tar.gz","w:gz")
fp.add('xgboost-model')
fp.close()
Enter fullscreen mode Exit fullscreen mode

Create model in sagemaker

The first step is to indicate url of the container of the algorithm. In this case, employed the container issued for AWS

region = Session().boto_region_name
xgboost_container = sagemaker.image_uris.retrieve("xgboost", region, "1.0-1")
Enter fullscreen mode Exit fullscreen mode

The next setting is to create a model, with the SDK of sagemaker. Should facilitate the location artifacts in S3 and container of algorithm

from sagemaker.model import Model
xgboost_model=Model(xgboost_container,
                  model_data='s3://file_path_in_s3/model.tar.gz',
                  role=sagemaker.get_execution_role())
Enter fullscreen mode Exit fullscreen mode

Setting inference pipeline

The next step is setup the processing of output of model. For this create a postprocessing model through SkLearnModel

Post processing

from sagemaker.sklearn.model import SKLearnModel

FRAMEWORK_VERSION = '0.23-1'
entry_point = 'postprocessing.py'

postprocessing_model = SKLearnModel(
    model_data='s3://file_path_in_s3/model.tar.gz',
    role=role,
    entry_point=entry_point,
    framework_version=FRAMEWORK_VERSION,
    sagemaker_session=sagemaker_session
)
Enter fullscreen mode Exit fullscreen mode

The entry point is a python file that contains functions that basically manage the output of the model (strings), associating a context. For this there consider the binary or multi-class, and context of the project. The next show an extract of this code:


def output_fn(prediction, accept):

    accept, params = cgi.parse_header(accept.lower())

    if accept == "application/json":
        results = []
        classes = prediction['classes']
        score=prediction['scores']
        score.insert(0,1-score[0])
        score=[score]
        for scores in score: 
            row = []
            for class_, score in zip(classes, scores):
                row.append({
                    'id': class_,
                    'score': score
                })

            results.append(row)

        json_output = {"context": results[0]}
Enter fullscreen mode Exit fullscreen mode

Pipeline model

from sagemaker.pipeline import PipelineModel

model_name='name-model'
inference_model = PipelineModel(
    name=model_name, 
    role=sagemaker.get_execution_role(), 
    models=[ 
        xgboost_model,
        postprocessing_model,
    ])
Enter fullscreen mode Exit fullscreen mode

Deploy and testing endpoint

Finally, we deploy models through an endpoint, where they work sequentially, obtaining the output according to the configuration designed by the user.

inference_model.deploy(
    initial_instance_count=1,
    instance_type='ml.m5.xlarge',  
    endpoint_name=endpoint_name
)
Enter fullscreen mode Exit fullscreen mode

In the following code, you can see the response when invoking the endpoint that contains a post-processing container:

b'{"context": [{"id": "class-0", "score": 0.24162}, {"id": "class-1", "score": 0.75837}]}'
Enter fullscreen mode Exit fullscreen mode

Conclusion and discussion

Using the steps listed above, you can deploy a model to the AWS Cloud, ensuring the consistency and performance of the model that is built locally. Something that can be developed taking this path is to work on the preprocessing of the algorithm that was worked with, and add the preprocessing layer to the inference pipeline, configuring this stage according to your need.

References

Top comments (0)