Using a Random Forest Model for Fraud Detection in Confidential Computing

#cloud #security #machinelearning #datascience

Introduction

In this blog we develop a random forest model for detecting credit card fraud and deploy it on Cape Privacy's confidential computing platform to perform secure inference. Cape ensures that both the model and the examined credit card transactions remain private during inference.

Credit Card Fraud

Credit card fraud is a form of identity theft, which involves using another person’s credit card to make purchases or withdraw cash advances without the card owner's consent. Fraudsters may either obtain your physical credit card, or just steal your credit card information such as the account number, cardholder name, and CVV code and use it to take over your account.

In fact, according to the Federal Trade Commision credit card fraud has become the most frequent type of identity theft in 2022 [1]. The good news is that most major credit card providers such Visa, Mastercard or American Express offer $0 liability protection to their customers, which means that individuals whose credit card information has been stolen aren’t personally liable for fraudulent transactions. However, having your identity stolen and going through the process of mitigating the repercussions of it is still no fun. Therefore, timely credit card fraud detection is paramount for protecting credit cardholders against identity theft and for mitigating financial losses that the credit card industry suffers due to fraud.

Fraud Detection

To minimize their losses and ensure their customers satisfaction, credit card companies employ a variety methods to prevent and detect credit card fraud. Modern solutions leverage machine learning to timely detect suspicious transactions to stop fraud [2].

Privacy Intrusion in Fraud Detection

As a credit card holder, I want my account to be maximally protected against credit card fraud, but how do I feel about my credit card transactions data being collected and processed? What if that data is not securely handled and leaks? This is where Cape’s confidential computing platform can help.

Confidential Computing with Cape

Cape’s confidential computing platform based on secure AWS Nitro enclaves allows its users to process data in a privacy preserving manner in the cloud. A secure enclave is an environment that provides for isolation of code and data from OS using hardware-based CPU-level isolation. Secure enclaves offer a process called attestation to verify that the CPU and apps running are genuine and unaltered. Therefore, secure enclaves enable confidential data processing and ensure the privacy of both the code and data within the enclave.

In addition to the platform itself, Cape also provides a CLI that enables its users to easily encrypt their input data, and deploy and run serverless functions with easy commands: cape encrypt, cape deploy, and cape run. Additionally, Cape also provides two SDKs: pycape and cape-js, which allow for using cape within Python and JavaScript programs respectively.

Secure Credit Card Fraud Inference with Cape

In this blog we will train a credit card fraud detection model with Sklearn’s random forest classifier and deploy it on Cape’s secure cloud platform to ensure that the model and input data processed during inference remain confidential.

Train a Model

First, we train a simple random forest classifier and save the model as follows.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from joblib import dump, load

data = pd.read_csv("creditcard.csv")
X = data.drop(['Class'], axis=1)
Y = data["Class"]
X_data = X.values
Y_data = Y.values
X_train, X_test, Y_train, Y_test = train_test_split(X_data, Y_data, test_size = 0.2, random_state = 42)

model = RandomForestClassifier()
model.fit(X_train, Y_train)

y_pred = model.predict(X_test)
print(accuracy_score(Y_test, y_pred))

# save model
dump(model, 'model.joblib')

The above model has a testing accuracy of 99.9%.

Create an Inference Cape Function

The code snippet below shows our app.py. First, we import the libraries we need for our app.

from joblib import load
import pandas as pd
import sklearn

Then we define a cape handler function, which accepts a credit card transaction as input, invokes the previously trained model and outputs a prediction indicating if the transaction is legitimate or fraudulent.

Please note that any function that is deployed with Cape needs to be named app.py, where app.py needs to contain a function called cape_handler() that takes the input that the function processes and returns the results.

def cape_handler(input_data):
    csv = input_data.decode("utf-8")
    csv = csv.replace("\\t", ",").replace("\\n", "\n")
    f = open("data.csv", "w")
    f.write(csv)
    f.close()

    data = pd.read_csv("data.csv")
    clf = load('model.joblib')
    y_pred = clf.predict(data)
    if y_pred == 0:
        return "This credit card transaction is legitimate"
    else:
        return "This credit card transaction is fraudulent"

Deploy with Cape

To deploy our function with Cape, we first need to create a folder that contains all needed dependencies. In case of this app, the deployment folder needs to contain the app.py above and also the trained model, which we saved as model.joblib. Additionally, because the app.py program imports some external libraries (in this case: sklearn, pandas, and joblib), the deployment folder needs to have those as well. We can save a list of those dependencies into a requirements.txt file and run docker to install those dependencies into our deployment folder called app as follows:

sudo docker run -v `pwd`:/build -w /build --rm -it python:3.9-slim-bullseye pip install -r requirements.txt --target ./app/

Now that we have everything ready, we can log into Cape:

cape login

Your CLI confirmation code is: GZPN-KHMT
Visit this URL to complete the login process: https://login.capeprivacy.com/activate?user_code=GZPN-KHMT
Congratulations, you're all set!

And after that we can deploy the app:

cape deploy app

Deploying function to Cape ...
Success! Deployed function to Cape.
Function ID ➜  YdVYPwWkTw2TmP6u7JEF6i
Function Checksum ➜  26ebbba7e81391b9a40ea35f8b29eb969726417897dbfbe5d069973344a5e831

Run with Cape

Now that the app is deployed, we can pass it an input and invoke it with cape run:

cape run YdVYPwWkTw2TmP6u7JEF6i -f fraudulent_transaction.csv --insecure -u https://k8s-cape-enclaver-750003af11-e3080498c852b366.elb.us-east-1.amazonaws.com

This credit card transaction is fraudulent

cape run YdVYPwWkTw2TmP6u7JEF6i -f legitimate_transaction.csv --insecure -u https://k8s-cape-enclaver-750003af11-e3080498c852b366.elb.us-east-1.amazonaws.com

This credit card transaction is legitimate

Conclusion

In this blog we discussed the importance of timely credit card fraud detection, which has become the number one most common form of identity theft in 2022 [1]. Modern fraud detection tools leverage machine learning models, which requires a large scale collection of credit card transaction data. The challenge is to ensure that this sensitive data is handled in a secure manner to prevent data leaks that bad actors can take advantage of. To ensure that the credit card transactions that are being examined remain private during inference, we leveraged Cape’s confidential computing platform.

Specifically, we deployed a trained random forest classifier to Cape’s secure enclave from where we ran inference on some example transactions. Go to Cape’s 5 Minute Quickstart to try out Cape yourself!

[1] https://public.tableau.com/app/profile/federal.trade.commission/viz/TheBigViewAllSentinelReports/TopReports
[2] https://www.google.com/url?q=https://journalofbigdata.springeropen.com/articles/10.1186/s40537-022-00573-8&sa=D&source=docs&ust=1667852611321463&usg=AOvVaw3UX5MKe17ZhDxhabVjhQie