DEV Community

Omkar Ajnadkar
Omkar Ajnadkar

Posted on • Originally published at Medium on

Predicting customer churn in banking using ANN

Predicting customer churn in banking using ANN

Dataset

The dataset ‘Churn_Modelling.csv’ contains records of 10,000 customers of a bank with following columns:

  1. RowNumber
  2. CustomerId
  3. Surname
  4. CreditScore
  5. Geography
  6. Gender
  7. Age
  8. Tenure
  9. Balance
  10. NumOfProducts
  11. HasCrCard
  12. IsActiveMember
  13. EstimatedSalary
  14. Exited

By using the columns 1 to 13, we want to predict if the customer will exit or not that is column 14.

Data Preprocessing

  • Removing unnecessary features
  • Label Encoder
  • One Hot Encoder
  • Train Test Split
  • Standard Scaler

Simple ANN Model using Keras

Create an ANN with total 4 layers

  • One input layer with 11 input features and 6 output features
  • Hidden layer with 6 output features
  • Final layer with 1 output feature

Activation Functions

  • 1st layer: Relu
  • 2nd layer: Relu
  • 3rd layer: Sigmoid

Hyperparameters

  • optimizer: adam
  • loss: binary_crossentropy
  • metrics: accuracy
  • batch_size: 10
  • epochs:100

Accuracy(Subject to change)

  • Training Set: 0.8610
  • Testing Set:0.86

Improving ANN

  • Use k-fold classifier to split training set in say 10 parts and applying training on 9 out of 10 parts and testing on another every time to decrease fluctuation in accuracy every time you run the code.
  • Use Dropout technique with a certain threshold to decreases overfitting on the training set. Applying to this dataset gives an accuracy of 0.8321 which means now data is less overfitted to this training set.
  • Use GridSearchCV to find best parameters automatically. Enter all the hyperparameters you want to test your network on and after testing everything it will give the best possible accuracy and parameters. I tried with the following parameters:

batch_size: 25, 32

epochs: 100, 500

optimizer: adam, rmsprop

  • After waking in the morning(yes, it takes a long time…), this is what I found…

best_parameters

  • batch_size: 25
  • epochs: 500
  • optimizer: rmsprop

accuracy: 0.8545

Further Improvements

You can further improve this model by changing hyperparameters and trying other range of values in GridSearchCV. But it is important to note that, as you will increase the number of parameters in GridSearchCV, your time for training will also increase.

Code


Send a pull request for any suggestions and errors…

Top comments (0)